Skip to main content

Overview

GradiumSTTService provides real-time speech recognition using Gradium’s WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.

Gradium STT API Reference

Pipecat’s API methods for Gradium STT integration

Example Implementation

Complete example with interruption handling

Gradium Documentation

Official Gradium STT API documentation

Gradium Platform

Access API keys and speech models

Installation

To use Gradium services, install the required dependency:
pip install "pipecat-ai[gradium]"

Prerequisites

Gradium Account Setup

Before using Gradium STT services, you need:
  1. Gradium Account: Sign up at Gradium
  2. API Key: Generate an API key from your account dashboard
  3. Region Selection: Choose your preferred region (EU or US)

Required Environment Variables

  • GRADIUM_API_KEY: Your Gradium API key for authentication

Configuration

GradiumSTTService

api_key
str
required
Gradium API key for authentication.
api_endpoint_base_url
str
default:"wss://eu.api.gradium.ai/api/speech/asr"
WebSocket endpoint URL. Override for different regions or custom deployments.
params
GradiumSTTService.InputParams
default:"None"
deprecated
Configuration parameters for language and delay settings. Deprecated in v0.0.105. Use settings=GradiumSTTService.Settings(...) instead.
json_config
str
default:"None"
Optional JSON configuration string for additional model settings. Deprecated in favor of params.
settings
GradiumSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"GRADIUM_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using GradiumSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strNoneExpected language of the audio. (Inherited from base STT settings.) Helps ground the model to a specific language and improve transcription quality.
delay_in_framesintNoneDelay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms).

Usage

Basic Setup

from pipecat.services.gradium.stt import GradiumSTTService

stt = GradiumSTTService(
    api_key=os.getenv("GRADIUM_API_KEY"),
)

With Language and Delay Configuration

from pipecat.services.gradium.stt import GradiumSTTService
from pipecat.transcriptions.language import Language

stt = GradiumSTTService(
    api_key=os.getenv("GRADIUM_API_KEY"),
    settings=GradiumSTTService.Settings(
        language=Language.EN,
        delay_in_frames=8,
    ),
)

Notes

  • Supported languages: German, English, Spanish, French, and Portuguese.
  • Silence flushing: When VAD detects the user has stopped speaking, the service sends silence frames to flush the transcription buffer, resulting in faster final transcripts without closing the connection.
  • Audio format: Sends audio as 24 kHz 16-bit PCM in 80ms chunks.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Gradium STT supports the standard service connection events:
EventDescription
on_connectedConnected to Gradium WebSocket
on_disconnectedDisconnected from Gradium WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gradium")