Skip to main content

Overview

GladiaSTTService provides real-time speech recognition using Gladia’s WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.

Gladia STT API Reference

Pipecat’s API methods for Gladia STT integration

Example Implementation

Complete example with interruption handling

Gladia Documentation

Official Gladia documentation and features

Gladia Platform

Access multilingual transcription and API keys

Installation

To use Gladia services, install the required dependency:
pip install "pipecat-ai[gladia]"

Prerequisites

Gladia Account Setup

Before using Gladia STT services, you need:
  1. Gladia Account: Sign up at Gladia
  2. API Key: Generate an API key from your account dashboard
  3. Region Selection: Choose your preferred region (EU-West or US-West)

Required Environment Variables

  • GLADIA_API_KEY: Your Gladia API key for authentication
  • GLADIA_REGION: Your preferred region (optional, defaults to “eu-west”)

Configuration

GladiaSTTService

api_key
str
required
Gladia API key for authentication.
region
Literal['us-west', 'eu-west']
default:"None"
Region used to process audio. Defaults to "eu-west" when None.
url
str
default:"https://api.gladia.io/v2/live"
Gladia API URL for session initialization.
confidence
float
default:"None"
Minimum confidence threshold for transcriptions (0.0-1.0). Deprecated — no confidence threshold is applied.
encoding
str
default:"wav/pcm"
Audio encoding format. Init-only — not part of runtime-updatable settings.
bit_depth
int
default:"16"
Audio bit depth. Init-only — not part of runtime-updatable settings.
channels
int
default:"1"
Number of audio channels. Init-only — not part of runtime-updatable settings.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"None"
deprecated
Model to use for transcription. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.
params
GladiaInputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.
settings
GladiaSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
max_buffer_size
int
default:"20971520"
Maximum size of audio buffer in bytes (default 20MB).
should_interrupt
bool
default:"True"
Whether the bot should be interrupted when Gladia VAD detects user speech.
ttfs_p99_latency
float
default:"1.49"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using GladiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage for speech recognition. (Inherited from base STT settings.)
language_configLanguageConfigNoneDetailed language configuration with code switching support.
custom_metadataDict[str, Any]NoneAdditional metadata to include with requests.
endpointingfloatNoneSilence duration in seconds to mark end of speech.
maximum_duration_without_endpointingint5Maximum utterance duration (seconds) without silence.
pre_processingPreProcessingConfigNoneAudio pre-processing options (audio enhancer, speech threshold).
realtime_processingRealtimeProcessingConfigNoneReal-time processing features (custom vocabulary, translation, NER, sentiment).
messages_configMessagesConfigNoneWebSocket message filtering options.
enable_vadboolFalseEnable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent.

Usage

Basic Setup

from pipecat.services.gladia.stt import GladiaSTTService

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)

With Language Configuration

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import LanguageConfig

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    region="us-west",
    settings=GladiaSTTService.Settings(
        model="solaria-1",
        language_config=LanguageConfig(
            languages=["en", "es"],
            code_switching=True,
        ),
    ),
)

With Real-time Processing

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import (
    RealtimeProcessingConfig,
    CustomVocabularyConfig,
    CustomVocabularyItem,
    TranslationConfig,
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    settings=GladiaSTTService.Settings(
        realtime_processing=RealtimeProcessingConfig(
            custom_vocabulary=True,
            custom_vocabulary_config=CustomVocabularyConfig(
                vocabulary=[
                    CustomVocabularyItem(value="Pipecat", intensity=0.8),
                    "Gladia",
                ],
            ),
            translation=True,
            translation_config=TranslationConfig(
                target_languages=["fr", "de"],
                model="enhanced",
            ),
        ),
    ),
)

Notes

  • Session-based connection: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
  • Audio buffering: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
  • Keepalive: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
  • Built-in VAD: Set enable_vad=True in Settings to use Gladia’s server-side VAD, which emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame. When using this, do not enable another VAD in your pipeline.
  • Translation: Gladia supports real-time translation to multiple target languages. Translation results are pushed as TranslationFrames.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Gladia STT supports the standard service connection events:
EventDescription
on_connectedConnected to Gladia WebSocket
on_disconnectedDisconnected from Gladia WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gladia")