Gladia

Overview

GladiaSTTService provides real-time speech recognition using Gladia’s WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.

Gladia STT API Reference

Pipecat’s API methods for Gladia STT integration

Example Implementation

Complete example with interruption handling

Gladia Documentation

Official Gladia documentation and features

Gladia Platform

Access multilingual transcription and API keys

Installation

To use Gladia services, install the required dependency:

pip install "pipecat-ai[gladia]"

Prerequisites

Gladia Account Setup

Before using Gladia STT services, you need:

Gladia Account: Sign up at Gladia
API Key: Generate an API key from your account dashboard
Region Selection: Choose your preferred region (EU-West or US-West)

Required Environment Variables

GLADIA_API_KEY: Your Gladia API key for authentication
GLADIA_REGION: Your preferred region (optional, defaults to “eu-west”)

Configuration

GladiaSTTService

api_key

str

required

Gladia API key for authentication.

region

Literal['us-west', 'eu-west']

default:"None"

Region used to process audio. Defaults to "eu-west" when None.

url

str

default:"https://api.gladia.io/v2/live"

Gladia API URL for session initialization.

confidence

float

default:"None"

Minimum confidence threshold for transcriptions (0.0-1.0). Deprecated — no confidence threshold is applied.

encoding

str

default:"wav/pcm"

Audio encoding format. Init-only — not part of runtime-updatable settings.

bit_depth

int

default:"16"

Audio bit depth. Init-only — not part of runtime-updatable settings.

channels

int

default:"1"

Number of audio channels. Init-only — not part of runtime-updatable settings.

sample_rate

int

default:"None"

Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

model

str

default:"None"

deprecated

Model to use for transcription. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.

params

GladiaInputParams

default:"None"

deprecated

Additional configuration parameters. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.

settings

GladiaSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

max_buffer_size

int

default:"20971520"

Maximum size of audio buffer in bytes (default 20MB).

should_interrupt

bool

default:"True"

Whether the bot should be interrupted when Gladia VAD detects user speech.

ttfs_p99_latency

float

default:"1.49"

P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using GladiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	STT model identifier. (Inherited from base STT settings.)
`language`	`Language \| str`	`None`	Language for speech recognition. (Inherited from base STT settings.)
`language_config`	`LanguageConfig`	`None`	Detailed language configuration with code switching support.
`custom_metadata`	`Dict[str, Any]`	`None`	Additional metadata to include with requests.
`endpointing`	`float`	`None`	Silence duration in seconds to mark end of speech.
`maximum_duration_without_endpointing`	`int`	`5`	Maximum utterance duration (seconds) without silence.
`pre_processing`	`PreProcessingConfig`	`None`	Audio pre-processing options (audio enhancer, speech threshold).
`realtime_processing`	`RealtimeProcessingConfig`	`None`	Real-time processing features (custom vocabulary, translation, NER, sentiment).
`messages_config`	`MessagesConfig`	`None`	WebSocket message filtering options.
`enable_vad`	`bool`	`False`	Enable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent.

Usage

Basic Setup

from pipecat.services.gladia.stt import GladiaSTTService

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)

With Language Configuration

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import LanguageConfig

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    region="us-west",
    settings=GladiaSTTService.Settings(
        model="solaria-1",
        language_config=LanguageConfig(
            languages=["en", "es"],
            code_switching=True,
        ),
    ),
)

With Real-time Processing

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import (
    RealtimeProcessingConfig,
    CustomVocabularyConfig,
    CustomVocabularyItem,
    TranslationConfig,
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    settings=GladiaSTTService.Settings(
        realtime_processing=RealtimeProcessingConfig(
            custom_vocabulary=True,
            custom_vocabulary_config=CustomVocabularyConfig(
                vocabulary=[
                    CustomVocabularyItem(value="Pipecat", intensity=0.8),
                    "Gladia",
                ],
            ),
            translation=True,
            translation_config=TranslationConfig(
                target_languages=["fr", "de"],
                model="enhanced",
            ),
        ),
    ),
)

Notes

Session-based connection: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
Audio buffering: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
Keepalive: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
Built-in VAD: Set enable_vad=True in Settings to use Gladia’s server-side VAD, which emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame. When using this, do not enable another VAD in your pipeline.
Translation: Gladia supports real-time translation to multiple target languages. Translation results are pushed as TranslationFrames.

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Gladia STT supports the standard service connection events:

Event	Description
`on_connected`	Connected to Gladia WebSocket
`on_disconnected`	Disconnected from Gladia WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gladia")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Gladia STT API Reference

Example Implementation

Gladia Documentation

Gladia Platform

Installation

Prerequisites

Gladia Account Setup

Required Environment Variables

Configuration

GladiaSTTService

Settings

Usage

Basic Setup

With Language Configuration

With Real-time Processing

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Gladia STT API Reference

Example Implementation

Gladia Documentation

Gladia Platform

​Installation

​Prerequisites

​Gladia Account Setup

​Required Environment Variables

​Configuration

​GladiaSTTService

​Settings

​Usage

​Basic Setup

​With Language Configuration

​With Real-time Processing

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Gladia Account Setup

Required Environment Variables

Configuration

GladiaSTTService

Settings

Usage

Basic Setup

With Language Configuration

With Real-time Processing

Notes

Event Handlers