Skip to main content

Overview

CartesiaSTTService provides real-time speech recognition using Cartesia’s WebSocket API with the ink-whisper model, supporting streaming transcription with both interim and final results for low-latency applications.

Cartesia STT API Reference

Pipecat’s API methods for Cartesia STT integration

Example Implementation

Complete example with transcription logging

Cartesia Documentation

Official Cartesia STT documentation and features

Cartesia Platform

Access API keys and transcription models

Installation

To use Cartesia services, install the required dependency:
pip install "pipecat-ai[cartesia]"

Prerequisites

Cartesia Account Setup

Before using Cartesia STT services, you need:
  1. Cartesia Account: Sign up at Cartesia
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the ink-whisper transcription model

Required Environment Variables

  • CARTESIA_API_KEY: Your Cartesia API key for authentication

Configuration

CartesiaSTTService

api_key
str
required
Cartesia API key for authentication.
base_url
str
default:""
Custom API endpoint URL. If empty, defaults to "api.cartesia.ai". Override for proxied deployments.
encoding
str
default:"pcm_s16le"
Audio encoding format.
sample_rate
int
default:"None"
Audio sample rate in Hz.
live_options
Optional[CartesiaLiveOptions]
default:"None"
deprecated
Configuration options for the transcription service. Deprecated in v0.0.105. Use settings=CartesiaSTTService.Settings(...) for model/language and direct init parameters for encoding/sample_rate instead.
settings
CartesiaSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"CARTESIA_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using CartesiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"ink-whisper"The transcription model to use. (Inherited from base STT settings.)
languageLanguage | str"en"Target language for transcription. (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

With Custom Options

from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaSTTService.Settings(
        model="ink-whisper",
        language="es",
    ),
    sample_rate=16000,
)

Notes

  • Inactivity timeout: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
  • Auto-reconnect on send: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
  • Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a "finalize" command to flush the transcription session and produce a final result.
The InputParams / params= / live_options= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Cartesia STT supports the standard service connection events:
EventDescription
on_connectedConnected to Cartesia WebSocket
on_disconnectedDisconnected from Cartesia WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Cartesia STT")