Skip to main content

Overview

OpenAITTSService provides high-quality text-to-speech synthesis using OpenAI’s TTS API with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.

OpenAI TTS API Reference

Pipecat’s API methods for OpenAI TTS integration

Example Implementation

Complete example with voice customization

OpenAI Documentation

Official OpenAI TTS API documentation

Voice Samples

Listen to available voice options

Installation

To use OpenAI services, install the required dependencies:
pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI TTS services, you need:
  1. OpenAI Account: Sign up at OpenAI Platform
  2. API Key: Generate an API key from your API keys page
  3. Voice Selection: Choose from available voice options (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse)

Required Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key for authentication

Configuration

OpenAITTSService

api_key
str
default:"None"
OpenAI API key for authentication. If None, uses the OPENAI_API_KEY environment variable.
base_url
str
default:"None"
Custom base URL for OpenAI API. If None, uses the default OpenAI endpoint.
voice
str
default:"alloy"
deprecated
Voice ID to use for synthesis. Options: alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
model
str
default:"gpt-4o-mini-tts"
deprecated
TTS model to use.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz. If None, uses OpenAI’s default 24kHz. OpenAI TTS only supports 24kHz output.
params
InputParams
default:"None"
deprecated
Runtime-configurable voice and generation settings. See InputParams below.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
settings
OpenAITTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAITTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneTTS model identifier. (Inherited from base settings.)
voicestrNoneVoice identifier. (Inherited from base settings.)
languageLanguage | strNoneLanguage for synthesis. (Inherited from base settings.)
instructionsstrNOT_GIVENInstructions to guide voice synthesis behavior (e.g. affect, tone, pacing).
speedfloatNOT_GIVENVoice speed control (0.25 to 4.0).

Usage

Basic Setup

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAITTSService.Settings(
        voice="nova",
    ),
)

With Voice Customization

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAITTSService.Settings(
        voice="coral",
        model="gpt-4o-mini-tts",
        instructions="Speak in a warm, friendly tone with moderate pacing.",
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.openai.tts import OpenAITTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=OpenAITTSSettings(
            instructions="Now speak more formally.",
            speed=0.9,
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Fixed sample rate: OpenAI TTS always outputs audio at 24kHz. Using a different sample rate may cause issues.
  • Model selection: The gpt-4o-mini-tts model supports the instructions parameter for controlling voice affect and tone, which traditional TTS models do not support.
  • HTTP-based service: OpenAI TTS uses HTTP streaming, so it does not have WebSocket connection events.