Skip to main content

Overview

MiniMaxTTSService provides high-quality text-to-speech synthesis using MiniMax’s T2A (Text-to-Audio) API with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.

MiniMax TTS API Reference

Pipecat’s API methods for MiniMax TTS integration

Example Implementation

Complete example with emotional voice settings

MiniMax Documentation

Official MiniMax T2A API documentation

MiniMax Platform

Access voice models and API credentials

Installation

To use MiniMax services, no additional dependencies are required beyond the base installation:
pip install "pipecat-ai"

Prerequisites

MiniMax Account Setup

Before using MiniMax TTS services, you need:
  1. MiniMax Account: Sign up at MiniMax Platform
  2. API Credentials: Get your API key and Group ID from the platform
  3. Voice Selection: Choose from available voice models and emotional settings

Required Environment Variables

  • MINIMAX_API_KEY: Your MiniMax API key for authentication
  • MINIMAX_GROUP_ID: Your MiniMax group ID

Configuration

MiniMaxHttpTTSService

api_key
str
required
MiniMax API key for authentication.
group_id
str
required
MiniMax Group ID to identify project.
voice_id
str
default:"Calm_Woman"
deprecated
Voice identifier for synthesis.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
model
str
default:"speech-02-turbo"
deprecated
TTS model name. Options include speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
base_url
str
default:"https://api.minimax.io/v1/t2a_v2"
API base URL. Use https://api.minimaxi.chat/v1/t2a_v2 for mainland China or https://api-uw.minimax.io/v1/t2a_v2 for western United States.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
deprecated
Runtime-configurable voice and generation settings. See InputParams below.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
settings
MiniMaxHttpTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using MiniMaxHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
speedfloatNOT_GIVENSpeech speed.
volumefloatNOT_GIVENVolume level.
pitchintNOT_GIVENPitch adjustment.
emotionstrNOT_GIVENEmotion for synthesis.
text_normalizationboolNOT_GIVENWhether to apply text normalization.
latex_readboolNOT_GIVENWhether to read LaTeX formulas.
language_booststrNOT_GIVENLanguage boost setting.

Usage

Basic Setup

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        aiohttp_session=session,
    )

With Voice Customization

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        aiohttp_session=session,
        settings=MiniMaxHttpTTSService.Settings(
            voice="Calm_Woman",
            model="speech-02-hd",
            language=Language.ZH,
            speed=1.2,
            emotion="happy",
        ),
    )
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • HTTP-based streaming: MiniMax uses an HTTP streaming API, not WebSocket. Audio data is returned in hex-encoded PCM chunks.
  • Emotional voice control: The emotion parameter lets you adjust the emotional tone of the voice without changing the voice model itself.
  • Model selection: The speech-2.6-* models are the latest and support additional languages (Filipino, Tamil, Persian). Use turbo variants for lower latency or hd variants for higher quality.
  • The Python class is named MiniMaxHttpTTSService, not MiniMaxTTSService.