Skip to main content

Overview

GrokRealtimeLLMService provides real-time, multimodal conversation capabilities using xAI’s Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.

Grok Realtime API Reference

Pipecat’s API methods for Grok Realtime integration

Example Implementation

Complete Grok Realtime conversation example

Grok Voice Documentation

Official xAI Grok Voice Agent API documentation

xAI Console

Access Grok models and manage API keys

Installation

To use Grok Realtime services, install the required dependencies:
pip install "pipecat-ai[grok]"

Prerequisites

xAI Account Setup

Before using Grok Realtime services, you need:
  1. xAI Account: Sign up at xAI Console
  2. API Key: Generate a Grok API key from your account dashboard
  3. Model Access: Ensure access to Grok Voice Agent models
  4. Usage Limits: Configure appropriate usage limits and billing

Required Environment Variables

  • GROK_API_KEY: Your xAI API key for authentication

Key Features

  • Real-time Speech-to-Speech: Direct audio processing with low latency
  • Multilingual Support: Support for multiple languages
  • Voice Activity Detection: Server-side VAD for automatic speech detection
  • Function Calling: Seamless support for external functions and tool integration
  • Multiple Voice Options: Various voice personalities available
  • WebSocket Support: Real-time bidirectional audio streaming

Configuration

GrokRealtimeLLMService

api_key
str
required
xAI API key for authentication.
base_url
str
default:"wss://api.x.ai/v1/realtime"
WebSocket base URL for the Grok Realtime API. Override for custom deployments.
session_properties
SessionProperties
default:"None"
deprecated
Configuration properties for the realtime session. If None, uses default SessionProperties with voice "Ara" and server-side VAD enabled. See SessionProperties below.Deprecated in v0.0.105. Use settings=GrokRealtimeLLMService.Settings(session_properties=...) instead.
settings
GrokRealtimeLLMService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
start_audio_paused
bool
default:"False"
Whether to start with audio input paused.

Settings

Runtime-configurable settings passed via the settings constructor argument using GrokRealtimeLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNOT_GIVENModel identifier. (Inherited from base settings.)
system_instructionstrNOT_GIVENSystem instruction/prompt. (Inherited from base settings.)
session_propertiesSessionPropertiesNOT_GIVENSession-level configuration (voice, audio config, tools, etc.).
NOT_GIVEN values are omitted, letting the service use its own defaults. Only parameters that are explicitly set are included.

SessionProperties

ParameterTypeDefaultDescription
instructionsstrNoneSystem instructions for the assistant.
voiceLiteral["Ara", "Rex", "Sal", "Eve", "Leo"]"Ara"Voice the model uses to respond.
turn_detectionTurnDetectionTurnDetection(type="server_vad")Turn detection configuration. Set to None for manual turn detection.
audioAudioConfigurationNoneConfiguration for input and output audio formats.
toolsList[GrokTool]NoneAvailable tools: web_search, x_search, file_search, or custom function tools.

AudioConfiguration

The audio field in SessionProperties accepts an AudioConfiguration with input and output sub-configurations: AudioInput (audio.input):
ParameterTypeDefaultDescription
formatAudioFormatNoneInput audio format. Supports PCMAudioFormat (configurable rate), PCMUAudioFormat (8kHz), or PCMAAudioFormat (8kHz).
AudioOutput (audio.output):
ParameterTypeDefaultDescription
formatAudioFormatNoneOutput audio format. Same format options as input.
Grok PCM audio supports sample rates: 8000, 16000, 21050, 24000, 32000, 44100, and 48000 Hz.

Built-in Tools

Grok provides several built-in tools in addition to custom function tools:
ToolTypeDescription
WebSearchToolweb_searchSearch the web for current information
XSearchToolx_searchSearch X (Twitter) for posts. Supports allowed_x_handles filter.
FileSearchToolfile_searchSearch uploaded document collections by vector_store_ids

Usage

Basic Setup

import os
from pipecat.services.grok.realtime import GrokRealtimeLLMService

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
)

With Session Configuration

from pipecat.services.grok.realtime import GrokRealtimeLLMService
from pipecat.services.grok.realtime.events import (
    SessionProperties,
    TurnDetection,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    PCMAudioFormat,
)

session_properties = SessionProperties(
    instructions="You are a helpful assistant.",
    voice="Rex",
    turn_detection=TurnDetection(type="server_vad"),
    audio=AudioConfiguration(
        input=AudioInput(format=PCMAudioFormat(rate=16000)),
        output=AudioOutput(format=PCMAudioFormat(rate=16000)),
    ),
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=session_properties,
    ),
)

With Built-in Tools

from pipecat.services.grok.realtime import GrokRealtimeLLMService
from pipecat.services.grok.realtime.events import (
    SessionProperties,
    WebSearchTool,
    XSearchTool,
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=SessionProperties(
            instructions="You are a helpful assistant with access to web search.",
            voice="Ara",
            tools=[
                WebSearchTool(),
                XSearchTool(allowed_x_handles=["@elonmusk"]),
            ],
        ),
    ),
)

Updating Settings at Runtime

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.grok.realtime.llm import GrokRealtimeLLMSettings
from pipecat.services.grok.realtime.events import SessionProperties

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GrokRealtimeLLMSettings(
            session_properties=SessionProperties(
                instructions="Now speak in Spanish.",
                voice="Eve",
            ),
        )
    )
)
The deprecated session_properties constructor parameter is replaced by Settings as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Audio format auto-configuration: If audio format is not specified in session_properties, the service automatically configures PCM input/output using the pipeline’s sample rates.
  • Server-side VAD: Enabled by default. When VAD is enabled, the server handles speech detection and turn management automatically. Set turn_detection to None to manage turns manually.
  • Audio before setup: Audio is not sent to Grok until the conversation setup is complete, preventing sample rate mismatches.
  • Available voices: Ara (default), Rex, Sal, Eve, and Leo.
  • G.711 support: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.

Event Handlers

EventDescription
on_conversation_item_createdCalled when a new conversation item is created in the session
on_conversation_item_updatedCalled when a conversation item is updated or completed
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")