Smart Turn Detection

Overview

Smart Turn Detection uses an advanced machine learning model to determine when a user has finished speaking and your bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection recognizes natural conversational cues like intonation patterns and linguistic signals for more natural conversations.

GitHub Repository

Open source model for conversational turn detection

Model weights

ONNX weights file for Smart Turn v3

Key Benefits

Natural conversations: More human-like turn-taking patterns
Free to use: The model is fully open source
Scalable: Smart Turn v3 supports fast CPU inference directly inside your Pipecat Cloud instance

Quick Start

To enable Smart Turn Detection in your Pipecat Cloud bot, configure a TurnAnalyzerUserTurnStopStrategy with LocalSmartTurnAnalyzerV3 in your context aggregator. The model weights are bundled with Pipecat, so there’s no need to download them separately.

from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

async def main(room_url: str, token: str):
    transport = DailyTransport(
        room_url,
        token,
        "Voice AI Bot",
        DailyParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
        ),
    )

    # Configure Smart Turn Detection via user turn strategies
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            user_turn_strategies=UserTurnStrategies(
                stop=[TurnAnalyzerUserTurnStopStrategy(
                    turn_analyzer=LocalSmartTurnAnalyzerV3()
                )]
            ),
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )

    # Continue with your pipeline setup...

Smart Turn Detection requires VAD to be enabled with stop_secs=0.2 (default value). This value mimics the training data and allows Smart Turn to dynamically adjust timing based on the model’s predictions.

How It Works

Audio Analysis: The system continuously analyzes incoming audio for speech patterns
VAD Processing: Voice Activity Detection segments audio into speech and silence
Turn Classification: When VAD detects a pause, the ML model analyzes the speech segment for natural completion cues
Smart Response: The model determines if the turn is complete or if the user is likely to continue speaking

Training Data Collection

The smart-turn model is trained on real conversational data collected through these applications. Help us improve the model by contributing your own data or classifying existing data:

Data Collector

Contribute conversational data to improve the model

Data Classifier

Help classify turn completion patterns in conversations

More information

For more details on Smart Turn, see the following links:

Smart Turn Overview

More details about the Pipecat Smart Turn integration

Deploying your bot

Pipecat Cloud

Pipecat Cloud: REST Reference

Pipecat Cloud: SDK Reference

Smart Turn Detection

Overview

GitHub Repository

Model weights

Key Benefits

Quick Start

How It Works

Training Data Collection

Data Collector

Data Classifier

More information

Smart Turn Overview

Deploying your bot

Pipecat Cloud

Pipecat Cloud: REST Reference

Pipecat Cloud: SDK Reference

​Overview

GitHub Repository

Model weights

​Key Benefits

​Quick Start

​How It Works

​Training Data Collection

Data Collector

Data Classifier

​More information

Smart Turn Overview

Overview

Key Benefits

Quick Start

How It Works

Training Data Collection

More information