Skip to main content

Overview

Smart Turn Detection uses an advanced machine learning model to determine when a user has finished speaking and your bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection recognizes natural conversational cues like intonation patterns and linguistic signals for more natural conversations.

GitHub Repository

Open source model for conversational turn detection

Model weights

ONNX weights file for Smart Turn v3

Key Benefits

  • Natural conversations: More human-like turn-taking patterns
  • Free to use: The model is fully open source
  • Scalable: Smart Turn v3 supports fast CPU inference directly inside your Pipecat Cloud instance

Quick Start

To enable Smart Turn Detection in your Pipecat Cloud bot, configure a TurnAnalyzerUserTurnStopStrategy with LocalSmartTurnAnalyzerV3 in your context aggregator. The model weights are bundled with Pipecat, so there’s no need to download them separately.
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

async def main(room_url: str, token: str):
    transport = DailyTransport(
        room_url,
        token,
        "Voice AI Bot",
        DailyParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
        ),
    )

    # Configure Smart Turn Detection via user turn strategies
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            user_turn_strategies=UserTurnStrategies(
                stop=[TurnAnalyzerUserTurnStopStrategy(
                    turn_analyzer=LocalSmartTurnAnalyzerV3()
                )]
            ),
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )

    # Continue with your pipeline setup...
Smart Turn Detection requires VAD to be enabled with stop_secs=0.2 (default value). This value mimics the training data and allows Smart Turn to dynamically adjust timing based on the model’s predictions.

How It Works

  1. Audio Analysis: The system continuously analyzes incoming audio for speech patterns
  2. VAD Processing: Voice Activity Detection segments audio into speech and silence
  3. Turn Classification: When VAD detects a pause, the ML model analyzes the speech segment for natural completion cues
  4. Smart Response: The model determines if the turn is complete or if the user is likely to continue speaking

Training Data Collection

The smart-turn model is trained on real conversational data collected through these applications. Help us improve the model by contributing your own data or classifying existing data:

Data Collector

Contribute conversational data to improve the model

Data Classifier

Help classify turn completion patterns in conversations

More information

For more details on Smart Turn, see the following links:

Smart Turn Overview

More details about the Pipecat Smart Turn integration