Files
lerobot/examples/voice_control
2025-12-17 16:31:25 +01:00
..
2025-12-17 16:31:25 +01:00
2025-12-17 16:31:25 +01:00

Voice Assistant Examples

Voice-enabled robot assistant examples using speech-to-text (STT), and text-to-speech (TTS).

Overview

These examples demonstrate how to build a voice interface for robot control:

  1. Hold SPACE → Push-to-talk recording starts
  2. Release SPACE → Recording stops
  3. STT (Whisper) → Converts speech to text (high-level task prompt)
  4. Pi0.5 → Generates robot response/utterance
  5. TTS (Kokoro) → Speaks the response back

Requirements

pip install torch transformers sounddevice numpy pynput kokoro>=0.9.2

Usage

With Pi0.5 Model

python examples/voice_assistant/voice_assistant_pi05.py \
    --pretrained_path path/to/pi05/checkpoint

How It Works

Pi0.5 Voice Integration

Pi0.5 can generate robot utterances as part of its subtask prediction. The flow:

  1. High-level prompt: User voice command is transcribed and formatted as a task prompt
  2. Subtask generation: Pi0.5 autoregressively generates a response
  3. Utterance extraction: If the response contains <utterance>...</utterance> tags, the content is extracted
  4. TTS output: The response is spoken back to the user

Configuration Options

Option Default Description
--pretrained_path None Path to Pi0.5 checkpoint
--record_seconds 5.0 Audio recording duration
--max_response_tokens 100 Max tokens in generated response