mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-15 16:49:55 +00:00
Voice Assistant Examples
Voice-enabled robot assistant examples using speech-to-text (STT), and text-to-speech (TTS).
Overview
These examples demonstrate how to build a voice interface for robot control:
- Hold SPACE → Push-to-talk recording starts
- Release SPACE → Recording stops
- STT (Whisper) → Converts speech to text (high-level task prompt)
- Pi0.5 → Generates robot response/utterance
- TTS (Kokoro) → Speaks the response back
Requirements
pip install torch transformers sounddevice numpy pynput kokoro>=0.9.2
Usage
With Pi0.5 Model
python examples/voice_assistant/voice_assistant_pi05.py \
--pretrained_path path/to/pi05/checkpoint
How It Works
Pi0.5 Voice Integration
Pi0.5 can generate robot utterances as part of its subtask prediction. The flow:
- High-level prompt: User voice command is transcribed and formatted as a task prompt
- Subtask generation: Pi0.5 autoregressively generates a response
- Utterance extraction: If the response contains
<utterance>...</utterance>tags, the content is extracted - TTS output: The response is spoken back to the user
Configuration Options
| Option | Default | Description |
|---|---|---|
--pretrained_path |
None | Path to Pi0.5 checkpoint |
--record_seconds |
5.0 | Audio recording duration |
--max_response_tokens |
100 | Max tokens in generated response |