mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 09:09:48 +00:00
47 lines
1.4 KiB
Markdown
47 lines
1.4 KiB
Markdown
# Voice Assistant Examples
|
|
|
|
Voice-enabled robot assistant examples using speech-to-text (STT), and text-to-speech (TTS).
|
|
|
|
## Overview
|
|
|
|
These examples demonstrate how to build a voice interface for robot control:
|
|
|
|
1. **Hold SPACE** → Push-to-talk recording starts
|
|
2. **Release SPACE** → Recording stops
|
|
3. **STT (Whisper)** → Converts speech to text (high-level task prompt)
|
|
4. **Pi0.5** → Generates robot response/utterance
|
|
5. **TTS (Kokoro)** → Speaks the response back
|
|
|
|
## Requirements
|
|
|
|
```bash
|
|
pip install torch transformers sounddevice numpy pynput kokoro>=0.9.2
|
|
```
|
|
|
|
## Usage
|
|
|
|
### With Pi0.5 Model
|
|
|
|
```bash
|
|
python examples/voice_assistant/voice_assistant_pi05.py \
|
|
--pretrained_path path/to/pi05/checkpoint
|
|
```
|
|
|
|
## How It Works
|
|
|
|
### Pi0.5 Voice Integration
|
|
|
|
Pi0.5 can generate robot utterances as part of its subtask prediction. The flow:
|
|
|
|
1. **High-level prompt**: User voice command is transcribed and formatted as a task prompt
|
|
2. **Subtask generation**: Pi0.5 autoregressively generates a response
|
|
3. **Utterance extraction**: If the response contains `<utterance>...</utterance>` tags, the content is extracted
|
|
4. **TTS output**: The response is spoken back to the user
|
|
|
|
## Configuration Options
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| `--pretrained_path` | None | Path to Pi0.5 checkpoint |
|
|
| `--record_seconds` | 5.0 | Audio recording duration |
|
|
| `--max_response_tokens` | 100 | Max tokens in generated response | |