Add voice example

2026-07-20 16:31:55 +00:00 · 2025-12-17 16:31:25 +01:00
parent 8e05dc9a7a
commit b229e7df28
2 changed files with 383 additions and 0 deletions
@@ -0,0 +1,47 @@
+# Voice Assistant Examples
+
+Voice-enabled robot assistant examples using speech-to-text (STT), and text-to-speech (TTS).
+
+## Overview
+
+These examples demonstrate how to build a voice interface for robot control:
+
+1. **Hold SPACE** → Push-to-talk recording starts
+2. **Release SPACE** → Recording stops
+3. **STT (Whisper)** → Converts speech to text (high-level task prompt)
+4. **Pi0.5** → Generates robot response/utterance
+5. **TTS (Kokoro)** → Speaks the response back
+
+## Requirements
+
+```bash
+pip install torch transformers sounddevice numpy pynput kokoro>=0.9.2
+```
+
+## Usage
+
+### With Pi0.5 Model
+
+```bash
+python examples/voice_assistant/voice_assistant_pi05.py \
+    --pretrained_path path/to/pi05/checkpoint
+```
+
+## How It Works
+
+### Pi0.5 Voice Integration
+
+Pi0.5 can generate robot utterances as part of its subtask prediction. The flow:
+
+1. **High-level prompt**: User voice command is transcribed and formatted as a task prompt
+2. **Subtask generation**: Pi0.5 autoregressively generates a response
+3. **Utterance extraction**: If the response contains `<utterance>...</utterance>` tags, the content is extracted
+4. **TTS output**: The response is spoken back to the user
+
+## Configuration Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--pretrained_path` | None | Path to Pi0.5 checkpoint |
+| `--record_seconds` | 5.0 | Audio recording duration |
+| `--max_response_tokens` | 100 | Max tokens in generated response |