mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-19 18:49:52 +00:00
209 lines
5.7 KiB
Markdown
209 lines
5.7 KiB
Markdown
# RTC Profiling - Quick Start
|
|
|
|
Quick reference for profiling Pi0 with RTC to identify performance bottlenecks.
|
|
|
|
## 🚀 Quick Commands
|
|
|
|
### 1. Profile with Real Robot
|
|
|
|
```bash
|
|
# With RTC enabled (profiled version)
|
|
uv run examples/rtc/eval_with_real_robot_profiled.py \
|
|
--policy.path=helper2424/pi05_check_rtc \
|
|
--policy.device=mps \
|
|
--rtc.enabled=true \
|
|
--rtc.execution_horizon=20 \
|
|
--robot.type=so100_follower \
|
|
--robot.port=/dev/tty.usbmodem58FA0834591 \
|
|
--robot.cameras="{ gripper: {type: opencv, index_or_path: 0}, front: {type: opencv, index_or_path: 1}}" \
|
|
--task="Pick up object" \
|
|
--duration=30
|
|
```
|
|
|
|
### 2. Compare RTC vs No-RTC (No Robot Needed)
|
|
|
|
```bash
|
|
uv run examples/rtc/profile_rtc_comparison.py \
|
|
--policy_path=helper2424/pi05_check_rtc \
|
|
--device=mps \
|
|
--num_iterations=50 \
|
|
--execution_horizon=20
|
|
```
|
|
|
|
### 3. Detailed RTC Method Profiling
|
|
|
|
```bash
|
|
uv run examples/rtc/profile_pi0_rtc_detailed.py \
|
|
--policy_path=helper2424/pi05_check_rtc \
|
|
--device=mps \
|
|
--num_iterations=20 \
|
|
--execution_horizon=20 \
|
|
--enable_rtc_profiling
|
|
```
|
|
|
|
## 📊 What Each Tool Does
|
|
|
|
| Tool | Purpose | Needs Robot? |
|
|
|------|---------|--------------|
|
|
| `eval_with_real_robot_profiled.py` | Profile actual robot execution with RTC | ✅ Yes |
|
|
| `profile_rtc_comparison.py` | Compare RTC vs no-RTC side-by-side | ❌ No |
|
|
| `profile_pi0_rtc_detailed.py` | Deep dive into RTC internals | ❌ No |
|
|
|
|
## 🔍 Key Metrics to Watch
|
|
|
|
### Overall Performance
|
|
- **iteration.policy_inference** - Total policy inference time
|
|
- **iteration.preprocessing** - Image preprocessing time
|
|
- **iteration.postprocessing** - Action denormalization time
|
|
|
|
### RTC-Specific (with `--enable_rtc_profiling`)
|
|
- **rtc.denoise_step.base_denoising** - Time without RTC overhead
|
|
- **rtc.denoise_step.autograd_correction** - Gradient computation time
|
|
- **rtc.denoise_step.guidance_computation** - Total RTC guidance overhead
|
|
|
|
### Robot Communication
|
|
- **robot.get_observation** - Time to get robot state
|
|
- **robot.send_action** - Time to send action command
|
|
|
|
## 🎯 Quick Diagnosis
|
|
|
|
### RTC is slower than expected?
|
|
|
|
1. **Check if torch.compile is enabled**
|
|
```bash
|
|
# Add this flag
|
|
--use_torch_compile
|
|
```
|
|
|
|
2. **Try larger execution horizon**
|
|
```bash
|
|
# Increase to amortize RTC overhead
|
|
--rtc.execution_horizon=30
|
|
```
|
|
|
|
3. **Profile to find bottleneck**
|
|
```bash
|
|
uv run examples/rtc/profile_pi0_rtc_detailed.py \
|
|
--policy_path=helper2424/pi05_check_rtc \
|
|
--device=mps \
|
|
--enable_rtc_profiling
|
|
```
|
|
|
|
### Preprocessing is slow?
|
|
|
|
- Reduce image resolution in robot config
|
|
- Use fewer cameras
|
|
- Check camera FPS settings
|
|
|
|
### Policy inference is slow?
|
|
|
|
- Enable torch.compile
|
|
- Check device (MPS vs CUDA vs CPU)
|
|
- Try smaller model if available
|
|
|
|
## 📈 Expected Performance
|
|
|
|
### Typical timings on Apple Silicon (MPS):
|
|
|
|
| Component | Time (ms) | Notes |
|
|
|-----------|-----------|-------|
|
|
| Policy inference | 100-200 | Depends on model size |
|
|
| Preprocessing | 5-20 | Depends on #cameras |
|
|
| Postprocessing | 1-5 | Usually fast |
|
|
| RTC overhead | 10-50 | Should be < 50% of base |
|
|
|
|
### When RTC helps:
|
|
- ✅ Execution horizon ≥ 10
|
|
- ✅ Inference time > action execution rate
|
|
- ✅ Using torch.compile
|
|
- ✅ Proper inference_delay calculation
|
|
|
|
### When RTC might not help:
|
|
- ❌ Very fast inference already
|
|
- ❌ Small execution horizon (< 5)
|
|
- ❌ No compilation (interpreted mode)
|
|
- ❌ Inference delay not accounted for
|
|
|
|
## 🛠️ Adding Profiling to Your Code
|
|
|
|
### Quick snippet:
|
|
|
|
```python
|
|
from lerobot.utils.profiling import enable_profiling, print_profiling_summary, profile_section
|
|
|
|
# Enable at start
|
|
enable_profiling()
|
|
|
|
# Profile sections
|
|
with profile_section("my_operation"):
|
|
# ... your code ...
|
|
pass
|
|
|
|
# Print at end
|
|
print_profiling_summary()
|
|
```
|
|
|
|
### Profile specific methods:
|
|
|
|
```python
|
|
from lerobot.utils.profiling import profile_method
|
|
|
|
@profile_method
|
|
def my_slow_function():
|
|
# ... your code ...
|
|
pass
|
|
```
|
|
|
|
## 📝 Example Output
|
|
|
|
```
|
|
PROFILING SUMMARY
|
|
================================================================================
|
|
Function Count Mean (ms)
|
|
--------------------------------------------------------------------------------
|
|
iteration.policy_inference 20 150.23
|
|
iteration.preprocessing 20 12.45
|
|
rtc.denoise_step.guidance_computation 200 15.67
|
|
rtc.denoise_step.autograd_correction 200 8.23
|
|
rtc.denoise_step.base_denoising 200 120.45
|
|
================================================================================
|
|
```
|
|
|
|
## 🚨 Common Issues
|
|
|
|
### "No profiling data available"
|
|
- Did you call `enable_profiling()`?
|
|
- Running enough iterations?
|
|
|
|
### Inconsistent results
|
|
- Increase `--num_iterations`
|
|
- Check for thermal throttling
|
|
- Close other applications
|
|
|
|
### Can't find bottleneck
|
|
- Enable `--enable_rtc_profiling` for detailed breakdown
|
|
- Check both preprocessing and inference
|
|
- Compare with and without RTC
|
|
|
|
## 📖 More Details
|
|
|
|
See `PROFILING_GUIDE.md` for comprehensive documentation.
|
|
|
|
## 🤔 Still Slow?
|
|
|
|
1. Run comparison: `profile_rtc_comparison.py`
|
|
2. Run detailed profiling: `profile_pi0_rtc_detailed.py --enable_rtc_profiling`
|
|
3. Share output for help (include device, model, settings)
|
|
|
|
## ✅ Quick Checklist
|
|
|
|
Before asking for help, verify:
|
|
|
|
- [ ] Ran comparison script (with/without RTC)
|
|
- [ ] Tried torch.compile
|
|
- [ ] Tested different execution horizons (10, 20, 30)
|
|
- [ ] Profiled with detailed RTC profiling
|
|
- [ ] Checked preprocessing vs inference split
|
|
- [ ] Verified hardware (device type, thermal state)
|
|
|