lerobot/examples/rtc/PROFILING_QUICK_START.md

# RTC Profiling - Quick Start

Quick reference for profiling Pi0 with RTC to identify performance bottlenecks.

## 🚀 Quick Commands

### 1. Profile with Real Robot

```bash
# With RTC enabled (profiled version)
uv run examples/rtc/eval_with_real_robot_profiled.py \
    --policy.path=helper2424/pi05_check_rtc \
    --policy.device=mps \
    --rtc.enabled=true \
    --rtc.execution_horizon=20 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0}, front: {type: opencv, index_or_path: 1}}" \
    --task="Pick up object" \
    --duration=30
```

### 2. Compare RTC vs No-RTC (No Robot Needed)

```bash
uv run examples/rtc/profile_rtc_comparison.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=50 \
    --execution_horizon=20
```

### 3. Detailed RTC Method Profiling

```bash
uv run examples/rtc/profile_pi0_rtc_detailed.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=20 \
    --execution_horizon=20 \
    --enable_rtc_profiling
```

## 📊 What Each Tool Does

| Tool | Purpose | Needs Robot? |
|------|---------|--------------|
| `eval_with_real_robot_profiled.py` | Profile actual robot execution with RTC | ✅ Yes |
| `profile_rtc_comparison.py` | Compare RTC vs no-RTC side-by-side | ❌ No |
| `profile_pi0_rtc_detailed.py` | Deep dive into RTC internals | ❌ No |

## 🔍 Key Metrics to Watch

### Overall Performance
- **iteration.policy_inference** - Total policy inference time
- **iteration.preprocessing** - Image preprocessing time
- **iteration.postprocessing** - Action denormalization time

### RTC-Specific (with `--enable_rtc_profiling`)
- **rtc.denoise_step.base_denoising** - Time without RTC overhead
- **rtc.denoise_step.autograd_correction** - Gradient computation time
- **rtc.denoise_step.guidance_computation** - Total RTC guidance overhead

### Robot Communication
- **robot.get_observation** - Time to get robot state
- **robot.send_action** - Time to send action command

## 🎯 Quick Diagnosis

### RTC is slower than expected?

1. **Check if torch.compile is enabled**
   ```bash
   # Add this flag
   --use_torch_compile
   ```

2. **Try larger execution horizon**
   ```bash
   # Increase to amortize RTC overhead
   --rtc.execution_horizon=30
   ```

3. **Profile to find bottleneck**
   ```bash
   uv run examples/rtc/profile_pi0_rtc_detailed.py \
       --policy_path=helper2424/pi05_check_rtc \
       --device=mps \
       --enable_rtc_profiling
   ```

### Preprocessing is slow?

- Reduce image resolution in robot config
- Use fewer cameras
- Check camera FPS settings

### Policy inference is slow?

- Enable torch.compile
- Check device (MPS vs CUDA vs CPU)
- Try smaller model if available

## 📈 Expected Performance

### Typical timings on Apple Silicon (MPS):

| Component | Time (ms) | Notes |
|-----------|-----------|-------|
| Policy inference | 100-200 | Depends on model size |
| Preprocessing | 5-20 | Depends on #cameras |
| Postprocessing | 1-5 | Usually fast |
| RTC overhead | 10-50 | Should be < 50% of base |

### When RTC helps:
- ✅ Execution horizon ≥ 10
- ✅ Inference time > action execution rate
- ✅ Using torch.compile
- ✅ Proper inference_delay calculation

### When RTC might not help:
- ❌ Very fast inference already
- ❌ Small execution horizon (< 5)
- ❌ No compilation (interpreted mode)
- ❌ Inference delay not accounted for

## 🛠️ Adding Profiling to Your Code

### Quick snippet:

```python
from lerobot.utils.profiling import enable_profiling, print_profiling_summary, profile_section

# Enable at start
enable_profiling()

# Profile sections
with profile_section("my_operation"):
    # ... your code ...
    pass

# Print at end
print_profiling_summary()
```

### Profile specific methods:

```python
from lerobot.utils.profiling import profile_method

@profile_method
def my_slow_function():
    # ... your code ...
    pass
```

## 📝 Example Output

```
PROFILING SUMMARY
================================================================================
Function                                                    Count    Mean (ms)
--------------------------------------------------------------------------------
iteration.policy_inference                                    20       150.23
iteration.preprocessing                                       20        12.45
rtc.denoise_step.guidance_computation                        200        15.67
rtc.denoise_step.autograd_correction                         200         8.23
rtc.denoise_step.base_denoising                             200       120.45
================================================================================
```

## 🚨 Common Issues

### "No profiling data available"
- Did you call `enable_profiling()`?
- Running enough iterations?

### Inconsistent results
- Increase `--num_iterations`
- Check for thermal throttling
- Close other applications

### Can't find bottleneck
- Enable `--enable_rtc_profiling` for detailed breakdown
- Check both preprocessing and inference
- Compare with and without RTC

## 📖 More Details

See `PROFILING_GUIDE.md` for comprehensive documentation.

## 🤔 Still Slow?

1. Run comparison: `profile_rtc_comparison.py`
2. Run detailed profiling: `profile_pi0_rtc_detailed.py --enable_rtc_profiling`
3. Share output for help (include device, model, settings)

## ✅ Quick Checklist

Before asking for help, verify:

- [ ] Ran comparison script (with/without RTC)
- [ ] Tried torch.compile
- [ ] Tested different execution horizons (10, 20, 30)
- [ ] Profiled with detailed RTC profiling
- [ ] Checked preprocessing vs inference split
- [ ] Verified hardware (device type, thermal state)