admin/lerobot

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-16 00:59:46 +00:00

Files

T

Michel Aractingi c868777752 profile

2025-11-18 09:51:50 +01:00

5.7 KiB

Raw Blame History

RTC Profiling - Quick Start

Quick reference for profiling Pi0 with RTC to identify performance bottlenecks.

🚀 Quick Commands

1. Profile with Real Robot

# With RTC enabled (profiled version)
uv run examples/rtc/eval_with_real_robot_profiled.py \
    --policy.path=helper2424/pi05_check_rtc \
    --policy.device=mps \
    --rtc.enabled=true \
    --rtc.execution_horizon=20 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0}, front: {type: opencv, index_or_path: 1}}" \
    --task="Pick up object" \
    --duration=30

2. Compare RTC vs No-RTC (No Robot Needed)

uv run examples/rtc/profile_rtc_comparison.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=50 \
    --execution_horizon=20

3. Detailed RTC Method Profiling

uv run examples/rtc/profile_pi0_rtc_detailed.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=20 \
    --execution_horizon=20 \
    --enable_rtc_profiling

📊 What Each Tool Does

Tool	Purpose	Needs Robot?
`eval_with_real_robot_profiled.py`	Profile actual robot execution with RTC	✅ Yes
`profile_rtc_comparison.py`	Compare RTC vs no-RTC side-by-side	❌ No
`profile_pi0_rtc_detailed.py`	Deep dive into RTC internals	❌ No

🔍 Key Metrics to Watch

Overall Performance

iteration.policy_inference - Total policy inference time
iteration.preprocessing - Image preprocessing time
iteration.postprocessing - Action denormalization time

RTC-Specific (with `--enable_rtc_profiling`)

rtc.denoise_step.base_denoising - Time without RTC overhead
rtc.denoise_step.autograd_correction - Gradient computation time
rtc.denoise_step.guidance_computation - Total RTC guidance overhead

Robot Communication

robot.get_observation - Time to get robot state
robot.send_action - Time to send action command

🎯 Quick Diagnosis

RTC is slower than expected?

Check if torch.compile is enabled
```
# Add this flag
--use_torch_compile
```

Try larger execution horizon

# Increase to amortize RTC overhead
--rtc.execution_horizon=30

Profile to find bottleneck

uv run examples/rtc/profile_pi0_rtc_detailed.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --enable_rtc_profiling

Preprocessing is slow?

Reduce image resolution in robot config
Use fewer cameras
Check camera FPS settings

Policy inference is slow?

Enable torch.compile
Check device (MPS vs CUDA vs CPU)
Try smaller model if available

📈 Expected Performance

Typical timings on Apple Silicon (MPS):

Component	Time (ms)	Notes
Policy inference	100-200	Depends on model size
Preprocessing	5-20	Depends on #cameras
Postprocessing	1-5	Usually fast
RTC overhead	10-50	Should be < 50% of base

When RTC helps:

✅ Execution horizon ≥ 10
✅ Inference time > action execution rate
✅ Using torch.compile
✅ Proper inference_delay calculation

When RTC might not help:

❌ Very fast inference already
❌ Small execution horizon (< 5)
❌ No compilation (interpreted mode)
❌ Inference delay not accounted for

🛠️ Adding Profiling to Your Code

Quick snippet:

from lerobot.utils.profiling import enable_profiling, print_profiling_summary, profile_section

# Enable at start
enable_profiling()

# Profile sections
with profile_section("my_operation"):
    # ... your code ...
    pass

# Print at end
print_profiling_summary()

Profile specific methods:

from lerobot.utils.profiling import profile_method

@profile_method
def my_slow_function():
    # ... your code ...
    pass

📝 Example Output

PROFILING SUMMARY
================================================================================
Function                                                    Count    Mean (ms)
--------------------------------------------------------------------------------
iteration.policy_inference                                    20       150.23
iteration.preprocessing                                       20        12.45
rtc.denoise_step.guidance_computation                        200        15.67
rtc.denoise_step.autograd_correction                         200         8.23
rtc.denoise_step.base_denoising                             200       120.45
================================================================================

🚨 Common Issues

"No profiling data available"

Did you call enable_profiling()?
Running enough iterations?

Inconsistent results

Increase --num_iterations
Check for thermal throttling
Close other applications

Can't find bottleneck

Enable --enable_rtc_profiling for detailed breakdown
Check both preprocessing and inference
Compare with and without RTC

📖 More Details

See PROFILING_GUIDE.md for comprehensive documentation.

🤔 Still Slow?

Run comparison: profile_rtc_comparison.py
Run detailed profiling: profile_pi0_rtc_detailed.py --enable_rtc_profiling
Share output for help (include device, model, settings)

✅ Quick Checklist

Before asking for help, verify:

Ran comparison script (with/without RTC)
Tried torch.compile
Tested different execution horizons (10, 20, 30)
Profiled with detailed RTC profiling
Checked preprocessing vs inference split
Verified hardware (device type, thermal state)

5.7 KiB Raw Blame History