Files
lerobot/examples/rtc/PROFILING_QUICK_START.md
T
Michel Aractingi c868777752 profile
2025-11-18 09:51:50 +01:00

5.7 KiB

RTC Profiling - Quick Start

Quick reference for profiling Pi0 with RTC to identify performance bottlenecks.

🚀 Quick Commands

1. Profile with Real Robot

# With RTC enabled (profiled version)
uv run examples/rtc/eval_with_real_robot_profiled.py \
    --policy.path=helper2424/pi05_check_rtc \
    --policy.device=mps \
    --rtc.enabled=true \
    --rtc.execution_horizon=20 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0}, front: {type: opencv, index_or_path: 1}}" \
    --task="Pick up object" \
    --duration=30

2. Compare RTC vs No-RTC (No Robot Needed)

uv run examples/rtc/profile_rtc_comparison.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=50 \
    --execution_horizon=20

3. Detailed RTC Method Profiling

uv run examples/rtc/profile_pi0_rtc_detailed.py \
    --policy_path=helper2424/pi05_check_rtc \
    --device=mps \
    --num_iterations=20 \
    --execution_horizon=20 \
    --enable_rtc_profiling

📊 What Each Tool Does

Tool Purpose Needs Robot?
eval_with_real_robot_profiled.py Profile actual robot execution with RTC Yes
profile_rtc_comparison.py Compare RTC vs no-RTC side-by-side No
profile_pi0_rtc_detailed.py Deep dive into RTC internals No

🔍 Key Metrics to Watch

Overall Performance

  • iteration.policy_inference - Total policy inference time
  • iteration.preprocessing - Image preprocessing time
  • iteration.postprocessing - Action denormalization time

RTC-Specific (with --enable_rtc_profiling)

  • rtc.denoise_step.base_denoising - Time without RTC overhead
  • rtc.denoise_step.autograd_correction - Gradient computation time
  • rtc.denoise_step.guidance_computation - Total RTC guidance overhead

Robot Communication

  • robot.get_observation - Time to get robot state
  • robot.send_action - Time to send action command

🎯 Quick Diagnosis

RTC is slower than expected?

  1. Check if torch.compile is enabled

    # Add this flag
    --use_torch_compile
    
  2. Try larger execution horizon

    # Increase to amortize RTC overhead
    --rtc.execution_horizon=30
    
  3. Profile to find bottleneck

    uv run examples/rtc/profile_pi0_rtc_detailed.py \
        --policy_path=helper2424/pi05_check_rtc \
        --device=mps \
        --enable_rtc_profiling
    

Preprocessing is slow?

  • Reduce image resolution in robot config
  • Use fewer cameras
  • Check camera FPS settings

Policy inference is slow?

  • Enable torch.compile
  • Check device (MPS vs CUDA vs CPU)
  • Try smaller model if available

📈 Expected Performance

Typical timings on Apple Silicon (MPS):

Component Time (ms) Notes
Policy inference 100-200 Depends on model size
Preprocessing 5-20 Depends on #cameras
Postprocessing 1-5 Usually fast
RTC overhead 10-50 Should be < 50% of base

When RTC helps:

  • Execution horizon ≥ 10
  • Inference time > action execution rate
  • Using torch.compile
  • Proper inference_delay calculation

When RTC might not help:

  • Very fast inference already
  • Small execution horizon (< 5)
  • No compilation (interpreted mode)
  • Inference delay not accounted for

🛠️ Adding Profiling to Your Code

Quick snippet:

from lerobot.utils.profiling import enable_profiling, print_profiling_summary, profile_section

# Enable at start
enable_profiling()

# Profile sections
with profile_section("my_operation"):
    # ... your code ...
    pass

# Print at end
print_profiling_summary()

Profile specific methods:

from lerobot.utils.profiling import profile_method

@profile_method
def my_slow_function():
    # ... your code ...
    pass

📝 Example Output

PROFILING SUMMARY
================================================================================
Function                                                    Count    Mean (ms)
--------------------------------------------------------------------------------
iteration.policy_inference                                    20       150.23
iteration.preprocessing                                       20        12.45
rtc.denoise_step.guidance_computation                        200        15.67
rtc.denoise_step.autograd_correction                         200         8.23
rtc.denoise_step.base_denoising                             200       120.45
================================================================================

🚨 Common Issues

"No profiling data available"

  • Did you call enable_profiling()?
  • Running enough iterations?

Inconsistent results

  • Increase --num_iterations
  • Check for thermal throttling
  • Close other applications

Can't find bottleneck

  • Enable --enable_rtc_profiling for detailed breakdown
  • Check both preprocessing and inference
  • Compare with and without RTC

📖 More Details

See PROFILING_GUIDE.md for comprehensive documentation.

🤔 Still Slow?

  1. Run comparison: profile_rtc_comparison.py
  2. Run detailed profiling: profile_pi0_rtc_detailed.py --enable_rtc_profiling
  3. Share output for help (include device, model, settings)

Quick Checklist

Before asking for help, verify:

  • Ran comparison script (with/without RTC)
  • Tried torch.compile
  • Tested different execution horizons (10, 20, 30)
  • Profiled with detailed RTC profiling
  • Checked preprocessing vs inference split
  • Verified hardware (device type, thermal state)