9.4 KiB
Real-Time Chunking (RTC) Examples
This directory contains examples and evaluation scripts for Real-Time Chunking (RTC), a technique for improving action chunking policies in real-time robot control.
Overview
Real-Time Chunking addresses the challenge of maintaining consistency and reactivity when using action chunking policies with non-negligible inference latency. It uses a guidance technique during diffusion sampling to blend new action predictions with previously planned actions.
Key Benefits:
- Maintains consistency between consecutive action chunks
- Reduces jitter and improves smoothness
- Adapts to inference delays dynamically
Reference: Physical Intelligence - Real-Time Chunking
Scripts
1. eval_dataset.py
Offline evaluation on dataset samples with detailed visualization and validation.
Features:
- Compare RTC vs non-RTC predictions on two random dataset samples
- Validate RTC behavior (delay region, blend region, post-horizon region)
- Generate debug visualizations:
- Denoising step comparisons (x_t, v_t, x1_t, corrections)
- Final action predictions comparison
- Support for torch.compile() optimization
- Memory-efficient sequential policy loading for large models
Usage:
# Basic usage with SmolVLA policy
uv run python examples/rtc/eval_dataset.py \
--policy.path=helper2424/smolvla_check_rtc_last3 \
--dataset.repo_id=helper2424/check_rtc \
--rtc.execution_horizon=8 \
--device=mps \
--rtc.max_guidance_weight=10.0 \
--seed=10
# With Pi0.5 policy on CUDA
uv run python examples/rtc/eval_dataset.py \
--policy.path=lerobot/pi05_libero_finetuned \
--dataset.repo_id=HuggingFaceVLA/libero \
--rtc.execution_horizon=8 \
--device=cuda
# With Pi0 policy
uv run python examples/rtc/eval_dataset.py \
--policy.path=lerobot/pi0_libero_finetuned \
--dataset.repo_id=HuggingFaceVLA/libero \
--rtc.execution_horizon=8 \
--device=cuda
# With torch.compile for faster inference
uv run python examples/rtc/eval_dataset.py \
--policy.path=helper2424/smolvla_check_rtc_last3 \
--dataset.repo_id=helper2424/check_rtc \
--rtc.execution_horizon=8 \
--device=cuda \
--use_torch_compile=true \
--torch_compile_mode=max-autotune
# Enable CUDA graphs (advanced - may cause tensor aliasing errors)
uv run python examples/rtc/eval_dataset.py \
--policy.path=helper2424/smolvla_check_rtc_last3 \
--dataset.repo_id=helper2424/check_rtc \
--use_torch_compile=true \
--torch_compile_backend=inductor \
--torch_compile_mode=max-autotune \
--torch_compile_disable_cudagraphs=false
Key Parameters:
--policy.path: Path to pretrained policy--dataset.repo_id: Dataset to evaluate on--rtc.execution_horizon: Number of steps to maintain consistency (default: 20)--rtc.max_guidance_weight: Maximum guidance weight (default: 10.0)--rtc.prefix_attention_schedule: Schedule type (ZEROS, ONES, LINEAR, EXP)--inference_delay: Inference delay for RTC (default: 4)--seed: Random seed for reproducibility (default: 42)--output_dir: Directory to save visualizations (default: rtc_debug_output)--device: Device to use (cuda, cpu, mps, auto)--use_torch_compile: Enable torch.compile() for faster inference
Output:
The script generates several visualization files in rtc_debug_output/:
denoising_xt_comparison.png- Noisy state evolution during denoisingdenoising_vt_comparison.png- Velocity predictions during denoisingdenoising_x1t_comparison.png- Predicted final states during denoisingdenoising_correction_comparison.png- RTC guidance corrections appliedfinal_actions_comparison.png- Final action predictions (prev_chunk, no_rtc, rtc)
The script also validates RTC behavior and reports:
- ✅ Delay region [0:inference_delay]: RTC = prev_chunk
- ✅ Blend region [inference_delay:execution_horizon]: prev_chunk ≤ RTC ≤ no_rtc
- ✅ Post-horizon [execution_horizon:]: RTC = no_rtc
2. eval_with_real_robot.py
Real-time evaluation on physical robots or simulation environments.
Features:
- Run policy with RTC on real robot or simulation
- Multi-threaded action execution and inference
- Action queue management with proper timing
- Latency tracking and adaptive inference delay
- Support for both robots and gym environments
- Support for torch.compile() optimization
Usage:
# With real robot
uv run python examples/rtc/eval_with_real_robot.py \
--policy.path=lerobot/smolvla_base \
--robot.type=so100 \
--task="pick up the cup" \
--duration=30.0
# With simulation environment
uv run python examples/rtc/eval_with_real_robot.py \
--policy.path=lerobot/smolvla_base \
--env.type=pusht \
--duration=60.0
# With policy compilation (CUDA only, not MPS)
uv run python examples/rtc/eval_with_real_robot.py \
--policy.path=lerobot/smolvla_base \
--robot.type=so100 \
--use_torch_compile=true \
--torch_compile_mode=max-autotune
Key Parameters:
--policy.path: Path to pretrained policy--robot.typeor--env.type: Robot or environment to use--task: Task description (for VLA models)--rtc.execution_horizon: Number of steps to maintain consistency (default: 10)--rtc.max_guidance_weight: Maximum guidance weight (default: 1.0)--rtc.prefix_attention_schedule: Schedule type (ZEROS, ONES, LINEAR, EXP)--duration: How long to run (seconds, default: 30.0)--fps: Action execution frequency (Hz, default: 10.0)--action_queue_size_to_get_new_actions: Queue size threshold to request new actions (default: 30)--device: Device to use (cuda, cpu, mps, auto)--use_torch_compile: Enable torch.compile() for faster inference
Understanding RTC Parameters
execution_horizon
Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity.
Typical values: 8-12 steps for dataset evaluation, 10 steps for real-time execution
max_guidance_weight
Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions.
Typical values:
- Dataset evaluation: 10.0-100.0 (can be higher for analysis)
- Real-time execution: 1.0-10.0 (more conservative)
prefix_attention_schedule
How to weight consistency across the overlap region:
ZEROS: Binary (full weight up to inference_delay, then zero)ONES: Full weight across entire execution_horizonLINEAR: Linear decay from inference_delay to execution_horizonEXP: Exponential decay (recommended)
Recommended: EXP
inference_delay
Number of timesteps from the prefix to use for guidance. Typically calculated dynamically based on inference latency in real-time execution, but fixed for dataset evaluation.
Typical values: 3-5 steps for dataset evaluation
action_queue_size_to_get_new_actions (real-time only)
Threshold for requesting new action chunks. Should be higher than inference_delay + execution_horizon to ensure smooth operation.
Typical values: 20-30 steps
Validation Rules (Dataset Evaluation)
The dataset evaluation script validates that RTC behavior matches expectations:
-
Delay Region [0:inference_delay]: RTC actions should equal previous chunk
- Ensures consistency during the inference delay period
-
Blend Region [inference_delay:execution_horizon]: RTC should be between prev_chunk and no_rtc
- Smooth transition from previous plan to new predictions
-
Post-Horizon [execution_horizon:]: RTC should equal no_rtc
- Full adoption of new predictions after execution horizon
Tips
- Start with dataset evaluation (
eval_dataset.py) to understand RTC behavior and tune parameters before running on robot - Use visualizations to debug unexpected behavior - check denoising steps and final actions
- Tune execution_horizon based on your inference latency and action frequency
- Monitor validation output - failures indicate potential implementation issues or misconfigured parameters
- Compare different schedules - EXP usually works best but LINEAR can be more interpretable
Troubleshooting
Validation fails in delay region
- Check that
prev_chunk_left_overis properly passed to the policy - Verify RTC guidance is being applied during denoising
- Look at denoising visualizations to see where guidance diverges
Validation fails in post-horizon region
- RTC and no_rtc use different noise - verify same noise is being used for comparison
- Check that weights are correctly zeroed out after execution horizon
- Review prefix_attention_schedule visualization
Poor performance on real robot
- Increase
action_queue_size_to_get_new_actionsif you see warnings - Reduce
max_guidance_weightif robot is too conservative - Try different
prefix_attention_schedulevalues - Enable torch.compile() for faster inference (CUDA only)
Memory issues with large models
- The dataset evaluation script loads policies sequentially to minimize memory
- For real-time execution, only one policy is loaded
- Use smaller batch sizes if needed