add docs for rtc

2026-07-24 02:06:15 +00:00 · 2025-11-18 17:46:49 +01:00
parent b7b0ac2456
commit 611159f8bb
6 changed files with 207 additions and 293 deletions
@@ -15,8 +15,6 @@
    title: Train a Robot with RL
  - local: hilserl_sim
    title: Train RL in Simulation
  - local: async
    title: Use Async Inference
  - local: multi_gpu_training
    title: Multi GPU training
  title: "Tutorials"
@@ -40,6 +38,12 @@
  - local: groot
    title: NVIDIA GR00T N1.5
  title: "Policies"
 - sections:
  - local: async
    title: Use Async Inference
  - local: rtc
    title: Real-Time Chunking (RTC)
  title: "Inference"
 - sections:
  - local: envhub
    title: Environments from the Hub
@@ -0,0 +1,173 @@
 # Real-Time Chunking (RTC)
 Real-Time Chunking (RTC) is an inference-time method that allows large, flow-matching based robotic policies, such as [Pi0](./pi0), [Pi0.5](./pi05), and [SmolVLA](./smolvla), to produce smooth, continuous, and reactive motion despite having high inference latency.
 These policies generate chunks of future actions (e.g., 50 steps at a time) instead of single actions.
 Because the models are large, producing each chunk takes longer than the time it takes the robot to execute it.
 Naively executing chunks leads to problems such as pauses, jerky transitions, or sudden changes in strategy whenever the next chunk arrives late or disagrees with the previously executed actions.
 RTC solves this by asynchronously generating the next chunk while the robot continues executing the current one, and by guiding the new chunk so it aligns smoothly with the portion of the previous chunk that has already been executed.
 ## How RTC Works (simplified)
 RTC lets the robot think ahead while it’s still moving. When the robot is carrying out one chunk of actions, RTC starts creating the next chunk early.
 But since the robot has already moved a bit by the time the new chunk is ready, RTC has to make sure the new chunk still lines up smoothly with what the robot is currently doing.
 To do this, RTC treats the beginning of the new chunk like an inpainting or “fill-in-the-gaps” problem:
 it gently adjusts the first part of the new chunk so it blends naturally with the robot’s ongoing motion. The result is no pauses, no sudden jumps.
 In technical terms, RTC adds a guidance term to the flow-matching denoising process that forces the overlapping timesteps of the new chunk to stay close to the executed portion of the previous chunk, typically using a soft transition mask.
 ## Quick Start
 ### Installation
 RTC is built into LeRobot. Just install the policy dependencies you need:
 ```bash
 # For Pi0 or Pi0.5
 pip install -e ".[pi]"
 # For SmolVLA
 pip install -e ".[smolvla]"
 ```
 ### Using RTC with Pi0
 Here's a minimal example of using RTC with Pi0:
 ```python
 from lerobot.policies.pi0 import PI0Policy, PI0Config
 from lerobot.configs.types import RTCAttentionSchedule
 from lerobot.policies.rtc.configuration_rtc import RTCConfig
 # Load Pi0 with RTC enabled
 policy_cfg = PI0Config()
 # Enable RTC
 policy_cfg.rtc_config = RTCConfig(
    enabled=True,
    execution_horizon=10,  # How many steps to blend with previous chunk
    max_guidance_weight=10.0,  # How strongly to enforce consistency
    prefix_attention_schedule=RTCAttentionSchedule.LINEAR,  # Linear blend
 )
 # Load the policy
 policy = PI0Policy.from_pretrained("lerobot/pi0_base", policy_cfg=policy_cfg, device="cuda")
 # Now use predict_action_chunk with RTC parameters
 prev_chunk_left_over = None  # Will hold the leftover from previous chunk
 inference_delay = 4  # How many steps of inference latency
 for step in range(num_steps):
    # Get observation from environment
    observation = get_observation()
    # Predict action chunk with RTC
    action_chunk = policy.predict_action_chunk(
        observation,
        inference_delay=inference_delay,
        prev_chunk_left_over=prev_chunk_left_over,
        execution_horizon=policy_cfg.rtc_config.execution_horizon,
    )
    # Execute the first N actions
    execute_actions(action_chunk[:execution_horizon])
    # Save the rest for next iteration
    prev_chunk_left_over = action_chunk[inference_delay:]
 ```
 ## Key Parameters
 `RTCConfig` has the following parameters to tune:
 **`execution_horizon`**: How many timesteps from the previous chunk to maintain consistency with. Higher values mean smoother transitions but potentially less reactivity.
 Typical values: 8-12 steps
 ```python
 RTCConfig(execution_horizon=10)
 ```
 **`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. Higher values give stronger smoothness but may over-constrain new predictions.
 Typical values:
 - Dataset evaluation: 10.0-100.0
 - Real-time robot control: 1.0-10.0
 **`prefix_attention_schedule`**: How to weight consistency across the overlap region.
 - `LINEAR`: Linear decay from inference_delay to execution_horizon (recommended for getting started)
 - `EXP`: Exponential decay (often performs better)
 - `ONES`: Full weight across entire execution_horizon
 - `ZEROS`: Binary (full weight up to inference_delay, then zero)
 **`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime.
 Typical values: 3-5 steps for dataset evaluation, dynamically calculated for real-time control
 ## Testing RTC Offline
 Before running on a real robot, test RTC with dataset samples to visualize how it works:
 ```bash
 python examples/rtc/eval_dataset.py \
    --policy.path=lerobot/pi0_libero_finetuned \
    --dataset.repo_id=HuggingFaceVLA/libero \
    --rtc.execution_horizon=10 \
    --rtc.max_guidance_weight=10.0 \
    --device=cuda
 ```
 ## Testing RTC with a Real Robot
 ```bash
 python examples/rtc/eval_with_real_robot.py \
    --policy.path=${HF_USERNAME}/policy_repo_id \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
    --task="Move green small object into the purple platform" \
    --duration=120 \
    --device=cuda
 ```
 ## How It Differs from the Async Inference in LeRobot
 Both RTC and [async inference](./async) improve real-time robot control, but they solve different problems.
 | Aspect        | Async Inference                                                            | RTC                                                 |
 | ------------- | -------------------------------------------------------------------------- | --------------------------------------------------- |
 | **Problem**   | Idle frames while waiting for inference                                    | Discontinuities between action chunks               |
 | **Solution**  | Decouple prediction from execution                                         | Guide new chunks to continue smoothly from previous |
 | **Benefit**   | No waiting, continuous action                                              | Smooth transitions, natural motion                  |
 | **Best Used** | Async inference is best used with large models with high inference latency | Flow-matching based policies                        |
 **Use both together** for maximum smoothness and reactivity!
 ## Advanced: Debug Tracking
 RTC includes built-in debug tracking to help you understand what's happening during inference:
 ```python
 # Enable debug tracking
 policy_cfg.rtc_config.debug = True
 policy_cfg.rtc_config.debug_maxlen = 100
 # After inference, access debug data
 debug_data = policy.rtc_processor.get_debug_data()
 # Visualize denoising steps, corrections, etc.
 from lerobot.policies.rtc.debug_visualizer import RTCDebugVisualizer
 visualizer = RTCDebugVisualizer()
 # ... create plots
 ```
 See `examples/rtc/eval_dataset.py` for a complete example of visualization.
 ## References
 - [Smooth-As-Butter Robot Policies](https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html) - Excellent technical explanation with real robot results
 - [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/research/real_time_chunking) - Original paper and research
 - [Kinetix RTC Implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - Reference implementation from Physical Intelligence
@@ -1,251 +0,0 @@
 # Real-Time Chunking (RTC) Examples
 This directory contains examples and evaluation scripts for Real-Time Chunking (RTC), a technique for improving action chunking policies in real-time robot control.
 ## Overview
 Real-Time Chunking addresses the challenge of maintaining consistency and reactivity when using action chunking policies with non-negligible inference latency. It uses a guidance technique during diffusion sampling to blend new action predictions with previously planned actions.
 **Key Benefits:**
 - Maintains consistency between consecutive action chunks
 - Reduces jitter and improves smoothness
 - Adapts to inference delays dynamically
 **Reference:** [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/download/real_time_chunking.pdf)
 ## Scripts
 ### 1. `eval_dataset.py`
 Offline evaluation on dataset samples with detailed visualization and validation.
 **Features:**
 - Compare RTC vs non-RTC predictions on two random dataset samples
 - Validate RTC behavior (delay region, blend region, post-horizon region)
 - Generate debug visualizations:
  - Denoising step comparisons (x_t, v_t, x1_t, corrections)
  - Final action predictions comparison
 - Support for torch.compile() optimization
 - Memory-efficient sequential policy loading for large models
 **Usage:**
 ```bash
 # Basic usage with SmolVLA policy
 uv run python examples/rtc/eval_dataset.py \
    --policy.path=helper2424/smolvla_check_rtc_last3 \
    --dataset.repo_id=helper2424/check_rtc \
    --rtc.execution_horizon=8 \
    --device=mps \
    --rtc.max_guidance_weight=10.0 \
    --seed=10
 # With Pi0.5 policy on CUDA
 uv run python examples/rtc/eval_dataset.py \
    --policy.path=lerobot/pi05_libero_finetuned \
    --dataset.repo_id=HuggingFaceVLA/libero \
    --rtc.execution_horizon=8 \
    --device=cuda
 # With Pi0 policy
 uv run python examples/rtc/eval_dataset.py \
    --policy.path=lerobot/pi0_libero_finetuned \
    --dataset.repo_id=HuggingFaceVLA/libero \
    --rtc.execution_horizon=8 \
    --device=cuda
 # With torch.compile for faster inference
 uv run python examples/rtc/eval_dataset.py \
    --policy.path=helper2424/smolvla_check_rtc_last3 \
    --dataset.repo_id=helper2424/check_rtc \
    --rtc.execution_horizon=8 \
    --device=cuda \
    --use_torch_compile=true \
    --torch_compile_mode=max-autotune
 # Enable CUDA graphs (advanced - may cause tensor aliasing errors)
 uv run python examples/rtc/eval_dataset.py \
    --policy.path=helper2424/smolvla_check_rtc_last3 \
    --dataset.repo_id=helper2424/check_rtc \
    --use_torch_compile=true \
    --torch_compile_backend=inductor \
    --torch_compile_mode=max-autotune \
    --torch_compile_disable_cudagraphs=false
 ```
 **Key Parameters:**
 - `--policy.path`: Path to pretrained policy
 - `--dataset.repo_id`: Dataset to evaluate on
 - `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 20)
 - `--rtc.max_guidance_weight`: Maximum guidance weight (default: 10.0)
 - `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
 - `--inference_delay`: Inference delay for RTC (default: 4)
 - `--seed`: Random seed for reproducibility (default: 42)
 - `--output_dir`: Directory to save visualizations (default: rtc_debug_output)
 - `--device`: Device to use (cuda, cpu, mps, auto)
 - `--use_torch_compile`: Enable torch.compile() for faster inference
 **Output:**
 The script generates several visualization files in `rtc_debug_output/`:
 - `denoising_xt_comparison.png` - Noisy state evolution during denoising
 - `denoising_vt_comparison.png` - Velocity predictions during denoising
 - `denoising_x1t_comparison.png` - Predicted final states during denoising
 - `denoising_correction_comparison.png` - RTC guidance corrections applied
 - `final_actions_comparison.png` - Final action predictions (prev_chunk, no_rtc, rtc)
 The script also validates RTC behavior and reports:
 - ✅ Delay region [0:inference_delay]: RTC = prev_chunk
 - ✅ Blend region [inference_delay:execution_horizon]: prev_chunk ≤ RTC ≤ no_rtc
 - ✅ Post-horizon [execution_horizon:]: RTC = no_rtc
 ### 2. `eval_with_real_robot.py`
 Real-time evaluation on physical robots or simulation environments.
 **Features:**
 - Run policy with RTC on real robot or simulation
 - Multi-threaded action execution and inference
 - Action queue management with proper timing
 - Latency tracking and adaptive inference delay
 - Support for both robots and gym environments
 - Support for torch.compile() optimization
 **Usage:**
 ```bash
 # With real robot
 uv run python examples/rtc/eval_with_real_robot.py \
    --policy.path=lerobot/smolvla_base \
    --robot.type=so100 \
    --task="pick up the cup" \
    --duration=30.0
 # With simulation environment
 uv run python examples/rtc/eval_with_real_robot.py \
    --policy.path=lerobot/smolvla_base \
    --env.type=pusht \
    --duration=60.0
 # With policy compilation (CUDA only, not MPS)
 uv run python examples/rtc/eval_with_real_robot.py \
    --policy.path=lerobot/smolvla_base \
    --robot.type=so100 \
    --use_torch_compile=true \
    --torch_compile_mode=max-autotune
 ```
 **Key Parameters:**
 - `--policy.path`: Path to pretrained policy
 - `--robot.type` or `--env.type`: Robot or environment to use
 - `--task`: Task description (for VLA models)
 - `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 10)
 - `--rtc.max_guidance_weight`: Maximum guidance weight (default: 1.0)
 - `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
 - `--duration`: How long to run (seconds, default: 30.0)
 - `--fps`: Action execution frequency (Hz, default: 10.0)
 - `--action_queue_size_to_get_new_actions`: Queue size threshold to request new actions (default: 30)
 - `--device`: Device to use (cuda, cpu, mps, auto)
 - `--use_torch_compile`: Enable torch.compile() for faster inference
 ## Understanding RTC Parameters
 ### `execution_horizon`
 Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity.
 **Typical values:** 8-12 steps for dataset evaluation, 10 steps for real-time execution
 ### `max_guidance_weight`
 Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions.
 **Typical values:**
 - Dataset evaluation: 10.0-100.0 (can be higher for analysis)
 - Real-time execution: 1.0-10.0 (more conservative)
 ### `prefix_attention_schedule`
 How to weight consistency across the overlap region:
 - `ZEROS`: Binary (full weight up to inference_delay, then zero)
 - `ONES`: Full weight across entire execution_horizon
 - `LINEAR`: Linear decay from inference_delay to execution_horizon
 - `EXP`: Exponential decay (recommended)
 **Recommended:** `EXP`
 ### `inference_delay`
 Number of timesteps from the prefix to use for guidance. Typically calculated dynamically based on inference latency in real-time execution, but fixed for dataset evaluation.
 **Typical values:** 3-5 steps for dataset evaluation
 ### `action_queue_size_to_get_new_actions` (real-time only)
 Threshold for requesting new action chunks. Should be higher than `inference_delay + execution_horizon` to ensure smooth operation.
 **Typical values:** 20-30 steps
 ## Validation Rules (Dataset Evaluation)
 The dataset evaluation script validates that RTC behavior matches expectations:
 1. **Delay Region [0:inference_delay]**: RTC actions should equal previous chunk
   - Ensures consistency during the inference delay period
 2. **Blend Region [inference_delay:execution_horizon]**: RTC should be between prev_chunk and no_rtc
   - Smooth transition from previous plan to new predictions
 3. **Post-Horizon [execution_horizon:]**: RTC should equal no_rtc
   - Full adoption of new predictions after execution horizon
 ## Tips
 1. **Start with dataset evaluation** (`eval_dataset.py`) to understand RTC behavior and tune parameters before running on robot
 2. **Use visualizations** to debug unexpected behavior - check denoising steps and final actions
 3. **Tune execution_horizon** based on your inference latency and action frequency
 4. **Monitor validation output** - failures indicate potential implementation issues or misconfigured parameters
 5. **Compare different schedules** - EXP usually works best but LINEAR can be more interpretable
 ## Troubleshooting
 ### Validation fails in delay region
 - Check that `prev_chunk_left_over` is properly passed to the policy
 - Verify RTC guidance is being applied during denoising
 - Look at denoising visualizations to see where guidance diverges
 ### Validation fails in post-horizon region
 - RTC and no_rtc use different noise - verify same noise is being used for comparison
 - Check that weights are correctly zeroed out after execution horizon
 - Review prefix_attention_schedule visualization
 ### Poor performance on real robot
 - Increase `action_queue_size_to_get_new_actions` if you see warnings
 - Reduce `max_guidance_weight` if robot is too conservative
 - Try different `prefix_attention_schedule` values
 - Enable torch.compile() for faster inference (CUDA only)
 ### Memory issues with large models
 - The dataset evaluation script loads policies sequentially to minimize memory
 - For real-time execution, only one policy is loaded
 - Use smaller batch sizes if needed
 ## Related Documentation
 - [RTC Implementation](../../src/lerobot/policies/rtc/modeling_rtc.py)
 - [RTC Configuration](../../src/lerobot/policies/rtc/configuration_rtc.py)
 - [Action Queue](../../src/lerobot/policies/rtc/action_queue.py)
 - [Physical Intelligence Paper](https://www.physicalintelligence.company/download/real_time_chunking.pdf)
@@ -98,7 +98,6 @@ pygame-dep = ["pygame>=2.5.1,<2.7.0"]
 placo-dep = ["placo>=0.9.6,<0.10.0"]
 transformers-dep = ["transformers>=4.53.0,<5.0.0"]
 grpcio-dep = ["grpcio==1.73.1", "protobuf==6.31.0"] # TODO: Bumb dependency (compatible with wandb)
 matplotlib-dep = ["matplotlib>=3.10.3,<4.0.0"]
 # Motors
 feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0"]
@@ -133,7 +132,7 @@ groot = [
 hilserl = ["lerobot[transformers-dep]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"]
 # Features
-async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"]
+async = ["lerobot[grpcio-dep]", "matplotlib>=3.10.3,<4.0.0"]
 # Development
 dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1"]
@@ -1,49 +1,38 @@
-# Real-Time Chunking (RTC) Module
+# Real-Time Chunking (RTC)
-This module implements Real-Time Chunking and related adaptive inference techniques for robotics policies in LeRobot.
+This module contains the LeRobot implementation of **Real-Time Chunking (RTC)**, an inference-time technique for flow-matching based policies.
-## Overview
+**Note**: RTC is not a policy itself, but rather an inference enhancement that works with flow-matching based policies including [π₀](../pi0/), [π₀.₅](../pi05/), and [SmolVLA](../smolvla/).
-Real-Time Chunking (RTC) addresses the challenge of real-time inference in action chunking policies by treating chunk generation as an inpainting problem. It strategically handles overlapping timesteps between action chunks using prefix attention mechanisms.
+---
-It is particularly effective for handling long-horizon inference in robotics policies.
+## Citation
-## Integration with Policies
+If you use Real-Time Chunking in your work, please cite:
-RTC can be integrated with any policy that supports flow mathicng for chunking:
+```bibtex
@misc{openpi2024,
  author       = {Physical Intelligence Lab},
  title        = {OpenPI: PyTorch Implementation of π0 and π0.5 Policies},
  year         = {2024},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/Physical-Intelligence/openpi}},
  license      = {Apache-2.0}
 }
- **SmolVLA**: Vision-language-action model with RTC support
+@misc{black2025realtimeexecutionactionchunking,
- **Pi0**: Action prediction model with adaptive chunking
+      title={Real-Time Execution of Action Chunking Flow Policies},
- **Pi05**: Action prediction model with adaptive chunking
+      author={Kevin Black and Manuel Y. Galliker and Sergey Levine},
-
+      year={2025},
-## Original Implementation
+      eprint={2506.07339},
-
+      archivePrefix={arXiv},
-This implementation is based on Physical Intelligence's Kinetix RTC:
+      primaryClass={cs.RO},
-
+      url={https://arxiv.org/abs/2506.07339},
- [Original RTC implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix/blob/main/src/model.py#L214)
+}
 - [Kinetix GitHub Repository](https://github.com/Physical-Intelligence/real-time-chunking-kinetix)
 ## References
 - [Real Time Chunking Paper](https://www.physicalintelligence.company/research/real_time_chunking)
 - [Physical Intelligence Kinetix](https://github.com/Physical-Intelligence/real-time-chunking-kinetix)
 ## How to run
 ### Check with data from the dataset
 ```bash
 uv run python examples/rtc/eval_dataset.py \
 --policy.path=helper2424/smolvla_check_rtc_last3 \
 --dataset.repo_id=helper2424/check_rtc \
 --rtc.execution_horizon=8 \
 --device=mps \
 --seed=42
 ```
-This script will evaluate RTC on a data from a dataset and save the results to a file, u can check the results in the `rtc_debug_output` directory.
+---
-The example output should look like this:
+## License
 ![Flow Matching with RTC](./flow_matching.png)
-It shows how flow matching works with RTC and without it. The chart shows values of action predictions for each timestep. The colour shows the the generation progress. The blue ones - earlier timesteps, the yellow ones - later timesteps. The red line is the ground truth (previous action chunk).
+This implementation follows the **Apache 2.0 License**, consistent with the LeRobot project.