diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 0cf8aa9a6..ef0cb798f 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -15,8 +15,6 @@ title: Train a Robot with RL - local: hilserl_sim title: Train RL in Simulation - - local: async - title: Use Async Inference - local: multi_gpu_training title: Multi GPU training title: "Tutorials" @@ -40,6 +38,12 @@ - local: groot title: NVIDIA GR00T N1.5 title: "Policies" +- sections: + - local: async + title: Use Async Inference + - local: rtc + title: Real-Time Chunking (RTC) + title: "Inference" - sections: - local: envhub title: Environments from the Hub diff --git a/docs/source/rtc.mdx b/docs/source/rtc.mdx new file mode 100644 index 000000000..4f87beb1e --- /dev/null +++ b/docs/source/rtc.mdx @@ -0,0 +1,173 @@ +# Real-Time Chunking (RTC) + +Real-Time Chunking (RTC) is an inference-time method that allows large, flow-matching based robotic policies, such as [Pi0](./pi0), [Pi0.5](./pi05), and [SmolVLA](./smolvla), to produce smooth, continuous, and reactive motion despite having high inference latency. + +These policies generate chunks of future actions (e.g., 50 steps at a time) instead of single actions. +Because the models are large, producing each chunk takes longer than the time it takes the robot to execute it. +Naively executing chunks leads to problems such as pauses, jerky transitions, or sudden changes in strategy whenever the next chunk arrives late or disagrees with the previously executed actions. + +RTC solves this by asynchronously generating the next chunk while the robot continues executing the current one, and by guiding the new chunk so it aligns smoothly with the portion of the previous chunk that has already been executed. + +## How RTC Works (simplified) + +RTC lets the robot think ahead while it’s still moving. When the robot is carrying out one chunk of actions, RTC starts creating the next chunk early. +But since the robot has already moved a bit by the time the new chunk is ready, RTC has to make sure the new chunk still lines up smoothly with what the robot is currently doing. + +To do this, RTC treats the beginning of the new chunk like an inpainting or “fill-in-the-gaps” problem: +it gently adjusts the first part of the new chunk so it blends naturally with the robot’s ongoing motion. The result is no pauses, no sudden jumps. + +In technical terms, RTC adds a guidance term to the flow-matching denoising process that forces the overlapping timesteps of the new chunk to stay close to the executed portion of the previous chunk, typically using a soft transition mask. + +## Quick Start + +### Installation + +RTC is built into LeRobot. Just install the policy dependencies you need: + +```bash +# For Pi0 or Pi0.5 +pip install -e ".[pi]" + +# For SmolVLA +pip install -e ".[smolvla]" +``` + +### Using RTC with Pi0 + +Here's a minimal example of using RTC with Pi0: + +```python +from lerobot.policies.pi0 import PI0Policy, PI0Config +from lerobot.configs.types import RTCAttentionSchedule +from lerobot.policies.rtc.configuration_rtc import RTCConfig + +# Load Pi0 with RTC enabled +policy_cfg = PI0Config() + +# Enable RTC +policy_cfg.rtc_config = RTCConfig( + enabled=True, + execution_horizon=10, # How many steps to blend with previous chunk + max_guidance_weight=10.0, # How strongly to enforce consistency + prefix_attention_schedule=RTCAttentionSchedule.LINEAR, # Linear blend +) + +# Load the policy +policy = PI0Policy.from_pretrained("lerobot/pi0_base", policy_cfg=policy_cfg, device="cuda") + +# Now use predict_action_chunk with RTC parameters +prev_chunk_left_over = None # Will hold the leftover from previous chunk +inference_delay = 4 # How many steps of inference latency + +for step in range(num_steps): + # Get observation from environment + observation = get_observation() + + # Predict action chunk with RTC + action_chunk = policy.predict_action_chunk( + observation, + inference_delay=inference_delay, + prev_chunk_left_over=prev_chunk_left_over, + execution_horizon=policy_cfg.rtc_config.execution_horizon, + ) + + # Execute the first N actions + execute_actions(action_chunk[:execution_horizon]) + + # Save the rest for next iteration + prev_chunk_left_over = action_chunk[inference_delay:] +``` + +## Key Parameters + +`RTCConfig` has the following parameters to tune: + +**`execution_horizon`**: How many timesteps from the previous chunk to maintain consistency with. Higher values mean smoother transitions but potentially less reactivity. + +Typical values: 8-12 steps + +```python +RTCConfig(execution_horizon=10) +``` + +**`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. Higher values give stronger smoothness but may over-constrain new predictions. + +Typical values: + +- Dataset evaluation: 10.0-100.0 +- Real-time robot control: 1.0-10.0 + +**`prefix_attention_schedule`**: How to weight consistency across the overlap region. + +- `LINEAR`: Linear decay from inference_delay to execution_horizon (recommended for getting started) +- `EXP`: Exponential decay (often performs better) +- `ONES`: Full weight across entire execution_horizon +- `ZEROS`: Binary (full weight up to inference_delay, then zero) + +**`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime. +Typical values: 3-5 steps for dataset evaluation, dynamically calculated for real-time control + +## Testing RTC Offline + +Before running on a real robot, test RTC with dataset samples to visualize how it works: + +```bash +python examples/rtc/eval_dataset.py \ + --policy.path=lerobot/pi0_libero_finetuned \ + --dataset.repo_id=HuggingFaceVLA/libero \ + --rtc.execution_horizon=10 \ + --rtc.max_guidance_weight=10.0 \ + --device=cuda +``` + +## Testing RTC with a Real Robot + +```bash +python examples/rtc/eval_with_real_robot.py \ + --policy.path=${HF_USERNAME}/policy_repo_id \ + --robot.type=so100_follower \ + --robot.port=/dev/tty.usbmodem58FA0834591 \ + --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \ + --task="Move green small object into the purple platform" \ + --duration=120 \ + --device=cuda +``` + +## How It Differs from the Async Inference in LeRobot + +Both RTC and [async inference](./async) improve real-time robot control, but they solve different problems. + +| Aspect | Async Inference | RTC | +| ------------- | -------------------------------------------------------------------------- | --------------------------------------------------- | +| **Problem** | Idle frames while waiting for inference | Discontinuities between action chunks | +| **Solution** | Decouple prediction from execution | Guide new chunks to continue smoothly from previous | +| **Benefit** | No waiting, continuous action | Smooth transitions, natural motion | +| **Best Used** | Async inference is best used with large models with high inference latency | Flow-matching based policies | + +**Use both together** for maximum smoothness and reactivity! + +## Advanced: Debug Tracking + +RTC includes built-in debug tracking to help you understand what's happening during inference: + +```python +# Enable debug tracking +policy_cfg.rtc_config.debug = True +policy_cfg.rtc_config.debug_maxlen = 100 + +# After inference, access debug data +debug_data = policy.rtc_processor.get_debug_data() + +# Visualize denoising steps, corrections, etc. +from lerobot.policies.rtc.debug_visualizer import RTCDebugVisualizer +visualizer = RTCDebugVisualizer() +# ... create plots +``` + +See `examples/rtc/eval_dataset.py` for a complete example of visualization. + +## References + +- [Smooth-As-Butter Robot Policies](https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html) - Excellent technical explanation with real robot results +- [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/research/real_time_chunking) - Original paper and research +- [Kinetix RTC Implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - Reference implementation from Physical Intelligence diff --git a/examples/rtc/README.md b/examples/rtc/README.md deleted file mode 100644 index 5128645e7..000000000 --- a/examples/rtc/README.md +++ /dev/null @@ -1,251 +0,0 @@ -# Real-Time Chunking (RTC) Examples - -This directory contains examples and evaluation scripts for Real-Time Chunking (RTC), a technique for improving action chunking policies in real-time robot control. - -## Overview - -Real-Time Chunking addresses the challenge of maintaining consistency and reactivity when using action chunking policies with non-negligible inference latency. It uses a guidance technique during diffusion sampling to blend new action predictions with previously planned actions. - -**Key Benefits:** - -- Maintains consistency between consecutive action chunks -- Reduces jitter and improves smoothness -- Adapts to inference delays dynamically - -**Reference:** [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/download/real_time_chunking.pdf) - -## Scripts - -### 1. `eval_dataset.py` - -Offline evaluation on dataset samples with detailed visualization and validation. - -**Features:** - -- Compare RTC vs non-RTC predictions on two random dataset samples -- Validate RTC behavior (delay region, blend region, post-horizon region) -- Generate debug visualizations: - - Denoising step comparisons (x_t, v_t, x1_t, corrections) - - Final action predictions comparison -- Support for torch.compile() optimization -- Memory-efficient sequential policy loading for large models - -**Usage:** - -```bash -# Basic usage with SmolVLA policy -uv run python examples/rtc/eval_dataset.py \ - --policy.path=helper2424/smolvla_check_rtc_last3 \ - --dataset.repo_id=helper2424/check_rtc \ - --rtc.execution_horizon=8 \ - --device=mps \ - --rtc.max_guidance_weight=10.0 \ - --seed=10 - -# With Pi0.5 policy on CUDA -uv run python examples/rtc/eval_dataset.py \ - --policy.path=lerobot/pi05_libero_finetuned \ - --dataset.repo_id=HuggingFaceVLA/libero \ - --rtc.execution_horizon=8 \ - --device=cuda - -# With Pi0 policy -uv run python examples/rtc/eval_dataset.py \ - --policy.path=lerobot/pi0_libero_finetuned \ - --dataset.repo_id=HuggingFaceVLA/libero \ - --rtc.execution_horizon=8 \ - --device=cuda - -# With torch.compile for faster inference -uv run python examples/rtc/eval_dataset.py \ - --policy.path=helper2424/smolvla_check_rtc_last3 \ - --dataset.repo_id=helper2424/check_rtc \ - --rtc.execution_horizon=8 \ - --device=cuda \ - --use_torch_compile=true \ - --torch_compile_mode=max-autotune - -# Enable CUDA graphs (advanced - may cause tensor aliasing errors) -uv run python examples/rtc/eval_dataset.py \ - --policy.path=helper2424/smolvla_check_rtc_last3 \ - --dataset.repo_id=helper2424/check_rtc \ - --use_torch_compile=true \ - --torch_compile_backend=inductor \ - --torch_compile_mode=max-autotune \ - --torch_compile_disable_cudagraphs=false -``` - -**Key Parameters:** - -- `--policy.path`: Path to pretrained policy -- `--dataset.repo_id`: Dataset to evaluate on -- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 20) -- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 10.0) -- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP) -- `--inference_delay`: Inference delay for RTC (default: 4) -- `--seed`: Random seed for reproducibility (default: 42) -- `--output_dir`: Directory to save visualizations (default: rtc_debug_output) -- `--device`: Device to use (cuda, cpu, mps, auto) -- `--use_torch_compile`: Enable torch.compile() for faster inference - -**Output:** - -The script generates several visualization files in `rtc_debug_output/`: - -- `denoising_xt_comparison.png` - Noisy state evolution during denoising -- `denoising_vt_comparison.png` - Velocity predictions during denoising -- `denoising_x1t_comparison.png` - Predicted final states during denoising -- `denoising_correction_comparison.png` - RTC guidance corrections applied -- `final_actions_comparison.png` - Final action predictions (prev_chunk, no_rtc, rtc) - -The script also validates RTC behavior and reports: - -- ✅ Delay region [0:inference_delay]: RTC = prev_chunk -- ✅ Blend region [inference_delay:execution_horizon]: prev_chunk ≤ RTC ≤ no_rtc -- ✅ Post-horizon [execution_horizon:]: RTC = no_rtc - -### 2. `eval_with_real_robot.py` - -Real-time evaluation on physical robots or simulation environments. - -**Features:** - -- Run policy with RTC on real robot or simulation -- Multi-threaded action execution and inference -- Action queue management with proper timing -- Latency tracking and adaptive inference delay -- Support for both robots and gym environments -- Support for torch.compile() optimization - -**Usage:** - -```bash -# With real robot -uv run python examples/rtc/eval_with_real_robot.py \ - --policy.path=lerobot/smolvla_base \ - --robot.type=so100 \ - --task="pick up the cup" \ - --duration=30.0 - -# With simulation environment -uv run python examples/rtc/eval_with_real_robot.py \ - --policy.path=lerobot/smolvla_base \ - --env.type=pusht \ - --duration=60.0 - -# With policy compilation (CUDA only, not MPS) -uv run python examples/rtc/eval_with_real_robot.py \ - --policy.path=lerobot/smolvla_base \ - --robot.type=so100 \ - --use_torch_compile=true \ - --torch_compile_mode=max-autotune -``` - -**Key Parameters:** - -- `--policy.path`: Path to pretrained policy -- `--robot.type` or `--env.type`: Robot or environment to use -- `--task`: Task description (for VLA models) -- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 10) -- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 1.0) -- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP) -- `--duration`: How long to run (seconds, default: 30.0) -- `--fps`: Action execution frequency (Hz, default: 10.0) -- `--action_queue_size_to_get_new_actions`: Queue size threshold to request new actions (default: 30) -- `--device`: Device to use (cuda, cpu, mps, auto) -- `--use_torch_compile`: Enable torch.compile() for faster inference - -## Understanding RTC Parameters - -### `execution_horizon` - -Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity. - -**Typical values:** 8-12 steps for dataset evaluation, 10 steps for real-time execution - -### `max_guidance_weight` - -Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions. - -**Typical values:** - -- Dataset evaluation: 10.0-100.0 (can be higher for analysis) -- Real-time execution: 1.0-10.0 (more conservative) - -### `prefix_attention_schedule` - -How to weight consistency across the overlap region: - -- `ZEROS`: Binary (full weight up to inference_delay, then zero) -- `ONES`: Full weight across entire execution_horizon -- `LINEAR`: Linear decay from inference_delay to execution_horizon -- `EXP`: Exponential decay (recommended) - -**Recommended:** `EXP` - -### `inference_delay` - -Number of timesteps from the prefix to use for guidance. Typically calculated dynamically based on inference latency in real-time execution, but fixed for dataset evaluation. - -**Typical values:** 3-5 steps for dataset evaluation - -### `action_queue_size_to_get_new_actions` (real-time only) - -Threshold for requesting new action chunks. Should be higher than `inference_delay + execution_horizon` to ensure smooth operation. - -**Typical values:** 20-30 steps - -## Validation Rules (Dataset Evaluation) - -The dataset evaluation script validates that RTC behavior matches expectations: - -1. **Delay Region [0:inference_delay]**: RTC actions should equal previous chunk - - Ensures consistency during the inference delay period - -2. **Blend Region [inference_delay:execution_horizon]**: RTC should be between prev_chunk and no_rtc - - Smooth transition from previous plan to new predictions - -3. **Post-Horizon [execution_horizon:]**: RTC should equal no_rtc - - Full adoption of new predictions after execution horizon - -## Tips - -1. **Start with dataset evaluation** (`eval_dataset.py`) to understand RTC behavior and tune parameters before running on robot -2. **Use visualizations** to debug unexpected behavior - check denoising steps and final actions -3. **Tune execution_horizon** based on your inference latency and action frequency -4. **Monitor validation output** - failures indicate potential implementation issues or misconfigured parameters -5. **Compare different schedules** - EXP usually works best but LINEAR can be more interpretable - -## Troubleshooting - -### Validation fails in delay region - -- Check that `prev_chunk_left_over` is properly passed to the policy -- Verify RTC guidance is being applied during denoising -- Look at denoising visualizations to see where guidance diverges - -### Validation fails in post-horizon region - -- RTC and no_rtc use different noise - verify same noise is being used for comparison -- Check that weights are correctly zeroed out after execution horizon -- Review prefix_attention_schedule visualization - -### Poor performance on real robot - -- Increase `action_queue_size_to_get_new_actions` if you see warnings -- Reduce `max_guidance_weight` if robot is too conservative -- Try different `prefix_attention_schedule` values -- Enable torch.compile() for faster inference (CUDA only) - -### Memory issues with large models - -- The dataset evaluation script loads policies sequentially to minimize memory -- For real-time execution, only one policy is loaded -- Use smaller batch sizes if needed - -## Related Documentation - -- [RTC Implementation](../../src/lerobot/policies/rtc/modeling_rtc.py) -- [RTC Configuration](../../src/lerobot/policies/rtc/configuration_rtc.py) -- [Action Queue](../../src/lerobot/policies/rtc/action_queue.py) -- [Physical Intelligence Paper](https://www.physicalintelligence.company/download/real_time_chunking.pdf) diff --git a/pyproject.toml b/pyproject.toml index e05203188..0b53457a1 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -98,7 +98,6 @@ pygame-dep = ["pygame>=2.5.1,<2.7.0"] placo-dep = ["placo>=0.9.6,<0.10.0"] transformers-dep = ["transformers>=4.53.0,<5.0.0"] grpcio-dep = ["grpcio==1.73.1", "protobuf==6.31.0"] # TODO: Bumb dependency (compatible with wandb) -matplotlib-dep = ["matplotlib>=3.10.3,<4.0.0"] # Motors feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0"] @@ -133,7 +132,7 @@ groot = [ hilserl = ["lerobot[transformers-dep]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"] # Features -async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"] +async = ["lerobot[grpcio-dep]", "matplotlib>=3.10.3,<4.0.0"] # Development dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1"] diff --git a/src/lerobot/policies/rtc/README.md b/src/lerobot/policies/rtc/README.md index 2b72b33ab..926d4e8c4 100644 --- a/src/lerobot/policies/rtc/README.md +++ b/src/lerobot/policies/rtc/README.md @@ -1,49 +1,38 @@ -# Real-Time Chunking (RTC) Module +# Real-Time Chunking (RTC) -This module implements Real-Time Chunking and related adaptive inference techniques for robotics policies in LeRobot. +This module contains the LeRobot implementation of **Real-Time Chunking (RTC)**, an inference-time technique for flow-matching based policies. -## Overview +**Note**: RTC is not a policy itself, but rather an inference enhancement that works with flow-matching based policies including [π₀](../pi0/), [π₀.₅](../pi05/), and [SmolVLA](../smolvla/). -Real-Time Chunking (RTC) addresses the challenge of real-time inference in action chunking policies by treating chunk generation as an inpainting problem. It strategically handles overlapping timesteps between action chunks using prefix attention mechanisms. +--- -It is particularly effective for handling long-horizon inference in robotics policies. +## Citation -## Integration with Policies +If you use Real-Time Chunking in your work, please cite: -RTC can be integrated with any policy that supports flow mathicng for chunking: +```bibtex +@misc{openpi2024, + author = {Physical Intelligence Lab}, + title = {OpenPI: PyTorch Implementation of π0 and π0.5 Policies}, + year = {2024}, + publisher = {GitHub}, + howpublished = {\url{https://github.com/Physical-Intelligence/openpi}}, + license = {Apache-2.0} +} -- **SmolVLA**: Vision-language-action model with RTC support -- **Pi0**: Action prediction model with adaptive chunking -- **Pi05**: Action prediction model with adaptive chunking - -## Original Implementation - -This implementation is based on Physical Intelligence's Kinetix RTC: - -- [Original RTC implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix/blob/main/src/model.py#L214) -- [Kinetix GitHub Repository](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - -## References - -- [Real Time Chunking Paper](https://www.physicalintelligence.company/research/real_time_chunking) -- [Physical Intelligence Kinetix](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - -## How to run - -### Check with data from the dataset - -```bash -uv run python examples/rtc/eval_dataset.py \ ---policy.path=helper2424/smolvla_check_rtc_last3 \ ---dataset.repo_id=helper2424/check_rtc \ ---rtc.execution_horizon=8 \ ---device=mps \ ---seed=42 +@misc{black2025realtimeexecutionactionchunking, + title={Real-Time Execution of Action Chunking Flow Policies}, + author={Kevin Black and Manuel Y. Galliker and Sergey Levine}, + year={2025}, + eprint={2506.07339}, + archivePrefix={arXiv}, + primaryClass={cs.RO}, + url={https://arxiv.org/abs/2506.07339}, +} ``` -This script will evaluate RTC on a data from a dataset and save the results to a file, u can check the results in the `rtc_debug_output` directory. +--- -The example output should look like this: -![Flow Matching with RTC](./flow_matching.png) +## License -It shows how flow matching works with RTC and without it. The chart shows values of action predictions for each timestep. The colour shows the the generation progress. The blue ones - earlier timesteps, the yellow ones - later timesteps. The red line is the ground truth (previous action chunk). +This implementation follows the **Apache 2.0 License**, consistent with the LeRobot project. diff --git a/src/lerobot/policies/rtc/flow_matching.png b/src/lerobot/policies/rtc/flow_matching.png deleted file mode 100644 index 3ef86edfd..000000000 Binary files a/src/lerobot/policies/rtc/flow_matching.png and /dev/null differ