mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 22:59:50 +00:00
8a915c6b6f
* Add Real-Time Chunking (RTC) support for flow matching models Implement Real-Time Chunking (RTC) for action chunking policies using flow matching denoising. RTC enables smooth action transitions between consecutive chunks by using prefix guidance during denoising. Key features: - RTCProcessor class with denoise_step method for RTC guidance - Tracker system for debug tracking using time-based dictionary storage - RTCDebugVisualizer with comprehensive visualization utilities - Integration with SmolVLA policy for flow matching models - Support for multiple prefix attention schedules (ZEROS, ONES, LINEAR, EXP) - Configurable execution horizon and max guidance weight - Example scripts for dataset evaluation and real-time control Technical details: - Uses autograd-based gradient computation for RTC corrections - Time-based tracking eliminates duplicate step issues - Proxy methods in RTCProcessor for cleaner API - Full integration with LeRobot's policy and dataset systems Files added/modified: - src/lerobot/configs/types.py: Add RTCAttentionSchedule enum - src/lerobot/policies/rtc/: Core RTC implementation - configuration_rtc.py: RTC configuration - modeling_rtc.py: RTCProcessor with denoise_step - debug_handler.py: Tracker for debug information - debug_visualizer.py: Visualization utilities - src/lerobot/policies/smolvla/modeling_smolvla.py: RTC integration - examples/rtc/: Example scripts and evaluation tools 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * Fix rtc_config attribute access in SmolVLA Use getattr() to safely check for rtc_config attribute existence instead of direct attribute access. This fixes AttributeError when loading policies without rtc_config in their config. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * fixup! Fix rtc_config attribute access in SmolVLA * Add RTCConfig field to SmolVLAConfig Add rtc_config as an optional field in SmolVLAConfig to properly support Real-Time Chunking configuration. This replaces the previous getattr() workarounds with direct attribute access, making the code cleaner and more maintainable. Changes: - Import RTCConfig in configuration_smolvla.py - Add rtc_config: RTCConfig | None = None field - Revert getattr() calls to direct attribute access in modeling_smolvla.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * Refactor RTC enabled checks to use _rtc_enabled helper Add _rtc_enabled() helper method in VLAFlowMatching class to simplify and clean up RTC enabled checks throughout the code. This reduces code duplication and improves readability. Changes: - Add _rtc_enabled() method in VLAFlowMatching - Replace verbose rtc_config checks with _rtc_enabled() calls - Maintain exact same functionality with cleaner code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * Rename track_debug method to track Simplify the method name from track_debug to just track for better readability and consistency. The method already has clear documentation about its debug tracking purpose. Changes: - Rename RTCProcessor.track_debug() to track() - Update all call sites in modeling_smolvla.py and modeling_rtc.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * Use output_dir for saving all evaluation images Update eval_dataset.py to save all comparison images to the configured output_dir instead of the current directory. This provides better organization and allows users to specify where outputs should be saved. Changes: - Add os import at top level - Create output_dir at start of run_evaluation() - Save all comparison images to output_dir - Remove duplicate os imports - Update init_rtc_processor() docstring to be more concise 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * fixup! Use output_dir for saving all evaluation images * Fix logging buffering and enable tracking when RTC config provided - Add force=True to logging.basicConfig to override existing configuration - Enable line buffering for stdout/stderr for real-time log output - Modify init_rtc_processor to create processor when rtc_config exists even if RTC is disabled, allowing tracking of denoising data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> * Refactor SmolVLA plotting to use tracker data instead of local variables Remove local tracking variables (correction, x1_t, error) from the denoising loop and instead retrieve plotting data from the RTC tracker after each denoise step. This makes the code cleaner and uses the tracker as the single source of truth for debug/visualization data. Changes: - Remove initialization of correction, x1_t, error before denoising loop - After each Euler step, retrieve most recent debug step from tracker - Extract correction, x1_t, err from debug step for plotting - Update tracking condition to use is_debug_enabled() method 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> * Move plotting logic from modeling_smolvla to eval_dataset script Refactor to improve separation of concerns: modeling_smolvla.py changes: - Remove all plotting logic from sample_actions method - Remove viz_xt_axs, viz_vt_axs, viz_x1t_axs parameters - Remove matplotlib and RTCDebugVisualizer imports - Remove viz_fig, viz_axs, denoise_step_counter instance variables - Simplify denoising loop to only track data in rtc_processor eval_dataset.py changes: - Add _plot_denoising_steps_from_tracker helper method - Retrieve debug steps from tracker after inference - Plot x_t, v_t, x1_t, correction, and error from tracker data - Enable debug tracking (cfg.rtc.debug = True) for visualization - Remove viz axes parameters from predict_action_chunk calls modeling_rtc.py changes: - Remove v_t from track() call (handled by user change) Benefits: - Cleaner modeling code focused on inference - Evaluation script owns all visualization logic - Better separation of concerns - Tracker is single source of truth for debug data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> * Refactor plotting loging * fixup! Refactor plotting loging * Improve visualization: separate correction plot and fix axis scaling Changes: - Create separate figure for correction data instead of overlaying on v_t - Add _rescale_axes helper method to properly scale all axes - Add 10% margin to y-axis for better visualization - Fix v_t chart vertical compression issue Benefits: - Clearer v_t plot without correction overlay - Better axis scaling with proper margins - Separate correction figure for focused analysis - Improved readability of all denoising visualizations Output files: - denoising_xt_comparison.png (x_t trajectories) - denoising_vt_comparison.png (v_t velocity - now cleaner) - denoising_correction_comparison.png (NEW - separate corrections) - denoising_x1t_comparison.png (x1_t state with error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> * fixup! Improve visualization: separate correction plot and fix axis scaling * fixup! fixup! Improve visualization: separate correction plot and fix axis scaling * fixup! fixup! fixup! Improve visualization: separate correction plot and fix axis scaling * Fix traacking * Right kwargs for the policy * Add tests for tracker * Fix tests * Drop not required methods * Add torch compilation for eval_dataset * delete policies * Add matplotliv to dev * fixup! Add matplotliv to dev * Experiemnt with late detach * Debug * Fix compilation * Add RTC to PI0 * Pi0 * Pi0 eval dataset * fixup! Pi0 eval dataset * Turn off compilation for pi0/pi05 * fixup! Turn off compilation for pi0/pi05 * fixup! fixup! Turn off compilation for pi0/pi05 * fixup! fixup! fixup! Turn off compilation for pi0/pi05 * fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05 * fixup! fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05 * Add workable flow * Small fixes * Add more tests * Add validatio at the end * Update README * Silent validation * Fix tests * Add tests for modeling_rtc * Add tests for flow matching models with RTC * fixup! Add tests for flow matching models with RTC * fixup! fixup! Add tests for flow matching models with RTC * Add one more test * fixup! Add one more test * Fix test to use _rtc_enabled() instead of is_rtc_enabled() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled() * fixup! fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled() * Add RTC initialization tests without config for PI0.5 and SmolVLA Add test_pi05_rtc_initialization_without_rtc_config and test_smolvla_rtc_initialization_without_rtc_config to verify that policies can initialize without RTC config and that _rtc_enabled() returns False in this case. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix PI0.5 init_rtc_processor to use getattr instead of direct model access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix SmolVLA init_rtc_processor to use getattr instead of direct model access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fixup! Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization * Fixup eval with real robot * fixup! Fixup eval with real robot * fixup! fixup! Fixup eval with real robot * Extract simulator logic from eval_with real robot and add proper headers to files * Update images * Fix tests * fixup! Fix tests * add docs for rtc * enhance doc and add images * Fix instal instructions --------- Co-authored-by: Ben Zhang <benzhangniu@gmail.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
189 lines
8.3 KiB
Plaintext
189 lines
8.3 KiB
Plaintext
# Real-Time Chunking (RTC)
|
||
|
||
Real-Time Chunking (RTC) is an inference-time method that allows large, flow-matching based robotic policies, such as [Pi0](./pi0), [Pi0.5](./pi05), and [SmolVLA](./smolvla), to produce smooth, continuous, and reactive motion despite having high inference latency.
|
||
|
||
These policies generate chunks of future actions (e.g., 50 steps at a time) instead of single actions.
|
||
Because the models are large, producing each chunk takes longer than the time it takes the robot to execute it.
|
||
Naively executing chunks leads to problems such as pauses, jerky transitions, or sudden changes in strategy whenever the next chunk arrives late or disagrees with the previously executed actions.
|
||
|
||
RTC solves this by asynchronously generating the next chunk while the robot continues executing the current one, and by guiding the new chunk so it aligns smoothly with the portion of the previous chunk that has already been executed.
|
||
|
||
## How RTC Works (simplified)
|
||
|
||
RTC lets the robot think ahead while it’s still moving. When the robot is carrying out one chunk of actions, RTC starts creating the next chunk early.
|
||
But since the robot has already moved a bit by the time the new chunk is ready, RTC has to make sure the new chunk still lines up smoothly with what the robot is currently doing.
|
||
|
||
To do this, RTC treats the beginning of the new chunk like an inpainting or “fill-in-the-gaps” problem:
|
||
it gently adjusts the first part of the new chunk so it blends naturally with the robot’s ongoing motion. The result is no pauses, no sudden jumps.
|
||
|
||
In technical terms, RTC adds a guidance term to the flow-matching denoising process that forces the overlapping timesteps of the new chunk to stay close to the executed portion of the previous chunk, typically using a soft transition mask.
|
||
|
||
## Quick Start
|
||
|
||
### Installation
|
||
|
||
RTC is built into LeRobot. Just install the policy dependencies you need:
|
||
|
||
```bash
|
||
# For Pi0 or Pi0.5
|
||
pip install -e ".[pi]"
|
||
|
||
# For SmolVLA
|
||
pip install -e ".[smolvla]"
|
||
```
|
||
|
||
### Using RTC with Pi0
|
||
|
||
You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
|
||
The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:
|
||
|
||
```python
|
||
from lerobot.policies.pi0 import PI0Policy, PI0Config
|
||
from lerobot.configs.types import RTCAttentionSchedule
|
||
from lerobot.policies.rtc.configuration_rtc import RTCConfig
|
||
from lerobot.policies.rtc.action_queue import ActionQueue
|
||
|
||
# Load Pi0 with RTC enabled
|
||
policy_cfg = PI0Config()
|
||
|
||
# Enable RTC
|
||
policy_cfg.rtc_config = RTCConfig(
|
||
enabled=True,
|
||
execution_horizon=10, # How many steps to blend with previous chunk
|
||
max_guidance_weight=10.0, # How strongly to enforce consistency
|
||
prefix_attention_schedule=RTCAttentionSchedule.EXP, # Exponential blend
|
||
)
|
||
|
||
# Load the policy
|
||
policy = PI0Policy.from_pretrained("lerobot/pi0_base", policy_cfg=policy_cfg, device="cuda")
|
||
|
||
# Now use predict_action_chunk with RTC parameters
|
||
inference_delay = 4 # How many steps of inference latency, this values should be calculated based on the inference latency of the policy
|
||
|
||
# Initialize the action queue
|
||
action_queue = ActionQueue(policy_cfg.rtc_config)
|
||
|
||
# Start in a separate thread with the following function
|
||
def get_actions():
|
||
while True:
|
||
if should_get_actions:
|
||
|
||
prev_actions = action_queue.get_left_over()
|
||
obs = get_robot_observations(robot)
|
||
|
||
# Generate actions WITH RTC
|
||
actions = policy.predict_action_chunk(
|
||
obs,
|
||
inference_delay=inference_delay,
|
||
prev_chunk_left_over=prev_actions,
|
||
)
|
||
|
||
action_queue.merge(
|
||
actions, actions, inference_delay
|
||
)
|
||
|
||
for step in range(num_steps):
|
||
action = action_queue.get()
|
||
|
||
# Execute the first N actions
|
||
execute_actions(action)
|
||
```
|
||
|
||
## Key Parameters
|
||
|
||
`RTCConfig` has the following parameters to tune:
|
||
|
||
**`execution_horizon`**: How many timesteps from the previous chunk to maintain consistency with. Higher values mean smoother transitions but potentially less reactivity.
|
||
|
||
Typical values: 8-12 steps
|
||
|
||
```python
|
||
RTCConfig(execution_horizon=10)
|
||
```
|
||
|
||
**`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. For 10 steps flow matching (SmolVLA, Pi0, Pi0.5), a value of 10.0 is a optimal value.
|
||
|
||
**`prefix_attention_schedule`**: How to weight consistency across the overlap region.
|
||
|
||
- `LINEAR`: Linear decay from inference_delay to execution_horizon
|
||
- `EXP`: Exponential decay (recommended for getting started)
|
||
- `ONES`: Full weight across entire execution_horizon
|
||
- `ZEROS`: Binary (full weight up to inference_delay, then zero)
|
||
|
||
**`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime.
|
||
|
||
## Testing RTC Offline
|
||
|
||
Before running on a real robot, test RTC with dataset samples to visualize how it works:
|
||
|
||
```bash
|
||
python examples/rtc/eval_dataset.py \
|
||
--policy.path=lerobot/pi0_libero_finetuned \
|
||
--dataset.repo_id=HuggingFaceVLA/libero \
|
||
--rtc.execution_horizon=10 \
|
||
--rtc.max_guidance_weight=10.0 \
|
||
--device=cuda
|
||
```
|
||
|
||
The script generates a visualization of the denoising process, comparing standard generation (left) with RTC (right). In the RTC plots, you can see how the first few steps (blue/purple lines) are guided to match the red ground truth trajectory (previous chunk's tail), ensuring a smooth transition between chunks.
|
||
|
||
<p align="center">
|
||
<img
|
||
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/flow_matching.png"
|
||
alt="Denoising steps with and without RTC"
|
||
width="100%"
|
||
/>
|
||
</p>
|
||
|
||
## Testing RTC with a Real Robot
|
||
|
||
```bash
|
||
python examples/rtc/eval_with_real_robot.py \
|
||
--policy.path=${HF_USERNAME}/policy_repo_id \
|
||
--robot.type=so100_follower \
|
||
--robot.port=/dev/tty.usbmodem58FA0834591 \
|
||
--robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
|
||
--task="Move green small object into the purple platform" \
|
||
--duration=120 \
|
||
--device=cuda
|
||
```
|
||
|
||
## How It Differs from the Async Inference in LeRobot
|
||
|
||
Both RTC and [async inference](./async) improve real-time robot control, but they solve different problems.
|
||
|
||
| Aspect | Async Inference | RTC |
|
||
| ------------- | -------------------------------------------------------------------------- | --------------------------------------------------- |
|
||
| **Problem** | Idle frames while waiting for inference | Discontinuities between action chunks |
|
||
| **Solution** | Decouple prediction from execution | Guide new chunks to continue smoothly from previous |
|
||
| **Benefit** | No waiting, continuous action | Smooth transitions, natural motion |
|
||
| **Best Used** | Async inference is best used with large models with high inference latency | Flow-matching based policies |
|
||
|
||
**Use both together** for maximum smoothness and reactivity!
|
||
|
||
## Advanced: Debug Tracking
|
||
|
||
RTC includes built-in debug tracking to help you understand what's happening during inference:
|
||
|
||
```python
|
||
# Enable debug tracking
|
||
policy_cfg.rtc_config.debug = True
|
||
policy_cfg.rtc_config.debug_maxlen = 100
|
||
|
||
# After inference, access debug data
|
||
debug_data = policy.rtc_processor.get_debug_data()
|
||
|
||
# Visualize denoising steps, corrections, etc.
|
||
from lerobot.policies.rtc.debug_visualizer import RTCDebugVisualizer
|
||
visualizer = RTCDebugVisualizer()
|
||
# ... create plots
|
||
```
|
||
|
||
See `examples/rtc/eval_dataset.py` for a complete example of visualization.
|
||
|
||
## References
|
||
|
||
- [Smooth-As-Butter Robot Policies](https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html) - Excellent technical explanation with real robot results
|
||
- [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/research/real_time_chunking) - Original paper and research
|
||
- [Kinetix RTC Implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - Reference implementation from Physical Intelligence
|