enhance doc and add images

add docs for rtc
fixup! Fix tests
2026-05-13 07:39:53 +00:00 · 2025-11-19 10:03:15 +01:00 · 2025-11-19 10:03:13 +01:00 · 2025-11-19 12:04:00 +07:00 · 2025-11-19 03:10:27 +07:00 · 2025-11-19 03:02:26 +07:00
20 changed files with 338 additions and 3052 deletions
@@ -15,8 +15,6 @@
    title: Train a Robot with RL
  - local: hilserl_sim
    title: Train RL in Simulation
-  - local: async
-    title: Use Async Inference
  - local: multi_gpu_training
    title: Multi GPU training
  title: "Tutorials"
@@ -40,6 +38,12 @@
  - local: groot
    title: NVIDIA GR00T N1.5
  title: "Policies"
+- sections:
+  - local: async
+    title: Use Async Inference
+  - local: rtc
+    title: Real-Time Chunking (RTC)
+  title: "Inference"
 - sections:
  - local: envhub
    title: Environments from the Hub
@@ -0,0 +1,188 @@
+# Real-Time Chunking (RTC)
+
+Real-Time Chunking (RTC) is an inference-time method that allows large, flow-matching based robotic policies, such as [Pi0](./pi0), [Pi0.5](./pi05), and [SmolVLA](./smolvla), to produce smooth, continuous, and reactive motion despite having high inference latency.
+
+These policies generate chunks of future actions (e.g., 50 steps at a time) instead of single actions.
+Because the models are large, producing each chunk takes longer than the time it takes the robot to execute it.
+Naively executing chunks leads to problems such as pauses, jerky transitions, or sudden changes in strategy whenever the next chunk arrives late or disagrees with the previously executed actions.
+
+RTC solves this by asynchronously generating the next chunk while the robot continues executing the current one, and by guiding the new chunk so it aligns smoothly with the portion of the previous chunk that has already been executed.
+
+## How RTC Works (simplified)
+
+RTC lets the robot think ahead while it’s still moving. When the robot is carrying out one chunk of actions, RTC starts creating the next chunk early.
+But since the robot has already moved a bit by the time the new chunk is ready, RTC has to make sure the new chunk still lines up smoothly with what the robot is currently doing.
+
+To do this, RTC treats the beginning of the new chunk like an inpainting or “fill-in-the-gaps” problem:
+it gently adjusts the first part of the new chunk so it blends naturally with the robot’s ongoing motion. The result is no pauses, no sudden jumps.
+
+In technical terms, RTC adds a guidance term to the flow-matching denoising process that forces the overlapping timesteps of the new chunk to stay close to the executed portion of the previous chunk, typically using a soft transition mask.
+
+## Quick Start
+
+### Installation
+
+RTC is built into LeRobot. Just install the policy dependencies you need:
+
+```bash
+# For Pi0 or Pi0.5
+pip install -e ".[pi]"
+
+# For SmolVLA
+pip install -e ".[smolvla]"
+```
+
+### Using RTC with Pi0
+
+You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
+The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:
+
+```python
+from lerobot.policies.pi0 import PI0Policy, PI0Config
+from lerobot.configs.types import RTCAttentionSchedule
+from lerobot.policies.rtc.configuration_rtc import RTCConfig
+from lerobot.policies.rtc.action_queue import ActionQueue
+
+# Load Pi0 with RTC enabled
+policy_cfg = PI0Config()
+
+# Enable RTC
+policy_cfg.rtc_config = RTCConfig(
+    enabled=True,
+    execution_horizon=10,  # How many steps to blend with previous chunk
+    max_guidance_weight=10.0,  # How strongly to enforce consistency
+    prefix_attention_schedule=RTCAttentionSchedule.EXP,  # Exponential blend
+)
+
+# Load the policy
+policy = PI0Policy.from_pretrained("lerobot/pi0_base", policy_cfg=policy_cfg, device="cuda")
+
+# Now use predict_action_chunk with RTC parameters
+inference_delay = 4  # How many steps of inference latency, this values should be calculated based on the inference latency of the policy
+
+# Initialize the action queue
+action_queue = ActionQueue(policy_cfg.rtc_config)
+
+# Start in a separate thread with the following function
+def get_actions():
+  while True:
+    if should_get_actions:
+
+      prev_actions = action_queue.get_left_over()
+      obs = get_robot_observations(robot)
+
+      # Generate actions WITH RTC
+      actions = policy.predict_action_chunk(
+          obs,
+          inference_delay=inference_delay,
+          prev_chunk_left_over=prev_actions,
+      )
+
+      action_queue.merge(
+          actions, actions, inference_delay
+      )
+
+for step in range(num_steps):
+    action = action_queue.get()
+
+    # Execute the first N actions
+    execute_actions(action)
+```
+
+## Key Parameters
+
+`RTCConfig` has the following parameters to tune:
+
+**`execution_horizon`**: How many timesteps from the previous chunk to maintain consistency with. Higher values mean smoother transitions but potentially less reactivity.
+
+Typical values: 8-12 steps
+
+```python
+RTCConfig(execution_horizon=10)
+```
+
+**`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. For 10 steps flow matching (SmolVLA, Pi0, Pi0.5), a value of 10.0 is a optimal value.
+
+**`prefix_attention_schedule`**: How to weight consistency across the overlap region.
+
+- `LINEAR`: Linear decay from inference_delay to execution_horizon
+- `EXP`: Exponential decay (recommended for getting started)
+- `ONES`: Full weight across entire execution_horizon
+- `ZEROS`: Binary (full weight up to inference_delay, then zero)
+
+**`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime.
+
+## Testing RTC Offline
+
+Before running on a real robot, test RTC with dataset samples to visualize how it works:
+
+```bash
+python examples/rtc/eval_dataset.py \
+    --policy.path=lerobot/pi0_libero_finetuned \
+    --dataset.repo_id=HuggingFaceVLA/libero \
+    --rtc.execution_horizon=10 \
+    --rtc.max_guidance_weight=10.0 \
+    --device=cuda
+```
+
+The script generates a visualization of the denoising process, comparing standard generation (left) with RTC (right). In the RTC plots, you can see how the first few steps (blue/purple lines) are guided to match the red ground truth trajectory (previous chunk's tail), ensuring a smooth transition between chunks.
+
+<p align="center">
+  <img
+    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/flow_matching.png"
+    alt="Denoising steps with and without RTC"
+    width="100%"
+  />
+</p>
+
+## Testing RTC with a Real Robot
+
+```bash
+python examples/rtc/eval_with_real_robot.py \
+    --policy.path=${HF_USERNAME}/policy_repo_id \
+    --robot.type=so100_follower \
+    --robot.port=/dev/tty.usbmodem58FA0834591 \
+    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --task="Move green small object into the purple platform" \
+    --duration=120 \
+    --device=cuda
+```
+
+## How It Differs from the Async Inference in LeRobot
+
+Both RTC and [async inference](./async) improve real-time robot control, but they solve different problems.
+
+| Aspect        | Async Inference                                                            | RTC                                                 |
+| ------------- | -------------------------------------------------------------------------- | --------------------------------------------------- |
+| **Problem**   | Idle frames while waiting for inference                                    | Discontinuities between action chunks               |
+| **Solution**  | Decouple prediction from execution                                         | Guide new chunks to continue smoothly from previous |
+| **Benefit**   | No waiting, continuous action                                              | Smooth transitions, natural motion                  |
+| **Best Used** | Async inference is best used with large models with high inference latency | Flow-matching based policies                        |
+
+**Use both together** for maximum smoothness and reactivity!
+
+## Advanced: Debug Tracking
+
+RTC includes built-in debug tracking to help you understand what's happening during inference:
+
+```python
+# Enable debug tracking
+policy_cfg.rtc_config.debug = True
+policy_cfg.rtc_config.debug_maxlen = 100
+
+# After inference, access debug data
+debug_data = policy.rtc_processor.get_debug_data()
+
+# Visualize denoising steps, corrections, etc.
+from lerobot.policies.rtc.debug_visualizer import RTCDebugVisualizer
+visualizer = RTCDebugVisualizer()
+# ... create plots
+```
+
+See `examples/rtc/eval_dataset.py` for a complete example of visualization.
+
+## References
+
+- [Smooth-As-Butter Robot Policies](https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html) - Excellent technical explanation with real robot results
+- [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/research/real_time_chunking) - Original paper and research
+- [Kinetix RTC Implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix) - Reference implementation from Physical Intelligence
@@ -15,16 +15,12 @@
 # limitations under the License.

 import argparse
-import logging
 from pathlib import Path

 from datatrove.executor import LocalPipelineExecutor
 from datatrove.executor.slurm import SlurmPipelineExecutor
 from datatrove.pipeline.base import PipelineStep
-from port_datasets.droid_rlds.port_droid import DROID_SHARDS
-
-from lerobot.datasets.aggregate import aggregate_datasets
-from lerobot.utils.utils import init_logging
+from port_droid import DROID_SHARDS


 class AggregateDatasets(PipelineStep):
@@ -38,6 +34,11 @@ class AggregateDatasets(PipelineStep):
        self.aggr_repo_id = aggregated_repo_id

    def run(self, data=None, rank: int = 0, world_size: int = 1):
+        import logging
+
+        from lerobot.datasets.aggregate import aggregate_datasets
+        from lerobot.utils.utils import init_logging
+
        init_logging()

        # Since aggregate_datasets already handles parallel processing internally,
@@ -20,7 +20,7 @@ from pathlib import Path
 from datatrove.executor import LocalPipelineExecutor
 from datatrove.executor.slurm import SlurmPipelineExecutor
 from datatrove.pipeline.base import PipelineStep
-from port_datasets.droid_rlds.port_droid import DROID_SHARDS
+from port_droid import DROID_SHARDS


 class PortDroidShards(PipelineStep):
@@ -35,7 +35,7 @@ class PortDroidShards(PipelineStep):

    def run(self, data=None, rank: int = 0, world_size: int = 1):
        from datasets.utils.tqdm import disable_progress_bars
-        from port_datasets.droid_rlds.port_droid import port_droid, validate_dataset
+        from port_droid import port_droid, validate_dataset

        from lerobot.utils.utils import init_logging

@@ -24,7 +24,7 @@ from datatrove.executor.slurm import SlurmPipelineExecutor
 from datatrove.pipeline.base import PipelineStep
 from huggingface_hub import HfApi
 from huggingface_hub.constants import REPOCARD_NAME
-from port_datasets.droid_rlds.port_droid import DROID_SHARDS
+from port_droid import DROID_SHARDS

 from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDatasetMetadata
 from lerobot.datasets.utils import create_lerobot_dataset_card
@@ -185,11 +185,11 @@ class UploadDataset(PipelineStep):


 def make_upload_executor(
-    repo_id, job_name, logs_dir, workers, partition, cpus_per_task, mem_per_cpu, slurm=True
+    repo_id, job_name, logs_dir, workers, partition, cpus_per_task, mem_per_cpu, private=False, slurm=True
 ):
    kwargs = {
        "pipeline": [
-            UploadDataset(repo_id),
+            UploadDataset(repo_id, private=private),
        ],
        "logging_dir": str(logs_dir / job_name),
    }
@@ -267,6 +267,12 @@ def main():
        default="1950M",
        help="Memory per cpu that each worker will use.",
    )
+    parser.add_argument(
+        "--private",
+        action="store_true",
+        default=False,
+        help="Whether to create a private repository.",
+    )

    init_logging()

@@ -1,263 +0,0 @@
-# RTC Profiling Guide
-
-This guide explains how to profile RTC (Real-Time Chunking) performance to identify bottlenecks and understand why RTC might be slower than expected.
-
-## Quick Start
-
-### 1. Profile with Real Robot (Profiled Version)
-
-Use `eval_with_real_robot_profiled.py` to profile actual robot execution:
-
-```bash
-# With RTC enabled
-uv run examples/rtc/eval_with_real_robot_profiled.py \
-    --policy.path=helper2424/pi05_check_rtc \
-    --policy.device=mps \
-    --rtc.enabled=true \
-    --rtc.execution_horizon=20 \
-    --robot.type=so100_follower \
-    --robot.port=/dev/tty.usbmodem58FA0834591 \
-    --robot.id=so100_follower \
-    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}}" \
-    --task="Move green small object into the purple platform" \
-    --duration=30
-
-# Without RTC for comparison
-uv run examples/rtc/eval_with_real_robot_profiled.py \
-    --policy.path=helper2424/pi05_check_rtc \
-    --policy.device=mps \
-    --rtc.enabled=false \
-    --robot.type=so100_follower \
-    --robot.port=/dev/tty.usbmodem58FA0834591 \
-    --robot.id=so100_follower \
-    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}}" \
-    --task="Move green small object into the purple platform" \
-    --duration=30
-```
-
-**Output**: At the end of execution, you'll see a detailed breakdown of timing for each component:
- `get_actions.policy_inference` - Time spent in policy inference
- `get_actions.preprocessing` - Time spent preprocessing observations
- `get_actions.postprocessing` - Time spent postprocessing actions
- `get_actions.action_queue_merge` - Time spent merging actions with RTC
- `robot.get_observation` - Time to get observations from robot
- `robot.send_action` - Time to send actions to robot
- And more...
-
-### 2. Profile Without Robot (Comparison Script)
-
-Use `profile_rtc_comparison.py` to profile just the policy inference without needing a robot:
-
-```bash
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=20
-```
-
-**Output**: Side-by-side comparison of performance with and without RTC, including:
- Mean/min/max inference times
- Throughput (iterations per second)
- Verdict on whether RTC is faster or slower
-
-### 3. Enable Detailed Method-Level Profiling
-
-For even more granular profiling, add the `--enable_detailed_profiling` flag:
-
-```bash
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=20 \
-    --enable_detailed_profiling
-```
-
-This will show timing for individual methods within the policy.
-
-## Understanding the Output
-
-### Key Metrics to Look At
-
-1. **get_actions.policy_inference** - This should be the largest component
-   - If RTC is enabled, this includes the RTC guidance overhead
-   - Compare this with/without RTC to see the overhead
-
-2. **get_actions.preprocessing** - Image preprocessing and normalization
-   - Should be relatively fast
-   - If slow, consider optimizing image processing
-
-3. **get_actions.postprocessing** - Action denormalization
-   - Should be minimal
-   - If slow, check postprocessor implementation
-
-4. **get_actions.action_queue_merge** - RTC-specific merging logic
-   - Only present when RTC is enabled
-   - If this is taking significant time, the RTC algorithm may need optimization
-
-5. **robot.get_observation** - Robot communication overhead
-   - If slow, check camera/sensor latency
-   - Consider reducing image resolution
-
-6. **robot.send_action** - Action execution overhead
-   - Should be very fast
-   - If slow, check robot communication
-
-### Expected Performance
-
-For a typical Pi0 policy on Apple Silicon (MPS):
- **Without RTC**: ~100-200ms per inference
- **With RTC**: Should be similar or slightly faster due to action reuse
- **Preprocessing**: ~5-20ms depending on number of cameras
- **Postprocessing**: ~1-5ms
-
-If RTC is significantly slower, likely causes:
-1. **RTC overhead exceeds benefits** - The guidance computation is expensive
-2. **Execution horizon too small** - Not reusing enough actions to amortize overhead
-3. **No compilation** - Try with `--use_torch_compile`
-4. **Large prev_actions buffer** - Copying/processing previous actions is slow
-
-## Profiling Your Own Code
-
-### Using the Profiling Decorator
-
-Add profiling to your own methods:
-
-```python
-from lerobot.utils.profiling import profile_method, enable_profiling, print_profiling_summary
-
-# Enable profiling
-enable_profiling()
-
-# Decorate methods you want to profile
-@profile_method
-def my_slow_function(x):
-    # ... your code ...
-    return result
-
-# At end of execution
-print_profiling_summary()
-```
-
-### Using Profile Context Manager
-
-For profiling specific code blocks:
-
-```python
-from lerobot.utils.profiling import profile_section, enable_profiling
-
-enable_profiling()
-
-with profile_section("data_loading"):
-    data = load_data()
-
-with profile_section("model_inference"):
-    output = model(data)
-```
-
-### Adding Profiling to Policy Methods
-
-To profile specific parts of the Pi0 policy, you can add decorators:
-
-```python
-# In src/lerobot/policies/pi0/modeling_pi0.py
-from lerobot.utils.profiling import profile_method, profile_section
-
-class Pi0Policy:
-    @profile_method
-    def predict_action_chunk(self, obs, inference_delay=0, prev_chunk_left_over=None):
-        # ... existing code ...
-        pass
-
-    def _generate_actions_with_rtc(self, ...):
-        with profile_section("rtc.guidance_computation"):
-            # ... guidance code ...
-            pass
-        
-        with profile_section("rtc.action_merging"):
-            # ... merging code ...
-            pass
-```
-
-## Analyzing Results
-
-### Comparison Checklist
-
-When comparing RTC vs non-RTC performance, check:
-
- [ ] Is `policy_inference` time higher with RTC?
- [ ] Is `action_queue_merge` taking significant time?
- [ ] Are you running enough iterations to amortize warmup?
- [ ] Is torch.compile enabled for fair comparison?
- [ ] Is the execution horizon large enough? (should be >= 10-20)
- [ ] Are you testing on the same hardware/device?
-
-### Common Bottlenecks
-
-1. **Image preprocessing dominates** 
-   - Solution: Reduce image resolution, use fewer cameras, or optimize preprocessing
-
-2. **Action queue operations are slow**
-   - Solution: Review queue implementation, consider using ring buffer
-
-3. **RTC guidance is expensive**
-   - Solution: Reduce guidance weight, simplify guidance computation, use torch.compile
-
-4. **Robot communication is slow**
-   - Solution: Increase baud rate, reduce action frequency, optimize protocol
-
-5. **Memory allocation overhead**
-   - Solution: Pre-allocate buffers, reuse tensors, avoid unnecessary copies
-
-## Advanced: Adding Custom Metrics
-
-You can add custom timing metrics to the profiled script:
-
-```python
-from lerobot.utils.profiling import record_timing
-
-start = time.perf_counter()
-# ... your code ...
-duration = time.perf_counter() - start
-record_timing("my_custom_metric", duration)
-```
-
-## Troubleshooting
-
-### Profiling shows RTC is slower by >50%
-
-1. Check if torch.compile is enabled: `--use_torch_compile`
-2. Increase execution horizon: `--rtc.execution_horizon=30`
-3. Verify inference_delay is calculated correctly
-4. Profile with `--enable_detailed_profiling` to find exact bottleneck
-
-### Profiling output is empty
-
-1. Make sure profiling is enabled with `enable_profiling()`
-2. Verify you're running enough iterations (at least 10)
-3. Check that code is actually executing (not short-circuited)
-
-### Inconsistent results between runs
-
-1. Run more iterations: `--num_iterations=100`
-2. Increase warmup iterations
-3. Check for thermal throttling on device
-4. Ensure no other processes competing for resources
-
-## Next Steps
-
-1. Run both profiling scripts (with/without robot)
-2. Compare timing breakdowns
-3. Identify the largest bottleneck
-4. Focus optimization efforts on that component
-5. Re-run profiling to verify improvements
-
-## Questions?
-
-If profiling reveals unexpected bottlenecks or you need help interpreting results, please share:
- The full profiling output
- Your configuration (RTC enabled/disabled, execution horizon, etc.)
- Hardware specs (device type, memory, etc.)
- Policy type and size
-
@@ -1,208 +0,0 @@
-# RTC Profiling - Quick Start
-
-Quick reference for profiling Pi0 with RTC to identify performance bottlenecks.
-
-## 🚀 Quick Commands
-
-### 1. Profile with Real Robot
-
-```bash
-# With RTC enabled (profiled version)
-uv run examples/rtc/eval_with_real_robot_profiled.py \
-    --policy.path=helper2424/pi05_check_rtc \
-    --policy.device=mps \
-    --rtc.enabled=true \
-    --rtc.execution_horizon=20 \
-    --robot.type=so100_follower \
-    --robot.port=/dev/tty.usbmodem58FA0834591 \
-    --robot.cameras="{ gripper: {type: opencv, index_or_path: 0}, front: {type: opencv, index_or_path: 1}}" \
-    --task="Pick up object" \
-    --duration=30
-```
-
-### 2. Compare RTC vs No-RTC (No Robot Needed)
-
-```bash
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=20
-```
-
-### 3. Detailed RTC Method Profiling
-
-```bash
-uv run examples/rtc/profile_pi0_rtc_detailed.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=20 \
-    --execution_horizon=20 \
-    --enable_rtc_profiling
-```
-
-## 📊 What Each Tool Does
-
-| Tool | Purpose | Needs Robot? |
-|------|---------|--------------|
-| `eval_with_real_robot_profiled.py` | Profile actual robot execution with RTC | ✅ Yes |
-| `profile_rtc_comparison.py` | Compare RTC vs no-RTC side-by-side | ❌ No |
-| `profile_pi0_rtc_detailed.py` | Deep dive into RTC internals | ❌ No |
-
-## 🔍 Key Metrics to Watch
-
-### Overall Performance
- **iteration.policy_inference** - Total policy inference time
- **iteration.preprocessing** - Image preprocessing time
- **iteration.postprocessing** - Action denormalization time
-
-### RTC-Specific (with `--enable_rtc_profiling`)
- **rtc.denoise_step.base_denoising** - Time without RTC overhead
- **rtc.denoise_step.autograd_correction** - Gradient computation time
- **rtc.denoise_step.guidance_computation** - Total RTC guidance overhead
-
-### Robot Communication
- **robot.get_observation** - Time to get robot state
- **robot.send_action** - Time to send action command
-
-## 🎯 Quick Diagnosis
-
-### RTC is slower than expected?
-
-1. **Check if torch.compile is enabled**
-   ```bash
-   # Add this flag
-   --use_torch_compile
-   ```
-
-2. **Try larger execution horizon**
-   ```bash
-   # Increase to amortize RTC overhead
-   --rtc.execution_horizon=30
-   ```
-
-3. **Profile to find bottleneck**
-   ```bash
-   uv run examples/rtc/profile_pi0_rtc_detailed.py \
-       --policy_path=helper2424/pi05_check_rtc \
-       --device=mps \
-       --enable_rtc_profiling
-   ```
-
-### Preprocessing is slow?
-
- Reduce image resolution in robot config
- Use fewer cameras
- Check camera FPS settings
-
-### Policy inference is slow?
-
- Enable torch.compile
- Check device (MPS vs CUDA vs CPU)
- Try smaller model if available
-
-## 📈 Expected Performance
-
-### Typical timings on Apple Silicon (MPS):
-
-| Component | Time (ms) | Notes |
-|-----------|-----------|-------|
-| Policy inference | 100-200 | Depends on model size |
-| Preprocessing | 5-20 | Depends on #cameras |
-| Postprocessing | 1-5 | Usually fast |
-| RTC overhead | 10-50 | Should be < 50% of base |
-
-### When RTC helps:
- ✅ Execution horizon ≥ 10
- ✅ Inference time > action execution rate
- ✅ Using torch.compile
- ✅ Proper inference_delay calculation
-
-### When RTC might not help:
- ❌ Very fast inference already
- ❌ Small execution horizon (< 5)
- ❌ No compilation (interpreted mode)
- ❌ Inference delay not accounted for
-
-## 🛠️ Adding Profiling to Your Code
-
-### Quick snippet:
-
-```python
-from lerobot.utils.profiling import enable_profiling, print_profiling_summary, profile_section
-
-# Enable at start
-enable_profiling()
-
-# Profile sections
-with profile_section("my_operation"):
-    # ... your code ...
-    pass
-
-# Print at end
-print_profiling_summary()
-```
-
-### Profile specific methods:
-
-```python
-from lerobot.utils.profiling import profile_method
-
-@profile_method
-def my_slow_function():
-    # ... your code ...
-    pass
-```
-
-## 📝 Example Output
-
-```
-PROFILING SUMMARY
-================================================================================
-Function                                                    Count    Mean (ms)
--------------------------------------------------------------------------------
-iteration.policy_inference                                    20       150.23
-iteration.preprocessing                                       20        12.45
-rtc.denoise_step.guidance_computation                        200        15.67
-rtc.denoise_step.autograd_correction                         200         8.23
-rtc.denoise_step.base_denoising                             200       120.45
-================================================================================
-```
-
-## 🚨 Common Issues
-
-### "No profiling data available"
- Did you call `enable_profiling()`?
- Running enough iterations?
-
-### Inconsistent results
- Increase `--num_iterations`
- Check for thermal throttling
- Close other applications
-
-### Can't find bottleneck
- Enable `--enable_rtc_profiling` for detailed breakdown
- Check both preprocessing and inference
- Compare with and without RTC
-
-## 📖 More Details
-
-See `PROFILING_GUIDE.md` for comprehensive documentation.
-
-## 🤔 Still Slow?
-
-1. Run comparison: `profile_rtc_comparison.py`
-2. Run detailed profiling: `profile_pi0_rtc_detailed.py --enable_rtc_profiling`
-3. Share output for help (include device, model, settings)
-
-## ✅ Quick Checklist
-
-Before asking for help, verify:
-
- [ ] Ran comparison script (with/without RTC)
- [ ] Tried torch.compile
- [ ] Tested different execution horizons (10, 20, 30)
- [ ] Profiled with detailed RTC profiling
- [ ] Checked preprocessing vs inference split
- [ ] Verified hardware (device type, thermal state)
-
@@ -1,352 +0,0 @@
-# RTC Profiling Toolkit
-
-Complete toolkit for profiling Pi0 with RTC to identify performance bottlenecks.
-
-## 📦 What's Included
-
-### Scripts
-
-1. **`eval_with_real_robot_profiled.py`**
-   - Profiled version of the real robot eval script
-   - Adds timing measurements throughout execution
-   - Works with actual robot hardware
-   - Same usage as original but with profiling output
-
-2. **`profile_rtc_comparison.py`**
-   - Side-by-side comparison of RTC vs no-RTC
-   - No robot needed (uses mock observations)
-   - Shows clear verdict on whether RTC is helping
-   - Great for quick performance checks
-
-3. **`profile_pi0_rtc_detailed.py`**
-   - Most detailed profiling available
-   - Can enable RTC method-level profiling
-   - Provides insights and recommendations
-   - Perfect for deep-dive investigations
-
-4. **`add_rtc_profiling.py`**
-   - Monkey-patching utility for RTC internals
-   - Profiles individual RTC operations
-   - Can be applied without modifying source
-   - Shows exactly where RTC spends time
-
-### Utilities
-
-5. **`src/lerobot/utils/profiling.py`**
-   - Core profiling utilities
-   - Decorators for method profiling
-   - Context managers for code blocks
-   - Statistics collection and reporting
-
-### Documentation
-
-6. **`PROFILING_GUIDE.md`** - Comprehensive guide
-7. **`PROFILING_QUICK_START.md`** - Quick reference
-
-## 🚀 Quick Start
-
-### Step 1: Compare Performance
-
-Run this first to see if RTC is actually slower:
-
-```bash
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=20
-```
-
-**Expected output:**
-```
-COMPARISON SUMMARY
-================================================================================
-Metric                         Without RTC        With RTC      Difference
--------------------------------------------------------------------------------
-Mean time (ms)                       150.23         165.45          +15.22
-Throughput (iter/s)                    6.66           6.05           -0.61
-================================================================================
-VERDICT
-✗ RTC is SLOWER by 10.1%
-  Mean time increased by 15.22 ms
-  
-  Possible reasons:
-  - RTC overhead exceeds benefits at current execution horizon
-  - No torch.compile enabled
-```
-
-### Step 2: Identify Bottleneck
-
-If RTC is slower, find out why:
-
-```bash
-uv run examples/rtc/profile_pi0_rtc_detailed.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=20 \
-    --execution_horizon=20 \
-    --enable_rtc_profiling
-```
-
-**Expected output:**
-```
-PROFILING SUMMARY
-================================================================================
-Function                                             Count    Mean (ms)    Total (s)
------------------------------------------------------------------------------------
-iteration.policy_inference                              20      150.23         3.00
-rtc.denoise_step.guidance_computation                  200       15.67         3.13
-rtc.denoise_step.autograd_correction                   200        8.23         1.65
-iteration.preprocessing                                 20       12.45         0.25
-================================================================================
-
-KEY INSIGHTS
-================================================================================
-Time breakdown:
-  Policy inference:  150.23 ms (87.2%)
-  Preprocessing:     12.45 ms (7.2%)
-  Postprocessing:    2.10 ms (1.2%)
-
-RTC breakdown:
-  Base denoising:    120.45 ms
-  Guidance compute:  15.67 ms
-  Autograd correct:  8.23 ms
-  RTC overhead:      23.90 ms (19.8% of base)
-
-Recommendations:
-  ⚠ RTC autograd overhead is significant
-    → This is expected, but consider increasing execution_horizon
-    → Try torch.compile if not already enabled
-  💡 torch.compile not enabled
-    → Try --use_torch_compile for potential speedup
-================================================================================
-```
-
-### Step 3: Try Optimizations
-
-Based on recommendations:
-
-```bash
-# Try with torch.compile
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=20 \
-    --use_torch_compile
-
-# Try larger execution horizon
-uv run examples/rtc/profile_rtc_comparison.py \
-    --policy_path=helper2424/pi05_check_rtc \
-    --device=mps \
-    --num_iterations=50 \
-    --execution_horizon=30
-```
-
-### Step 4: Profile Real Robot (Optional)
-
-Test with actual hardware:
-
-```bash
-uv run examples/rtc/eval_with_real_robot_profiled.py \
-    --policy.path=helper2424/pi05_check_rtc \
-    --policy.device=mps \
-    --rtc.enabled=true \
-    --rtc.execution_horizon=20 \
-    --robot.type=so100_follower \
-    --robot.port=/dev/tty.usbmodem58FA0834591 \
-    --robot.cameras="{...}" \
-    --task="Pick up object" \
-    --duration=30
-```
-
-## 🎯 Common Scenarios
-
-### "RTC is 2x slower!"
-
-This usually means:
- RTC overhead is high but not getting benefits
- Need to enable torch.compile
- Execution horizon too small
- Inference delay not calculated correctly
-
-**Try:**
-1. `--use_torch_compile`
-2. Increase `--execution_horizon` to 30+
-3. Check inference_delay calculation
-
-### "RTC is only slightly slower"
-
-This is expected! RTC overhead is about 10-30% typically.
-The benefit comes during **execution**, not single inference:
- Actions are reused across chunks
- Overall system latency is reduced
- Robot gets smoother actions
-
-### "Want to optimize specific part"
-
-Use the profiling utilities:
-
-```python
-from lerobot.utils.profiling import enable_profiling, profile_section, print_profiling_summary
-
-enable_profiling()
-
-with profile_section("my_custom_operation"):
-    # Your code here
-    pass
-
-print_profiling_summary()
-```
-
-## 📊 Understanding Results
-
-### Key Metrics
-
-**Policy Inference Time**
- Time for forward pass through model
- Should be largest component (70-90%)
- Includes RTC guidance if enabled
-
-**Preprocessing Time**
- Image normalization, resizing
- Should be < 20% of total
- If high: reduce image resolution
-
-**RTC Guidance Overhead**
- Extra time for RTC guidance computation
- Typically 10-30% of base inference
- If > 50%: RTC may not be beneficial at current settings
-
-**Autograd Correction**
- Time computing gradients for RTC
- Usually 5-15% of base inference
- Can be reduced with torch.compile
-
-### Expected Ranges (Apple Silicon MPS)
-
-| Metric | Good | Acceptable | Poor |
-|--------|------|------------|------|
-| Policy inference | 100-150ms | 150-250ms | >250ms |
-| Preprocessing | <20ms | 20-50ms | >50ms |
-| RTC overhead | 10-30% | 30-50% | >50% |
-
-## 🔧 Optimization Guide
-
-### If RTC overhead is too high:
-
-1. **Enable compilation:**
-   ```bash
-   --use_torch_compile
-   ```
-   Expected improvement: 20-40% faster
-
-2. **Increase execution horizon:**
-   ```bash
-   --execution_horizon=30  # or higher
-   ```
-   Amortizes RTC cost over more actions
-
-3. **Check guidance weight:**
-   ```python
-   # In config
-   rtc.max_guidance_weight=1.0  # try 0.5 for less overhead
-   ```
-
-### If preprocessing is slow:
-
-1. **Reduce image resolution:**
-   ```python
-   # In robot config
-   cameras={
-       "gripper": {"width": 320, "height": 240}  # instead of 640x480
-   }
-   ```
-
-2. **Use fewer cameras:**
-   - Profile which cameras are essential
-   - Remove unnecessary views
-
-### If inference is generally slow:
-
-1. Use torch.compile (if not already)
-2. Check device is correct (MPS vs CUDA)
-3. Verify model is in eval mode
-4. Check for unnecessary gradient tracking
-
-## 🐛 Troubleshooting
-
-### Empty profiling output
-```python
-# Make sure to enable profiling!
-from lerobot.utils.profiling import enable_profiling
-enable_profiling()
-```
-
-### Inconsistent timings
- Run more iterations (50-100)
- Check thermal throttling
- Close background apps
- Use `--warmup_iterations=10`
-
-### Can't find bottleneck
-1. Start with `profile_rtc_comparison.py`
-2. Then run `profile_pi0_rtc_detailed.py --enable_rtc_profiling`
-3. Compare with/without RTC
-4. Check each component separately
-
-## 📖 Full Documentation
-
- **`PROFILING_GUIDE.md`** - Complete reference with examples
- **`PROFILING_QUICK_START.md`** - Quick commands and tips
-
-## 🤝 Getting Help
-
-If you're still experiencing issues:
-
-1. Run comparison script and save output
-2. Run detailed profiling and save output
-3. Include:
-   - Policy path
-   - Device type
-   - RTC settings (execution_horizon, etc.)
-   - Hardware specs
-   - Full profiling output
-
-## 🎓 Learning More
-
-### Profiling your own code:
-
-```python
-from lerobot.utils.profiling import profile_method, enable_profiling
-
-enable_profiling()
-
-@profile_method
-def my_function():
-    # Automatically profiled
-    pass
-```
-
-### RTC internals:
-
-```python
-from examples.rtc.add_rtc_profiling import monkey_patch_rtc_profiling
-
-enable_profiling()
-monkey_patch_rtc_profiling()
-
-# Now RTC methods are profiled
-policy.predict_action_chunk(...)
-```
-
-## ✨ Next Steps
-
-1. Run `profile_rtc_comparison.py` to establish baseline
-2. Use `profile_pi0_rtc_detailed.py` to find bottlenecks
-3. Apply optimizations (torch.compile, larger horizon)
-4. Re-run comparison to verify improvements
-5. Test with real robot using profiled version
-
-Happy profiling! 🚀
-
@@ -1,251 +0,0 @@
-# Real-Time Chunking (RTC) Examples
-
-This directory contains examples and evaluation scripts for Real-Time Chunking (RTC), a technique for improving action chunking policies in real-time robot control.
-
-## Overview
-
-Real-Time Chunking addresses the challenge of maintaining consistency and reactivity when using action chunking policies with non-negligible inference latency. It uses a guidance technique during diffusion sampling to blend new action predictions with previously planned actions.
-
-**Key Benefits:**
-
- Maintains consistency between consecutive action chunks
- Reduces jitter and improves smoothness
- Adapts to inference delays dynamically
-
-**Reference:** [Physical Intelligence - Real-Time Chunking](https://www.physicalintelligence.company/download/real_time_chunking.pdf)
-
-## Scripts
-
-### 1. `eval_dataset.py`
-
-Offline evaluation on dataset samples with detailed visualization and validation.
-
-**Features:**
-
- Compare RTC vs non-RTC predictions on two random dataset samples
- Validate RTC behavior (delay region, blend region, post-horizon region)
- Generate debug visualizations:
-  - Denoising step comparisons (x_t, v_t, x1_t, corrections)
-  - Final action predictions comparison
- Support for torch.compile() optimization
- Memory-efficient sequential policy loading for large models
-
-**Usage:**
-
-```bash
-# Basic usage with SmolVLA policy
-uv run python examples/rtc/eval_dataset.py \
-    --policy.path=helper2424/smolvla_check_rtc_last3 \
-    --dataset.repo_id=helper2424/check_rtc \
-    --rtc.execution_horizon=8 \
-    --device=mps \
-    --rtc.max_guidance_weight=10.0 \
-    --seed=10
-
-# With Pi0.5 policy on CUDA
-uv run python examples/rtc/eval_dataset.py \
-    --policy.path=lerobot/pi05_libero_finetuned \
-    --dataset.repo_id=HuggingFaceVLA/libero \
-    --rtc.execution_horizon=8 \
-    --device=cuda
-
-# With Pi0 policy
-uv run python examples/rtc/eval_dataset.py \
-    --policy.path=lerobot/pi0_libero_finetuned \
-    --dataset.repo_id=HuggingFaceVLA/libero \
-    --rtc.execution_horizon=8 \
-    --device=cuda
-
-# With torch.compile for faster inference
-uv run python examples/rtc/eval_dataset.py \
-    --policy.path=helper2424/smolvla_check_rtc_last3 \
-    --dataset.repo_id=helper2424/check_rtc \
-    --rtc.execution_horizon=8 \
-    --device=cuda \
-    --use_torch_compile=true \
-    --torch_compile_mode=max-autotune
-
-# Enable CUDA graphs (advanced - may cause tensor aliasing errors)
-uv run python examples/rtc/eval_dataset.py \
-    --policy.path=helper2424/smolvla_check_rtc_last3 \
-    --dataset.repo_id=helper2424/check_rtc \
-    --use_torch_compile=true \
-    --torch_compile_backend=inductor \
-    --torch_compile_mode=max-autotune \
-    --torch_compile_disable_cudagraphs=false
-```
-
-**Key Parameters:**
-
- `--policy.path`: Path to pretrained policy
- `--dataset.repo_id`: Dataset to evaluate on
- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 20)
- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 10.0)
- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
- `--inference_delay`: Inference delay for RTC (default: 4)
- `--seed`: Random seed for reproducibility (default: 42)
- `--output_dir`: Directory to save visualizations (default: rtc_debug_output)
- `--device`: Device to use (cuda, cpu, mps, auto)
- `--use_torch_compile`: Enable torch.compile() for faster inference
-
-**Output:**
-
-The script generates several visualization files in `rtc_debug_output/`:
-
- `denoising_xt_comparison.png` - Noisy state evolution during denoising
- `denoising_vt_comparison.png` - Velocity predictions during denoising
- `denoising_x1t_comparison.png` - Predicted final states during denoising
- `denoising_correction_comparison.png` - RTC guidance corrections applied
- `final_actions_comparison.png` - Final action predictions (prev_chunk, no_rtc, rtc)
-
-The script also validates RTC behavior and reports:
-
- ✅ Delay region [0:inference_delay]: RTC = prev_chunk
- ✅ Blend region [inference_delay:execution_horizon]: prev_chunk ≤ RTC ≤ no_rtc
- ✅ Post-horizon [execution_horizon:]: RTC = no_rtc
-
-### 2. `eval_with_real_robot.py`
-
-Real-time evaluation on physical robots or simulation environments.
-
-**Features:**
-
- Run policy with RTC on real robot or simulation
- Multi-threaded action execution and inference
- Action queue management with proper timing
- Latency tracking and adaptive inference delay
- Support for both robots and gym environments
- Support for torch.compile() optimization
-
-**Usage:**
-
-```bash
-# With real robot
-uv run python examples/rtc/eval_with_real_robot.py \
-    --policy.path=lerobot/smolvla_base \
-    --robot.type=so100 \
-    --task="pick up the cup" \
-    --duration=30.0
-
-# With simulation environment
-uv run python examples/rtc/eval_with_real_robot.py \
-    --policy.path=lerobot/smolvla_base \
-    --env.type=pusht \
-    --duration=60.0
-
-# With policy compilation (CUDA only, not MPS)
-uv run python examples/rtc/eval_with_real_robot.py \
-    --policy.path=lerobot/smolvla_base \
-    --robot.type=so100 \
-    --use_torch_compile=true \
-    --torch_compile_mode=max-autotune
-```
-
-**Key Parameters:**
-
- `--policy.path`: Path to pretrained policy
- `--robot.type` or `--env.type`: Robot or environment to use
- `--task`: Task description (for VLA models)
- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 10)
- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 1.0)
- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
- `--duration`: How long to run (seconds, default: 30.0)
- `--fps`: Action execution frequency (Hz, default: 10.0)
- `--action_queue_size_to_get_new_actions`: Queue size threshold to request new actions (default: 30)
- `--device`: Device to use (cuda, cpu, mps, auto)
- `--use_torch_compile`: Enable torch.compile() for faster inference
-
-## Understanding RTC Parameters
-
-### `execution_horizon`
-
-Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity.
-
-**Typical values:** 8-12 steps for dataset evaluation, 10 steps for real-time execution
-
-### `max_guidance_weight`
-
-Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions.
-
-**Typical values:**
-
- Dataset evaluation: 10.0-100.0 (can be higher for analysis)
- Real-time execution: 1.0-10.0 (more conservative)
-
-### `prefix_attention_schedule`
-
-How to weight consistency across the overlap region:
-
- `ZEROS`: Binary (full weight up to inference_delay, then zero)
- `ONES`: Full weight across entire execution_horizon
- `LINEAR`: Linear decay from inference_delay to execution_horizon
- `EXP`: Exponential decay (recommended)
-
-**Recommended:** `EXP`
-
-### `inference_delay`
-
-Number of timesteps from the prefix to use for guidance. Typically calculated dynamically based on inference latency in real-time execution, but fixed for dataset evaluation.
-
-**Typical values:** 3-5 steps for dataset evaluation
-
-### `action_queue_size_to_get_new_actions` (real-time only)
-
-Threshold for requesting new action chunks. Should be higher than `inference_delay + execution_horizon` to ensure smooth operation.
-
-**Typical values:** 20-30 steps
-
-## Validation Rules (Dataset Evaluation)
-
-The dataset evaluation script validates that RTC behavior matches expectations:
-
-1. **Delay Region [0:inference_delay]**: RTC actions should equal previous chunk
-   - Ensures consistency during the inference delay period
-
-2. **Blend Region [inference_delay:execution_horizon]**: RTC should be between prev_chunk and no_rtc
-   - Smooth transition from previous plan to new predictions
-
-3. **Post-Horizon [execution_horizon:]**: RTC should equal no_rtc
-   - Full adoption of new predictions after execution horizon
-
-## Tips
-
-1. **Start with dataset evaluation** (`eval_dataset.py`) to understand RTC behavior and tune parameters before running on robot
-2. **Use visualizations** to debug unexpected behavior - check denoising steps and final actions
-3. **Tune execution_horizon** based on your inference latency and action frequency
-4. **Monitor validation output** - failures indicate potential implementation issues or misconfigured parameters
-5. **Compare different schedules** - EXP usually works best but LINEAR can be more interpretable
-
-## Troubleshooting
-
-### Validation fails in delay region
-
- Check that `prev_chunk_left_over` is properly passed to the policy
- Verify RTC guidance is being applied during denoising
- Look at denoising visualizations to see where guidance diverges
-
-### Validation fails in post-horizon region
-
- RTC and no_rtc use different noise - verify same noise is being used for comparison
- Check that weights are correctly zeroed out after execution horizon
- Review prefix_attention_schedule visualization
-
-### Poor performance on real robot
-
- Increase `action_queue_size_to_get_new_actions` if you see warnings
- Reduce `max_guidance_weight` if robot is too conservative
- Try different `prefix_attention_schedule` values
- Enable torch.compile() for faster inference (CUDA only)
-
-### Memory issues with large models
-
- The dataset evaluation script loads policies sequentially to minimize memory
- For real-time execution, only one policy is loaded
- Use smaller batch sizes if needed
-
-## Related Documentation
-
- [RTC Implementation](../../src/lerobot/policies/rtc/modeling_rtc.py)
- [RTC Configuration](../../src/lerobot/policies/rtc/configuration_rtc.py)
- [Action Queue](../../src/lerobot/policies/rtc/action_queue.py)
- [Physical Intelligence Paper](https://www.physicalintelligence.company/download/real_time_chunking.pdf)
@@ -1,202 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Script to add profiling instrumentation to RTCProcessor.
-
-This script shows which methods to profile in the RTC code to identify bottlenecks.
-You can either:
-1. Apply these changes directly to modeling_rtc.py
-2. Use monkey patching to add profiling without modifying source
-3. Use as reference for manual instrumentation
-
-Usage:
-    # Option 1: Monkey patch (no source changes)
-    python examples/rtc/add_rtc_profiling.py
-
-    # Option 2: Apply changes to source
-    # Copy the profiled methods below into src/lerobot/policies/rtc/modeling_rtc.py
-"""
-
-import logging
-
-import torch
-from torch import Tensor
-
-from lerobot.policies.rtc.modeling_rtc import RTCProcessor
-from lerobot.utils.profiling import ProfileContext, enable_profiling, is_profiling_enabled
-
-logger = logging.getLogger(__name__)
-
-
-def profile_denoise_step(self, x_t, prev_chunk_left_over, inference_delay, time, original_denoise_step_partial, execution_horizon=None) -> Tensor:
-    """Profiled version of denoise_step."""
-    
-    if not is_profiling_enabled():
-        # Call original implementation if profiling disabled
-        return self._original_denoise_step(x_t, prev_chunk_left_over, inference_delay, time, original_denoise_step_partial, execution_horizon)
-    
-    with ProfileContext("rtc.denoise_step.total"):
-        # In the original implementation, the time goes from 0 to 1 and
-        # In our implementation, the time goes from 1 to 0
-        # So we need to invert the time
-        tau = 1 - time
-
-        if prev_chunk_left_over is None:
-            # First step, no guidance - return v_t
-            with ProfileContext("rtc.denoise_step.base_denoising"):
-                v_t = original_denoise_step_partial(x_t)
-            return v_t
-
-        with ProfileContext("rtc.denoise_step.setup"):
-            x_t = x_t.clone().detach()
-
-            squeezed = False
-            if len(x_t.shape) < 3:
-                x_t = x_t.unsqueeze(0)
-                squeezed = True
-
-            if len(prev_chunk_left_over.shape) < 3:
-                prev_chunk_left_over = prev_chunk_left_over.unsqueeze(0)
-
-            if execution_horizon is None:
-                execution_horizon = self.rtc_config.execution_horizon
-
-            if execution_horizon > prev_chunk_left_over.shape[1]:
-                execution_horizon = prev_chunk_left_over.shape[1]
-
-            batch_size = x_t.shape[0]
-            action_chunk_size = x_t.shape[1]
-            action_dim = x_t.shape[2]
-
-        # Padding
-        with ProfileContext("rtc.denoise_step.padding"):
-            if prev_chunk_left_over.shape[1] < action_chunk_size or prev_chunk_left_over.shape[2] < action_dim:
-                padded = torch.zeros(batch_size, action_chunk_size, action_dim).to(x_t.device)
-                padded[:, : prev_chunk_left_over.shape[1], : prev_chunk_left_over.shape[2]] = prev_chunk_left_over
-                prev_chunk_left_over = padded
-
-        # Get prefix weights
-        with ProfileContext("rtc.denoise_step.get_prefix_weights"):
-            weights = (
-                self.get_prefix_weights(inference_delay, execution_horizon, action_chunk_size)
-                .to(x_t.device)
-                .unsqueeze(0)
-                .unsqueeze(-1)
-            )
-
-        # Main RTC guidance computation
-        with ProfileContext("rtc.denoise_step.guidance_computation"):
-            with torch.enable_grad():
-                # Base denoising
-                with ProfileContext("rtc.denoise_step.base_denoising"):
-                    v_t = original_denoise_step_partial(x_t)
-                
-                x_t.requires_grad_(True)
-
-                # Compute x1_t
-                with ProfileContext("rtc.denoise_step.compute_x1_t"):
-                    x1_t = x_t - time * v_t
-
-                # Compute error
-                with ProfileContext("rtc.denoise_step.compute_error"):
-                    err = (prev_chunk_left_over - x1_t) * weights
-                    grad_outputs = err.clone().detach()
-
-                # Compute correction via autograd
-                with ProfileContext("rtc.denoise_step.autograd_correction"):
-                    correction = torch.autograd.grad(x1_t, x_t, grad_outputs, retain_graph=False)[0]
-
-        # Compute guidance weight
-        with ProfileContext("rtc.denoise_step.compute_guidance_weight"):
-            max_guidance_weight = torch.as_tensor(self.rtc_config.max_guidance_weight)
-            tau_tensor = torch.as_tensor(tau)
-            squared_one_minus_tau = (1 - tau_tensor) ** 2
-            inv_r2 = (squared_one_minus_tau + tau_tensor**2) / (squared_one_minus_tau)
-            c = torch.nan_to_num((1 - tau_tensor) / tau_tensor, posinf=max_guidance_weight)
-            guidance_weight = torch.nan_to_num(c * inv_r2, posinf=max_guidance_weight)
-            guidance_weight = torch.minimum(guidance_weight, max_guidance_weight)
-
-        # Apply guidance
-        with ProfileContext("rtc.denoise_step.apply_guidance"):
-            result = v_t - guidance_weight * correction
-
-        # Cleanup
-        with ProfileContext("rtc.denoise_step.cleanup"):
-            if squeezed:
-                result = result.squeeze(0)
-                correction = correction.squeeze(0)
-                x1_t = x1_t.squeeze(0)
-                err = err.squeeze(0)
-
-            self.track(
-                time=time,
-                x1_t=x1_t,
-                correction=correction,
-                err=err,
-                weights=weights,
-                guidance_weight=guidance_weight,
-                inference_delay=inference_delay,
-                execution_horizon=execution_horizon,
-            )
-
-        return result
-
-
-def monkey_patch_rtc_profiling():
-    """Apply profiling to RTCProcessor via monkey patching.
-    
-    This modifies the RTCProcessor class at runtime to add profiling
-    without changing source files.
-    """
-    logger.info("Applying RTC profiling monkey patch...")
-    
-    # Save original method
-    RTCProcessor._original_denoise_step = RTCProcessor.denoise_step
-    
-    # Replace with profiled version
-    RTCProcessor.denoise_step = profile_denoise_step
-    
-    logger.info("✓ RTC profiling enabled")
-
-
-def print_usage():
-    """Print usage instructions."""
-    print("\n" + "="*80)
-    print("RTC PROFILING INSTRUMENTATION")
-    print("="*80)
-    print("\nThis script provides profiling for RTCProcessor methods.")
-    print("\nOption 1: Monkey Patch (Recommended)")
-    print("-" * 40)
-    print("Add to your script:")
-    print("""
-    from lerobot.utils.profiling import enable_profiling, print_profiling_summary
-    from examples.rtc.add_rtc_profiling import monkey_patch_rtc_profiling
-
-    # Enable profiling
-    enable_profiling()
-    monkey_patch_rtc_profiling()
-
-    # ... run your code ...
-
-    # Print results
-    print_profiling_summary()
-    """)
-    
-    print("\nOption 2: Manual Source Modification")
-    print("-" * 40)
-    print("1. Copy profile_denoise_step() from this file")
-    print("2. Replace denoise_step() in src/lerobot/policies/rtc/modeling_rtc.py")
-    print("3. Add profiling imports at top of file")
-    
-    print("\nKey Metrics to Watch:")
-    print("-" * 40)
-    print("- rtc.denoise_step.base_denoising     - Time for base policy inference")
-    print("- rtc.denoise_step.autograd_correction - Time computing gradients")
-    print("- rtc.denoise_step.guidance_computation - Total guidance overhead")
-    print("- rtc.denoise_step.get_prefix_weights  - Time computing weights")
-    print("="*80 + "\n")
-
-
-if __name__ == "__main__":
-    print_usage()
-
@@ -39,8 +39,9 @@ Usage:
    uv run python examples/rtc/eval_dataset.py \
        --policy.path=lerobot/pi05_libero_finetuned \
        --dataset.repo_id=HuggingFaceVLA/libero \
-        --rtc.execution_horizon=8 \
+        --rtc.execution_horizon=10 \
        --device=mps
+        --seed=10

    # Basic usage with pi0.5 policy with cuda device
    uv run python examples/rtc/eval_dataset.py \
@@ -543,11 +544,6 @@ class RTCEvaluator:
        logging.info("Plotting results...")
        self.plot_tracked_data(rtc_tracked_steps, no_rtc_tracked_steps, prev_chunk_left_over, num_steps)

-        # Validate RTC behavior
-        # logging.info("=" * 80)
-        # logging.info("Validating RTC behavior...")
-        # self.validate_rtc_behavior(rtc_actions, no_rtc_actions, prev_chunk_left_over)
-
        # Plot final actions comparison
        logging.info("=" * 80)
        logging.info("Plotting final actions comparison...")
@@ -556,159 +552,6 @@ class RTCEvaluator:
        logging.info("=" * 80)
        logging.info("Evaluation completed successfully")

-    def validate_rtc_behavior(self, rtc_actions, no_rtc_actions, prev_chunk_left_over):
-        """Validate RTC behavior by comparing final action predictions with expected values.
-
-        Validation rules:
-        1. During delay [0:inference_delay]: RTC should equal prev_chunk
-        2. After delay, within execution horizon [inference_delay:execution_horizon]:
-           RTC should be between prev_chunk and no_rtc
-        3. After execution horizon [execution_horizon:]: RTC should equal no_rtc
-
-        Args:
-            rtc_actions: Final actions from RTC policy (batch, time, action_dim)
-            no_rtc_actions: Final actions from non-RTC policy (batch, time, action_dim)
-            prev_chunk_left_over: Previous chunk used as ground truth (time, action_dim)
-        """
-        # Remove batch dimension if present and move to CPU
-        rtc_actions_t = rtc_actions.squeeze(0).cpu() if len(rtc_actions.shape) == 3 else rtc_actions.cpu()
-        no_rtc_actions_t = (
-            no_rtc_actions.squeeze(0).cpu() if len(no_rtc_actions.shape) == 3 else no_rtc_actions.cpu()
-        )
-        prev_chunk = prev_chunk_left_over.cpu()
-
-        logging.info(f"  rtc_actions shape: {rtc_actions_t.shape}")
-        logging.info(f"  no_rtc_actions shape: {no_rtc_actions_t.shape}")
-        logging.info(f"  prev_chunk shape: {prev_chunk.shape}")
-
-        # Determine chunk length for comparison
-        chunk_len = min(rtc_actions_t.shape[0], no_rtc_actions_t.shape[0], prev_chunk.shape[0])
-        inference_delay = self.cfg.inference_delay
-        execution_horizon = self.cfg.rtc.execution_horizon
-
-        # Tolerance for floating point comparison
-        rtol = 1e-2  # Relative tolerance
-
-        validation_passed = True
-        warnings = []
-
-        logging.info("  Validating RTC behavior:")
-        logging.info(f"    Chunk length: {chunk_len}")
-        logging.info(f"    Inference delay: {inference_delay}")
-        logging.info(f"    Execution horizon: {execution_horizon}")
-        logging.info(f"    Tolerance: rtol={rtol}")
-
-        # ============================================================================
-        # Rule 1: During delay [0:inference_delay], RTC should equal prev_chunk
-        # ============================================================================
-        if inference_delay > 0:
-            delay_end = min(inference_delay, chunk_len)
-            rtc_delay = rtc_actions_t[:delay_end]
-            prev_delay = prev_chunk[:delay_end]
-
-            logging.info(f"  rtc_delay: {rtc_delay.shape}")
-            logging.info(f"  prev_delay: {prev_delay.shape}")
-
-            if not torch.allclose(rtc_delay, prev_delay, rtol=rtol):
-                max_diff = torch.max(torch.abs(rtc_delay - prev_delay)).item()
-                mean_diff = torch.mean(torch.abs(rtc_delay - prev_delay)).item()
-                logging.info(f"  rtc_delay: {rtc_delay}")
-                logging.info(f"  prev_delay: {prev_delay}")
-                logging.info(f"  max_diff: {max_diff}")
-                logging.info(f"  mean_diff: {mean_diff}")
-                warnings.append(
-                    f"    ⚠ VALIDATION FAILED: During delay [0:{delay_end}], "
-                    f"RTC does NOT equal prev_chunk!\n"
-                    f"      Max difference: {max_diff:.6f}\n"
-                    f"      Mean difference: {mean_diff:.6f}"
-                )
-                validation_passed = False
-            else:
-                logging.info(f"    ✓ During delay [0:{delay_end}]: RTC equals prev_chunk")
-
-        # ============================================================================
-        # Rule 2: After delay, within execution horizon [inference_delay:execution_horizon]
-        #         RTC should be between prev_chunk and no_rtc
-        # ============================================================================
-        blend_start = inference_delay
-        blend_end = min(execution_horizon, chunk_len)
-
-        if blend_end > blend_start:
-            rtc_blend = rtc_actions_t[blend_start:blend_end]
-            prev_blend = prev_chunk[blend_start:blend_end]
-            no_rtc_blend = no_rtc_actions_t[blend_start:blend_end]
-
-            # Check if RTC is between prev_chunk and no_rtc (element-wise)
-            # For each element, check if it's between the min and max of prev_chunk and no_rtc
-            min_bound = torch.minimum(prev_blend, no_rtc_blend)
-            max_bound = torch.maximum(prev_blend, no_rtc_blend)
-
-            within_bounds = torch.logical_and(rtc_blend >= min_bound, rtc_blend <= max_bound)
-
-            if not torch.all(within_bounds):
-                violations = torch.sum(~within_bounds).item()
-                total_elements = within_bounds.numel()
-                violation_pct = 100.0 * violations / total_elements
-
-                # Find max violation
-                lower_violations = torch.maximum(torch.tensor(0.0), min_bound - rtc_blend)
-                upper_violations = torch.maximum(torch.tensor(0.0), rtc_blend - max_bound)
-                max_violation = torch.max(torch.maximum(lower_violations, upper_violations)).item()
-
-                warnings.append(
-                    f"    ⚠ VALIDATION FAILED: In blend region [{blend_start}:{blend_end}], "
-                    f"RTC is NOT always between prev_chunk and no_rtc!\n"
-                    f"      Violations: {violations}/{total_elements} elements ({violation_pct:.1f}%)\n"
-                    f"      Max violation distance: {max_violation:.6f}"
-                )
-                validation_passed = False
-            else:
-                logging.info(
-                    f"    ✓ Blend region [{blend_start}:{blend_end}]: RTC is between prev_chunk and no_rtc"
-                )
-
-        # ============================================================================
-        # Rule 3: After execution horizon [execution_horizon:], RTC should equal no_rtc
-        # ============================================================================
-        if execution_horizon < chunk_len:
-            rtc_after = rtc_actions_t[execution_horizon:chunk_len]
-            no_rtc_after = no_rtc_actions_t[execution_horizon:chunk_len]
-
-            logging.info(f"  rtc_after: {rtc_after}")
-            logging.info(f"  no_rtc_after: {no_rtc_after}")
-
-            if not torch.allclose(rtc_after, no_rtc_after, rtol=rtol):
-                max_diff = torch.max(torch.abs(rtc_after - no_rtc_after)).item()
-                mean_diff = torch.mean(torch.abs(rtc_after - no_rtc_after)).item()
-                warnings.append(
-                    f"    ⚠ VALIDATION FAILED: After execution horizon [{execution_horizon}:{chunk_len}], "
-                    f"RTC does NOT equal no_rtc!\n"
-                    f"      Max difference: {max_diff:.6f}\n"
-                    f"      Mean difference: {mean_diff:.6f}"
-                )
-                validation_passed = False
-            else:
-                logging.info(
-                    f"    ✓ After execution horizon [{execution_horizon}:{chunk_len}]: RTC equals no_rtc"
-                )
-
-        # ============================================================================
-        # Report results
-        # ============================================================================
-        logging.info("=" * 80)
-        if validation_passed:
-            logging.info("  ✅ VALIDATION PASSED: All RTC behavior checks passed!")
-            logging.info("    • During delay: RTC = prev_chunk ✓")
-            logging.info("    • Blend region: prev_chunk ≤ RTC ≤ no_rtc ✓")
-            logging.info("    • After execution horizon: RTC = no_rtc ✓")
-        else:
-            logging.error("  ❌ VALIDATION FAILED: RTC behavior does not match expected!")
-            logging.error("")
-            for warning in warnings:
-                logging.error(warning)
-            logging.error("")
-            logging.error("  Please check the implementation of RTC guidance.")
-
    def plot_final_actions_comparison(self, rtc_actions, no_rtc_actions, prev_chunk_left_over):
        """Plot final action predictions comparison on a single chart.

@@ -795,16 +638,34 @@ class RTCEvaluator:
            ax.set_xticks(range(0, max_len, max(1, max_len // 20)))  # Show ~20 ticks
            ax.set_xlim(-0.5, max_len - 0.5)

-            # Add legend only to first subplot
-            if dim_idx == 0:
-                ax.legend(loc="best", fontsize=9)
-
        axes[-1].set_xlabel("Step", fontsize=10)

+        # Collect legend handles and labels from first subplot
+        handles, labels = axes[0].get_legend_handles_labels()
+        # Remove duplicates while preserving order
+        seen = set()
+        unique_handles = []
+        unique_labels = []
+        for handle, label in zip(handles, labels, strict=True):
+            if label not in seen:
+                seen.add(label)
+                unique_handles.append(handle)
+                unique_labels.append(label)
+
+        # Add legend outside the plot area (to the right)
+        fig.legend(
+            unique_handles,
+            unique_labels,
+            loc="center right",
+            fontsize=9,
+            bbox_to_anchor=(1.0, 0.5),
+            framealpha=0.9,
+        )
+
        # Save figure
        output_path = os.path.join(self.cfg.output_dir, "final_actions_comparison.png")
-        fig.tight_layout()
-        fig.savefig(output_path, dpi=150)
+        fig.tight_layout(rect=[0, 0, 0.85, 1])  # Leave space for legend on right
+        fig.savefig(output_path, dpi=150, bbox_inches="tight")
        logging.info(f"Saved final actions comparison to {output_path}")
        plt.close(fig)

@@ -825,6 +686,7 @@ class RTCEvaluator:
            axs_corr[:, 1],  # Right column for correction
            axs_x1t[:, 1],  # Right column for x1_t
            num_steps,
+            add_labels=True,  # Add labels for RTC (right column)
        )

        self._plot_denoising_steps_from_tracker(
@@ -834,6 +696,7 @@ class RTCEvaluator:
            axs_corr[:, 0],  # Left column for correction
            axs_x1t[:, 0],  # Left column for x1_t
            num_steps,
+            add_labels=False,  # No labels for No RTC (left column)
        )

        # Plot no-RTC x_t data on right chart as orange dashed line for comparison
@@ -849,15 +712,21 @@ class RTCEvaluator:
            axs_x1t[:, 1], prev_chunk_left_over, start_from=0, color="red", label="Ground truth"
        )

-        # Plot ground truth on x_t axes
+        # Plot ground truth on x_t axes (no labels for left column)
        RTCDebugVisualizer.plot_waypoints(
-            axs_xt[:, 0], prev_chunk_left_over, start_from=0, color="red", label="Ground truth"
+            axs_xt[:, 0], prev_chunk_left_over, start_from=0, color="red", label=None
        )

        RTCDebugVisualizer.plot_waypoints(
-            axs_x1t[:, 0], prev_chunk_left_over, start_from=0, color="red", label="Ground truth"
+            axs_x1t[:, 0], prev_chunk_left_over, start_from=0, color="red", label=None
        )

+        # Add legends outside the plot area for each figure
+        self._add_figure_legend(fig_xt, axs_xt)
+        self._add_figure_legend(fig_vt, axs_vt)
+        self._add_figure_legend(fig_corr, axs_corr)
+        self._add_figure_legend(fig_x1t, axs_x1t)
+
        # Save denoising plots
        self._save_figure(fig_xt, os.path.join(self.cfg.output_dir, "denoising_xt_comparison.png"))
        self._save_figure(fig_vt, os.path.join(self.cfg.output_dir, "denoising_vt_comparison.png"))
@@ -875,13 +744,47 @@ class RTCEvaluator:

        return fig, axs

+    def _add_figure_legend(self, fig, axs):
+        """Add a legend outside the plot area on the right side.
+
+        Args:
+            fig: Matplotlib figure to add legend to
+            axs: Array of axes to collect legend handles from
+        """
+        # Collect all handles and labels from the first row of axes (right column)
+        handles, labels = axs[0, 1].get_legend_handles_labels()
+
+        # Remove duplicates while preserving order
+        seen = set()
+        unique_handles = []
+        unique_labels = []
+        for handle, label in zip(handles, labels, strict=True):
+            if label not in seen:
+                seen.add(label)
+                unique_handles.append(handle)
+                unique_labels.append(label)
+
+        # Add legend outside the plot area (to the right, close to charts)
+        if unique_handles:
+            fig.legend(
+                unique_handles,
+                unique_labels,
+                loc="center left",
+                fontsize=8,
+                bbox_to_anchor=(0.87, 0.5),
+                framealpha=0.9,
+                ncol=1,
+            )
+
    def _save_figure(self, fig, path):
-        fig.tight_layout()
-        fig.savefig(path, dpi=150)
+        fig.tight_layout(rect=[0, 0, 0.85, 1])  # Leave space for legend/colorbar on right
+        fig.savefig(path, dpi=150, bbox_inches="tight")
        logging.info(f"Saved figure to {path}")
        plt.close(fig)

-    def _plot_denoising_steps_from_tracker(self, tracked_steps, xt_axs, vt_axs, corr_axs, x1t_axs, num_steps):
+    def _plot_denoising_steps_from_tracker(
+        self, tracked_steps, xt_axs, vt_axs, corr_axs, x1t_axs, num_steps, add_labels=True
+    ):
        """Plot denoising steps from tracker data.

        Args:
@@ -891,6 +794,7 @@ class RTCEvaluator:
            corr_axs: Matplotlib axes for correction plots (array of 6 axes)
            x1t_axs: Matplotlib axes for x1_t plots (array of 6 axes)
            num_steps: Total number of denoising steps for colormap
+            add_labels: Whether to add legend labels for the plots
        """

        logging.info("=" * 80)
@@ -905,17 +809,18 @@ class RTCEvaluator:

        for step_idx, debug_step in enumerate(debug_steps):
            color = colors[step_idx % len(colors)]
+            label = f"Step {step_idx}" if add_labels else None

            # Plot x_t
            if debug_step.x_t is not None:
                RTCDebugVisualizer.plot_waypoints(
-                    xt_axs, debug_step.x_t, start_from=0, color=color, label=f"Step {step_idx}"
+                    xt_axs, debug_step.x_t, start_from=0, color=color, label=label
                )

            # Plot v_t
            if debug_step.v_t is not None:
                RTCDebugVisualizer.plot_waypoints(
-                    vt_axs, debug_step.v_t, start_from=0, color=color, label=f"Step {step_idx}"
+                    vt_axs, debug_step.v_t, start_from=0, color=color, label=label
                )

            # Plot correction on separate axes
@@ -925,17 +830,18 @@ class RTCEvaluator:
                    debug_step.correction,
                    start_from=0,
                    color=color,
-                    label=f"Step {step_idx}",
+                    label=label,
                )

            # Plot x1_t (predicted state)
            if x1t_axs is not None and debug_step.x1_t is not None:
+                x1t_label = f"x1_t Step {step_idx}" if add_labels else None
                RTCDebugVisualizer.plot_waypoints(
                    x1t_axs,
                    debug_step.x1_t,
                    start_from=0,
                    color=color,
-                    label=f"x1_t Step {step_idx}",
+                    label=x1t_label,
                )

            # Plot error in orange dashed
@@ -947,6 +853,7 @@ class RTCEvaluator:
                )

                num_dims = min(error_chunk.shape[-1], 6)
+                error_label = f"error Step {step_idx}" if add_labels else None
                for j in range(num_dims):
                    x1t_axs[j].plot(
                        np.arange(0, error_chunk.shape[0]),
@@ -954,7 +861,7 @@ class RTCEvaluator:
                        color="orange",
                        linestyle="--",
                        alpha=0.7,
-                        label=f"error Step {step_idx}",
+                        label=error_label,
                    )

        # Recalculate axis limits after plotting to ensure proper scaling
@@ -1,631 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Profiled version of eval_with_real_robot.py for performance analysis.
-
-This version adds detailed timing measurements for:
- Policy inference
- Preprocessing
- Postprocessing
- Action queue operations
- Robot communication
- Thread execution times
-
-Usage: Same as eval_with_real_robot.py but with profiling output.
-"""
-
-import logging
-import math
-import sys
-import time
-import traceback
-from collections import defaultdict
-from dataclasses import dataclass, field
-from threading import Event, Lock, Thread
-
-import torch
-from torch import Tensor
-
-from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig  # noqa: F401
-from lerobot.cameras.realsense.configuration_realsense import RealSenseCameraConfig  # noqa: F401
-from lerobot.configs import parser
-from lerobot.configs.policies import PreTrainedConfig
-from lerobot.configs.types import RTCAttentionSchedule
-from lerobot.datasets.utils import build_dataset_frame, hw_to_dataset_features
-from lerobot.policies.factory import get_policy_class, make_pre_post_processors
-from lerobot.policies.rtc.action_queue import ActionQueue
-from lerobot.policies.rtc.configuration_rtc import RTCConfig
-from lerobot.policies.rtc.latency_tracker import LatencyTracker
-from lerobot.processor.factory import (
-    make_default_robot_action_processor,
-    make_default_robot_observation_processor,
-)
-from lerobot.rl.process import ProcessSignalHandler
-from lerobot.robots import (  # noqa: F401
-    Robot,
-    RobotConfig,
-    koch_follower,
-    so100_follower,
-    so101_follower,
-)
-from lerobot.robots.utils import make_robot_from_config
-from lerobot.utils.constants import OBS_IMAGES
-from lerobot.utils.hub import HubMixin
-from lerobot.utils.utils import init_logging
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-
-class ProfileTimer:
-    """Context manager and utility class for timing code sections."""
-
-    def __init__(self, name: str, stats_dict: dict):
-        self.name = name
-        self.stats_dict = stats_dict
-        self.start_time = None
-
-    def __enter__(self):
-        self.start_time = time.perf_counter()
-        return self
-
-    def __exit__(self, *args):
-        elapsed = time.perf_counter() - self.start_time
-        if self.name not in self.stats_dict:
-            self.stats_dict[self.name] = []
-        self.stats_dict[self.name].append(elapsed)
-
-
-class ProfilingStats:
-    """Global profiling statistics collector."""
-
-    def __init__(self):
-        self.stats = defaultdict(list)
-        self.lock = Lock()
-
-    def record(self, name: str, duration: float):
-        with self.lock:
-            self.stats[name].append(duration)
-
-    def timer(self, name: str):
-        """Return a context manager for timing."""
-        return ProfileTimer(name, self.stats)
-
-    def get_summary(self) -> dict[str, dict[str, float]]:
-        """Get summary statistics for all timings."""
-        with self.lock:
-            summary = {}
-            for name, times in self.stats.items():
-                if times:
-                    summary[name] = {
-                        "count": len(times),
-                        "mean": sum(times) / len(times),
-                        "min": min(times),
-                        "max": max(times),
-                        "total": sum(times),
-                    }
-            return summary
-
-    def print_summary(self):
-        """Print formatted summary of all timings."""
-        summary = self.get_summary()
-        
-        logger.info("\n" + "=" * 80)
-        logger.info("PROFILING SUMMARY")
-        logger.info("=" * 80)
-        
-        # Sort by total time (descending)
-        sorted_items = sorted(summary.items(), key=lambda x: x[1]["total"], reverse=True)
-        
-        for name, stats in sorted_items:
-            logger.info(f"\n{name}:")
-            logger.info(f"  Count:     {stats['count']}")
-            logger.info(f"  Mean:      {stats['mean']*1000:.2f} ms")
-            logger.info(f"  Min:       {stats['min']*1000:.2f} ms")
-            logger.info(f"  Max:       {stats['max']*1000:.2f} ms")
-            logger.info(f"  Total:     {stats['total']:.2f} s")
-            logger.info(f"  Hz:        {stats['count']/stats['total']:.2f}")
-        
-        logger.info("\n" + "=" * 80)
-
-
-# Global profiling stats
-profiling_stats = ProfilingStats()
-
-
-class RobotWrapper:
-    def __init__(self, robot: Robot):
-        self.robot = robot
-        self.lock = Lock()
-
-    def get_observation(self) -> dict[str, Tensor]:
-        with profiling_stats.timer("robot.get_observation"):
-            with self.lock:
-                return self.robot.get_observation()
-
-    def send_action(self, action: Tensor):
-        with profiling_stats.timer("robot.send_action"):
-            with self.lock:
-                self.robot.send_action(action)
-
-    def observation_features(self) -> list[str]:
-        with self.lock:
-            return self.robot.observation_features
-
-    def action_features(self) -> list[str]:
-        with self.lock:
-            return self.robot.action_features
-
-
-@dataclass
-class RTCDemoConfig(HubMixin):
-    """Configuration for RTC demo with action chunking policies and real robots."""
-
-    # Policy configuration
-    policy: PreTrainedConfig | None = None
-
-    # Robot configuration
-    robot: RobotConfig | None = None
-
-    # RTC configuration
-    rtc: RTCConfig = field(
-        default_factory=lambda: RTCConfig(
-            execution_horizon=10,
-            max_guidance_weight=1.0,
-            prefix_attention_schedule=RTCAttentionSchedule.EXP,
-        )
-    )
-
-    # Demo parameters
-    duration: float = 30.0  # Duration to run the demo (seconds)
-    fps: float = 10.0  # Action execution frequency (Hz)
-
-    # Compute device
-    device: str | None = None  # Device to run on (cuda, cpu, auto)
-
-    # Get new actions horizon. The amount of executed steps after which will be requested new actions.
-    # It should be higher than inference delay + execution horizon.
-    action_queue_size_to_get_new_actions: int = 30
-
-    # Task to execute
-    task: str = field(default="", metadata={"help": "Task to execute"})
-
-    # Torch compile configuration
-    use_torch_compile: bool = field(
-        default=False,
-        metadata={"help": "Use torch.compile for faster inference (PyTorch 2.0+)"},
-    )
-
-    torch_compile_backend: str = field(
-        default="inductor",
-        metadata={"help": "Backend for torch.compile (inductor, aot_eager, cudagraphs)"},
-    )
-
-    torch_compile_mode: str = field(
-        default="default",
-        metadata={"help": "Compilation mode (default, reduce-overhead, max-autotune)"},
-    )
-
-    torch_compile_disable_cudagraphs: bool = field(
-        default=True,
-        metadata={
-            "help": "Disable CUDA graphs in torch.compile. Required due to in-place tensor "
-            "operations in denoising loop (x_t += dt * v_t) which cause tensor aliasing issues."
-        },
-    )
-
-    def __post_init__(self):
-        # HACK: We parse again the cli args here to get the pretrained path if there was one.
-        policy_path = parser.get_path_arg("policy")
-        if policy_path:
-            cli_overrides = parser.get_cli_overrides("policy")
-            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
-            self.policy.pretrained_path = policy_path
-        else:
-            raise ValueError("Policy path is required")
-
-        # Validate that robot configuration is provided
-        if self.robot is None:
-            raise ValueError("Robot configuration must be provided")
-
-    @classmethod
-    def __get_path_fields__(cls) -> list[str]:
-        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
-        return ["policy"]
-
-
-def is_image_key(k: str) -> bool:
-    return k.startswith(OBS_IMAGES)
-
-
-def get_actions(
-    policy,
-    robot: RobotWrapper,
-    robot_observation_processor,
-    action_queue: ActionQueue,
-    shutdown_event: Event,
-    cfg: RTCDemoConfig,
-):
-    """Thread function to request action chunks from the policy with profiling.
-
-    Args:
-        policy: The policy instance (SmolVLA, Pi0, etc.)
-        robot: The robot instance for getting observations
-        robot_observation_processor: Processor for raw robot observations
-        action_queue: Queue to put new action chunks
-        shutdown_event: Event to signal shutdown
-        cfg: Demo configuration
-    """
-    try:
-        logger.info("[GET_ACTIONS] Starting get actions thread")
-
-        latency_tracker = LatencyTracker()  # Track latency of action chunks
-        fps = cfg.fps
-        time_per_chunk = 1.0 / fps
-
-        dataset_features = hw_to_dataset_features(robot.observation_features(), "observation")
-        policy_device = policy.config.device
-
-        # Load preprocessor and postprocessor from pretrained files
-        logger.info(f"[GET_ACTIONS] Loading preprocessor/postprocessor from {cfg.policy.pretrained_path}")
-
-        preprocessor, postprocessor = make_pre_post_processors(
-            policy_cfg=cfg.policy,
-            pretrained_path=cfg.policy.pretrained_path,
-            dataset_stats=None,  # Will load from pretrained processor files
-            preprocessor_overrides={
-                "device_processor": {"device": cfg.policy.device},
-            },
-        )
-
-        logger.info("[GET_ACTIONS] Preprocessor/postprocessor loaded successfully with embedded stats")
-
-        get_actions_threshold = cfg.action_queue_size_to_get_new_actions
-
-        if not cfg.rtc.enabled:
-            get_actions_threshold = 0
-
-        inference_count = 0
-
-        while not shutdown_event.is_set():
-            if action_queue.qsize() <= get_actions_threshold:
-                with profiling_stats.timer("get_actions.total_iteration"):
-                    inference_count += 1
-                    logger.info(f"[GET_ACTIONS] Starting inference #{inference_count}")
-
-                    current_time = time.perf_counter()
-                    action_index_before_inference = action_queue.get_action_index()
-                    
-                    with profiling_stats.timer("get_actions.get_prev_actions"):
-                        prev_actions = action_queue.get_left_over()
-
-                    inference_latency = latency_tracker.max()
-                    inference_delay = math.ceil(inference_latency / time_per_chunk)
-
-                    # Get observation
-                    obs = robot.get_observation()
-
-                    # Apply robot observation processor
-                    with profiling_stats.timer("get_actions.robot_obs_processing"):
-                        obs_processed = robot_observation_processor(obs)
-
-                    # Build dataset frame
-                    with profiling_stats.timer("get_actions.build_dataset_frame"):
-                        obs_with_policy_features = build_dataset_frame(
-                            dataset_features, obs_processed, prefix="observation"
-                        )
-
-                    # Convert to tensors and normalize
-                    with profiling_stats.timer("get_actions.tensor_conversion"):
-                        for name in obs_with_policy_features:
-                            obs_with_policy_features[name] = torch.from_numpy(obs_with_policy_features[name])
-                            if "image" in name:
-                                obs_with_policy_features[name] = (
-                                    obs_with_policy_features[name].type(torch.float32) / 255
-                                )
-                                obs_with_policy_features[name] = (
-                                    obs_with_policy_features[name].permute(2, 0, 1).contiguous()
-                                )
-                            obs_with_policy_features[name] = obs_with_policy_features[name].unsqueeze(0)
-                            obs_with_policy_features[name] = obs_with_policy_features[name].to(policy_device)
-
-                        obs_with_policy_features["task"] = [cfg.task]
-                        obs_with_policy_features["robot_type"] = (
-                            robot.robot.name if hasattr(robot.robot, "name") else ""
-                        )
-
-                    # Preprocessing
-                    with profiling_stats.timer("get_actions.preprocessing"):
-                        preproceseded_obs = preprocessor(obs_with_policy_features)
-
-                    # Policy inference
-                    with profiling_stats.timer("get_actions.policy_inference"):
-                        actions = policy.predict_action_chunk(
-                            preproceseded_obs,
-                            inference_delay=inference_delay,
-                            prev_chunk_left_over=prev_actions,
-                        )
-
-                    # Clone for RTC
-                    with profiling_stats.timer("get_actions.clone_actions"):
-                        original_actions = actions.squeeze(0).clone()
-
-                    # Postprocessing
-                    with profiling_stats.timer("get_actions.postprocessing"):
-                        postprocessed_actions = postprocessor(actions)
-                        postprocessed_actions = postprocessed_actions.squeeze(0)
-
-                    # Update latency tracker
-                    new_latency = time.perf_counter() - current_time
-                    new_delay = math.ceil(new_latency / time_per_chunk)
-                    latency_tracker.add(new_latency)
-
-                    logger.info(
-                        f"[GET_ACTIONS] Inference #{inference_count} completed in {new_latency*1000:.2f}ms "
-                        f"(delay={new_delay} chunks)"
-                    )
-
-                    if cfg.action_queue_size_to_get_new_actions < cfg.rtc.execution_horizon + new_delay:
-                        logger.warning(
-                            "[GET_ACTIONS] cfg.action_queue_size_to_get_new_actions Too small, "
-                            "It should be higher than inference delay + execution horizon."
-                        )
-
-                    # Merge into action queue
-                    with profiling_stats.timer("get_actions.action_queue_merge"):
-                        action_queue.merge(
-                            original_actions, postprocessed_actions, new_delay, action_index_before_inference
-                        )
-            else:
-                # Small sleep to prevent busy waiting
-                time.sleep(0.1)
-
-        logger.info("[GET_ACTIONS] get actions thread shutting down")
-    except Exception as e:
-        logger.error(f"[GET_ACTIONS] Fatal exception in get_actions thread: {e}")
-        logger.error(traceback.format_exc())
-        sys.exit(1)
-
-
-def actor_control(
-    robot: RobotWrapper,
-    robot_action_processor,
-    action_queue: ActionQueue,
-    shutdown_event: Event,
-    cfg: RTCDemoConfig,
-):
-    """Thread function to execute actions on the robot with profiling.
-
-    Args:
-        robot: The robot instance
-        action_queue: Queue to get actions from
-        shutdown_event: Event to signal shutdown
-        cfg: Demo configuration
-    """
-    try:
-        logger.info("[ACTOR] Starting actor thread")
-
-        action_count = 0
-        action_interval = 1.0 / cfg.fps
-
-        while not shutdown_event.is_set():
-            start_time = time.perf_counter()
-
-            with profiling_stats.timer("actor.total_iteration"):
-                # Get action from queue
-                with profiling_stats.timer("actor.queue_get"):
-                    action = action_queue.get()
-
-                if action is not None:
-                    # Process action
-                    with profiling_stats.timer("actor.action_processing"):
-                        action = action.cpu()
-                        action_dict = {key: action[i].item() for i, key in enumerate(robot.action_features())}
-                        action_processed = robot_action_processor((action_dict, None))
-                    
-                    # Send to robot (includes robot.send_action timing)
-                    robot.send_action(action_processed)
-                    action_count += 1
-
-            # Sleep to maintain target FPS
-            dt_s = time.perf_counter() - start_time
-            sleep_time = max(0, (action_interval - dt_s) - 0.001)
-            if sleep_time > 0:
-                time.sleep(sleep_time)
-
-        logger.info(f"[ACTOR] Actor thread shutting down. Total actions executed: {action_count}")
-    except Exception as e:
-        logger.error(f"[ACTOR] Fatal exception in actor_control thread: {e}")
-        logger.error(traceback.format_exc())
-        sys.exit(1)
-
-
-def _apply_torch_compile(policy, cfg: RTCDemoConfig):
-    """Apply torch.compile to the policy's predict_action_chunk method.
-
-    Args:
-        policy: Policy instance to compile
-        cfg: Configuration containing torch compile settings
-
-    Returns:
-        Policy with compiled predict_action_chunk method
-    """
-
-    # PI models handle their own compilation
-    if policy.type == "pi05" or policy.type == "pi0":
-        return policy
-
-    try:
-        # Check if torch.compile is available (PyTorch 2.0+)
-        if not hasattr(torch, "compile"):
-            logger.warning(
-                f"torch.compile is not available. Requires PyTorch 2.0+. "
-                f"Current version: {torch.__version__}. Skipping compilation."
-            )
-            return policy
-
-        logger.info("Applying torch.compile to predict_action_chunk...")
-        logger.info(f"  Backend: {cfg.torch_compile_backend}")
-        logger.info(f"  Mode: {cfg.torch_compile_mode}")
-        logger.info(f"  Disable CUDA graphs: {cfg.torch_compile_disable_cudagraphs}")
-
-        # Compile the predict_action_chunk method
-        compile_kwargs = {
-            "backend": cfg.torch_compile_backend,
-            "mode": cfg.torch_compile_mode,
-        }
-
-        # Disable CUDA graphs if requested (prevents tensor aliasing issues)
-        if cfg.torch_compile_disable_cudagraphs:
-            compile_kwargs["options"] = {"triton.cudagraphs": False}
-
-        original_method = policy.predict_action_chunk
-        compiled_method = torch.compile(original_method, **compile_kwargs)
-        policy.predict_action_chunk = compiled_method
-        logger.info("✓ Successfully compiled predict_action_chunk")
-
-    except Exception as e:
-        logger.error(f"Failed to apply torch.compile: {e}")
-        logger.warning("Continuing without torch.compile")
-
-    return policy
-
-
-@parser.wrap()
-def demo_cli(cfg: RTCDemoConfig):
-    """Main entry point for RTC demo with profiling."""
-
-    # Initialize logging
-    init_logging()
-
-    logger.info(f"Using device: {cfg.device}")
-    logger.info("=" * 80)
-    logger.info("PROFILING MODE ENABLED")
-    logger.info("=" * 80)
-
-    # Setup signal handler for graceful shutdown
-    signal_handler = ProcessSignalHandler(use_threads=True, display_pid=False)
-    shutdown_event = signal_handler.shutdown_event
-
-    policy = None
-    robot = None
-    get_actions_thread = None
-    actor_thread = None
-
-    policy_class = get_policy_class(cfg.policy.type)
-
-    # Load config and set compile_model for pi0/pi05 models
-    config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
-
-    if cfg.policy.type == "pi05" or cfg.policy.type == "pi0":
-        config.compile_model = cfg.use_torch_compile
-
-    policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
-
-    # Turn on RTC
-    policy.config.rtc_config = cfg.rtc
-
-    # Init RTC processor
-    policy.init_rtc_processor()
-
-    assert policy.name in ["smolvla", "pi05", "pi0"], "Only smolvla, pi05, and pi0 are supported for RTC"
-
-    policy = policy.to(cfg.device)
-    policy.eval()
-
-    # Apply torch.compile to predict_action_chunk method if enabled
-    if cfg.use_torch_compile:
-        policy = _apply_torch_compile(policy, cfg)
-
-    # Create robot
-    logger.info(f"Initializing robot: {cfg.robot.type}")
-    robot = make_robot_from_config(cfg.robot)
-    robot.connect()
-    robot_wrapper = RobotWrapper(robot)
-
-    # Create robot observation processor
-    robot_observation_processor = make_default_robot_observation_processor()
-    robot_action_processor = make_default_robot_action_processor()
-
-    # Create action queue for communication between threads
-    action_queue = ActionQueue(cfg.rtc)
-
-    # Start chunk requester thread
-    get_actions_thread = Thread(
-        target=get_actions,
-        args=(policy, robot_wrapper, robot_observation_processor, action_queue, shutdown_event, cfg),
-        daemon=True,
-        name="GetActions",
-    )
-    get_actions_thread.start()
-    logger.info("Started get actions thread")
-
-    # Start action executor thread
-    actor_thread = Thread(
-        target=actor_control,
-        args=(robot_wrapper, robot_action_processor, action_queue, shutdown_event, cfg),
-        daemon=True,
-        name="Actor",
-    )
-    actor_thread.start()
-    logger.info("Started actor thread")
-
-    logger.info("Started stop by duration thread")
-
-    # Main thread monitors for duration or shutdown
-    logger.info(f"Running demo for {cfg.duration} seconds...")
-    start_time = time.time()
-
-    while not shutdown_event.is_set() and (time.time() - start_time) < cfg.duration:
-        time.sleep(10)
-
-        # Log queue status periodically
-        if int(time.time() - start_time) % 5 == 0:
-            logger.info(f"[MAIN] Action queue size: {action_queue.qsize()}")
-
-        if time.time() - start_time > cfg.duration:
-            break
-
-    logger.info("Demo duration reached or shutdown requested")
-
-    # Signal shutdown
-    shutdown_event.set()
-
-    # Wait for threads to finish
-    if get_actions_thread and get_actions_thread.is_alive():
-        logger.info("Waiting for chunk requester thread to finish...")
-        get_actions_thread.join()
-
-    if actor_thread and actor_thread.is_alive():
-        logger.info("Waiting for action executor thread to finish...")
-        actor_thread.join()
-
-    # Cleanup robot
-    if robot:
-        robot.disconnect()
-        logger.info("Robot disconnected")
-
-    # Print profiling summary
-    profiling_stats.print_summary()
-
-    logger.info("Cleanup completed")
-
-
-if __name__ == "__main__":
-    demo_cli()
-    logging.info("RTC demo finished")
-
@@ -1,358 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Comprehensive profiling script for Pi0 with RTC.
-
-This script demonstrates how to use all the profiling tools to identify
-bottlenecks in Pi0 policy inference with RTC enabled.
-
-It profiles:
-1. Overall inference time
-2. RTC-specific operations (guidance, weights, etc.)
-3. Preprocessing/postprocessing
-4. Individual method timings
-
-Usage:
-    uv run examples/rtc/profile_pi0_rtc_detailed.py \
-        --policy_path=helper2424/pi05_check_rtc \
-        --device=mps \
-        --num_iterations=20 \
-        --execution_horizon=20 \
-        --enable_rtc_profiling
-"""
-
-import argparse
-import logging
-import sys
-import time
-
-import numpy as np
-import torch
-
-from lerobot.configs.policies import PreTrainedConfig
-from lerobot.configs.types import RTCAttentionSchedule
-from lerobot.policies.factory import get_policy_class, make_pre_post_processors
-from lerobot.policies.rtc.configuration_rtc import RTCConfig
-from lerobot.utils.profiling import (
-    ProfileContext,
-    clear_profiling_stats,
-    enable_profiling,
-    get_profiling_stats,
-    print_profiling_summary,
-)
-
-# Import monkey patching for RTC profiling
-try:
-    from examples.rtc.add_rtc_profiling import monkey_patch_rtc_profiling
-except ImportError:
-    logging.warning("Could not import add_rtc_profiling, detailed RTC profiling disabled")
-    monkey_patch_rtc_profiling = None
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-
-def create_mock_observation(policy_config, device: str) -> dict:
-    """Create a mock observation matching policy requirements.
-    
-    Args:
-        policy_config: Policy configuration
-        device: Device to create tensors on
-        
-    Returns:
-        Mock observation dictionary
-    """
-    obs = {}
-    
-    # Create mock state observation
-    state_dim = 10  # Typical robot state dimension
-    obs["observation.state"] = torch.randn(1, state_dim, device=device)
-    
-    # Create mock images if needed
-    # For Pi0, we typically need at least one image
-    image_height = 224
-    image_width = 224
-    
-    # Common image keys for Pi0
-    image_keys = ["observation.images.gripper", "observation.images.front"]
-    
-    for key in image_keys:
-        # Images should be [B, C, H, W] and normalized to [0, 1]
-        obs[key] = torch.rand(1, 3, image_height, image_width, device=device)
-    
-    # Add task
-    obs["task"] = ["Pick up the object"]
-    
-    # Add language tokens and attention mask (required for Pi0)
-    # These are mock values - in real usage they come from tokenizer
-    max_seq_len = 32
-    obs["observation.language_tokens"] = torch.randint(0, 1000, (1, max_seq_len), device=device)
-    obs["observation.language_attention_mask"] = torch.ones(1, max_seq_len, device=device)
-    
-    return obs
-
-
-def profile_single_iteration(
-    policy,
-    preprocessor,
-    postprocessor,
-    observation: dict,
-    prev_actions: torch.Tensor | None,
-    use_rtc: bool,
-    inference_delay: int = 0,
-) -> tuple[torch.Tensor, torch.Tensor | None, dict]:
-    """Profile a single inference iteration.
-    
-    Args:
-        policy: Policy instance
-        preprocessor: Observation preprocessor
-        postprocessor: Action postprocessor
-        observation: Input observation
-        prev_actions: Previous action chunk (for RTC)
-        use_rtc: Whether RTC is enabled
-        inference_delay: Inference delay in timesteps
-        
-    Returns:
-        Tuple of (actions, new_prev_actions, timings)
-    """
-    timings = {}
-    
-    with ProfileContext("iteration.total"):
-        # Preprocessing
-        with ProfileContext("iteration.preprocessing"):
-            preprocessed_obs = preprocessor(observation)
-        
-        # Policy inference
-        with ProfileContext("iteration.policy_inference"):
-            if use_rtc:
-                actions = policy.predict_action_chunk(
-                    preprocessed_obs,
-                    inference_delay=inference_delay,
-                    prev_chunk_left_over=prev_actions,
-                )
-            else:
-                actions = policy.predict_action_chunk(preprocessed_obs)
-        
-        # Clone for next iteration (if RTC)
-        new_prev_actions = None
-        if use_rtc:
-            with ProfileContext("iteration.prepare_prev_actions"):
-                execution_horizon = policy.config.rtc_config.execution_horizon
-                if actions.shape[1] > execution_horizon:
-                    new_prev_actions = actions[:, execution_horizon:].clone()
-        
-        # Postprocessing
-        with ProfileContext("iteration.postprocessing"):
-            processed_actions = postprocessor(actions)
-    
-    return processed_actions, new_prev_actions, timings
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Detailed profiling for Pi0 with RTC")
-    parser.add_argument("--policy_path", type=str, required=True, help="Path to pretrained policy")
-    parser.add_argument("--device", type=str, default="cuda", help="Device (cuda/cpu/mps)")
-    parser.add_argument("--num_iterations", type=int, default=20, help="Number of iterations")
-    parser.add_argument("--execution_horizon", type=int, default=10, help="RTC execution horizon")
-    parser.add_argument("--warmup_iterations", type=int, default=5, help="Warmup iterations")
-    parser.add_argument("--enable_rtc_profiling", action="store_true", help="Enable detailed RTC profiling")
-    parser.add_argument("--use_torch_compile", action="store_true", help="Use torch.compile")
-    
-    args = parser.parse_args()
-    
-    logger.info("="*80)
-    logger.info("DETAILED PI0 RTC PROFILING")
-    logger.info("="*80)
-    logger.info(f"Policy: {args.policy_path}")
-    logger.info(f"Device: {args.device}")
-    logger.info(f"Iterations: {args.num_iterations}")
-    logger.info(f"Execution Horizon: {args.execution_horizon}")
-    logger.info(f"RTC Profiling: {args.enable_rtc_profiling}")
-    logger.info("="*80 + "\n")
-    
-    # Enable profiling
-    enable_profiling()
-    
-    # Apply RTC profiling if requested
-    if args.enable_rtc_profiling:
-        if monkey_patch_rtc_profiling is not None:
-            monkey_patch_rtc_profiling()
-            logger.info("✓ Detailed RTC profiling enabled\n")
-        else:
-            logger.warning("⚠ Could not enable detailed RTC profiling\n")
-    
-    # Load policy
-    logger.info("Loading policy...")
-    config = PreTrainedConfig.from_pretrained(args.policy_path)
-    
-    if hasattr(config, "compile_model"):
-        config.compile_model = args.use_torch_compile
-    
-    policy_class = get_policy_class(config.type)
-    policy = policy_class.from_pretrained(args.policy_path, config=config)
-    
-    # Configure RTC
-    policy.config.rtc_config = RTCConfig(
-        enabled=True,
-        execution_horizon=args.execution_horizon,
-        max_guidance_weight=1.0,
-        prefix_attention_schedule=RTCAttentionSchedule.EXP,
-    )
-    policy.init_rtc_processor()
-    
-    policy = policy.to(args.device)
-    policy.eval()
-    
-    logger.info(f"✓ Policy loaded: {config.type}\n")
-    
-    # Create preprocessor and postprocessor
-    logger.info("Loading preprocessor/postprocessor...")
-    preprocessor, postprocessor = make_pre_post_processors(
-        policy_cfg=config,
-        pretrained_path=args.policy_path,
-        dataset_stats=None,
-        preprocessor_overrides={
-            "device_processor": {"device": args.device},
-        },
-    )
-    logger.info("✓ Preprocessor/postprocessor loaded\n")
-    
-    # Create mock observation
-    logger.info("Creating mock observation...")
-    observation = create_mock_observation(config, args.device)
-    logger.info("✓ Mock observation created\n")
-    
-    # Warmup
-    logger.info(f"Warming up ({args.warmup_iterations} iterations)...")
-    prev_actions = None
-    for i in range(args.warmup_iterations):
-        with torch.no_grad():
-            _, prev_actions, _ = profile_single_iteration(
-                policy=policy,
-                preprocessor=preprocessor,
-                postprocessor=postprocessor,
-                observation=observation,
-                prev_actions=prev_actions,
-                use_rtc=True,
-                inference_delay=0,
-            )
-    
-    # Clear warmup stats
-    clear_profiling_stats()
-    logger.info("✓ Warmup complete\n")
-    
-    # Profiled run WITH RTC
-    logger.info(f"Running profiled iterations WITH RTC ({args.num_iterations} iterations)...")
-    prev_actions = None
-    iteration_times = []
-    
-    for i in range(args.num_iterations):
-        start = time.perf_counter()
-        
-        with torch.no_grad():
-            _, prev_actions, _ = profile_single_iteration(
-                policy=policy,
-                preprocessor=preprocessor,
-                postprocessor=postprocessor,
-                observation=observation,
-                prev_actions=prev_actions,
-                use_rtc=True,
-                inference_delay=0,
-            )
-        
-        # Sync CUDA if needed
-        if args.device.startswith("cuda"):
-            torch.cuda.synchronize()
-        
-        elapsed = time.perf_counter() - start
-        iteration_times.append(elapsed)
-        
-        if (i + 1) % 5 == 0:
-            logger.info(f"  Completed {i+1}/{args.num_iterations}")
-    
-    logger.info("✓ Profiling complete\n")
-    
-    # Print summary statistics
-    logger.info("\n" + "="*80)
-    logger.info("ITERATION TIMING SUMMARY")
-    logger.info("="*80)
-    
-    times_arr = np.array(iteration_times)
-    logger.info(f"Mean time:       {np.mean(times_arr)*1000:.2f} ms")
-    logger.info(f"Median time:     {np.median(times_arr)*1000:.2f} ms")
-    logger.info(f"Std dev:         {np.std(times_arr)*1000:.2f} ms")
-    logger.info(f"Min time:        {np.min(times_arr)*1000:.2f} ms")
-    logger.info(f"Max time:        {np.max(times_arr)*1000:.2f} ms")
-    logger.info(f"Total time:      {np.sum(times_arr):.2f} s")
-    logger.info(f"Throughput:      {len(times_arr)/np.sum(times_arr):.2f} iter/s")
-    logger.info("="*80 + "\n")
-    
-    # Print detailed profiling breakdown
-    print_profiling_summary(sort_by="total")
-    
-    # Print key insights
-    stats = get_profiling_stats()
-    
-    logger.info("\n" + "="*80)
-    logger.info("KEY INSIGHTS")
-    logger.info("="*80)
-    
-    # Find bottlenecks
-    if stats:
-        policy_inference_time = stats.get("iteration.policy_inference", {}).get("mean", 0)
-        preprocessing_time = stats.get("iteration.preprocessing", {}).get("mean", 0)
-        postprocessing_time = stats.get("iteration.postprocessing", {}).get("mean", 0)
-        
-        total_time = policy_inference_time + preprocessing_time + postprocessing_time
-        
-        if total_time > 0:
-            logger.info(f"\nTime breakdown:")
-            logger.info(f"  Policy inference:  {policy_inference_time*1000:.2f} ms ({policy_inference_time/total_time*100:.1f}%)")
-            logger.info(f"  Preprocessing:     {preprocessing_time*1000:.2f} ms ({preprocessing_time/total_time*100:.1f}%)")
-            logger.info(f"  Postprocessing:    {postprocessing_time*1000:.2f} ms ({postprocessing_time/total_time*100:.1f}%)")
-        
-        # RTC-specific insights
-        if args.enable_rtc_profiling:
-            rtc_guidance = stats.get("rtc.denoise_step.guidance_computation", {}).get("mean", 0)
-            rtc_autograd = stats.get("rtc.denoise_step.autograd_correction", {}).get("mean", 0)
-            rtc_base = stats.get("rtc.denoise_step.base_denoising", {}).get("mean", 0)
-            
-            if rtc_guidance > 0:
-                logger.info(f"\nRTC breakdown:")
-                logger.info(f"  Base denoising:    {rtc_base*1000:.2f} ms")
-                logger.info(f"  Guidance compute:  {rtc_guidance*1000:.2f} ms")
-                logger.info(f"  Autograd correct:  {rtc_autograd*1000:.2f} ms")
-                logger.info(f"  RTC overhead:      {(rtc_guidance - rtc_base)*1000:.2f} ms")
-        
-        # Recommendations
-        logger.info("\nRecommendations:")
-        
-        if preprocessing_time > policy_inference_time * 0.3:
-            logger.info("  ⚠ Preprocessing is taking >30% of time")
-            logger.info("    → Consider reducing image resolution")
-            logger.info("    → Consider using fewer cameras")
-        
-        if args.enable_rtc_profiling and rtc_autograd > rtc_base * 0.5:
-            logger.info("  ⚠ RTC autograd overhead is significant")
-            logger.info("    → This is expected, but consider increasing execution_horizon")
-            logger.info("    → Try torch.compile if not already enabled")
-        
-        if not args.use_torch_compile:
-            logger.info("  💡 torch.compile not enabled")
-            logger.info("    → Try --use_torch_compile for potential speedup")
-    
-    logger.info("="*80 + "\n")
-
-
-if __name__ == "__main__":
-    try:
-        main()
-    except KeyboardInterrupt:
-        logger.info("\n\nProfiling interrupted by user")
-        sys.exit(0)
-    except Exception as e:
-        logger.error(f"\n\nError during profiling: {e}")
-        import traceback
-        traceback.print_exc()
-        sys.exit(1)
-
@@ -1,347 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Script to compare performance with and without RTC enabled.
-
-This script helps identify whether RTC is actually improving or degrading performance
-by running multiple inference passes and collecting detailed timing statistics.
-
-Usage:
-    # Profile with mock data (no robot needed)
-    uv run examples/rtc/profile_rtc_comparison.py \
-        --policy_path=helper2424/pi05_check_rtc \
-        --device=mps \
-        --num_iterations=50
-
-    # Profile with specific RTC config
-    uv run examples/rtc/profile_rtc_comparison.py \
-        --policy_path=helper2424/pi05_check_rtc \
-        --device=mps \
-        --num_iterations=50 \
-        --execution_horizon=20
-"""
-
-import argparse
-import logging
-import time
-from dataclasses import dataclass
-
-import numpy as np
-import torch
-
-from lerobot.configs.policies import PreTrainedConfig
-from lerobot.configs.types import RTCAttentionSchedule
-from lerobot.policies.factory import get_policy_class, make_pre_post_processors
-from lerobot.policies.rtc.configuration_rtc import RTCConfig
-from lerobot.utils.profiling import (
-    clear_profiling_stats,
-    enable_profiling,
-    get_profiling_stats,
-    print_profiling_summary,
-)
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-
-@dataclass
-class ProfileResults:
-    """Results from profiling run."""
-
-    mode: str  # "with_rtc" or "without_rtc"
-    mean_time: float
-    std_time: float
-    min_time: float
-    max_time: float
-    times: list[float]
-    throughput: float  # iterations per second
-
-
-def create_mock_observation(policy, device: str) -> dict:
-    """Create a mock observation for testing.
-
-    Args:
-        policy: Policy instance
-        device: Device to create tensors on
-
-    Returns:
-        Mock observation dictionary
-    """
-    # Get expected input shapes from policy config
-    # This is a simplified version - adjust based on actual policy requirements
-    obs = {}
-
-    # Mock image observations (if needed)
-    if hasattr(policy.config, "input_shapes"):
-        for key, shape in policy.config.input_shapes.items():
-            if "image" in key:
-                # Typical image shape: (batch, channels, height, width)
-                obs[key] = torch.randn(1, *shape, device=device)
-            else:
-                obs[key] = torch.randn(1, *shape, device=device)
-
-    # Add task if needed
-    if "task" in policy.config.__dict__ or hasattr(policy, "accepts_task"):
-        obs["task"] = ["Pick up the object"]
-
-    # Mock state observation
-    obs["observation.state"] = torch.randn(1, 10, device=device)  # Adjust size as needed
-
-    return obs
-
-
-def profile_inference(
-    policy, observation: dict, num_iterations: int, use_rtc: bool, execution_horizon: int = 10
-) -> ProfileResults:
-    """Profile policy inference with or without RTC.
-
-    Args:
-        policy: Policy instance
-        observation: Observation dictionary
-        num_iterations: Number of inference iterations to run
-        use_rtc: Whether to enable RTC
-        execution_horizon: Execution horizon for RTC
-
-    Returns:
-        ProfileResults with timing statistics
-    """
-    mode = "with_rtc" if use_rtc else "without_rtc"
-    logger.info(f"\n{'='*80}")
-    logger.info(f"Profiling: {mode.upper()}")
-    logger.info(f"{'='*80}")
-
-    # Configure RTC
-    if use_rtc:
-        policy.config.rtc_config.enabled = True
-        policy.config.rtc_config.execution_horizon = execution_horizon
-        policy.init_rtc_processor()
-    else:
-        policy.config.rtc_config.enabled = False
-
-    times = []
-    prev_actions = None
-
-    # Warmup
-    logger.info("Warming up (5 iterations)...")
-    for _ in range(5):
-        with torch.no_grad():
-            if use_rtc:
-                _ = policy.predict_action_chunk(
-                    observation, inference_delay=0, prev_chunk_left_over=prev_actions
-                )
-            else:
-                _ = policy.predict_action_chunk(observation)
-
-    # Actual profiling
-    logger.info(f"Running {num_iterations} profiled iterations...")
-    for i in range(num_iterations):
-        start = time.perf_counter()
-
-        with torch.no_grad():
-            if use_rtc:
-                actions = policy.predict_action_chunk(
-                    observation, inference_delay=0, prev_chunk_left_over=prev_actions
-                )
-                # Simulate consuming some actions for next iteration
-                if actions.shape[1] > execution_horizon:
-                    prev_actions = actions[:, execution_horizon:].clone()
-                else:
-                    prev_actions = None
-            else:
-                actions = policy.predict_action_chunk(observation)
-
-        # Synchronize if using CUDA
-        if observation["observation.state"].device.type == "cuda":
-            torch.cuda.synchronize()
-
-        elapsed = time.perf_counter() - start
-        times.append(elapsed)
-
-        if (i + 1) % 10 == 0:
-            logger.info(f"  Completed {i+1}/{num_iterations} iterations")
-
-    # Calculate statistics
-    times_arr = np.array(times)
-    results = ProfileResults(
-        mode=mode,
-        mean_time=float(np.mean(times_arr)),
-        std_time=float(np.std(times_arr)),
-        min_time=float(np.min(times_arr)),
-        max_time=float(np.max(times_arr)),
-        times=times,
-        throughput=num_iterations / sum(times),
-    )
-
-    logger.info(f"\nResults for {mode}:")
-    logger.info(f"  Mean time: {results.mean_time*1000:.2f} ms")
-    logger.info(f"  Std dev:   {results.std_time*1000:.2f} ms")
-    logger.info(f"  Min time:  {results.min_time*1000:.2f} ms")
-    logger.info(f"  Max time:  {results.max_time*1000:.2f} ms")
-    logger.info(f"  Throughput: {results.throughput:.2f} iter/s")
-
-    return results
-
-
-def compare_results(results_without_rtc: ProfileResults, results_with_rtc: ProfileResults):
-    """Compare and print results from both runs.
-
-    Args:
-        results_without_rtc: Results from run without RTC
-        results_with_rtc: Results from run with RTC
-    """
-    logger.info(f"\n{'='*80}")
-    logger.info("COMPARISON SUMMARY")
-    logger.info(f"{'='*80}")
-
-    mean_diff = results_with_rtc.mean_time - results_without_rtc.mean_time
-    mean_diff_pct = (mean_diff / results_without_rtc.mean_time) * 100
-
-    throughput_diff = results_with_rtc.throughput - results_without_rtc.throughput
-    throughput_diff_pct = (throughput_diff / results_without_rtc.throughput) * 100
-
-    logger.info(f"\n{'Metric':<30} {'Without RTC':>15} {'With RTC':>15} {'Difference':>15}")
-    logger.info("-" * 80)
-    logger.info(
-        f"{'Mean time (ms)':<30} "
-        f"{results_without_rtc.mean_time*1000:>15.2f} "
-        f"{results_with_rtc.mean_time*1000:>15.2f} "
-        f"{mean_diff*1000:>+15.2f}"
-    )
-    logger.info(
-        f"{'Std dev (ms)':<30} "
-        f"{results_without_rtc.std_time*1000:>15.2f} "
-        f"{results_with_rtc.std_time*1000:>15.2f} "
-        f"{(results_with_rtc.std_time - results_without_rtc.std_time)*1000:>+15.2f}"
-    )
-    logger.info(
-        f"{'Min time (ms)':<30} "
-        f"{results_without_rtc.min_time*1000:>15.2f} "
-        f"{results_with_rtc.min_time*1000:>15.2f} "
-        f"{(results_with_rtc.min_time - results_without_rtc.min_time)*1000:>+15.2f}"
-    )
-    logger.info(
-        f"{'Max time (ms)':<30} "
-        f"{results_without_rtc.max_time*1000:>15.2f} "
-        f"{results_with_rtc.max_time*1000:>15.2f} "
-        f"{(results_with_rtc.max_time - results_without_rtc.max_time)*1000:>+15.2f}"
-    )
-    logger.info(
-        f"{'Throughput (iter/s)':<30} "
-        f"{results_without_rtc.throughput:>15.2f} "
-        f"{results_with_rtc.throughput:>15.2f} "
-        f"{throughput_diff:>+15.2f}"
-    )
-
-    logger.info(f"\n{'='*80}")
-    logger.info("VERDICT")
-    logger.info(f"{'='*80}")
-
-    if mean_diff_pct < -5:
-        logger.info(f"✓ RTC is FASTER by {abs(mean_diff_pct):.1f}%")
-        logger.info(f"  Mean time reduced by {abs(mean_diff)*1000:.2f} ms")
-    elif mean_diff_pct > 5:
-        logger.info(f"✗ RTC is SLOWER by {mean_diff_pct:.1f}%")
-        logger.info(f"  Mean time increased by {mean_diff*1000:.2f} ms")
-        logger.info("\n  Possible reasons:")
-        logger.info("  - RTC overhead exceeds benefits at current execution horizon")
-        logger.info("  - Inference delay calculation not accounting for RTC processing")
-        logger.info("  - Additional tensor operations in RTC guidance")
-    else:
-        logger.info(f"≈ Performance is SIMILAR (difference: {mean_diff_pct:+.1f}%)")
-
-    logger.info(f"{'='*80}\n")
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Profile RTC performance")
-    parser.add_argument(
-        "--policy_path", type=str, required=True, help="Path to pretrained policy"
-    )
-    parser.add_argument(
-        "--device", type=str, default="cuda", help="Device to run on (cuda/cpu/mps)"
-    )
-    parser.add_argument(
-        "--num_iterations", type=int, default=50, help="Number of inference iterations"
-    )
-    parser.add_argument(
-        "--execution_horizon", type=int, default=10, help="RTC execution horizon"
-    )
-    parser.add_argument(
-        "--enable_detailed_profiling",
-        action="store_true",
-        help="Enable detailed method-level profiling",
-    )
-    parser.add_argument(
-        "--use_torch_compile", action="store_true", help="Use torch.compile for faster inference"
-    )
-
-    args = parser.parse_args()
-
-    # Load policy
-    logger.info(f"Loading policy from {args.policy_path}")
-    config = PreTrainedConfig.from_pretrained(args.policy_path)
-    policy_class = get_policy_class(config.type)
-
-    # Set compile flag if needed
-    if hasattr(config, "compile_model"):
-        config.compile_model = args.use_torch_compile
-
-    policy = policy_class.from_pretrained(args.policy_path, config=config)
-
-    # Initialize RTC config
-    policy.config.rtc_config = RTCConfig(
-        execution_horizon=args.execution_horizon,
-        max_guidance_weight=1.0,
-        prefix_attention_schedule=RTCAttentionSchedule.EXP,
-    )
-
-    policy = policy.to(args.device)
-    policy.eval()
-
-    logger.info(f"Policy loaded: {config.type}")
-    logger.info(f"Device: {args.device}")
-    logger.info(f"Execution horizon: {args.execution_horizon}")
-
-    # Create mock observation
-    logger.info("Creating mock observation...")
-    observation = create_mock_observation(policy, args.device)
-
-    # Enable detailed profiling if requested
-    if args.enable_detailed_profiling:
-        enable_profiling()
-        logger.info("Detailed profiling enabled")
-
-    # Profile without RTC
-    results_without_rtc = profile_inference(
-        policy=policy,
-        observation=observation,
-        num_iterations=args.num_iterations,
-        use_rtc=False,
-        execution_horizon=args.execution_horizon,
-    )
-
-    if args.enable_detailed_profiling:
-        logger.info("\nDetailed profiling stats (WITHOUT RTC):")
-        print_profiling_summary()
-        clear_profiling_stats()
-
-    # Profile with RTC
-    results_with_rtc = profile_inference(
-        policy=policy,
-        observation=observation,
-        num_iterations=args.num_iterations,
-        use_rtc=True,
-        execution_horizon=args.execution_horizon,
-    )
-
-    if args.enable_detailed_profiling:
-        logger.info("\nDetailed profiling stats (WITH RTC):")
-        print_profiling_summary()
-
-    # Compare results
-    compare_results(results_without_rtc, results_with_rtc)
-
-
-if __name__ == "__main__":
-    main()
-
@@ -98,7 +98,6 @@ pygame-dep = ["pygame>=2.5.1,<2.7.0"]
 placo-dep = ["placo>=0.9.6,<0.10.0"]
 transformers-dep = ["transformers>=4.53.0,<5.0.0"]
 grpcio-dep = ["grpcio==1.73.1", "protobuf==6.31.0"] # TODO: Bumb dependency (compatible with wandb)
-matplotlib-dep = ["matplotlib>=3.10.3,<4.0.0"]

 # Motors
 feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0"]
@@ -133,7 +132,7 @@ groot = [
 hilserl = ["lerobot[transformers-dep]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"]

 # Features
-async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"]
+async = ["lerobot[grpcio-dep]", "matplotlib>=3.10.3,<4.0.0"]

 # Development
 dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1"]
@@ -1,49 +1,38 @@
-# Real-Time Chunking (RTC) Module
+# Real-Time Chunking (RTC)

-This module implements Real-Time Chunking and related adaptive inference techniques for robotics policies in LeRobot.
+This module contains the LeRobot implementation of **Real-Time Chunking (RTC)**, an inference-time technique for flow-matching based policies.

-## Overview
+**Note**: RTC is not a policy itself, but rather an inference enhancement that works with flow-matching based policies including [π₀](../pi0/), [π₀.₅](../pi05/), and [SmolVLA](../smolvla/).

-Real-Time Chunking (RTC) addresses the challenge of real-time inference in action chunking policies by treating chunk generation as an inpainting problem. It strategically handles overlapping timesteps between action chunks using prefix attention mechanisms.
+---

-It is particularly effective for handling long-horizon inference in robotics policies.
+## Citation

-## Integration with Policies
+If you use Real-Time Chunking in your work, please cite:

-RTC can be integrated with any policy that supports flow mathicng for chunking:
+```bibtex
+@misc{openpi2024,
+  author       = {Physical Intelligence Lab},
+  title        = {OpenPI: PyTorch Implementation of π0 and π0.5 Policies},
+  year         = {2024},
+  publisher    = {GitHub},
+  howpublished = {\url{https://github.com/Physical-Intelligence/openpi}},
+  license      = {Apache-2.0}
+}

- **SmolVLA**: Vision-language-action model with RTC support
- **Pi0**: Action prediction model with adaptive chunking
- **Pi05**: Action prediction model with adaptive chunking
-
-## Original Implementation
-
-This implementation is based on Physical Intelligence's Kinetix RTC:
-
- [Original RTC implementation](https://github.com/Physical-Intelligence/real-time-chunking-kinetix/blob/main/src/model.py#L214)
- [Kinetix GitHub Repository](https://github.com/Physical-Intelligence/real-time-chunking-kinetix)
-
-## References
-
- [Real Time Chunking Paper](https://www.physicalintelligence.company/research/real_time_chunking)
- [Physical Intelligence Kinetix](https://github.com/Physical-Intelligence/real-time-chunking-kinetix)
-
-## How to run
-
-### Check with data from the dataset
-
-```bash
-uv run python examples/rtc/eval_dataset.py \
--policy.path=helper2424/smolvla_check_rtc_last3 \
--dataset.repo_id=helper2424/check_rtc \
--rtc.execution_horizon=8 \
--device=mps \
--seed=42
+@misc{black2025realtimeexecutionactionchunking,
+      title={Real-Time Execution of Action Chunking Flow Policies},
+      author={Kevin Black and Manuel Y. Galliker and Sergey Levine},
+      year={2025},
+      eprint={2506.07339},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2506.07339},
+}
 ```

-This script will evaluate RTC on a data from a dataset and save the results to a file, u can check the results in the `rtc_debug_output` directory.
+---

-The example output should look like this:
-![Flow Matching with RTC](./flow_matching.png)
+## License

-It shows how flow matching works with RTC and without it. The chart shows values of action predictions for each timestep. The colour shows the the generation progress. The blue ones - earlier timesteps, the yellow ones - later timesteps. The red line is the ground truth (previous action chunk).
+This implementation follows the **Apache 2.0 License**, consistent with the LeRobot project.
@@ -111,7 +111,3 @@ class RTCDebugVisualizer:
            if not ax.yaxis.get_label().get_text():
                ax.set_ylabel(f"Dim {dim_idx}", fontsize=10)
            ax.grid(True, alpha=0.3)
-
-            # Add legend if label provided and this is the first dimension
-            if label and dim_idx == 0:
-                ax.legend(loc="best", fontsize=8)
@@ -1,206 +0,0 @@
-"""
-Profiling utilities for performance analysis.
-
-Usage:
-    from lerobot.utils.profiling import profile_method, get_profiling_stats, print_profiling_summary
-
-    @profile_method
-    def my_slow_function(x):
-        return x * 2
-
-    # At end of execution:
-    print_profiling_summary()
-"""
-
-import functools
-import logging
-import time
-from collections import defaultdict
-from threading import Lock
-from typing import Any, Callable
-
-logger = logging.getLogger(__name__)
-
-# Global profiling statistics storage
-_profiling_stats: dict[str, list[float]] = defaultdict(list)
-_profiling_lock = Lock()
-_profiling_enabled = False
-
-
-def enable_profiling():
-    """Enable profiling globally."""
-    global _profiling_enabled
-    _profiling_enabled = True
-    logger.info("Profiling enabled")
-
-
-def disable_profiling():
-    """Disable profiling globally."""
-    global _profiling_enabled
-    _profiling_enabled = False
-    logger.info("Profiling disabled")
-
-
-def is_profiling_enabled() -> bool:
-    """Check if profiling is enabled."""
-    return _profiling_enabled
-
-
-def record_timing(name: str, duration: float):
-    """Record a timing measurement.
-
-    Args:
-        name: Name/identifier for this timing
-        duration: Duration in seconds
-    """
-    if not _profiling_enabled:
-        return
-
-    with _profiling_lock:
-        _profiling_stats[name].append(duration)
-
-
-def profile_method(func: Callable) -> Callable:
-    """Decorator to profile a method or function.
-
-    Args:
-        func: Function to profile
-
-    Returns:
-        Wrapped function that records execution time
-    """
-
-    @functools.wraps(func)
-    def wrapper(*args, **kwargs) -> Any:
-        if not _profiling_enabled:
-            return func(*args, **kwargs)
-
-        start = time.perf_counter()
-        try:
-            result = func(*args, **kwargs)
-            return result
-        finally:
-            duration = time.perf_counter() - start
-            # Use fully qualified name
-            name = f"{func.__module__}.{func.__qualname__}"
-            record_timing(name, duration)
-
-    return wrapper
-
-
-class ProfileContext:
-    """Context manager for profiling code blocks.
-
-    Usage:
-        with ProfileContext("my_operation"):
-            # ... code to profile ...
-    """
-
-    def __init__(self, name: str):
-        self.name = name
-        self.start = None
-
-    def __enter__(self):
-        if _profiling_enabled:
-            self.start = time.perf_counter()
-        return self
-
-    def __exit__(self, *args):
-        if _profiling_enabled and self.start is not None:
-            duration = time.perf_counter() - self.start
-            record_timing(self.name, duration)
-
-
-def get_profiling_stats() -> dict[str, dict[str, float]]:
-    """Get summary statistics for all profiled functions.
-
-    Returns:
-        Dictionary mapping function names to their stats (count, mean, min, max, total)
-    """
-    with _profiling_lock:
-        summary = {}
-        for name, times in _profiling_stats.items():
-            if times:
-                summary[name] = {
-                    "count": len(times),
-                    "mean": sum(times) / len(times),
-                    "min": min(times),
-                    "max": max(times),
-                    "total": sum(times),
-                    "mean_ms": (sum(times) / len(times)) * 1000,
-                    "min_ms": min(times) * 1000,
-                    "max_ms": max(times) * 1000,
-                }
-        return summary
-
-
-def clear_profiling_stats():
-    """Clear all profiling statistics."""
-    with _profiling_lock:
-        _profiling_stats.clear()
-    logger.info("Profiling stats cleared")
-
-
-def print_profiling_summary(sort_by: str = "total"):
-    """Print formatted summary of profiling statistics.
-
-    Args:
-        sort_by: Sort key ('total', 'mean', 'count', 'max')
-    """
-    summary = get_profiling_stats()
-
-    if not summary:
-        logger.info("No profiling data available")
-        return
-
-    logger.info("\n" + "=" * 100)
-    logger.info("PROFILING SUMMARY")
-    logger.info("=" * 100)
-
-    # Sort by requested key
-    sorted_items = sorted(summary.items(), key=lambda x: x[1].get(sort_by, 0), reverse=True)
-
-    # Print header
-    logger.info(
-        f"{'Function':<60} {'Count':>8} {'Mean (ms)':>12} {'Min (ms)':>12} {'Max (ms)':>12} {'Total (s)':>12}"
-    )
-    logger.info("-" * 100)
-
-    # Print each function's stats
-    for name, stats in sorted_items:
-        # Shorten long names
-        display_name = name if len(name) <= 60 else "..." + name[-57:]
-
-        logger.info(
-            f"{display_name:<60} "
-            f"{stats['count']:>8} "
-            f"{stats['mean_ms']:>12.2f} "
-            f"{stats['min_ms']:>12.2f} "
-            f"{stats['max_ms']:>12.2f} "
-            f"{stats['total']:>12.2f}"
-        )
-
-    logger.info("=" * 100)
-
-    # Print summary
-    total_time = sum(s["total"] for s in summary.values())
-    total_calls = sum(s["count"] for s in summary.values())
-    logger.info(f"\nTotal profiled time: {total_time:.2f}s across {total_calls} calls")
-    logger.info("=" * 100 + "\n")
-
-
-def profile_section(name: str):
-    """Return a context manager for profiling a code section.
-
-    Args:
-        name: Name for this section
-
-    Returns:
-        ProfileContext instance
-
-    Usage:
-        with profile_section("data_loading"):
-            data = load_data()
-    """
-    return ProfileContext(name)
-
@@ -23,13 +23,15 @@ from lerobot.configs.types import FeatureType, PolicyFeature, RTCAttentionSchedu
 from lerobot.policies.factory import make_pre_post_processors  # noqa: E402
 from lerobot.policies.rtc.configuration_rtc import RTCConfig  # noqa: E402
 from lerobot.policies.smolvla.configuration_smolvla import SmolVLAConfig  # noqa: F401
-from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
 from lerobot.utils.random_utils import set_seed  # noqa: E402
-from tests.utils import require_cuda  # noqa: E402
+from tests.utils import require_cuda, require_package  # noqa: E402


+@require_package("transformers")
@require_cuda
 def test_smolvla_rtc_initialization():
+    from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
+
    """Test SmolVLA policy can initialize RTC processor."""
    set_seed(42)

@@ -63,8 +65,11 @@ def test_smolvla_rtc_initialization():
    print("✓ SmolVLA RTC initialization: Test passed")


+@require_package("transformers")
@require_cuda
 def test_smolvla_rtc_initialization_without_rtc_config():
+    from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
+
    """Test SmolVLA policy can initialize without RTC config."""
    set_seed(42)

@@ -82,9 +87,12 @@ def test_smolvla_rtc_initialization_without_rtc_config():
    print("✓ SmolVLA RTC initialization without RTC config: Test passed")


+@require_package("transformers")
@require_cuda
@pytest.mark.skipif(True, reason="Requires pretrained SmolVLA model weights")
 def test_smolvla_rtc_inference_with_prev_chunk():
+    from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
+
    """Test SmolVLA policy inference with RTC and previous chunk."""
    set_seed(42)

@@ -162,9 +170,12 @@ def test_smolvla_rtc_inference_with_prev_chunk():
    print("✓ SmolVLA RTC inference with prev_chunk: Test passed")


+@require_package("transformers")
@require_cuda
@pytest.mark.skipif(True, reason="Requires pretrained SmolVLA model weights")
 def test_smolvla_rtc_inference_without_prev_chunk():
+    from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
+
    """Test SmolVLA policy inference with RTC but no previous chunk (RTC should have no effect)."""
    set_seed(42)

@@ -233,9 +244,12 @@ def test_smolvla_rtc_inference_without_prev_chunk():
    print("✓ SmolVLA RTC inference without prev_chunk: Test passed")


+@require_package("transformers")
@require_cuda
@pytest.mark.skipif(True, reason="Requires pretrained SmolVLA model weights")
 def test_smolvla_rtc_validation_rules():
+    from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy  # noqa: F401
+
    """Test SmolVLA policy with RTC follows all three validation rules."""
    set_seed(42)
Author	SHA1	Message	Date
Michel Aractingi	ea87324725	enhance doc and add images	2025-11-19 10:03:15 +01:00
Michel Aractingi	611159f8bb	add docs for rtc	2025-11-19 10:03:13 +01:00
Eugene Mironov	b7b0ac2456	fixup! Fix tests	2025-11-19 12:04:00 +07:00
Eugene Mironov	59a52e557c	Fix tests	2025-11-19 03:10:27 +07:00
Eugene Mironov	8008dbb02c	Update images	2025-11-19 03:02:26 +07:00
Eugene Mironov	045f9c02f7	Extract simulator logic from eval_with real robot and add proper headers to files	2025-11-18 21:30:02 +07:00
Eugene Mironov	8eb28fd653	fixup! fixup! Fixup eval with real robot	2025-11-18 21:30:02 +07:00
Eugene Mironov	32f4336467	fixup! Fixup eval with real robot	2025-11-18 21:30:02 +07:00
Eugene Mironov	eb29f12ce2	Fixup eval with real robot	2025-11-18 21:30:02 +07:00
Eugene Mironov	9a38c5f4d2	fixup! Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization	2025-11-18 21:30:02 +07:00
Eugene Mironov	5ff66e498f	Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	dfa1e76082	Fix SmolVLA init_rtc_processor to use getattr instead of direct model access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	5fd1d8bce9	Fix PI0.5 init_rtc_processor to use getattr instead of direct model access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	4bacf70782	Add RTC initialization tests without config for PI0.5 and SmolVLA Add test_pi05_rtc_initialization_without_rtc_config and test_smolvla_rtc_initialization_without_rtc_config to verify that policies can initialize without RTC config and that _rtc_enabled() returns False in this case. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	8858e0cbf1	fixup! fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled()	2025-11-18 21:30:02 +07:00
Eugene Mironov	43e631122c	fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled()	2025-11-18 21:30:02 +07:00
Eugene Mironov	77fb71a903	Fix test to use _rtc_enabled() instead of is_rtc_enabled() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	b4f67373e9	fixup! Add one more test	2025-11-18 21:30:02 +07:00
Eugene Mironov	7e820bc1e3	Add one more test	2025-11-18 21:30:02 +07:00
Eugene Mironov	691503d099	fixup! fixup! Add tests for flow matching models with RTC	2025-11-18 21:30:02 +07:00
Eugene Mironov	a14e8a65cd	fixup! Add tests for flow matching models with RTC	2025-11-18 21:30:02 +07:00
Eugene Mironov	a59ebab66b	Add tests for flow matching models with RTC	2025-11-18 21:30:02 +07:00
Eugene Mironov	60e1a0de0f	Add tests for modeling_rtc	2025-11-18 21:30:02 +07:00
Eugene Mironov	da92b0169e	Fix tests	2025-11-18 21:30:02 +07:00
Eugene Mironov	36dc58d05e	Silent validation	2025-11-18 21:30:02 +07:00
Eugene Mironov	dd39d7a037	Update README	2025-11-18 21:30:02 +07:00
Eugene Mironov	70d5ca387e	Add validatio at the end	2025-11-18 21:30:02 +07:00
Eugene Mironov	043432254e	Add more tests	2025-11-18 21:30:02 +07:00
Eugene Mironov	7185a5350e	Small fixes	2025-11-18 21:30:02 +07:00
Eugene Mironov	e5c2a0a892	Add workable flow	2025-11-18 21:30:02 +07:00
Eugene Mironov	a230e7424d	fixup! fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	3c484a77f6	fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	dd0bf8a86e	fixup! fixup! fixup! Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	0da9976c5f	fixup! fixup! Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	755ba419f6	fixup! Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	2dd7c2a7ea	Turn off compilation for pi0/pi05	2025-11-18 21:30:02 +07:00
Eugene Mironov	07550ff0ef	fixup! Pi0 eval dataset	2025-11-18 21:30:02 +07:00
Eugene Mironov	577ab57bab	Pi0 eval dataset	2025-11-18 21:30:02 +07:00
Eugene Mironov	6684c68612	Pi0	2025-11-18 21:30:02 +07:00
Eugene Mironov	687484a864	Add RTC to PI0	2025-11-18 21:30:02 +07:00
Eugene Mironov	4739ef9da3	Fix compilation	2025-11-18 21:30:02 +07:00
Eugene Mironov	d9e72662c1	Debug	2025-11-18 21:30:02 +07:00
Eugene Mironov	9354d7ef10	Experiemnt with late detach	2025-11-18 21:30:02 +07:00
Eugene Mironov	16127642d4	fixup! Add matplotliv to dev	2025-11-18 21:30:02 +07:00
Eugene Mironov	495176f252	Add matplotliv to dev	2025-11-18 21:30:02 +07:00
Eugene Mironov	6aa940346d	delete policies	2025-11-18 21:30:02 +07:00
Eugene Mironov	6fdee95923	Add torch compilation for eval_dataset	2025-11-18 21:30:02 +07:00
Eugene Mironov	c5b246f57c	Drop not required methods	2025-11-18 21:30:02 +07:00
Eugene Mironov	3d3cfcf751	Fix tests	2025-11-18 21:30:02 +07:00
Eugene Mironov	a29e8a6737	Add tests for tracker	2025-11-18 21:30:02 +07:00
Eugene Mironov	e758703f9a	Right kwargs for the policy	2025-11-18 21:30:02 +07:00
Eugene Mironov	64c6b89c40	Fix traacking	2025-11-18 21:30:02 +07:00
Eugene Mironov	ab5cae6547	fixup! fixup! fixup! Improve visualization: separate correction plot and fix axis scaling	2025-11-18 21:30:02 +07:00
Eugene Mironov	b5ff2b38df	fixup! fixup! Improve visualization: separate correction plot and fix axis scaling	2025-11-18 21:30:02 +07:00
Eugene Mironov	7dae02cec1	fixup! Improve visualization: separate correction plot and fix axis scaling	2025-11-18 21:30:02 +07:00
Eugene Mironov	b54042a98f	Improve visualization: separate correction plot and fix axis scaling Changes: - Create separate figure for correction data instead of overlaying on v_t - Add _rescale_axes helper method to properly scale all axes - Add 10% margin to y-axis for better visualization - Fix v_t chart vertical compression issue Benefits: - Clearer v_t plot without correction overlay - Better axis scaling with proper margins - Separate correction figure for focused analysis - Improved readability of all denoising visualizations Output files: - denoising_xt_comparison.png (x_t trajectories) - denoising_vt_comparison.png (v_t velocity - now cleaner) - denoising_correction_comparison.png (NEW - separate corrections) - denoising_x1t_comparison.png (x1_t state with error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	0385ccdd05	fixup! Refactor plotting loging	2025-11-18 21:30:02 +07:00
Eugene Mironov	bd85ea905f	Refactor plotting loging	2025-11-18 21:30:02 +07:00
Eugene Mironov	a3d32cf123	Move plotting logic from modeling_smolvla to eval_dataset script Refactor to improve separation of concerns: modeling_smolvla.py changes: - Remove all plotting logic from sample_actions method - Remove viz_xt_axs, viz_vt_axs, viz_x1t_axs parameters - Remove matplotlib and RTCDebugVisualizer imports - Remove viz_fig, viz_axs, denoise_step_counter instance variables - Simplify denoising loop to only track data in rtc_processor eval_dataset.py changes: - Add _plot_denoising_steps_from_tracker helper method - Retrieve debug steps from tracker after inference - Plot x_t, v_t, x1_t, correction, and error from tracker data - Enable debug tracking (cfg.rtc.debug = True) for visualization - Remove viz axes parameters from predict_action_chunk calls modeling_rtc.py changes: - Remove v_t from track() call (handled by user change) Benefits: - Cleaner modeling code focused on inference - Evaluation script owns all visualization logic - Better separation of concerns - Tracker is single source of truth for debug data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	4575ebdfa5	Refactor SmolVLA plotting to use tracker data instead of local variables Remove local tracking variables (correction, x1_t, error) from the denoising loop and instead retrieve plotting data from the RTC tracker after each denoise step. This makes the code cleaner and uses the tracker as the single source of truth for debug/visualization data. Changes: - Remove initialization of correction, x1_t, error before denoising loop - After each Euler step, retrieve most recent debug step from tracker - Extract correction, x1_t, err from debug step for plotting - Update tracking condition to use is_debug_enabled() method 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	dff5e8871c	Fix logging buffering and enable tracking when RTC config provided - Add force=True to logging.basicConfig to override existing configuration - Enable line buffering for stdout/stderr for real-time log output - Modify init_rtc_processor to create processor when rtc_config exists even if RTC is disabled, allowing tracking of denoising data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	c066eb3a13	fixup! Use output_dir for saving all evaluation images	2025-11-18 21:30:02 +07:00
Eugene Mironov	e8dd5343ab	Use output_dir for saving all evaluation images Update eval_dataset.py to save all comparison images to the configured output_dir instead of the current directory. This provides better organization and allows users to specify where outputs should be saved. Changes: - Add os import at top level - Create output_dir at start of run_evaluation() - Save all comparison images to output_dir - Remove duplicate os imports - Update init_rtc_processor() docstring to be more concise 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	01f76e94a3	Rename track_debug method to track Simplify the method name from track_debug to just track for better readability and consistency. The method already has clear documentation about its debug tracking purpose. Changes: - Rename RTCProcessor.track_debug() to track() - Update all call sites in modeling_smolvla.py and modeling_rtc.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	70548e55f0	Refactor RTC enabled checks to use _rtc_enabled helper Add _rtc_enabled() helper method in VLAFlowMatching class to simplify and clean up RTC enabled checks throughout the code. This reduces code duplication and improves readability. Changes: - Add _rtc_enabled() method in VLAFlowMatching - Replace verbose rtc_config checks with _rtc_enabled() calls - Maintain exact same functionality with cleaner code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	455d347b49	Add RTCConfig field to SmolVLAConfig Add rtc_config as an optional field in SmolVLAConfig to properly support Real-Time Chunking configuration. This replaces the previous getattr() workarounds with direct attribute access, making the code cleaner and more maintainable. Changes: - Import RTCConfig in configuration_smolvla.py - Add rtc_config: RTCConfig \| None = None field - Revert getattr() calls to direct attribute access in modeling_smolvla.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	c835f03478	fixup! Fix rtc_config attribute access in SmolVLA	2025-11-18 21:30:02 +07:00
Eugene Mironov	48849c543d	Fix rtc_config attribute access in SmolVLA Use getattr() to safely check for rtc_config attribute existence instead of direct attribute access. This fixes AttributeError when loading policies without rtc_config in their config. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Eugene Mironov	2afe107583	Add Real-Time Chunking (RTC) support for flow matching models Implement Real-Time Chunking (RTC) for action chunking policies using flow matching denoising. RTC enables smooth action transitions between consecutive chunks by using prefix guidance during denoising. Key features: - RTCProcessor class with denoise_step method for RTC guidance - Tracker system for debug tracking using time-based dictionary storage - RTCDebugVisualizer with comprehensive visualization utilities - Integration with SmolVLA policy for flow matching models - Support for multiple prefix attention schedules (ZEROS, ONES, LINEAR, EXP) - Configurable execution horizon and max guidance weight - Example scripts for dataset evaluation and real-time control Technical details: - Uses autograd-based gradient computation for RTC corrections - Time-based tracking eliminates duplicate step issues - Proxy methods in RTCProcessor for cleaner API - Full integration with LeRobot's policy and dataset systems Files added/modified: - src/lerobot/configs/types.py: Add RTCAttentionSchedule enum - src/lerobot/policies/rtc/: Core RTC implementation - configuration_rtc.py: RTC configuration - modeling_rtc.py: RTCProcessor with denoise_step - debug_handler.py: Tracker for debug information - debug_visualizer.py: Visualization utilities - src/lerobot/policies/smolvla/modeling_smolvla.py: RTC integration - examples/rtc/: Example scripts and evaluation tools 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 21:30:02 +07:00
Michel Aractingi	784cdae55a	Fixes in port droid scripts (#2455 ) * Fixes in port droid scripts * revert default mem-per-cpu * style nit * fix relative imports * style nit	2025-11-17 23:42:30 +01:00