Files
lerobot/examples/rtc/README.md
T
Eugene Mironov 0acdde4ae2 Add Real-Time Chunking (RTC) support for flow matching models
Implement Real-Time Chunking (RTC) for action chunking policies using flow
matching denoising. RTC enables smooth action transitions between consecutive
chunks by using prefix guidance during denoising.

Key features:
- RTCProcessor class with denoise_step method for RTC guidance
- Tracker system for debug tracking using time-based dictionary storage
- RTCDebugVisualizer with comprehensive visualization utilities
- Integration with SmolVLA policy for flow matching models
- Support for multiple prefix attention schedules (ZEROS, ONES, LINEAR, EXP)
- Configurable execution horizon and max guidance weight
- Example scripts for dataset evaluation and real-time control

Technical details:
- Uses autograd-based gradient computation for RTC corrections
- Time-based tracking eliminates duplicate step issues
- Proxy methods in RTCProcessor for cleaner API
- Full integration with LeRobot's policy and dataset systems

Files added/modified:
- src/lerobot/configs/types.py: Add RTCAttentionSchedule enum
- src/lerobot/policies/rtc/: Core RTC implementation
  - configuration_rtc.py: RTC configuration
  - modeling_rtc.py: RTCProcessor with denoise_step
  - debug_handler.py: Tracker for debug information
  - debug_visualizer.py: Visualization utilities
- src/lerobot/policies/smolvla/modeling_smolvla.py: RTC integration
- examples/rtc/: Example scripts and evaluation tools

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:09:01 +07:00

8.4 KiB

Real-Time Chunking (RTC) Examples

This directory contains examples and evaluation scripts for Real-Time Chunking (RTC), a technique for improving action chunking policies in real-time robot control.

Overview

Real-Time Chunking addresses the challenge of maintaining consistency and reactivity when using action chunking policies with non-negligible inference latency. It uses a guidance technique during diffusion sampling to blend new action predictions with previously planned actions.

Key Benefits:

  • Maintains consistency between consecutive action chunks
  • Reduces jitter and improves smoothness
  • Adapts to inference delays dynamically

Reference: Physical Intelligence - Real-Time Chunking

Scripts

1. real_time_chunking_evaluate.py

Real-time evaluation on physical robots or simulation environments.

Features:

  • Run policy with RTC on real robot or simulation
  • Compare RTC vs non-RTC actions in real-time
  • Multi-threaded action execution and inference
  • Support for torch.compile() optimization

Usage:

# With real robot
uv run python examples/rtc/real_time_chunking_evaluate.py \
    --policy.path=lerobot/smolvla_base \
    --robot.type=so100 \
    --task="pick up the cup"

# With simulation environment
uv run python examples/rtc/real_time_chunking_evaluate.py \
    --policy.path=lerobot/smolvla_base \
    --env.type=pusht \
    --duration=60.0

# Disable verbose comparison (faster)
uv run python examples/rtc/real_time_chunking_evaluate.py \
    --policy.path=lerobot/smolvla_base \
    --robot.type=so100 \
    --verbose_rtc_comparison=false

# With policy compilation (CUDA only, not MPS)
uv run python examples/rtc/real_time_chunking_evaluate.py \
    --policy.path=lerobot/smolvla_base \
    --robot.type=so100 \
    --compile_policy=true \
    --compile_mode=max-autotune

Key Parameters:

  • --policy.path: Path to pretrained policy
  • --robot.type or --env.type: Robot or environment to use
  • --rtc.execution_horizon: Number of steps to maintain consistency (default: 10)
  • --rtc.max_guidance_weight: Maximum guidance weight (default: 1.0)
  • --rtc.prefix_attention_schedule: Schedule type (ZEROS, ONES, LINEAR, EXP)
  • --verbose_rtc_comparison: Enable detailed RTC comparison logging (default: true)
  • --duration: How long to run (seconds, default: 30.0)
  • --fps: Action execution frequency (Hz, default: 10.0)

2. evaluate_rtc_on_dataset.py

Offline evaluation on dataset samples to measure RTC effectiveness.

Features:

  • Evaluate RTC on dataset without running robot
  • Compare RTC vs non-RTC predictions
  • Measure consistency and ground truth alignment
  • Simulate different inference delays
  • Save detailed metrics to JSON

Usage:

# Basic evaluation
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=lerobot/pusht \
    --num_iterations=100

# Simulate inference delay (every 3rd step)
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=lerobot/pusht \
    --num_iterations=200 \
    --skip_steps=3

# Custom RTC configuration
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=lerobot/pusht \
    --num_iterations=100 \
    --rtc.execution_horizon=12 \
    --rtc.max_guidance_weight=5.0 \
    --rtc.prefix_attention_schedule=LINEAR

# Save results to file
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=lerobot/pusht \
    --num_iterations=100 \
    --output_path=results/rtc_evaluation.json

# Verbose mode with detailed logging
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=lerobot/pusht \
    --num_iterations=50 \
    --verbose=true

Key Parameters:

  • --policy.path: Path to pretrained policy
  • --dataset.repo_id: Dataset to evaluate on
  • --num_iterations: Number of samples to evaluate (default: 100)
  • --skip_steps: Steps to skip between inferences, simulates inference delay (default: 1)
  • --start_episode: Episode to start from (default: 0)
  • --output_path: Path to save results JSON
  • --verbose: Enable detailed per-sample logging
  • --device: Device to use (cuda, cpu, mps, auto)

Metrics Reported:

  • RTC vs Ground Truth MSE: How close RTC predictions are to actual actions
  • No-RTC vs Ground Truth MSE: Baseline without RTC
  • RTC Improvement: Absolute and relative improvement over baseline
  • RTC Consistency: How well RTC maintains consistency in prefix region
    • Prefix MSE
    • Mean/Max error in overlap region

3. run_dataset_evaluation.sh

Convenience script with multiple evaluation scenarios.

Usage:

# Edit the script to set your policy and dataset
# Then run all examples:
./examples/rtc/run_dataset_evaluation.sh

# Or run individual examples from the script

Understanding RTC Parameters

execution_horizon

Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity.

Typical values: 8-12 steps

max_guidance_weight

Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions.

Typical values: 1.0-10.0

prefix_attention_schedule

How to weight consistency across the overlap region:

  • ZEROS: Binary (full weight up to inference_delay, then zero)
  • ONES: Full weight across entire execution_horizon
  • LINEAR: Linear decay from inference_delay to execution_horizon
  • EXP: Exponential decay (recommended)

Recommended: EXP

skip_steps (evaluation only)

Simulates inference delay by evaluating every N-th step. This helps understand how RTC performs with realistic delays.

Example: skip_steps=3 means policy infers every 3 steps, simulating 3x action execution frequency vs inference frequency.

Output Format (Dataset Evaluation)

When using --output_path, results are saved in JSON format:

{
  "summary": {
    "rtc_vs_ground_truth_mse": {
      "mean": 0.00123,
      "std": 0.00045,
      "min": 0.00012,
      "max": 0.00456
    },
    "improvement": {
      "absolute": 0.00034,
      "relative_percent": 12.5
    },
    ...
  },
  "config": {
    "num_iterations": 100,
    "skip_steps": 3,
    "execution_horizon": 10,
    ...
  },
  "detailed_results": [
    {
      "sample_idx": 0,
      "rtc_vs_ground_truth_mse": 0.00112,
      "no_rtc_vs_ground_truth_mse": 0.00145,
      ...
    },
    ...
  ]
}

Tips

  1. Start with dataset evaluation to understand RTC behavior before running on robot
  2. Use verbose mode for debugging unexpected behavior
  3. Tune execution_horizon based on your inference latency and action frequency
  4. Monitor consistency metrics - very low consistency might indicate execution_horizon is too small
  5. Compare different schedules - EXP usually works best but LINEAR can be more interpretable

Troubleshooting

High RTC vs No-RTC difference but no improvement

  • Try reducing max_guidance_weight
  • Check if execution_horizon is too large

Poor consistency metrics

  • Increase execution_horizon
  • Check that skip_steps is not larger than your action chunk size
  • Verify episodes are being reset correctly

RTC worse than No-RTC

  • RTC may not help if inference is faster than action execution
  • Try different prefix_attention_schedule
  • Ensure execution_horizon matches your use case

Examples Results

Example output from dataset evaluation:

================================================================================
EVALUATION SUMMARY
================================================================================

Ground Truth Alignment:
  RTC MSE:        0.001234 ± 0.000456
  No-RTC MSE:     0.001567 ± 0.000512

RTC Improvement:
  Absolute:       0.000333
  Relative:       21.23%

RTC vs No-RTC Difference:
  MSE:            0.000112 ± 0.000034

RTC Consistency (Prefix Region):
  MSE:            0.000089 ± 0.000023
  Mean Error:     0.007654 ± 0.002341
  Max Error:      0.023456 ± 0.008765