mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-18 02:00:03 +00:00
Update README
This commit is contained in:
+153
-183
@@ -16,156 +16,161 @@ Real-Time Chunking addresses the challenge of maintaining consistency and reacti
|
||||
|
||||
## Scripts
|
||||
|
||||
### 1. `real_time_chunking_evaluate.py`
|
||||
### 1. `eval_dataset.py`
|
||||
|
||||
Real-time evaluation on physical robots or simulation environments.
|
||||
Offline evaluation on dataset samples with detailed visualization and validation.
|
||||
|
||||
**Features:**
|
||||
|
||||
- Run policy with RTC on real robot or simulation
|
||||
- Compare RTC vs non-RTC actions in real-time
|
||||
- Multi-threaded action execution and inference
|
||||
- Compare RTC vs non-RTC predictions on two random dataset samples
|
||||
- Validate RTC behavior (delay region, blend region, post-horizon region)
|
||||
- Generate debug visualizations:
|
||||
- Denoising step comparisons (x_t, v_t, x1_t, corrections)
|
||||
- Final action predictions comparison
|
||||
- Support for torch.compile() optimization
|
||||
- Memory-efficient sequential policy loading for large models
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# With real robot
|
||||
uv run python examples/rtc/real_time_chunking_evaluate.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--robot.type=so100 \
|
||||
--task="pick up the cup"
|
||||
# Basic usage with SmolVLA policy
|
||||
uv run python examples/rtc/eval_dataset.py \
|
||||
--policy.path=helper2424/smolvla_check_rtc_last3 \
|
||||
--dataset.repo_id=helper2424/check_rtc \
|
||||
--rtc.execution_horizon=8 \
|
||||
--device=mps \
|
||||
--rtc.max_guidance_weight=10.0 \
|
||||
--seed=10
|
||||
|
||||
# With simulation environment
|
||||
uv run python examples/rtc/real_time_chunking_evaluate.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--env.type=pusht \
|
||||
--duration=60.0
|
||||
# With Pi0.5 policy on CUDA
|
||||
uv run python examples/rtc/eval_dataset.py \
|
||||
--policy.path=lerobot/pi05_libero_finetuned \
|
||||
--dataset.repo_id=HuggingFaceVLA/libero \
|
||||
--rtc.execution_horizon=8 \
|
||||
--device=cuda
|
||||
|
||||
# Disable verbose comparison (faster)
|
||||
uv run python examples/rtc/real_time_chunking_evaluate.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--robot.type=so100 \
|
||||
--verbose_rtc_comparison=false
|
||||
# With Pi0 policy
|
||||
uv run python examples/rtc/eval_dataset.py \
|
||||
--policy.path=lerobot/pi0_libero_finetuned \
|
||||
--dataset.repo_id=HuggingFaceVLA/libero \
|
||||
--rtc.execution_horizon=8 \
|
||||
--device=cuda
|
||||
|
||||
# With policy compilation (CUDA only, not MPS)
|
||||
uv run python examples/rtc/real_time_chunking_evaluate.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--robot.type=so100 \
|
||||
--compile_policy=true \
|
||||
--compile_mode=max-autotune
|
||||
```
|
||||
# With torch.compile for faster inference
|
||||
uv run python examples/rtc/eval_dataset.py \
|
||||
--policy.path=helper2424/smolvla_check_rtc_last3 \
|
||||
--dataset.repo_id=helper2424/check_rtc \
|
||||
--rtc.execution_horizon=8 \
|
||||
--device=cuda \
|
||||
--use_torch_compile=true \
|
||||
--torch_compile_mode=max-autotune
|
||||
|
||||
**Key Parameters:**
|
||||
|
||||
- `--policy.path`: Path to pretrained policy
|
||||
- `--robot.type` or `--env.type`: Robot or environment to use
|
||||
- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 10)
|
||||
- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 1.0)
|
||||
- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
|
||||
- `--verbose_rtc_comparison`: Enable detailed RTC comparison logging (default: true)
|
||||
- `--duration`: How long to run (seconds, default: 30.0)
|
||||
- `--fps`: Action execution frequency (Hz, default: 10.0)
|
||||
|
||||
### 2. `evaluate_rtc_on_dataset.py`
|
||||
|
||||
Offline evaluation on dataset samples to measure RTC effectiveness.
|
||||
|
||||
**Features:**
|
||||
|
||||
- Evaluate RTC on dataset without running robot
|
||||
- Compare RTC vs non-RTC predictions
|
||||
- Measure consistency and ground truth alignment
|
||||
- Simulate different inference delays
|
||||
- Save detailed metrics to JSON
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Basic evaluation
|
||||
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--dataset.repo_id=lerobot/pusht \
|
||||
--num_iterations=100
|
||||
|
||||
# Simulate inference delay (every 3rd step)
|
||||
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--dataset.repo_id=lerobot/pusht \
|
||||
--num_iterations=200 \
|
||||
--skip_steps=3
|
||||
|
||||
# Custom RTC configuration
|
||||
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--dataset.repo_id=lerobot/pusht \
|
||||
--num_iterations=100 \
|
||||
--rtc.execution_horizon=12 \
|
||||
--rtc.max_guidance_weight=5.0 \
|
||||
--rtc.prefix_attention_schedule=LINEAR
|
||||
|
||||
# Save results to file
|
||||
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--dataset.repo_id=lerobot/pusht \
|
||||
--num_iterations=100 \
|
||||
--output_path=results/rtc_evaluation.json
|
||||
|
||||
# Verbose mode with detailed logging
|
||||
uv run python examples/rtc/evaluate_rtc_on_dataset.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--dataset.repo_id=lerobot/pusht \
|
||||
--num_iterations=50 \
|
||||
--verbose=true
|
||||
# Enable CUDA graphs (advanced - may cause tensor aliasing errors)
|
||||
uv run python examples/rtc/eval_dataset.py \
|
||||
--policy.path=helper2424/smolvla_check_rtc_last3 \
|
||||
--dataset.repo_id=helper2424/check_rtc \
|
||||
--use_torch_compile=true \
|
||||
--torch_compile_backend=inductor \
|
||||
--torch_compile_mode=max-autotune \
|
||||
--torch_compile_disable_cudagraphs=false
|
||||
```
|
||||
|
||||
**Key Parameters:**
|
||||
|
||||
- `--policy.path`: Path to pretrained policy
|
||||
- `--dataset.repo_id`: Dataset to evaluate on
|
||||
- `--num_iterations`: Number of samples to evaluate (default: 100)
|
||||
- `--skip_steps`: Steps to skip between inferences, simulates inference delay (default: 1)
|
||||
- `--start_episode`: Episode to start from (default: 0)
|
||||
- `--output_path`: Path to save results JSON
|
||||
- `--verbose`: Enable detailed per-sample logging
|
||||
- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 20)
|
||||
- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 10.0)
|
||||
- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
|
||||
- `--inference_delay`: Inference delay for RTC (default: 4)
|
||||
- `--seed`: Random seed for reproducibility (default: 42)
|
||||
- `--output_dir`: Directory to save visualizations (default: rtc_debug_output)
|
||||
- `--device`: Device to use (cuda, cpu, mps, auto)
|
||||
- `--use_torch_compile`: Enable torch.compile() for faster inference
|
||||
|
||||
**Metrics Reported:**
|
||||
**Output:**
|
||||
|
||||
- **RTC vs Ground Truth MSE**: How close RTC predictions are to actual actions
|
||||
- **No-RTC vs Ground Truth MSE**: Baseline without RTC
|
||||
- **RTC Improvement**: Absolute and relative improvement over baseline
|
||||
- **RTC Consistency**: How well RTC maintains consistency in prefix region
|
||||
- Prefix MSE
|
||||
- Mean/Max error in overlap region
|
||||
The script generates several visualization files in `rtc_debug_output/`:
|
||||
|
||||
### 3. `run_dataset_evaluation.sh`
|
||||
- `denoising_xt_comparison.png` - Noisy state evolution during denoising
|
||||
- `denoising_vt_comparison.png` - Velocity predictions during denoising
|
||||
- `denoising_x1t_comparison.png` - Predicted final states during denoising
|
||||
- `denoising_correction_comparison.png` - RTC guidance corrections applied
|
||||
- `final_actions_comparison.png` - Final action predictions (prev_chunk, no_rtc, rtc)
|
||||
|
||||
Convenience script with multiple evaluation scenarios.
|
||||
The script also validates RTC behavior and reports:
|
||||
|
||||
- ✅ Delay region [0:inference_delay]: RTC = prev_chunk
|
||||
- ✅ Blend region [inference_delay:execution_horizon]: prev_chunk ≤ RTC ≤ no_rtc
|
||||
- ✅ Post-horizon [execution_horizon:]: RTC = no_rtc
|
||||
|
||||
### 2. `eval_with_real_robot.py`
|
||||
|
||||
Real-time evaluation on physical robots or simulation environments.
|
||||
|
||||
**Features:**
|
||||
|
||||
- Run policy with RTC on real robot or simulation
|
||||
- Multi-threaded action execution and inference
|
||||
- Action queue management with proper timing
|
||||
- Latency tracking and adaptive inference delay
|
||||
- Support for both robots and gym environments
|
||||
- Support for torch.compile() optimization
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Edit the script to set your policy and dataset
|
||||
# Then run all examples:
|
||||
./examples/rtc/run_dataset_evaluation.sh
|
||||
# With real robot
|
||||
uv run python examples/rtc/eval_with_real_robot.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--robot.type=so100 \
|
||||
--task="pick up the cup" \
|
||||
--duration=30.0
|
||||
|
||||
# Or run individual examples from the script
|
||||
# With simulation environment
|
||||
uv run python examples/rtc/eval_with_real_robot.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--env.type=pusht \
|
||||
--duration=60.0
|
||||
|
||||
# With policy compilation (CUDA only, not MPS)
|
||||
uv run python examples/rtc/eval_with_real_robot.py \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--robot.type=so100 \
|
||||
--use_torch_compile=true \
|
||||
--torch_compile_mode=max-autotune
|
||||
```
|
||||
|
||||
**Key Parameters:**
|
||||
|
||||
- `--policy.path`: Path to pretrained policy
|
||||
- `--robot.type` or `--env.type`: Robot or environment to use
|
||||
- `--task`: Task description (for VLA models)
|
||||
- `--rtc.execution_horizon`: Number of steps to maintain consistency (default: 10)
|
||||
- `--rtc.max_guidance_weight`: Maximum guidance weight (default: 1.0)
|
||||
- `--rtc.prefix_attention_schedule`: Schedule type (ZEROS, ONES, LINEAR, EXP)
|
||||
- `--duration`: How long to run (seconds, default: 30.0)
|
||||
- `--fps`: Action execution frequency (Hz, default: 10.0)
|
||||
- `--action_queue_size_to_get_new_actions`: Queue size threshold to request new actions (default: 30)
|
||||
- `--device`: Device to use (cuda, cpu, mps, auto)
|
||||
- `--use_torch_compile`: Enable torch.compile() for faster inference
|
||||
|
||||
## Understanding RTC Parameters
|
||||
|
||||
### `execution_horizon`
|
||||
|
||||
Number of timesteps from previous chunk to maintain consistency with. Higher values mean more consistency but potentially less reactivity.
|
||||
|
||||
**Typical values:** 8-12 steps
|
||||
**Typical values:** 8-12 steps for dataset evaluation, 10 steps for real-time execution
|
||||
|
||||
### `max_guidance_weight`
|
||||
|
||||
Upper bound on guidance strength. Higher values give stronger consistency but may over-constrain new predictions.
|
||||
|
||||
**Typical values:** 1.0-10.0
|
||||
**Typical values:**
|
||||
|
||||
- Dataset evaluation: 10.0-100.0 (can be higher for analysis)
|
||||
- Real-time execution: 1.0-10.0 (more conservative)
|
||||
|
||||
### `prefix_attention_schedule`
|
||||
|
||||
@@ -178,104 +183,69 @@ How to weight consistency across the overlap region:
|
||||
|
||||
**Recommended:** `EXP`
|
||||
|
||||
### `skip_steps` (evaluation only)
|
||||
### `inference_delay`
|
||||
|
||||
Simulates inference delay by evaluating every N-th step. This helps understand how RTC performs with realistic delays.
|
||||
Number of timesteps from the prefix to use for guidance. Typically calculated dynamically based on inference latency in real-time execution, but fixed for dataset evaluation.
|
||||
|
||||
**Example:** `skip_steps=3` means policy infers every 3 steps, simulating 3x action execution frequency vs inference frequency.
|
||||
**Typical values:** 3-5 steps for dataset evaluation
|
||||
|
||||
## Output Format (Dataset Evaluation)
|
||||
### `action_queue_size_to_get_new_actions` (real-time only)
|
||||
|
||||
When using `--output_path`, results are saved in JSON format:
|
||||
Threshold for requesting new action chunks. Should be higher than `inference_delay + execution_horizon` to ensure smooth operation.
|
||||
|
||||
```json
|
||||
{
|
||||
"summary": {
|
||||
"rtc_vs_ground_truth_mse": {
|
||||
"mean": 0.00123,
|
||||
"std": 0.00045,
|
||||
"min": 0.00012,
|
||||
"max": 0.00456
|
||||
},
|
||||
"improvement": {
|
||||
"absolute": 0.00034,
|
||||
"relative_percent": 12.5
|
||||
},
|
||||
...
|
||||
},
|
||||
"config": {
|
||||
"num_iterations": 100,
|
||||
"skip_steps": 3,
|
||||
"execution_horizon": 10,
|
||||
...
|
||||
},
|
||||
"detailed_results": [
|
||||
{
|
||||
"sample_idx": 0,
|
||||
"rtc_vs_ground_truth_mse": 0.00112,
|
||||
"no_rtc_vs_ground_truth_mse": 0.00145,
|
||||
...
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
**Typical values:** 20-30 steps
|
||||
|
||||
## Validation Rules (Dataset Evaluation)
|
||||
|
||||
The dataset evaluation script validates that RTC behavior matches expectations:
|
||||
|
||||
1. **Delay Region [0:inference_delay]**: RTC actions should equal previous chunk
|
||||
- Ensures consistency during the inference delay period
|
||||
|
||||
2. **Blend Region [inference_delay:execution_horizon]**: RTC should be between prev_chunk and no_rtc
|
||||
- Smooth transition from previous plan to new predictions
|
||||
|
||||
3. **Post-Horizon [execution_horizon:]**: RTC should equal no_rtc
|
||||
- Full adoption of new predictions after execution horizon
|
||||
|
||||
## Tips
|
||||
|
||||
1. **Start with dataset evaluation** to understand RTC behavior before running on robot
|
||||
2. **Use verbose mode** for debugging unexpected behavior
|
||||
1. **Start with dataset evaluation** (`eval_dataset.py`) to understand RTC behavior and tune parameters before running on robot
|
||||
2. **Use visualizations** to debug unexpected behavior - check denoising steps and final actions
|
||||
3. **Tune execution_horizon** based on your inference latency and action frequency
|
||||
4. **Monitor consistency metrics** - very low consistency might indicate execution_horizon is too small
|
||||
4. **Monitor validation output** - failures indicate potential implementation issues or misconfigured parameters
|
||||
5. **Compare different schedules** - EXP usually works best but LINEAR can be more interpretable
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### High RTC vs No-RTC difference but no improvement
|
||||
### Validation fails in delay region
|
||||
|
||||
- Try reducing `max_guidance_weight`
|
||||
- Check if `execution_horizon` is too large
|
||||
- Check that `prev_chunk_left_over` is properly passed to the policy
|
||||
- Verify RTC guidance is being applied during denoising
|
||||
- Look at denoising visualizations to see where guidance diverges
|
||||
|
||||
### Poor consistency metrics
|
||||
### Validation fails in post-horizon region
|
||||
|
||||
- Increase `execution_horizon`
|
||||
- Check that `skip_steps` is not larger than your action chunk size
|
||||
- Verify episodes are being reset correctly
|
||||
- RTC and no_rtc use different noise - verify same noise is being used for comparison
|
||||
- Check that weights are correctly zeroed out after execution horizon
|
||||
- Review prefix_attention_schedule visualization
|
||||
|
||||
### RTC worse than No-RTC
|
||||
### Poor performance on real robot
|
||||
|
||||
- RTC may not help if inference is faster than action execution
|
||||
- Try different `prefix_attention_schedule`
|
||||
- Ensure `execution_horizon` matches your use case
|
||||
- Increase `action_queue_size_to_get_new_actions` if you see warnings
|
||||
- Reduce `max_guidance_weight` if robot is too conservative
|
||||
- Try different `prefix_attention_schedule` values
|
||||
- Enable torch.compile() for faster inference (CUDA only)
|
||||
|
||||
## Examples Results
|
||||
### Memory issues with large models
|
||||
|
||||
Example output from dataset evaluation:
|
||||
|
||||
```
|
||||
================================================================================
|
||||
EVALUATION SUMMARY
|
||||
================================================================================
|
||||
|
||||
Ground Truth Alignment:
|
||||
RTC MSE: 0.001234 ± 0.000456
|
||||
No-RTC MSE: 0.001567 ± 0.000512
|
||||
|
||||
RTC Improvement:
|
||||
Absolute: 0.000333
|
||||
Relative: 21.23%
|
||||
|
||||
RTC vs No-RTC Difference:
|
||||
MSE: 0.000112 ± 0.000034
|
||||
|
||||
RTC Consistency (Prefix Region):
|
||||
MSE: 0.000089 ± 0.000023
|
||||
Mean Error: 0.007654 ± 0.002341
|
||||
Max Error: 0.023456 ± 0.008765
|
||||
```
|
||||
- The dataset evaluation script loads policies sequentially to minimize memory
|
||||
- For real-time execution, only one policy is loaded
|
||||
- Use smaller batch sizes if needed
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [RTC Implementation](../../src/lerobot/policies/rtc/modeling_rtc.py)
|
||||
- [RTC Configuration](../../src/lerobot/policies/rtc/configuration_rtc.py)
|
||||
- [Action Queue](../../src/lerobot/policies/rtc/action_queue.py)
|
||||
- [Physical Intelligence Paper](https://www.physicalintelligence.company/download/real_time_chunking.pdf)
|
||||
|
||||
Reference in New Issue
Block a user