mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-15 16:49:55 +00:00
87 lines
2.6 KiB
Plaintext
87 lines
2.6 KiB
Plaintext
# Training-Time RTC
|
|
|
|
Training-Time RTC teaches the model to handle inference delay during training.
|
|
It feeds the **ground-truth action prefix** to the model and trains only on the remaining postfix actions.
|
|
This keeps chunk transitions smooth without doing any inference-time inpainting.
|
|
|
|
Based on: [Training-Time Action Conditioning for Efficient Real-Time Chunking](https://arxiv.org/abs/2512.05964).
|
|
|
|
LeRobot supports this for `pi0`, `pi05` and `smolvla` without changing model parameters.
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
### At Training Time
|
|
|
|
- Sample a delay `d` per batch element.
|
|
- Keep the first `d` action steps as **ground truth** (no noise).
|
|
- Add noise only to the postfix actions.
|
|
- Set the flow-matching timestep to **1.0** for prefix tokens and normal timesteps for postfix tokens.
|
|
- Mask the loss to only train on the postfix.
|
|
|
|
### At Inference Time
|
|
|
|
When `rtc_training_config.enabled=true`, the model uses training-time RTC inference:
|
|
|
|
- Replace prefix positions in `x_t` with previous chunk's leftover actions.
|
|
- Set timestep to **1.0** for prefix positions.
|
|
|
|
---
|
|
|
|
## Quick Start (CLI)
|
|
|
|
```bash
|
|
lerobot-train \
|
|
--policy.type=pi0 \
|
|
--dataset.repo_id=your/dataset \
|
|
--policy.rtc_training_config.enabled=true \
|
|
--policy.rtc_training_config.min_delay=0 \
|
|
--policy.rtc_training_config.max_delay=6 \
|
|
--policy.rtc_training_config.delay_distribution=UNIFORM
|
|
```
|
|
|
|
---
|
|
|
|
## Inference with Training-Time RTC
|
|
|
|
After training with `rtc_training_config`, use the same config at inference. The model will automatically use training-time RTC inference:
|
|
|
|
```python
|
|
policy = PI0Policy.from_pretrained("path/to/trained/model")
|
|
# rtc_training_config is loaded from the saved config
|
|
|
|
actions = policy.predict_action_chunk(
|
|
batch,
|
|
inference_delay=5, # estimated delay in timesteps
|
|
prev_chunk_left_over=previous_actions, # from previous chunk
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Key Parameters
|
|
|
|
`RTCTrainingConfig` is available on the policy config (`pi0`, `pi05`, `smolvla`, `xvla`):
|
|
|
|
- **`enabled`**: Toggle training-time RTC (both training and inference).
|
|
- **`min_delay` / `max_delay`**: Delay range (inclusive).
|
|
- **`delay_distribution`**:
|
|
- `UNIFORM`: uniform in `[min_delay, max_delay]`
|
|
- `EXP`: exponentially decayed distribution over delays
|
|
- **`exp_decay`**: Exponential decay factor for `EXP` sampling.
|
|
|
|
---
|
|
|
|
## Notes and Recommendations
|
|
|
|
- Start with `min_delay=0` and `max_delay` around your expected worst-case inference delay.
|
|
- Use `EXP` if you want more supervision on smaller delays.
|
|
|
|
---
|
|
|
|
## Related Docs
|
|
|
|
- [Real-Time Chunking (Inference-Time RTC)](./rtc)
|
|
- [Pi0](./pi0), [Pi0.5](./pi05), [SmolVLA](./smolvla)
|