Merge remote-tracking branch 'origin/main' into feat/language-columns

2026-05-17 17:50:09 +00:00 · 2026-05-06 12:09:13 +02:00
parent e3e9374e2c ce24063efd
commit 5c30b14929
146 changed files with 9361 additions and 4180 deletions
@@ -63,6 +63,8 @@
    title: SARM
  title: "Reward Models"
 - sections:
+  - local: inference
+    title: Policy Deployment (lerobot-rollout)
  - local: async
    title: Use Async Inference
  - local: rtc
@@ -50,30 +50,30 @@ This process can be repeated iteratively: deploy, collect, fine-tune, repeat. Ea

 ### Teleoperator Requirements

-The `examples/hil` HIL scripts require **teleoperators with active motors** that can:
+The `lerobot-rollout --strategy.type=dagger` mode requires **teleoperators with active motors** that can:

 - Enable/disable torque programmatically
 - Move to target positions (to mirror the robot state when pausing)

-**Compatible teleoperators in the current `examples/hil` scripts:**
+**Compatible teleoperators:**

 - `openarm_mini` - OpenArm Mini
 - `so_leader` - SO100 / SO101 leader arm

 > [!IMPORTANT]
-> The provided `examples/hil` commands default to `bi_openarm_follower` + `openarm_mini`.
+> The provided commands default to `bi_openarm_follower` + `openarm_mini`.
 > `so_follower` + `so_leader` configs are also registered and can be used via CLI flags.

 ---

 ## Script

-A single script handles both synchronous and RTC-based inference. Toggle RTC with `--rtc.enabled=true`:
+Use `lerobot-rollout` with `--strategy.type=dagger` for HIL data collection. Select the inference backend with `--inference.type=sync|rtc`:

-| Mode                     | Flag                 | Models                |
-| ------------------------ | -------------------- | --------------------- |
-| Standard (default)       | _(no flag needed)_   | ACT, Diffusion Policy |
-| Real-Time Chunking (RTC) | `--rtc.enabled=true` | Pi0, Pi0.5, SmolVLA   |
+| Mode                     | Flag                   | Models                |
+| ------------------------ | ---------------------- | --------------------- |
+| Standard (default)       | _(no flag needed)_     | ACT, Diffusion Policy |
+| Real-Time Chunking (RTC) | `--inference.type=rtc` | Pi0, Pi0.5, SmolVLA   |

 ---

@@ -97,7 +97,7 @@ python src/lerobot/scripts/lerobot_train.py \
 **Standard inference (ACT, Diffusion Policy):**

 ```bash
-python examples/hil/hil_data_collection.py \
+lerobot-rollout --strategy.type=dagger \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -108,11 +108,10 @@ python examples/hil/hil_data_collection.py \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/hil-dataset \
+    --dataset.repo_id=your-username/rollout_hil_dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --dataset.episode_time_s=1000 \
-    --dataset.num_episodes=50 \
+    --strategy.num_episodes=50 \
    --interpolation_multiplier=2
 ```

@@ -121,11 +120,11 @@ python examples/hil/hil_data_collection.py \
 For models with high inference latency, enable RTC for smooth execution:

 ```bash
-python examples/hil/hil_data_collection.py \
-    --rtc.enabled=true \
-    --rtc.execution_horizon=20 \
-    --rtc.max_guidance_weight=5.0 \
-    --rtc.prefix_attention_schedule=LINEAR \
+lerobot-rollout --strategy.type=dagger \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=20 \
+    --inference.rtc.max_guidance_weight=5.0 \
+    --inference.rtc.prefix_attention_schedule=LINEAR \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -136,11 +135,10 @@ python examples/hil/hil_data_collection.py \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/hil-rtc-dataset \
+    --dataset.repo_id=your-username/rollout_hil_rtc_dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --dataset.episode_time_s=1000 \
-    --dataset.num_episodes=50 \
+    --strategy.num_episodes=50 \
    --interpolation_multiplier=3
 ```

@@ -235,7 +233,7 @@ This HIL data collection approach builds on ideas from interactive imitation lea

 - **HG-DAgger** (Kelly et al., 2019) made this practical for robotics: a human expert monitors the robot and only intervenes when needed, rather than labeling every state. The gating between autonomous and human control is exactly the pause → takeover → return-to-policy loop used in the scripts here.

- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the HIL scripts in `examples/hil`.
+- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the DAgger strategy in `lerobot-rollout`.

 - **π0.6/RECAP** (Physical Intelligence, 2025) applies the same iterative collect-and-finetune loop at scale with VLA models, showing that even large pretrained policies benefit substantially from targeted human corrections on their own failure modes. π0.6 is trained using RECAP.

@@ -509,121 +509,42 @@ hf upload ${HF_USER}/act_so101_test${CKPT} \

 ## Run inference and evaluate your policy

-You can use the `record` script from [`lerobot-record`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy. For instance, run this command or API example to run inference and record 10 evaluation episodes:
+Use `lerobot-rollout` to deploy a trained policy on your robot. You can choose different strategies depending on your needs:

 <hfoptions id="eval">
-<hfoption id="Command">
+<hfoption id="Base mode (no recording)">
 ```bash
-lerobot-record  \
+lerobot-rollout \
+  --strategy.type=base \
+  --policy.path=${HF_USER}/my_policy \
  --robot.type=so100_follower \
  --robot.port=/dev/ttyACM1 \
  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
-  --robot.id=my_awesome_follower_arm \
-  --display_data=false \
-  --dataset.repo_id=${HF_USER}/eval_so100 \
-  --dataset.single_task="Put lego brick into the transparent box" \
-  --dataset.streaming_encoding=true \
-  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
-  # <- Teleop optional if you want to teleoperate in between episodes \
-  # --teleop.type=so100_leader \
-  # --teleop.port=/dev/ttyACM0 \
-  # --teleop.id=my_awesome_leader_arm \
-  --policy.path=${HF_USER}/my_policy
+  --task="Put lego brick into the transparent box" \
+  --duration=60
 ```
 </hfoption>
-<hfoption id="API example">
-
-<!-- prettier-ignore-start -->
-```python
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.datasets import LeRobotDataset
-from lerobot.utils.feature_utils import hw_to_dataset_features
-from lerobot.policies.act import ACTPolicy
-from lerobot.policies import make_pre_post_processors
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.scripts.lerobot_record import record_loop
-from lerobot.common.control_utils import init_keyboard_listener
-from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun
-
-
-NUM_EPISODES = 5
-FPS = 30
-EPISODE_TIME_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"
-
-# Create the robot configuration
-camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-robot_config = SO100FollowerConfig(
-    port="/dev/tty.usbmodem58760434471", id="my_awesome_follower_arm", cameras=camera_config
-)
-
-# Initialize the robot
-robot = SO100Follower(robot_config)
-
-# Initialize the policy
-policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
-
-# Configure the dataset features
-action_features = hw_to_dataset_features(robot.action_features, "action")
-obs_features = hw_to_dataset_features(robot.observation_features, "observation")
-dataset_features = {**action_features, **obs_features}
-
-# Create the dataset
-dataset = LeRobotDataset.create(
-    repo_id=HF_DATASET_ID,
-    fps=FPS,
-    features=dataset_features,
-    robot_type=robot.name,
-    use_videos=True,
-    image_writer_threads=4,
-)
-
-# Initialize the keyboard listener and rerun visualization
-_, events = init_keyboard_listener()
-init_rerun(session_name="recording")
-
-# Connect the robot
-robot.connect()
-
-preprocessor, postprocessor = make_pre_post_processors(
-    policy_cfg=policy,
-    pretrained_path=HF_MODEL_ID,
-    dataset_stats=dataset.meta.stats,
-)
-
-for episode_idx in range(NUM_EPISODES):
-    log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
-
-    # Run the policy inference loop
-    record_loop(
-        robot=robot,
-        events=events,
-        fps=FPS,
-        policy=policy,
-        preprocessor=preprocessor,
-        postprocessor=postprocessor,
-        dataset=dataset,
-        control_time_s=EPISODE_TIME_SEC,
-        single_task=TASK_DESCRIPTION,
-        display_data=True,
-    )
-
-    dataset.save_episode()
-
-# Clean up
-robot.disconnect()
-dataset.push_to_hub()
+<hfoption id="Sentry mode (with recording)">
+```bash
+lerobot-rollout \
+  --strategy.type=sentry \
+  --strategy.upload_every_n_episodes=5 \
+  --policy.path=${HF_USER}/my_policy \
+  --robot.type=so100_follower \
+  --robot.port=/dev/ttyACM1 \
+  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --dataset.repo_id=${HF_USER}/eval_so100 \
+  --dataset.single_task="Put lego brick into the transparent box" \
+  --duration=600
 ```
-<!-- prettier-ignore-end -->
-
 </hfoption>
 </hfoptions>

-As you can see, it's almost the same command as previously used to record your training dataset. Two things changed:
+The `--strategy.type` flag selects the execution mode:

-1. There is an additional `--control.policy.path` argument which indicates the path to your policy checkpoint with (e.g. `outputs/train/eval_act_so101_test/checkpoints/last/pretrained_model`). You can also use the model repository if you uploaded a model checkpoint to the hub (e.g. `${HF_USER}/act_so101_test`).
-2. The name of dataset begins by `eval` to reflect that you are running inference (e.g. `${HF_USER}/eval_act_so101_test`).
+- `base`: Autonomous rollout with no data recording (useful for quick evaluation)
+- `sentry`: Continuous recording with auto-upload (useful for large-scale evaluation)
+- `highlight`: Ring buffer recording with keystroke save (useful for capturing interesting events)
+- `dagger`: Human-in-the-loop data collection (see [HIL Data Collection](./hil_data_collection))
+
+All strategies support `--inference.type=rtc` for smooth execution with slow VLA models (Pi0, Pi0.5, SmolVLA).
@@ -0,0 +1,261 @@
+# Policy Deployment (lerobot-rollout)
+
+`lerobot-rollout` is the single CLI for deploying trained policies on real robots. It supports multiple execution strategies and inference backends, from quick evaluation to continuous recording and human-in-the-loop data collection.
+
+## Quick Start
+
+No extra dependencies are needed beyond your robot and policy extras.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --policy.path=lerobot/act_koch_real \
+    --robot.type=koch_follower \
+    --robot.port=/dev/ttyACM0 \
+    --task="pick up cube" \
+    --duration=30
+```
+
+This runs the policy for 30 seconds with no recording.
+
+---
+
+## Strategies
+
+Select a strategy with `--strategy.type=<name>`. Each strategy defines a different control loop with its own recording and interaction semantics.
+
+### Base (`--strategy.type=base`)
+
+Autonomous policy execution with no data recording. Use this for quick evaluation, demos, or when you only need to observe the robot.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --task="Put lego brick into the box" \
+    --duration=60
+```
+
+| Flag             | Description                                            |
+| ---------------- | ------------------------------------------------------ |
+| `--duration`     | Run time in seconds (0 = infinite)                     |
+| `--task`         | Task description passed to the policy                  |
+| `--display_data` | Stream observations/actions to Rerun for visualization |
+
+### Sentry (`--strategy.type=sentry`)
+
+Continuous autonomous recording with periodic upload to the Hugging Face Hub. Episode boundaries are auto-computed from camera resolution and FPS so each saved episode produces a complete video file, keeping uploads efficient.
+
+Policy state (hidden state, RTC queue) persists across episode boundaries: the robot does not reset between episodes.
+
+```bash
+lerobot-rollout \
+    --strategy.type=sentry \
+    --strategy.upload_every_n_episodes=5 \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --dataset.repo_id=${HF_USER}/rollout_eval_data \
+    --dataset.single_task="Put lego brick into the box" \
+    --duration=3600
+```
+
+| Flag                                   | Description                                                 |
+| -------------------------------------- | ----------------------------------------------------------- |
+| `--strategy.upload_every_n_episodes`   | Push to Hub every N episodes (default: 5)                   |
+| `--strategy.target_video_file_size_mb` | Target video file size for episode rotation (default: auto) |
+| `--dataset.repo_id`                    | **Required.** Hub repository for the recorded dataset       |
+| `--dataset.push_to_hub`                | Whether to push to Hub on teardown (default: true)          |
+
+### Highlight (`--strategy.type=highlight`)
+
+Autonomous rollout with on-demand recording via a memory-bounded ring buffer. The robot runs continuously while the buffer captures the last N seconds of telemetry. Press the save key to flush the buffer and start live recording; press it again to save the episode.
+
+```bash
+lerobot-rollout \
+    --strategy.type=highlight \
+    --strategy.ring_buffer_seconds=30 \
+    --strategy.save_key=s \
+    --strategy.push_key=h \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=koch_follower \
+    --robot.port=/dev/ttyACM0 \
+    --dataset.repo_id=${HF_USER}/rollout_highlight_data \
+    --dataset.single_task="Pick up the red cube"
+```
+
+**Keyboard controls:**
+
+| Key                | Action                                                   |
+| ------------------ | -------------------------------------------------------- |
+| `s` (configurable) | Start recording (flushes buffer) / stop and save episode |
+| `h` (configurable) | Push dataset to Hub                                      |
+| `ESC`              | Stop the session                                         |
+
+| Flag                                   | Description                                    |
+| -------------------------------------- | ---------------------------------------------- |
+| `--strategy.ring_buffer_seconds`       | Duration of buffered telemetry (default: 30)   |
+| `--strategy.ring_buffer_max_memory_mb` | Memory cap for the ring buffer (default: 2048) |
+| `--strategy.save_key`                  | Key to toggle recording (default: `s`)         |
+| `--strategy.push_key`                  | Key to push to Hub (default: `h`)              |
+
+### DAgger (`--strategy.type=dagger`)
+
+Human-in-the-loop data collection. Alternates between autonomous policy execution and human intervention via a teleoperator. Intervention frames are tagged with `intervention=True`. Requires a teleoperator (`--teleop.type`).
+
+See the [Human-In-the-Loop Data Collection](./hil_data_collection) guide for a detailed walkthrough.
+
+**Corrections-only mode** (default): Only human correction windows are recorded. Each correction becomes one episode.
+
+```bash
+lerobot-rollout \
+    --strategy.type=dagger \
+    --strategy.num_episodes=20 \
+    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
+    --robot.type=bi_openarm_follower \
+    --teleop.type=openarm_mini \
+    --dataset.repo_id=${HF_USER}/rollout_hil_data \
+    --dataset.single_task="Fold the T-shirt"
+```
+
+**Continuous recording mode** (`--strategy.record_autonomous=true`): Both autonomous and correction frames are recorded with time-based episode rotation (same as Sentry).
+
+```bash
+lerobot-rollout \
+    --strategy.type=dagger \
+    --strategy.record_autonomous=true \
+    --strategy.num_episodes=50 \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --teleop.type=so101_leader \
+    --teleop.port=/dev/ttyACM1 \
+    --dataset.repo_id=${HF_USER}/rollout_dagger_data \
+    --dataset.single_task="Grasp the block"
+```
+
+**Keyboard controls** (default input device):
+
+| Key     | Action                                      |
+| ------- | ------------------------------------------- |
+| `Space` | Pause / resume policy execution             |
+| `Tab`   | Start / stop human correction               |
+| `Enter` | Push dataset to Hub (corrections-only mode) |
+| `ESC`   | Stop the session                            |
+
+Foot pedal input is also supported via `--strategy.input_device=pedal`. Configure pedal codes with `--strategy.pedal.*` flags.
+
+| Flag                                 | Description                                             |
+| ------------------------------------ | ------------------------------------------------------- |
+| `--strategy.num_episodes`            | Number of correction episodes to record (default: 10)   |
+| `--strategy.record_autonomous`       | Record autonomous frames too (default: false)           |
+| `--strategy.upload_every_n_episodes` | Push to Hub every N episodes (default: 5)               |
+| `--strategy.input_device`            | Input device: `keyboard` or `pedal` (default: keyboard) |
+| `--teleop.type`                      | **Required.** Teleoperator type                         |
+
+---
+
+## Inference Backends
+
+Select a backend with `--inference.type=<name>`. All strategies work with both backends.
+
+### Sync (default)
+
+One policy call per control tick. The main loop blocks until the action is computed.
+
+Works with all policies. No extra flags needed.
+
+### Real-Time Chunking (`--inference.type=rtc`)
+
+A background thread produces action chunks asynchronously. The main control loop polls for the next ready action while the policy computes the next chunk in parallel.
+
+Use RTC with large, slow VLA models (Pi0, Pi0.5, SmolVLA) for smooth, continuous motion despite high inference latency.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=10 \
+    --inference.rtc.max_guidance_weight=10.0 \
+    --policy.path=${HF_USER}/pi0_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --task="Pick up the cube" \
+    --duration=60 \
+    --device=cuda
+```
+
+| Flag                                        | Description                                                    |
+| ------------------------------------------- | -------------------------------------------------------------- |
+| `--inference.rtc.execution_horizon`         | Steps to blend with previous chunk (default: varies by policy) |
+| `--inference.rtc.max_guidance_weight`       | Consistency enforcement strength (default: varies by policy)   |
+| `--inference.rtc.prefix_attention_schedule` | Blend schedule: `LINEAR`, `EXP`, `ONES`, `ZEROS`               |
+| `--inference.queue_threshold`               | Max queue size before backpressure (default: 30)               |
+
+See the [Real-Time Chunking](./rtc) guide for details on tuning RTC parameters.
+
+---
+
+## Common Flags
+
+| Flag                              | Description                                                       | Default |
+| --------------------------------- | ----------------------------------------------------------------- | ------- |
+| `--policy.path`                   | **Required.** HF Hub model ID or local checkpoint path            | --      |
+| `--robot.type`                    | **Required.** Robot type (e.g. `so100_follower`, `koch_follower`) | --      |
+| `--robot.port`                    | Serial port for the robot                                         | --      |
+| `--robot.cameras`                 | Camera configuration (JSON dict)                                  | --      |
+| `--fps`                           | Control loop frequency                                            | 30      |
+| `--duration`                      | Run time in seconds (0 = infinite)                                | 0       |
+| `--device`                        | Torch device (`cpu`, `cuda`, `mps`)                               | auto    |
+| `--task`                          | Task description (used when no dataset is provided)               | --      |
+| `--display_data`                  | Stream telemetry to Rerun visualization                           | false   |
+| `--display_ip` / `--display_port` | Remote Rerun server address                                       | --      |
+| `--interpolation_multiplier`      | Action interpolation factor                                       | 1       |
+| `--use_torch_compile`             | Enable `torch.compile` for inference                              | false   |
+| `--resume`                        | Resume a previous recording session                               | false   |
+| `--play_sounds`                   | Vocal synthesis for events                                        | true    |
+
+---
+
+## Programmatic Usage
+
+For custom deployments (e.g. with kinematics processors), use the rollout module API directly:
+
+```python
+from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
+from lerobot.rollout.inference import SyncInferenceConfig
+from lerobot.rollout.strategies import BaseStrategy
+from lerobot.utils.process import ProcessSignalHandler
+
+cfg = RolloutConfig(
+    robot=my_robot_config,
+    policy=my_policy_config,
+    strategy=BaseStrategyConfig(),
+    inference=SyncInferenceConfig(),
+    fps=30,
+    duration=60,
+    task="my task",
+)
+
+signal_handler = ProcessSignalHandler(use_threads=True)
+ctx = build_rollout_context(
+    cfg,
+    signal_handler.shutdown_event,
+    robot_action_processor=my_custom_action_processor,       # optional
+    robot_observation_processor=my_custom_obs_processor,     # optional
+)
+
+strategy = BaseStrategy(cfg.strategy)
+try:
+    strategy.setup(ctx)
+    strategy.run(ctx)
+finally:
+    strategy.teardown(ctx)
+```
+
+See `examples/so100_to_so100_EE/rollout.py` and `examples/phone_to_so100/rollout.py` for full examples with kinematics processors.
@@ -1,7 +1,34 @@
 # Language columns and recipes

-LeRobot stores reusable language annotations directly next to frame data in `data/chunk-*/file-*.parquet`.
-The two optional columns are:
+Most LeRobot datasets ship with a single `task` string per episode — fine for
+short, single-instruction skills, but not enough for the longer-horizon,
+multi-modal robot policies the field is moving toward (high-level planning,
+memory, interjections, VQA, tool use). To support those policies without
+forking the dataset format, LeRobot extends `LeRobotDataset` with two optional
+language columns and a small recipe layer that turns those rows into
+chat-style training samples on the fly.
+
+The design splits cleanly into three layers:
+
+1. **Data in the dataset** — language annotations stored next to frames in
+   `data/chunk-*/file-*.parquet` as two optional columns (`language_persistent`
+   and `language_events`). Datasets without these columns keep their existing
+   behavior.
+2. **Recipe** — a YAML file that declares which annotation rows to bind and
+   how to lay them out as chat turns (`role`, `content`, optional images,
+   optional tool calls). Recipes are pure config; no Python required to add a
+   new one.
+3. **Training format** — at sample time, `RenderMessagesStep` resolves the
+   recipe against the per-frame annotations and emits HF-style `messages` plus
+   LeRobot-specific sidecars (`message_streams`, `target_message_indices`)
+   that policy processors consume.
+
+This page describes each layer in turn.
+
+## Layer 1 — language columns in the dataset
+
+The two optional columns live next to frame data in
+`data/chunk-*/file-*.parquet`:

 - `language_persistent`: a list of rows broadcast across every frame in an episode for state that remains active, such as `subtask`, `plan`, and `memory`.
 - `language_events`: a list of rows only on the exact frame where an event was emitted, such as `interjection`, `vqa`, and speech tool calls.
@@ -26,9 +53,9 @@ the validator enforce this via `validate_camera_field(style, camera)`.

 `meta/tasks.parquet` remains the canonical source for the task. The special `${task}` recipe binding always reads that task string and does not depend on language annotations.

-## Architecture
+### Architecture

-The language stack has three layers:
+The language stack itself has three internal modules backing layer 1:

 1. `lerobot.datasets.language` defines the schema, style registry, and `column_for_style`.
 2. `lerobot.datasets.language_render` resolves rows and renders messages.
@@ -36,7 +63,7 @@ The language stack has three layers:

 `LeRobotDataset` stays recipe-agnostic. It passes `language_persistent` and `language_events` through when present, and unannotated datasets keep their existing behavior.

-## Temporal semantics
+### Temporal semantics

 Persistent styles are active after emission until replaced:

@@ -52,7 +79,7 @@ Event styles only exist on their exact timestamp:

 Exact event matching has no tolerance window, so writers must stamp event rows with frame timestamps from the parquet data.

-## View-dependent resolution
+### View-dependent resolution

 For view-dependent styles (`vqa`, `motion`, `trace`), the resolver gains a
 `camera=` filter parallel to `role=` and `tool_name=`. Datasets with multiple
@@ -78,9 +105,11 @@ ask_vqa_top:

 Add one such sub-recipe per camera the dataset records.

-## Recipe anatomy
+## Layer 2 — recipe anatomy

-Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`.
+Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`. They
+declare which annotation rows to pull (via `bindings`) and how to compose them
+into chat turns (`messages`).

 ```yaml
 messages:
@@ -88,6 +117,13 @@ messages:
  - { role: assistant, content: "${subtask}", stream: low_level, target: true }
 ```

+A recipe can also branch into a weighted **blend** of sub-recipes. At sample
+time, exactly one branch is selected deterministically from the sample index,
+so different frames train different objectives (e.g. memory updates vs.
+low-level execution vs. VQA) without any Python wiring.
+
+## Layer 3 — training format
+
 Rendered samples use HF-style chat messages plus LeRobot sidecars:

 ```python
@@ -96,12 +132,7 @@ sample["message_streams"]
 sample["target_message_indices"]
 ```

-The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone.
-
-## Blends
-
-Blend recipes select one weighted sub-recipe deterministically from the sample index.
-The canonical `recipes/pi05_hirobot.yaml` combines memory updates, interjection responses, high-level subtask prediction, low-level execution, and VQA.
+The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone, which keeps the same dataset usable across SmolVLA, Pi0.5, and any future VLM that expects OpenAI-style chat messages.

 ## Graceful absence

@@ -61,17 +61,6 @@ lerobot-eval \
  --rename_map='{"observation.images.image": "observation.images.base_0_rgb", "observation.images.image2": "observation.images.left_wrist_0_rgb"}'
 ```

-### Recording
-
-`lerobot-record` also supports rename maps, nested under the dataset config:
-
-```bash
-lerobot-record \ # When running inference
-  --policy.path="<user>/smolVLA_finetuned" \
-  ... \
-  --dataset.rename_map='{"observation.images.glove2": "observation.images.image"}'
-```
-
 ## Alternative: edit the policy config directly

 If you always use the same dataset or environment, you can **edit the policy's `config.json`** so its observation keys match your data source. Then no rename map is needed.
@@ -105,10 +94,10 @@ XVLA-base has three visual inputs and `empty_cameras=0` by default. Your dataset

 ## Quick reference

-| Goal                                      | What to do                                                                  |
-| ----------------------------------------- | --------------------------------------------------------------------------- |
-| Dataset keys ≠ policy keys                | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
-| Env keys ≠ policy keys (eval)             | `--rename_map='{"env_key": "policy_key", ...}'`                             |
-| Recording with different keys (inference) | `--dataset.rename_map='{"source_key": "policy_key", ...}'`.                 |
-| Fewer cameras than policy expects         | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
-| Avoid passing a rename map                | Edit the policy's `config.json` so its keys match your data source          |
+| Goal                                    | What to do                                                                  |
+| --------------------------------------- | --------------------------------------------------------------------------- |
+| Dataset keys ≠ policy keys              | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
+| Env keys ≠ policy keys (eval)           | `--rename_map='{"env_key": "policy_key", ...}'`                             |
+| Rollout with different keys (inference) | `--rename_map='{"source_key": "policy_key", ...}'`.                         |
+| Fewer cameras than policy expects       | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
+| Avoid passing a rename map              | Edit the policy's `config.json` so its keys match your data source          |
@@ -34,7 +34,7 @@ pip install -e ".[smolvla]"

 ### Using RTC with Pi0

-You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
+You can use `lerobot-rollout --strategy.type=base --inference.type=rtc` for RTC deployment on real robots.
 The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:

 ```python
@@ -137,8 +137,12 @@ The script generates a visualization of the denoising process, comparing standar
 ## Testing RTC with a Real Robot

 ```bash
-python examples/rtc/eval_with_real_robot.py \
+lerobot-rollout \
+    --strategy.type=base \
    --policy.path=${HF_USERNAME}/policy_repo_id \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=10 \
+    --inference.rtc.max_guidance_weight=10.0 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
@@ -178,7 +182,7 @@ visualizer = RTCDebugVisualizer()
 # ... create plots
 ```

-See `examples/rtc/eval_dataset.py` for a complete example of visualization.
+See `examples/rtc/eval_dataset.py` for a complete example of offline RTC visualization.

 ## References

@@ -46,7 +46,7 @@ This ensures identical task states map to consistent progress values, even acros

 ## Inputs and Targets (What the new code expects)

-SARM is trained through its processor (`src/lerobot/policies/sarm/processor_sarm.py`), which:
+SARM is trained through its processor (`src/lerobot/rewards/sarm/processor_sarm.py`), which:

 - **Encodes** images and task text with CLIP (ViT-B/32) into `video_features` and `text_features`
 - **Pads/truncates** robot state into `state_features` (up to `max_state_dim`)
@@ -347,7 +347,7 @@ Use `compute_rabc_weights.py` with `--visualize-only` to visualize model predict
 <hfoption id="single_stage">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -360,7 +360,7 @@ python src/lerobot/policies/sarm/compute_rabc_weights.py \
 <hfoption id="dense_only">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -373,7 +373,7 @@ python src/lerobot/policies/sarm/compute_rabc_weights.py \
 <hfoption id="dual">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -429,7 +429,7 @@ The weighting follows **Equations 8-9** from the paper:
 First, run the SARM model on all frames in your dataset to compute progress values:

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --head-mode sparse \
@@ -465,15 +465,15 @@ This script:

 ### Step 5b: Train Policy with RA-BC

-Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`). Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:
+Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`) if not explicitly provided. Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:

 ```bash
 lerobot-train \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --use_rabc=true \
-  --rabc_head_mode=sparse \
-  --rabc_kappa=0.01 \
+  --sample_weighting.type=rabc \
+  --sample_weighting.head_mode=sparse \
+  --sample_weighting.kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -488,12 +488,13 @@ The training script automatically:

 **RA-BC Arguments:**

-| Argument               | Description                                                | Default                            |
-| ---------------------- | ---------------------------------------------------------- | ---------------------------------- |
-| `--use_rabc`           | Enable RA-BC sample weighting                              | `false`                            |
-| `--rabc_progress_path` | Path to progress parquet file (auto-detected from dataset) | `sarm_progress.parquet` in dataset |
-| `--rabc_head_mode`     | Which SARM head's progress to use: `sparse` or `dense`     | `sparse`                           |
-| `--rabc_kappa`         | Threshold κ for high-quality samples                       | `0.01`                             |
+| Argument                           | Description                                            | Default                 |
+| ---------------------------------- | ------------------------------------------------------ | ----------------------- |
+| `--sample_weighting.type`          | Weighting strategy type (`rabc` or `uniform`)          | `rabc`                  |
+| `--sample_weighting.progress_path` | Path to progress parquet file                          | `sarm_progress.parquet` |
+| `--sample_weighting.head_mode`     | Which SARM head's progress to use: `sparse` or `dense` | `sparse`                |
+| `--sample_weighting.kappa`         | Threshold κ for high-quality samples                   | `0.01`                  |
+| `--sample_weighting.epsilon`       | Small constant for numerical stability                 | `1e-6`                  |

 ### Tuning RA-BC Kappa

@@ -511,30 +512,30 @@ The `kappa` parameter is the threshold that determines which samples get full we

 Monitor these WandB metrics during training:

-| Metric             | Healthy Range | Problem Indicator         |
-| ------------------ | ------------- | ------------------------- |
-| `rabc_mean_weight` | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
-| `rabc_delta_mean`  | > 0           | Should be positive        |
-| `rabc_delta_std`   | > 0           | Variance in data quality  |
+| Metric                        | Healthy Range | Problem Indicator         |
+| ----------------------------- | ------------- | ------------------------- |
+| `sample_weight_mean_weight`   | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
+| `sample_weighting/delta_mean` | > 0           | Should be positive        |
+| `sample_weighting/delta_std`  | > 0           | Variance in data quality  |

-**If `rabc_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.
+**If `sample_weight_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.

 **Setting kappa based on your data:**

-The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `rabc_delta_mean` and `rabc_delta_std`:
+The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `sample_weighting/delta_mean` and `sample_weighting/delta_std`:

 ```
 # If delta_mean ≈ 0.03 and delta_std ≈ 0.02:
 # Most deltas fall in range [0.01, 0.05]

 # Option 1: Set kappa = delta_mean (medium selectivity)
--rabc_kappa=0.03
+--sample_weighting.kappa=0.03

 # Option 2: Set kappa = delta_mean + delta_std (high selectivity)
--rabc_kappa=0.05
+--sample_weighting.kappa=0.05

 # Option 3: Set kappa = delta_mean + 2*delta_std (very selective)
--rabc_kappa=0.07
+--sample_weighting.kappa=0.07
 ```

 **When RA-BC may not help:**
@@ -550,8 +551,8 @@ accelerate launch \
  src/lerobot/scripts/lerobot_train.py \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --use_rabc=true \
-  --rabc_kappa=0.01 \
+  --sample_weighting.type=rabc \
+  --sample_weighting.kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -576,7 +577,7 @@ accelerate launch \
 ### RA-BC

 1. **Train SARM first**: RA-BC quality depends entirely on SARM quality
-2. **Monitor `rabc_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))
+2. **Monitor `sample_weight_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))

 ---

@@ -6,9 +6,9 @@ runtime dispatches to a real implementation (TTS, controller, logger, …).

 This page covers:

-1. Where the tool catalog lives (PR 1).
-2. How the annotation pipeline produces tool-call atoms (PR 2).
-3. How to add your own tool (PR 3).
+1. Where the tool catalog lives.
+2. How the annotation pipeline produces tool-call atoms.
+3. How to add your own tool.

 ## Where tools are declared

@@ -64,8 +64,8 @@ prompt_str = tokenizer.apply_chat_template(
 ```

 **The implementations** — runnable Python — live under
-`src/lerobot/tools/`, one file per tool. The `say` implementation
-arrives in PR 3 and wraps Kyutai's pocket-tts model.
+`src/lerobot/tools/`, one file per tool. The canonical `say`
+implementation wraps Kyutai's pocket-tts model.

 ## Per-row tool *invocations*

@@ -114,8 +114,7 @@ loop.

 Add an entry under `meta/info.json["tools"]`. Either edit the file
 directly on disk *before* running the annotation pipeline (it'll be
-preserved) or hand it to `lerobot-annotate` via a config flag (PR 2 —
-exact CLI lands with the pipeline change).
+preserved) or hand it to `lerobot-annotate` via a config flag.

 ```json
 {
@@ -167,12 +166,12 @@ class RecordObservationTool:
 ```

 One file per tool keeps dependencies isolated — `record_observation`
-might pull `pillow`, while `say` (PR 3) pulls `pocket-tts`. Users
-installing only the tools they need avoid heavy transitive deps.
+might pull `pillow`, while `say` pulls `pocket-tts`. Users installing
+only the tools they need avoid heavy transitive deps.

 ### Step 3 — register it

-Add to `src/lerobot/tools/registry.py` (PR 3):
+Add to `src/lerobot/tools/registry.py`:

 ```python
 from .record_observation import RecordObservationTool
@@ -184,14 +183,6 @@ That's it. At runtime `get_tools(meta)` looks up each schema in
 `meta.tools`, instantiates the matching registered class, and returns
 a name → instance dict the dispatcher can route into.

-## Where this fits in the three-PR stack
-
-| Layer | PR | What lands |
-|---|---|---|
-| Catalog storage in `meta/info.json` + `meta.tools` accessor | PR 1 | This page; `SAY_TOOL_SCHEMA`, `DEFAULT_TOOLS` constants in `lerobot.datasets.language`; `LeRobotDatasetMetadata.tools` property |
-| Annotation pipeline writes `tools` to meta after a run; honors anything users pre-populated | PR 2 | `lerobot-annotate` ensures `meta/info.json["tools"]` includes the canonical `say` and merges any user-declared tools |
-| Runnable implementations under `src/lerobot/tools/`; runtime dispatcher; `say.py` wired to Kyutai's pocket-tts | PR 3 | One file per tool; `Tool` protocol; `TOOL_REGISTRY`; optional `[tools]` extra in `pyproject.toml` |
-
 If you want to use a tool *without* writing an implementation (e.g. for
 training-time chat-template formatting only), step 1 alone is enough —
 the model still learns to *generate* the call. Steps 2 and 3 are only
@@ -274,7 +274,8 @@ python src/lerobot/scripts/lerobot_train.py \
 Once trained, we recommend deploying policies using inference-time RTC:

 ```bash
-python examples/rtc/eval_with_real_robot.py \
+lerobot-rollout \
+  --strategy.type=base \
  --policy.path=your-username/your-repo-id \
  --policy.device=cuda \
  --robot.type=unitree_g1 \
@@ -284,7 +285,7 @@ python examples/rtc/eval_with_real_robot.py \
  --task="task_description" \
  --duration=1000 \
  --fps=30 \
-  --rtc.enabled=true
+  --inference.type=rtc
 ```

 ---
@@ -220,7 +220,7 @@ REAL_DIM = 12
 # Postprocessing: Trim 20D predictions to 12D for deployment
 ```

-See the [action_hub.py](/home/jade_choghari/robot/lerobot/src/lerobot/policies/xvla/action_hub.py) implementation for details.
+See the [action_hub.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/action_hub.py) implementation for details.

 #### Auto Action Mode (Recommended)

@@ -519,9 +519,9 @@ If you use X-VLA in your research, please cite:

 - [X-VLA Paper](https://arxiv.org/pdf/2510.10274)
 - [LeRobot Documentation](https://github.com/huggingface/lerobot)
- [Action Registry Implementation](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/action_hub.py)
- [Processor Implementation](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/processor_xvla.py)
- [Model Configuration](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/configuration_xvla.py)
+- [Action Registry Implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/action_hub.py)
+- [Processor Implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/processor_xvla.py)
+- [Model Configuration](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/configuration_xvla.py)

 ## Contributing