fix(rl): enhance intervention handling in actor and learner

fix(rl): improve action processing for discrete and continuous actions
fix(rl): postprocess action in actor
2026-06-19 01:07:18 +00:00 · 2026-04-26 23:09:33 +02:00 · 2026-04-26 22:47:52 +02:00 · 2026-04-26 18:15:04 +02:00 · 2026-04-26 18:11:26 +02:00 · 2026-04-26 18:08:13 +02:00
129 changed files with 3836 additions and 8560 deletions
@@ -33,7 +33,7 @@ jobs:
      github.event.workflow_run.event == 'pull_request' &&
      github.event.workflow_run.conclusion == 'success' &&
      github.repository == 'huggingface/lerobot'
-    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
+    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@9ad2de8582b56c017cb530c1165116d40433f1c6  # main
    with:
      package_name: lerobot
    secrets:
@@ -55,7 +55,7 @@ jobs:
      github.repository == 'huggingface/lerobot'
    permissions:
      contents: read
-    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
    with:
      commit_sha: ${{ github.sha }}
      package: lerobot
@@ -78,7 +78,7 @@ jobs:
    permissions:
      contents: read
      pull-requests: write
-    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
    with:
      commit_sha: ${{ github.event.pull_request.head.sha }}
      pr_number: ${{ github.event.number }}
@@ -24,14 +24,14 @@ on:

 env:
  CLOSE_ISSUE_MESSAGE: >
-    This issue was closed because it has been stalled for 30 days with no activity.
+    This issue was closed because it has been stalled for 14 days with no activity.
    Feel free to reopen if is still relevant, or to ping a collaborator if you have any questions.
  CLOSE_PR_MESSAGE: >
-    This PR was closed because it has been stalled for 30 days with no activity.
+    This PR was closed because it has been stalled for 21 days with no activity.
    Feel free to reopen if is still relevant, or to ping a collaborator if you have any questions.
  WARN_ISSUE_MESSAGE: >
    This issue has been automatically marked as stale because it has not had
-    recent activity (1 year). It will be closed if no further activity occurs.
+    recent activity (6 months). It will be closed if no further activity occurs.
    Any change, comment or update to this issue will reset this count.
    Thank you for your contributions.
  WARN_PR_MESSAGE: >
@@ -59,10 +59,10 @@ jobs:
          stale-pr-label: stale
          exempt-issue-labels: never-stale
          exempt-pr-labels: never-stale
-          days-before-issue-stale: 365
-          days-before-issue-close: 30
+          days-before-issue-stale: 180
+          days-before-issue-close: 14
          days-before-pr-stale: 365
-          days-before-pr-close: 30
+          days-before-pr-close: 21
          delete-branch: true
          close-issue-message: ${{ env.CLOSE_ISSUE_MESSAGE }}
          close-pr-message: ${{ env.CLOSE_PR_MESSAGE }}
@@ -1,4 +1,3 @@
 include src/lerobot/templates/lerobot_modelcard_template.md
-include src/lerobot/templates/lerobot_rewardmodel_modelcard_template.md
 include src/lerobot/datasets/card_template.md
 include src/lerobot/envs/metaworld_config.json
@@ -18,8 +18,9 @@
 # docker build -f docker/Dockerfile.internal -t lerobot-internal .

 # Configure the base image for CI with GPU access
-ARG CUDA_VERSION=12.6.3
-ARG OS_VERSION=24.04
+# TODO(Steven): Bump these versions
+ARG CUDA_VERSION=12.4.1
+ARG OS_VERSION=22.04
 FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu${OS_VERSION}

 # Define Python version argument
@@ -35,13 +36,16 @@ ENV DEBIAN_FRONTEND=noninteractive \

 # Install Python, system dependencies, and uv (as root)
 RUN apt-get update && apt-get install -y --no-install-recommends \
-    build-essential git curl \
-    libglib2.0-0 libgl1 libegl1 ffmpeg \
+    software-properties-common build-essential git curl \
+    libglib2.0-0 libgl1-mesa-glx libegl1-mesa ffmpeg \
    libusb-1.0-0-dev speech-dispatcher libgeos-dev portaudio19-dev \
    cmake pkg-config ninja-build \
-    python${PYTHON_VERSION} \
-    python${PYTHON_VERSION}-venv \
-    python${PYTHON_VERSION}-dev \
+    && add-apt-repository -y ppa:deadsnakes/ppa \
+    && apt-get update \
+    && apt-get install -y --no-install-recommends \
+       python${PYTHON_VERSION} \
+       python${PYTHON_VERSION}-venv \
+       python${PYTHON_VERSION}-dev \
    && curl -LsSf https://astral.sh/uv/install.sh | sh \
    && mv /root/.local/bin/uv /usr/local/bin/uv \
    && useradd --create-home --shell /bin/bash user_lerobot \
@@ -61,8 +61,6 @@
    title: SARM
  title: "Reward Models"
 - sections:
-  - local: inference
-    title: Policy Deployment (lerobot-rollout)
  - local: async
    title: Use Async Inference
  - local: rtc
@@ -50,30 +50,30 @@ This process can be repeated iteratively: deploy, collect, fine-tune, repeat. Ea

 ### Teleoperator Requirements

-The `lerobot-rollout --strategy.type=dagger` mode requires **teleoperators with active motors** that can:
+The `examples/hil` HIL scripts require **teleoperators with active motors** that can:

 - Enable/disable torque programmatically
 - Move to target positions (to mirror the robot state when pausing)

-**Compatible teleoperators:**
+**Compatible teleoperators in the current `examples/hil` scripts:**

 - `openarm_mini` - OpenArm Mini
 - `so_leader` - SO100 / SO101 leader arm

 > [!IMPORTANT]
-> The provided commands default to `bi_openarm_follower` + `openarm_mini`.
+> The provided `examples/hil` commands default to `bi_openarm_follower` + `openarm_mini`.
 > `so_follower` + `so_leader` configs are also registered and can be used via CLI flags.

 ---

 ## Script

-Use `lerobot-rollout` with `--strategy.type=dagger` for HIL data collection. Select the inference backend with `--inference.type=sync|rtc`:
+A single script handles both synchronous and RTC-based inference. Toggle RTC with `--rtc.enabled=true`:

-| Mode                     | Flag                   | Models                |
-| ------------------------ | ---------------------- | --------------------- |
-| Standard (default)       | _(no flag needed)_     | ACT, Diffusion Policy |
-| Real-Time Chunking (RTC) | `--inference.type=rtc` | Pi0, Pi0.5, SmolVLA   |
+| Mode                     | Flag                 | Models                |
+| ------------------------ | -------------------- | --------------------- |
+| Standard (default)       | _(no flag needed)_   | ACT, Diffusion Policy |
+| Real-Time Chunking (RTC) | `--rtc.enabled=true` | Pi0, Pi0.5, SmolVLA   |

 ---

@@ -97,7 +97,7 @@ python src/lerobot/scripts/lerobot_train.py \
 **Standard inference (ACT, Diffusion Policy):**

 ```bash
-lerobot-rollout --strategy.type=dagger \
+python examples/hil/hil_data_collection.py \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -108,10 +108,11 @@ lerobot-rollout --strategy.type=dagger \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/rollout_hil_dataset \
+    --dataset.repo_id=your-username/hil-dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --strategy.num_episodes=50 \
+    --dataset.episode_time_s=1000 \
+    --dataset.num_episodes=50 \
    --interpolation_multiplier=2
 ```

@@ -120,11 +121,11 @@ lerobot-rollout --strategy.type=dagger \
 For models with high inference latency, enable RTC for smooth execution:

 ```bash
-lerobot-rollout --strategy.type=dagger \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=20 \
-    --inference.rtc.max_guidance_weight=5.0 \
-    --inference.rtc.prefix_attention_schedule=LINEAR \
+python examples/hil/hil_data_collection.py \
+    --rtc.enabled=true \
+    --rtc.execution_horizon=20 \
+    --rtc.max_guidance_weight=5.0 \
+    --rtc.prefix_attention_schedule=LINEAR \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -135,10 +136,11 @@ lerobot-rollout --strategy.type=dagger \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/rollout_hil_rtc_dataset \
+    --dataset.repo_id=your-username/hil-rtc-dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --strategy.num_episodes=50 \
+    --dataset.episode_time_s=1000 \
+    --dataset.num_episodes=50 \
    --interpolation_multiplier=3
 ```

@@ -233,7 +235,7 @@ This HIL data collection approach builds on ideas from interactive imitation lea

 - **HG-DAgger** (Kelly et al., 2019) made this practical for robotics: a human expert monitors the robot and only intervenes when needed, rather than labeling every state. The gating between autonomous and human control is exactly the pause → takeover → return-to-policy loop used in the scripts here.

- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the DAgger strategy in `lerobot-rollout`.
+- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the HIL scripts in `examples/hil`.

 - **π0.6/RECAP** (Physical Intelligence, 2025) applies the same iterative collect-and-finetune loop at scale with VLA models, showing that even large pretrained policies benefit substantially from targeted human corrections on their own failure modes. π0.6 is trained using RECAP.

@@ -509,42 +509,121 @@ hf upload ${HF_USER}/act_so101_test${CKPT} \

 ## Run inference and evaluate your policy

-Use `lerobot-rollout` to deploy a trained policy on your robot. You can choose different strategies depending on your needs:
+You can use the `record` script from [`lerobot-record`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy. For instance, run this command or API example to run inference and record 10 evaluation episodes:

 <hfoptions id="eval">
-<hfoption id="Base mode (no recording)">
+<hfoption id="Command">
 ```bash
-lerobot-rollout \
-  --strategy.type=base \
-  --policy.path=${HF_USER}/my_policy \
-  --robot.type=so100_follower \
-  --robot.port=/dev/ttyACM1 \
-  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
-  --task="Put lego brick into the transparent box" \
-  --duration=60
-```
-</hfoption>
-<hfoption id="Sentry mode (with recording)">
-```bash
-lerobot-rollout \
-  --strategy.type=sentry \
-  --strategy.upload_every_n_episodes=5 \
-  --policy.path=${HF_USER}/my_policy \
+lerobot-record  \
  --robot.type=so100_follower \
  --robot.port=/dev/ttyACM1 \
  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --robot.id=my_awesome_follower_arm \
+  --display_data=false \
  --dataset.repo_id=${HF_USER}/eval_so100 \
  --dataset.single_task="Put lego brick into the transparent box" \
-  --duration=600
+  --dataset.streaming_encoding=true \
+  --dataset.encoder_threads=2 \
+  # --dataset.vcodec=auto \
+  # <- Teleop optional if you want to teleoperate in between episodes \
+  # --teleop.type=so100_leader \
+  # --teleop.port=/dev/ttyACM0 \
+  # --teleop.id=my_awesome_leader_arm \
+  --policy.path=${HF_USER}/my_policy
 ```
+</hfoption>
+<hfoption id="API example">
+
+<!-- prettier-ignore-start -->
+```python
+from lerobot.cameras.opencv import OpenCVCameraConfig
+from lerobot.datasets import LeRobotDataset
+from lerobot.utils.feature_utils import hw_to_dataset_features
+from lerobot.policies.act import ACTPolicy
+from lerobot.policies import make_pre_post_processors
+from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
+from lerobot.scripts.lerobot_record import record_loop
+from lerobot.common.control_utils import init_keyboard_listener
+from lerobot.utils.utils import log_say
+from lerobot.utils.visualization_utils import init_rerun
+
+
+NUM_EPISODES = 5
+FPS = 30
+EPISODE_TIME_SEC = 60
+TASK_DESCRIPTION = "My task description"
+HF_MODEL_ID = "<hf_username>/<model_repo_id>"
+HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"
+
+# Create the robot configuration
+camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
+robot_config = SO100FollowerConfig(
+    port="/dev/tty.usbmodem58760434471", id="my_awesome_follower_arm", cameras=camera_config
+)
+
+# Initialize the robot
+robot = SO100Follower(robot_config)
+
+# Initialize the policy
+policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
+
+# Configure the dataset features
+action_features = hw_to_dataset_features(robot.action_features, "action")
+obs_features = hw_to_dataset_features(robot.observation_features, "observation")
+dataset_features = {**action_features, **obs_features}
+
+# Create the dataset
+dataset = LeRobotDataset.create(
+    repo_id=HF_DATASET_ID,
+    fps=FPS,
+    features=dataset_features,
+    robot_type=robot.name,
+    use_videos=True,
+    image_writer_threads=4,
+)
+
+# Initialize the keyboard listener and rerun visualization
+_, events = init_keyboard_listener()
+init_rerun(session_name="recording")
+
+# Connect the robot
+robot.connect()
+
+preprocessor, postprocessor = make_pre_post_processors(
+    policy_cfg=policy,
+    pretrained_path=HF_MODEL_ID,
+    dataset_stats=dataset.meta.stats,
+)
+
+for episode_idx in range(NUM_EPISODES):
+    log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
+
+    # Run the policy inference loop
+    record_loop(
+        robot=robot,
+        events=events,
+        fps=FPS,
+        policy=policy,
+        preprocessor=preprocessor,
+        postprocessor=postprocessor,
+        dataset=dataset,
+        control_time_s=EPISODE_TIME_SEC,
+        single_task=TASK_DESCRIPTION,
+        display_data=True,
+    )
+
+    dataset.save_episode()
+
+# Clean up
+robot.disconnect()
+dataset.push_to_hub()
+```
+<!-- prettier-ignore-end -->
+
 </hfoption>
 </hfoptions>

-The `--strategy.type` flag selects the execution mode:
+As you can see, it's almost the same command as previously used to record your training dataset. Two things changed:

- `base`: Autonomous rollout with no data recording (useful for quick evaluation)
- `sentry`: Continuous recording with auto-upload (useful for large-scale evaluation)
- `highlight`: Ring buffer recording with keystroke save (useful for capturing interesting events)
- `dagger`: Human-in-the-loop data collection (see [HIL Data Collection](./hil_data_collection))
-
-All strategies support `--inference.type=rtc` for smooth execution with slow VLA models (Pi0, Pi0.5, SmolVLA).
+1. There is an additional `--control.policy.path` argument which indicates the path to your policy checkpoint with (e.g. `outputs/train/eval_act_so101_test/checkpoints/last/pretrained_model`). You can also use the model repository if you uploaded a model checkpoint to the hub (e.g. `${HF_USER}/act_so101_test`).
+2. The name of dataset begins by `eval` to reflect that you are running inference (e.g. `${HF_USER}/eval_act_so101_test`).
@@ -1,261 +0,0 @@
-# Policy Deployment (lerobot-rollout)
-
-`lerobot-rollout` is the single CLI for deploying trained policies on real robots. It supports multiple execution strategies and inference backends, from quick evaluation to continuous recording and human-in-the-loop data collection.
-
-## Quick Start
-
-No extra dependencies are needed beyond your robot and policy extras.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --policy.path=lerobot/act_koch_real \
-    --robot.type=koch_follower \
-    --robot.port=/dev/ttyACM0 \
-    --task="pick up cube" \
-    --duration=30
-```
-
-This runs the policy for 30 seconds with no recording.
-
---
-
-## Strategies
-
-Select a strategy with `--strategy.type=<name>`. Each strategy defines a different control loop with its own recording and interaction semantics.
-
-### Base (`--strategy.type=base`)
-
-Autonomous policy execution with no data recording. Use this for quick evaluation, demos, or when you only need to observe the robot.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --task="Put lego brick into the box" \
-    --duration=60
-```
-
-| Flag             | Description                                            |
-| ---------------- | ------------------------------------------------------ |
-| `--duration`     | Run time in seconds (0 = infinite)                     |
-| `--task`         | Task description passed to the policy                  |
-| `--display_data` | Stream observations/actions to Rerun for visualization |
-
-### Sentry (`--strategy.type=sentry`)
-
-Continuous autonomous recording with periodic upload to the Hugging Face Hub. Episode boundaries are auto-computed from camera resolution and FPS so each saved episode produces a complete video file, keeping uploads efficient.
-
-Policy state (hidden state, RTC queue) persists across episode boundaries: the robot does not reset between episodes.
-
-```bash
-lerobot-rollout \
-    --strategy.type=sentry \
-    --strategy.upload_every_n_episodes=5 \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --dataset.repo_id=${HF_USER}/rollout_eval_data \
-    --dataset.single_task="Put lego brick into the box" \
-    --duration=3600
-```
-
-| Flag                                   | Description                                                 |
-| -------------------------------------- | ----------------------------------------------------------- |
-| `--strategy.upload_every_n_episodes`   | Push to Hub every N episodes (default: 5)                   |
-| `--strategy.target_video_file_size_mb` | Target video file size for episode rotation (default: auto) |
-| `--dataset.repo_id`                    | **Required.** Hub repository for the recorded dataset       |
-| `--dataset.push_to_hub`                | Whether to push to Hub on teardown (default: true)          |
-
-### Highlight (`--strategy.type=highlight`)
-
-Autonomous rollout with on-demand recording via a memory-bounded ring buffer. The robot runs continuously while the buffer captures the last N seconds of telemetry. Press the save key to flush the buffer and start live recording; press it again to save the episode.
-
-```bash
-lerobot-rollout \
-    --strategy.type=highlight \
-    --strategy.ring_buffer_seconds=30 \
-    --strategy.save_key=s \
-    --strategy.push_key=h \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=koch_follower \
-    --robot.port=/dev/ttyACM0 \
-    --dataset.repo_id=${HF_USER}/rollout_highlight_data \
-    --dataset.single_task="Pick up the red cube"
-```
-
-**Keyboard controls:**
-
-| Key                | Action                                                   |
-| ------------------ | -------------------------------------------------------- |
-| `s` (configurable) | Start recording (flushes buffer) / stop and save episode |
-| `h` (configurable) | Push dataset to Hub                                      |
-| `ESC`              | Stop the session                                         |
-
-| Flag                                   | Description                                    |
-| -------------------------------------- | ---------------------------------------------- |
-| `--strategy.ring_buffer_seconds`       | Duration of buffered telemetry (default: 30)   |
-| `--strategy.ring_buffer_max_memory_mb` | Memory cap for the ring buffer (default: 2048) |
-| `--strategy.save_key`                  | Key to toggle recording (default: `s`)         |
-| `--strategy.push_key`                  | Key to push to Hub (default: `h`)              |
-
-### DAgger (`--strategy.type=dagger`)
-
-Human-in-the-loop data collection. Alternates between autonomous policy execution and human intervention via a teleoperator. Intervention frames are tagged with `intervention=True`. Requires a teleoperator (`--teleop.type`).
-
-See the [Human-In-the-Loop Data Collection](./hil_data_collection) guide for a detailed walkthrough.
-
-**Corrections-only mode** (default): Only human correction windows are recorded. Each correction becomes one episode.
-
-```bash
-lerobot-rollout \
-    --strategy.type=dagger \
-    --strategy.num_episodes=20 \
-    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --robot.type=bi_openarm_follower \
-    --teleop.type=openarm_mini \
-    --dataset.repo_id=${HF_USER}/rollout_hil_data \
-    --dataset.single_task="Fold the T-shirt"
-```
-
-**Continuous recording mode** (`--strategy.record_autonomous=true`): Both autonomous and correction frames are recorded with time-based episode rotation (same as Sentry).
-
-```bash
-lerobot-rollout \
-    --strategy.type=dagger \
-    --strategy.record_autonomous=true \
-    --strategy.num_episodes=50 \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --teleop.type=so101_leader \
-    --teleop.port=/dev/ttyACM1 \
-    --dataset.repo_id=${HF_USER}/rollout_dagger_data \
-    --dataset.single_task="Grasp the block"
-```
-
-**Keyboard controls** (default input device):
-
-| Key     | Action                                      |
-| ------- | ------------------------------------------- |
-| `Space` | Pause / resume policy execution             |
-| `Tab`   | Start / stop human correction               |
-| `Enter` | Push dataset to Hub (corrections-only mode) |
-| `ESC`   | Stop the session                            |
-
-Foot pedal input is also supported via `--strategy.input_device=pedal`. Configure pedal codes with `--strategy.pedal.*` flags.
-
-| Flag                                 | Description                                             |
-| ------------------------------------ | ------------------------------------------------------- |
-| `--strategy.num_episodes`            | Number of correction episodes to record (default: 10)   |
-| `--strategy.record_autonomous`       | Record autonomous frames too (default: false)           |
-| `--strategy.upload_every_n_episodes` | Push to Hub every N episodes (default: 5)               |
-| `--strategy.input_device`            | Input device: `keyboard` or `pedal` (default: keyboard) |
-| `--teleop.type`                      | **Required.** Teleoperator type                         |
-
---
-
-## Inference Backends
-
-Select a backend with `--inference.type=<name>`. All strategies work with both backends.
-
-### Sync (default)
-
-One policy call per control tick. The main loop blocks until the action is computed.
-
-Works with all policies. No extra flags needed.
-
-### Real-Time Chunking (`--inference.type=rtc`)
-
-A background thread produces action chunks asynchronously. The main control loop polls for the next ready action while the policy computes the next chunk in parallel.
-
-Use RTC with large, slow VLA models (Pi0, Pi0.5, SmolVLA) for smooth, continuous motion despite high inference latency.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=10 \
-    --inference.rtc.max_guidance_weight=10.0 \
-    --policy.path=${HF_USER}/pi0_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --task="Pick up the cube" \
-    --duration=60 \
-    --device=cuda
-```
-
-| Flag                                        | Description                                                    |
-| ------------------------------------------- | -------------------------------------------------------------- |
-| `--inference.rtc.execution_horizon`         | Steps to blend with previous chunk (default: varies by policy) |
-| `--inference.rtc.max_guidance_weight`       | Consistency enforcement strength (default: varies by policy)   |
-| `--inference.rtc.prefix_attention_schedule` | Blend schedule: `LINEAR`, `EXP`, `ONES`, `ZEROS`               |
-| `--inference.queue_threshold`               | Max queue size before backpressure (default: 30)               |
-
-See the [Real-Time Chunking](./rtc) guide for details on tuning RTC parameters.
-
---
-
-## Common Flags
-
-| Flag                              | Description                                                       | Default |
-| --------------------------------- | ----------------------------------------------------------------- | ------- |
-| `--policy.path`                   | **Required.** HF Hub model ID or local checkpoint path            | --      |
-| `--robot.type`                    | **Required.** Robot type (e.g. `so100_follower`, `koch_follower`) | --      |
-| `--robot.port`                    | Serial port for the robot                                         | --      |
-| `--robot.cameras`                 | Camera configuration (JSON dict)                                  | --      |
-| `--fps`                           | Control loop frequency                                            | 30      |
-| `--duration`                      | Run time in seconds (0 = infinite)                                | 0       |
-| `--device`                        | Torch device (`cpu`, `cuda`, `mps`)                               | auto    |
-| `--task`                          | Task description (used when no dataset is provided)               | --      |
-| `--display_data`                  | Stream telemetry to Rerun visualization                           | false   |
-| `--display_ip` / `--display_port` | Remote Rerun server address                                       | --      |
-| `--interpolation_multiplier`      | Action interpolation factor                                       | 1       |
-| `--use_torch_compile`             | Enable `torch.compile` for inference                              | false   |
-| `--resume`                        | Resume a previous recording session                               | false   |
-| `--play_sounds`                   | Vocal synthesis for events                                        | true    |
-
---
-
-## Programmatic Usage
-
-For custom deployments (e.g. with kinematics processors), use the rollout module API directly:
-
-```python
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.utils.process import ProcessSignalHandler
-
-cfg = RolloutConfig(
-    robot=my_robot_config,
-    policy=my_policy_config,
-    strategy=BaseStrategyConfig(),
-    inference=SyncInferenceConfig(),
-    fps=30,
-    duration=60,
-    task="my task",
-)
-
-signal_handler = ProcessSignalHandler(use_threads=True)
-ctx = build_rollout_context(
-    cfg,
-    signal_handler.shutdown_event,
-    robot_action_processor=my_custom_action_processor,       # optional
-    robot_observation_processor=my_custom_obs_processor,     # optional
-)
-
-strategy = BaseStrategy(cfg.strategy)
-try:
-    strategy.setup(ctx)
-    strategy.run(ctx)
-finally:
-    strategy.teardown(ctx)
-```
-
-See `examples/so100_to_so100_EE/rollout.py` and `examples/phone_to_so100/rollout.py` for full examples with kinematics processors.
@@ -61,6 +61,17 @@ lerobot-eval \
  --rename_map='{"observation.images.image": "observation.images.base_0_rgb", "observation.images.image2": "observation.images.left_wrist_0_rgb"}'
 ```

+### Recording
+
+`lerobot-record` also supports rename maps, nested under the dataset config:
+
+```bash
+lerobot-record \ # When running inference
+  --policy.path="<user>/smolVLA_finetuned" \
+  ... \
+  --dataset.rename_map='{"observation.images.glove2": "observation.images.image"}'
+```
+
 ## Alternative: edit the policy config directly

 If you always use the same dataset or environment, you can **edit the policy's `config.json`** so its observation keys match your data source. Then no rename map is needed.
@@ -94,10 +105,10 @@ XVLA-base has three visual inputs and `empty_cameras=0` by default. Your dataset

 ## Quick reference

-| Goal                                    | What to do                                                                  |
-| --------------------------------------- | --------------------------------------------------------------------------- |
-| Dataset keys ≠ policy keys              | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
-| Env keys ≠ policy keys (eval)           | `--rename_map='{"env_key": "policy_key", ...}'`                             |
-| Rollout with different keys (inference) | `--rename_map='{"source_key": "policy_key", ...}'`.                         |
-| Fewer cameras than policy expects       | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
-| Avoid passing a rename map              | Edit the policy's `config.json` so its keys match your data source          |
+| Goal                                      | What to do                                                                  |
+| ----------------------------------------- | --------------------------------------------------------------------------- |
+| Dataset keys ≠ policy keys                | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
+| Env keys ≠ policy keys (eval)             | `--rename_map='{"env_key": "policy_key", ...}'`                             |
+| Recording with different keys (inference) | `--dataset.rename_map='{"source_key": "policy_key", ...}'`.                 |
+| Fewer cameras than policy expects         | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
+| Avoid passing a rename map                | Edit the policy's `config.json` so its keys match your data source          |
@@ -34,7 +34,7 @@ pip install -e ".[smolvla]"

 ### Using RTC with Pi0

-You can use `lerobot-rollout --strategy.type=base --inference.type=rtc` for RTC deployment on real robots.
+You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
 The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:

 ```python
@@ -137,12 +137,8 @@ The script generates a visualization of the denoising process, comparing standar
 ## Testing RTC with a Real Robot

 ```bash
-lerobot-rollout \
-    --strategy.type=base \
+python examples/rtc/eval_with_real_robot.py \
    --policy.path=${HF_USERNAME}/policy_repo_id \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=10 \
-    --inference.rtc.max_guidance_weight=10.0 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
@@ -182,7 +178,7 @@ visualizer = RTCDebugVisualizer()
 # ... create plots
 ```

-See `examples/rtc/eval_dataset.py` for a complete example of offline RTC visualization.
+See `examples/rtc/eval_dataset.py` for a complete example of visualization.

 ## References

@@ -46,7 +46,7 @@ This ensures identical task states map to consistent progress values, even acros

 ## Inputs and Targets (What the new code expects)

-SARM is trained through its processor (`src/lerobot/rewards/sarm/processor_sarm.py`), which:
+SARM is trained through its processor (`src/lerobot/policies/sarm/processor_sarm.py`), which:

 - **Encodes** images and task text with CLIP (ViT-B/32) into `video_features` and `text_features`
 - **Pads/truncates** robot state into `state_features` (up to `max_state_dim`)
@@ -347,7 +347,7 @@ Use `compute_rabc_weights.py` with `--visualize-only` to visualize model predict
 <hfoption id="single_stage">

 ```bash
-python -m lerobot.rewards.sarm.compute_rabc_weights \
+python src/lerobot/policies/sarm/compute_rabc_weights.py \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -360,7 +360,7 @@ python -m lerobot.rewards.sarm.compute_rabc_weights \
 <hfoption id="dense_only">

 ```bash
-python -m lerobot.rewards.sarm.compute_rabc_weights \
+python src/lerobot/policies/sarm/compute_rabc_weights.py \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -373,7 +373,7 @@ python -m lerobot.rewards.sarm.compute_rabc_weights \
 <hfoption id="dual">

 ```bash
-python -m lerobot.rewards.sarm.compute_rabc_weights \
+python src/lerobot/policies/sarm/compute_rabc_weights.py \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -429,7 +429,7 @@ The weighting follows **Equations 8-9** from the paper:
 First, run the SARM model on all frames in your dataset to compute progress values:

 ```bash
-python -m lerobot.rewards.sarm.compute_rabc_weights \
+python src/lerobot/policies/sarm/compute_rabc_weights.py \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --head-mode sparse \
@@ -465,15 +465,15 @@ This script:

 ### Step 5b: Train Policy with RA-BC

-Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`) if not explicitly provided. Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:
+Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`). Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:

 ```bash
 lerobot-train \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --sample_weighting.type=rabc \
-  --sample_weighting.head_mode=sparse \
-  --sample_weighting.kappa=0.01 \
+  --use_rabc=true \
+  --rabc_head_mode=sparse \
+  --rabc_kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -488,13 +488,12 @@ The training script automatically:

 **RA-BC Arguments:**

-| Argument                           | Description                                            | Default                 |
-| ---------------------------------- | ------------------------------------------------------ | ----------------------- |
-| `--sample_weighting.type`          | Weighting strategy type (`rabc` or `uniform`)          | `rabc`                  |
-| `--sample_weighting.progress_path` | Path to progress parquet file                          | `sarm_progress.parquet` |
-| `--sample_weighting.head_mode`     | Which SARM head's progress to use: `sparse` or `dense` | `sparse`                |
-| `--sample_weighting.kappa`         | Threshold κ for high-quality samples                   | `0.01`                  |
-| `--sample_weighting.epsilon`       | Small constant for numerical stability                 | `1e-6`                  |
+| Argument               | Description                                                | Default                            |
+| ---------------------- | ---------------------------------------------------------- | ---------------------------------- |
+| `--use_rabc`           | Enable RA-BC sample weighting                              | `false`                            |
+| `--rabc_progress_path` | Path to progress parquet file (auto-detected from dataset) | `sarm_progress.parquet` in dataset |
+| `--rabc_head_mode`     | Which SARM head's progress to use: `sparse` or `dense`     | `sparse`                           |
+| `--rabc_kappa`         | Threshold κ for high-quality samples                       | `0.01`                             |

 ### Tuning RA-BC Kappa

@@ -512,30 +511,30 @@ The `kappa` parameter is the threshold that determines which samples get full we

 Monitor these WandB metrics during training:

-| Metric                        | Healthy Range | Problem Indicator         |
-| ----------------------------- | ------------- | ------------------------- |
-| `sample_weight_mean_weight`   | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
-| `sample_weighting/delta_mean` | > 0           | Should be positive        |
-| `sample_weighting/delta_std`  | > 0           | Variance in data quality  |
+| Metric             | Healthy Range | Problem Indicator         |
+| ------------------ | ------------- | ------------------------- |
+| `rabc_mean_weight` | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
+| `rabc_delta_mean`  | > 0           | Should be positive        |
+| `rabc_delta_std`   | > 0           | Variance in data quality  |

-**If `sample_weight_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.
+**If `rabc_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.

 **Setting kappa based on your data:**

-The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `sample_weighting/delta_mean` and `sample_weighting/delta_std`:
+The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `rabc_delta_mean` and `rabc_delta_std`:

 ```
 # If delta_mean ≈ 0.03 and delta_std ≈ 0.02:
 # Most deltas fall in range [0.01, 0.05]

 # Option 1: Set kappa = delta_mean (medium selectivity)
--sample_weighting.kappa=0.03
+--rabc_kappa=0.03

 # Option 2: Set kappa = delta_mean + delta_std (high selectivity)
--sample_weighting.kappa=0.05
+--rabc_kappa=0.05

 # Option 3: Set kappa = delta_mean + 2*delta_std (very selective)
--sample_weighting.kappa=0.07
+--rabc_kappa=0.07
 ```

 **When RA-BC may not help:**
@@ -551,8 +550,8 @@ accelerate launch \
  src/lerobot/scripts/lerobot_train.py \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --sample_weighting.type=rabc \
-  --sample_weighting.kappa=0.01 \
+  --use_rabc=true \
+  --rabc_kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -577,7 +576,7 @@ accelerate launch \
 ### RA-BC

 1. **Train SARM first**: RA-BC quality depends entirely on SARM quality
-2. **Monitor `sample_weight_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))
+2. **Monitor `rabc_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))

 ---

@@ -274,8 +274,7 @@ python src/lerobot/scripts/lerobot_train.py \
 Once trained, we recommend deploying policies using inference-time RTC:

 ```bash
-lerobot-rollout \
-  --strategy.type=base \
+python examples/rtc/eval_with_real_robot.py \
  --policy.path=your-username/your-repo-id \
  --policy.device=cuda \
  --robot.type=unitree_g1 \
@@ -285,7 +284,7 @@ lerobot-rollout \
  --task="task_description" \
  --duration=1000 \
  --fps=30 \
-  --inference.type=rtc
+  --rtc.enabled=true
 ```

 ---
@@ -69,7 +69,7 @@ class ComputeProgressShards(PipelineStep):
        import torch
        from tqdm import tqdm

-        from lerobot.rewards.sarm.compute_rabc_weights import (
+        from lerobot.policies.sarm.compute_rabc_weights import (
            generate_all_frame_indices,
            interpolate_progress,
            load_sarm_resources,
@@ -0,0 +1,226 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Shared utilities for Human-in-the-Loop data collection scripts."""
+
+import logging
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+
+from lerobot.common.control_utils import is_headless
+from lerobot.processor import (
+    IdentityProcessorStep,
+    RobotAction,
+    RobotObservation,
+    RobotProcessorPipeline,
+    observation_to_transition,
+    robot_action_observation_to_transition,
+    transition_to_observation,
+    transition_to_robot_action,
+)
+from lerobot.robots import Robot
+from lerobot.teleoperators import Teleoperator
+from lerobot.utils.robot_utils import precise_sleep
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class HILDatasetConfig:
+    repo_id: str
+    single_task: str
+    root: str | Path | None = None
+    fps: int = 30
+    episode_time_s: float = 120
+    num_episodes: int = 50
+    video: bool = True
+    push_to_hub: bool = True
+    private: bool = False
+    tags: list[str] | None = None
+    num_image_writer_processes: int = 0
+    num_image_writer_threads_per_camera: int = 4
+    video_encoding_batch_size: int = 1
+    vcodec: str = "auto"
+    streaming_encoding: bool = True
+    encoder_queue_maxsize: int = 30
+    encoder_threads: int | None = None
+    rename_map: dict[str, str] = field(default_factory=dict)
+
+
+def teleop_has_motor_control(teleop: Teleoperator) -> bool:
+    """Check if teleoperator has motor control capabilities."""
+    return all(hasattr(teleop, attr) for attr in ("enable_torque", "disable_torque", "write_goal_positions"))
+
+
+def teleop_disable_torque(teleop: Teleoperator) -> None:
+    """Disable teleop torque if supported."""
+    if hasattr(teleop, "disable_torque"):
+        teleop.disable_torque()
+
+
+def teleop_enable_torque(teleop: Teleoperator) -> None:
+    """Enable teleop torque if supported."""
+    if hasattr(teleop, "enable_torque"):
+        teleop.enable_torque()
+
+
+def teleop_smooth_move_to(teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50):
+    """Smoothly move teleop to target position if motor control is available."""
+    if not teleop_has_motor_control(teleop):
+        logger.warning("Teleop does not support motor control - cannot mirror robot position")
+        return
+
+    teleop_enable_torque(teleop)
+    current = teleop.get_action()
+    steps = max(int(duration_s * fps), 1)
+
+    for step in range(steps + 1):
+        t = step / steps
+        interp = {}
+        for k in current:
+            if k in target_pos:
+                interp[k] = current[k] * (1 - t) + target_pos[k] * t
+            else:
+                interp[k] = current[k]
+        teleop.write_goal_positions(interp)
+        time.sleep(1 / fps)
+
+
+def init_keyboard_listener():
+    """Initialize keyboard listener with HIL controls."""
+    events = {
+        "exit_early": False,
+        "rerecord_episode": False,
+        "stop_recording": False,
+        "policy_paused": False,
+        "correction_active": False,
+        "resume_policy": False,
+        "in_reset": False,
+        "start_next_episode": False,
+    }
+
+    if is_headless():
+        logger.warning("Headless environment - keyboard controls unavailable")
+        return None, events
+
+    from pynput import keyboard
+
+    def on_press(key):
+        try:
+            if events["in_reset"]:
+                if key in [keyboard.Key.space, keyboard.Key.right]:
+                    logger.info("[HIL] Starting next episode...")
+                    events["start_next_episode"] = True
+                elif hasattr(key, "char") and key.char == "c":
+                    events["start_next_episode"] = True
+                elif key == keyboard.Key.esc:
+                    logger.info("[HIL] ESC - Stop recording, pushing to hub...")
+                    events["stop_recording"] = True
+                    events["start_next_episode"] = True
+            else:
+                if key == keyboard.Key.space:
+                    if not events["policy_paused"] and not events["correction_active"]:
+                        logger.info("[HIL] PAUSED - Press 'c' to take control or 'p' to resume policy")
+                        events["policy_paused"] = True
+                elif hasattr(key, "char") and key.char == "c":
+                    if events["policy_paused"] and not events["correction_active"]:
+                        logger.info("[HIL] Taking control...")
+                        events["start_next_episode"] = True
+                elif hasattr(key, "char") and key.char == "p":
+                    if events["policy_paused"] or events["correction_active"]:
+                        logger.info("[HIL] Resuming policy...")
+                        events["resume_policy"] = True
+                elif key == keyboard.Key.right:
+                    logger.info("[HIL] End episode")
+                    events["exit_early"] = True
+                elif key == keyboard.Key.left:
+                    logger.info("[HIL] Re-record episode")
+                    events["rerecord_episode"] = True
+                    events["exit_early"] = True
+                elif key == keyboard.Key.esc:
+                    logger.info("[HIL] ESC - Stop recording...")
+                    events["stop_recording"] = True
+                    events["exit_early"] = True
+        except Exception as e:
+            logger.info(f"Key error: {e}")
+
+    listener = keyboard.Listener(on_press=on_press)
+    listener.start()
+    return listener, events
+
+
+def make_identity_processors():
+    """Create identity processors for recording."""
+    teleop_proc = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
+        steps=[IdentityProcessorStep()],
+        to_transition=robot_action_observation_to_transition,
+        to_output=transition_to_robot_action,
+    )
+    obs_proc = RobotProcessorPipeline[RobotObservation, RobotObservation](
+        steps=[IdentityProcessorStep()],
+        to_transition=observation_to_transition,
+        to_output=transition_to_observation,
+    )
+    return teleop_proc, obs_proc
+
+
+def reset_loop(robot: Robot, teleop: Teleoperator, events: dict, fps: int):
+    """Reset period where human repositions environment."""
+    logger.info("[HIL] RESET")
+
+    events["in_reset"] = True
+    events["start_next_episode"] = False
+
+    obs = robot.get_observation()
+    robot_pos = {k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features}
+    teleop_smooth_move_to(teleop, robot_pos, duration_s=2.0, fps=50)
+
+    logger.info("Press any key to enable teleoperation")
+    while not events["start_next_episode"] and not events["stop_recording"]:
+        precise_sleep(0.05)
+
+    if events["stop_recording"]:
+        return
+
+    events["start_next_episode"] = False
+    teleop_disable_torque(teleop)
+    logger.info("Teleop enabled - press any key to start episode")
+
+    while not events["start_next_episode"] and not events["stop_recording"]:
+        loop_start = time.perf_counter()
+        action = teleop.get_action()
+        robot.send_action(action)
+        precise_sleep(1 / fps - (time.perf_counter() - loop_start))
+
+    events["in_reset"] = False
+    events["start_next_episode"] = False
+    events["exit_early"] = False
+    events["policy_paused"] = False
+    events["correction_active"] = False
+    events["resume_policy"] = False
+
+
+def print_controls(rtc: bool = False):
+    """Print control instructions."""
+    mode = "Human-in-the-Loop Data Collection" + (" (RTC)" if rtc else "")
+    logger.info(
+        "%s\n  Controls:\n"
+        "    SPACE  - Pause policy\n"
+        "    c      - Take control\n"
+        "    p      - Resume policy after pause/correction\n"
+        "    →      - End episode\n"
+        "    ESC    - Stop and push to hub",
+        mode,
+    )
@@ -14,21 +14,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.datasets import LeRobotDataset
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import make_default_processors
 from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import hw_to_dataset_features
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 2
 FPS = 30
@@ -39,9 +35,6 @@ HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")

@@ -90,67 +83,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
            log_say(f"Running inference, recording eval episode {recorded_episodes} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_observation_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot
-                robot_action_to_send = robot_action_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=teleop_action_processor,
+                robot_action_processor=robot_action_processor,
+                robot_observation_processor=robot_observation_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=teleop_action_processor,
+                    robot_action_processor=robot_action_processor,
+                    robot_observation_processor=robot_observation_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -45,6 +45,9 @@ def main():
    leader_arm = SO100Leader(leader_arm_config)
    keyboard = KeyboardTeleop(keyboard_config)

+    # TODO(Steven): Update this example to use pipelines
+    teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
+
    # Configure the dataset features
    action_features = hw_to_dataset_features(robot.action_features, ACTION)
    obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
@@ -74,10 +77,6 @@ def main():
        if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
            raise ValueError("Robot or teleop is not connected!")

-        teleop_action_processor, robot_action_processor, robot_observation_processor = (
-            make_default_processors()
-        )
-
        print("Starting record loop...")
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
@@ -88,14 +87,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
-                teleop_action_processor=teleop_action_processor,
-                robot_action_processor=robot_action_processor,
-                robot_observation_processor=robot_observation_processor,
                dataset=dataset,
                teleop=[leader_arm, keyboard],
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=teleop_action_processor,
+                robot_action_processor=robot_action_processor,
+                robot_observation_processor=robot_observation_processor,
            )

            # Reset the environment if not stopping or re-recording
@@ -107,13 +106,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=teleop_action_processor,
-                    robot_action_processor=robot_action_processor,
-                    robot_observation_processor=robot_observation_processor,
                    teleop=[leader_arm, keyboard],
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=teleop_action_processor,
+                    robot_action_processor=robot_action_processor,
+                    robot_observation_processor=robot_observation_processor,
                )

            if events["rerecord_episode"]:
@@ -1,77 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained policy on LeKiwi without recording (base rollout).
-
-Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
-no dataset) with :class:`SyncInferenceConfig` (inline policy call per
-control tick).  For a CLI entry point with the same capabilities plus
-recording, upload, and human-in-the-loop variants, see ``lerobot-rollout``.
-"""
-
-from lerobot.configs import PreTrainedConfig
-from lerobot.robots.lekiwi import LeKiwiClientConfig
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    # Robot: LeKiwi client — make sure lekiwi_host is already running on the robot.
-    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
-
-    # Policy: load the pretrained config.  ``pretrained_path`` is read downstream
-    # by ``build_rollout_context`` to reload the full model.
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    # Assemble the rollout config: base strategy (no recording) + sync inference.
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    # Graceful Ctrl-C: the strategy loop exits when shutdown_event is set.
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    # Build the context (connects robot, loads policy, wires the inference strategy).
-    # No custom processors here — LeKiwi runs on raw joint features.
-    ctx = build_rollout_context(cfg, signal_handler.shutdown_event)
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -14,17 +14,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -38,12 +34,11 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import combine_feature_dicts
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 5
 FPS = 30
@@ -54,9 +49,6 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -151,67 +143,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_joints_to_ee_pose_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot (EE -> joints via IK)
-                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=make_default_teleop_action_processor(),
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=make_default_teleop_action_processor(),
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -222,6 +190,7 @@ def main():

            # Save episode
            dataset.save_episode()
+            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -65,15 +65,14 @@ def main():
    robot = SO100Follower(robot_config)
    phone = Phone(teleop_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(robot.bus.motors.keys()),
    )

-    # Build pipeline to convert phone action to EE action (with gripper velocity mapped to joint).
+    # Build pipeline to convert phone action to EE action
    phone_to_robot_ee_pose_processor = RobotProcessorPipeline[
        tuple[RobotAction, RobotObservation], RobotAction
    ](
@@ -95,7 +94,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to joints action (IK).
+    # Build pipeline to convert EE action to joints action
    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            InverseKinematicsEEToJoints(
@@ -108,7 +107,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert joint observation to EE observation (FK).
+    # Build pipeline to convert joint observation to EE observation
    robot_joints_to_ee_pose = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -119,12 +118,13 @@ def main():
        to_output=transition_to_observation,
    )

-    # Create the dataset, deriving features from the pipelines so the on-disk schema
-    # matches exactly what the pipelines produce at runtime.
+    # Create the dataset
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
+            # Run the feature contract of the pipelines
+            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=phone_to_robot_ee_pose_processor,
                initial_features=create_initial_features(action=phone.action_features),
@@ -163,14 +163,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
-                teleop_action_processor=phone_to_robot_ee_pose_processor,
-                robot_action_processor=robot_ee_to_joints_processor,
-                robot_observation_processor=robot_joints_to_ee_pose,
                teleop=phone,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=phone_to_robot_ee_pose_processor,
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose,
            )

            # Reset the environment if not stopping or re-recording
@@ -182,13 +182,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=phone_to_robot_ee_pose_processor,
-                    robot_action_processor=robot_ee_to_joints_processor,
-                    robot_observation_processor=robot_joints_to_ee_pose,
                    teleop=phone,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=phone_to_robot_ee_pose_processor,
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose,
                )

            if events["rerecord_episode"]:
@@ -1,126 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained EE-space policy on SO100 (phone-trained) without recording.
-
-Mirrors ``examples/so100_to_so100_EE/rollout.py`` — the model was trained
-with phone teleoperation in EE space, so at deployment we only need the
-joint↔EE conversion on the robot side; the phone is not used.
-
-Uses :class:`BaseStrategy` (no recording) + :class:`SyncInferenceConfig`
-(inline policy call).  For recording during rollout, switch to Sentry,
-Highlight, or DAgger via ``lerobot-rollout --strategy.type=...``.
-"""
-
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.configs import PreTrainedConfig
-from lerobot.model.kinematics import RobotKinematics
-from lerobot.processor import (
-    RobotProcessorPipeline,
-    observation_to_transition,
-    robot_action_observation_to_transition,
-    transition_to_observation,
-    transition_to_robot_action,
-)
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.robots.so_follower.robot_kinematic_processor import (
-    ForwardKinematicsJointsToEE,
-    InverseKinematicsEEToJoints,
-)
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-    robot_config = SO100FollowerConfig(
-        port="/dev/tty.usbmodem58760434471",
-        id="my_awesome_follower_arm",
-        cameras=camera_config,
-        use_degrees=True,
-    )
-
-    # Peek at motor names once to build the kinematic solver.
-    temp_robot = SO100Follower(robot_config)
-    motor_names = list(temp_robot.bus.motors.keys())
-
-    kinematics_solver = RobotKinematics(
-        urdf_path="./SO101/so101_new_calib.urdf",
-        target_frame_name="gripper_frame_link",
-        joint_names=motor_names,
-    )
-
-    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
-        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
-        to_transition=observation_to_transition,
-        to_output=transition_to_observation,
-    )
-
-    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
-            InverseKinematicsEEToJoints(
-                kinematics=kinematics_solver,
-                motor_names=motor_names,
-                initial_guess_current_joints=True,
-            ),
-        ],
-        to_transition=robot_action_observation_to_transition,
-        to_output=transition_to_robot_action,
-    )
-
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    ctx = build_rollout_context(
-        cfg,
-        signal_handler.shutdown_event,
-        robot_action_processor=robot_ee_to_joints_processor,
-        robot_observation_processor=robot_joints_to_ee_pose_processor,
-    )
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -0,0 +1,673 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Demo script showing how to use Real-Time Chunking (RTC) with action chunking policies on real robots.
+
+This script demonstrates:
+1. Creating a robot and policy (SmolVLA, Pi0, etc.) with RTC
+2. Consuming actions from the policy while the robot executes
+3. Periodically requesting new action chunks in the background using threads
+4. Managing action buffers and timing for real-time operation
+
+For simulation environments, see eval_with_simulation.py
+
+Usage:
+    # Run RTC with Real robot with RTC
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/smolvla_check_rtc_last3 \
+        --policy.device=mps \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with Real robot without RTC
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/smolvla_check_rtc_last3 \
+        --policy.device=mps \
+        --rtc.enabled=false \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with Real robot with pi0.5 policy
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/pi05_check_rtc \
+        --policy.device=mps \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with bi_openarm_follower (dual-arm OpenArms) and pi0.5 policy
+    python examples/rtc/eval_with_real_robot.py \
+        --policy.path=lerobot-data-collection/folding_final \
+        --robot.type=bi_openarm_follower \
+        --robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}}' \
+        --robot.left_arm_config.port=can0 \
+        --robot.left_arm_config.side=left \
+        --robot.left_arm_config.can_interface=socketcan \
+        --robot.left_arm_config.disable_torque_on_disconnect=true \
+        --robot.left_arm_config.max_relative_target=8.0 \
+        --robot.right_arm_config.port=can1 \
+        --robot.right_arm_config.side=right \
+        --robot.right_arm_config.can_interface=socketcan \
+        --robot.right_arm_config.disable_torque_on_disconnect=true \
+        --robot.right_arm_config.max_relative_target=8.0 \
+        --task="Fold the T-shirt properly" \
+        --fps=30 \
+        --duration=2000 \
+        --interpolation_multiplier=3 \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --rtc.max_guidance_weight=5.0 \
+        --rtc.prefix_attention_schedule=LINEAR \
+        --device=cuda
+"""
+
+import logging
+import math
+import sys
+import time
+import traceback
+from dataclasses import dataclass, field
+from threading import Event, Lock, Thread
+
+import torch
+from torch import Tensor
+
+from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
+from lerobot.cameras.realsense import RealSenseCameraConfig  # noqa: F401
+from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
+from lerobot.configs import PreTrainedConfig, RTCAttentionSchedule, parser
+from lerobot.policies import get_policy_class, make_pre_post_processors
+from lerobot.policies.rtc import ActionInterpolator, ActionQueue, LatencyTracker, RTCConfig
+from lerobot.processor import (
+    NormalizerProcessorStep,
+    RelativeActionsProcessorStep,
+    TransitionKey,
+    create_transition,
+    make_default_robot_action_processor,
+    make_default_robot_observation_processor,
+    to_relative_actions,
+)
+from lerobot.rl.process import ProcessSignalHandler
+from lerobot.robots import (  # noqa: F401
+    Robot,
+    RobotConfig,
+    bi_openarm_follower,
+    bi_so_follower,
+    koch_follower,
+    so_follower,
+    unitree_g1,
+)
+from lerobot.robots.utils import make_robot_from_config
+from lerobot.utils.constants import OBS_IMAGES, OBS_STATE
+from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
+from lerobot.utils.hub import HubMixin
+from lerobot.utils.utils import init_logging
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class RobotWrapper:
+    def __init__(self, robot: Robot):
+        self.robot = robot
+        self.lock = Lock()
+
+    def get_observation(self) -> dict[str, Tensor]:
+        with self.lock:
+            return self.robot.get_observation()
+
+    def send_action(self, action: Tensor):
+        with self.lock:
+            self.robot.send_action(action)
+
+    def observation_features(self) -> list[str]:
+        with self.lock:
+            return self.robot.observation_features
+
+    def action_features(self) -> list[str]:
+        with self.lock:
+            return self.robot.action_features
+
+
+@dataclass
+class RTCDemoConfig(HubMixin):
+    """Configuration for RTC demo with action chunking policies and real robots."""
+
+    # Policy configuration
+    policy: PreTrainedConfig | None = None
+
+    # Robot configuration
+    robot: RobotConfig | None = None
+
+    # RTC configuration
+    rtc: RTCConfig = field(
+        default_factory=lambda: RTCConfig(
+            execution_horizon=10,
+            max_guidance_weight=1.0,
+            prefix_attention_schedule=RTCAttentionSchedule.EXP,
+        )
+    )
+
+    # Demo parameters
+    duration: float = 30.0  # Duration to run the demo (seconds)
+    fps: float = 10.0  # Action execution frequency (Hz)
+    interpolation_multiplier: int = 1  # Control rate multiplier (1=off, 2=2x, 3=3x)
+
+    # Compute device
+    device: str | None = None  # Device to run on (cuda, cpu, auto)
+
+    # Get new actions horizon. The amount of executed steps after which will be requested new actions.
+    # It should be higher than inference delay + execution horizon.
+    action_queue_size_to_get_new_actions: int = 30
+
+    # Task to execute
+    task: str = field(default="", metadata={"help": "Task to execute"})
+
+    # Torch compile configuration
+    use_torch_compile: bool = field(
+        default=False,
+        metadata={"help": "Use torch.compile for faster inference (PyTorch 2.0+)"},
+    )
+
+    torch_compile_backend: str = field(
+        default="inductor",
+        metadata={"help": "Backend for torch.compile (inductor, aot_eager, cudagraphs)"},
+    )
+
+    torch_compile_mode: str = field(
+        default="default",
+        metadata={"help": "Compilation mode (default, reduce-overhead, max-autotune)"},
+    )
+
+    torch_compile_disable_cudagraphs: bool = field(
+        default=True,
+        metadata={
+            "help": "Disable CUDA graphs in torch.compile. Required due to in-place tensor "
+            "operations in denoising loop (x_t += dt * v_t) which cause tensor aliasing issues."
+        },
+    )
+
+    def __post_init__(self):
+        # HACK: We parse again the cli args here to get the pretrained path if there was one.
+        policy_path = parser.get_path_arg("policy")
+        if policy_path:
+            cli_overrides = parser.get_cli_overrides("policy")
+            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
+            self.policy.pretrained_path = policy_path
+        else:
+            raise ValueError("Policy path is required")
+
+        # Validate that robot configuration is provided
+        if self.robot is None:
+            raise ValueError("Robot configuration must be provided")
+
+    @classmethod
+    def __get_path_fields__(cls) -> list[str]:
+        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
+        return ["policy"]
+
+
+def is_image_key(k: str) -> bool:
+    return k.startswith(OBS_IMAGES)
+
+
+def _reanchor_relative_rtc_prefix(
+    prev_actions_absolute: Tensor,
+    current_state: Tensor,
+    relative_step: RelativeActionsProcessorStep,
+    normalizer_step: NormalizerProcessorStep | None,
+    policy_device: torch.device | str,
+) -> Tensor:
+    """Convert absolute leftovers into model-space for relative-action RTC policies.
+
+    When a policy uses relative actions, the RTC prefix (leftover actions from
+    the previous chunk) is stored in absolute space. Before feeding it back to
+    the policy we need to re-express it relative to the *current* robot state
+    and then re-normalize.
+    """
+    state = current_state.detach().cpu()
+    if state.dim() == 1:
+        state = state.unsqueeze(0)
+
+    action_cpu = prev_actions_absolute.detach().cpu()
+    mask = relative_step._build_mask(action_cpu.shape[-1])
+    relative_actions = to_relative_actions(action_cpu, state, mask)
+
+    transition = create_transition(action=relative_actions)
+    if normalizer_step is not None:
+        transition = normalizer_step(transition)
+
+    return transition[TransitionKey.ACTION].to(policy_device)
+
+
+def get_actions(
+    policy,
+    robot: RobotWrapper,
+    robot_observation_processor,
+    action_queue: ActionQueue,
+    shutdown_event: Event,
+    cfg: RTCDemoConfig,
+):
+    """Thread function to request action chunks from the policy.
+
+    Args:
+        policy: The policy instance (SmolVLA, Pi0, etc.)
+        robot: The robot instance for getting observations
+        robot_observation_processor: Processor for raw robot observations
+        action_queue: Queue to put new action chunks
+        shutdown_event: Event to signal shutdown
+        cfg: Demo configuration
+    """
+    try:
+        logger.info("[GET_ACTIONS] Starting get actions thread")
+
+        latency_tracker = LatencyTracker()  # Track latency of action chunks
+        fps = cfg.fps
+        time_per_chunk = 1.0 / fps
+
+        # Only keep .pos joints + camera streams if the policy was trained on positions,
+        # not the full pos/vel/torque state the robot exposes.
+        observation_features_hw = {
+            key: value
+            for key, value in robot.observation_features().items()
+            if key.endswith(".pos") or isinstance(value, tuple)
+        }
+
+        dataset_features = hw_to_dataset_features(observation_features_hw, "observation")
+        policy_device = policy.config.device
+
+        # Load preprocessor and postprocessor from pretrained files
+        # The stats are embedded in the processor .safetensors files
+        logger.info(f"[GET_ACTIONS] Loading preprocessor/postprocessor from {cfg.policy.pretrained_path}")
+
+        preprocessor, postprocessor = make_pre_post_processors(
+            policy_cfg=cfg.policy,
+            pretrained_path=cfg.policy.pretrained_path,
+            dataset_stats=None,  # Will load from pretrained processor files
+            preprocessor_overrides={
+                "device_processor": {"device": cfg.policy.device},
+            },
+        )
+
+        logger.info("[GET_ACTIONS] Preprocessor/postprocessor loaded successfully with embedded stats")
+
+        relative_step = next(
+            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
+            None,
+        )
+        normalizer_step = next(
+            (s for s in preprocessor.steps if isinstance(s, NormalizerProcessorStep)),
+            None,
+        )
+        if relative_step is not None:
+            if relative_step.action_names is None:
+                cfg_names = getattr(cfg.policy, "action_feature_names", None)
+                if cfg_names:
+                    relative_step.action_names = list(cfg_names)
+                else:
+                    relative_step.action_names = [
+                        k for k in robot.robot.action_features if k.endswith(".pos")
+                    ]
+            logger.info("[GET_ACTIONS] Relative actions enabled: will re-anchor RTC prefix")
+
+        get_actions_threshold = cfg.action_queue_size_to_get_new_actions
+
+        if not cfg.rtc.enabled:
+            get_actions_threshold = 0
+
+        while not shutdown_event.is_set():
+            if action_queue.qsize() <= get_actions_threshold:
+                current_time = time.perf_counter()
+                action_index_before_inference = action_queue.get_action_index()
+                prev_actions = action_queue.get_left_over()
+
+                inference_latency = latency_tracker.max()
+                inference_delay = math.ceil(inference_latency / time_per_chunk)
+
+                obs = robot.get_observation()
+
+                # Apply robot observation processor
+                obs_processed = robot_observation_processor(obs)
+
+                obs_with_policy_features = build_dataset_frame(
+                    dataset_features, obs_processed, prefix="observation"
+                )
+
+                for name in obs_with_policy_features:
+                    obs_with_policy_features[name] = torch.from_numpy(obs_with_policy_features[name])
+                    if "image" in name:
+                        obs_with_policy_features[name] = (
+                            obs_with_policy_features[name].type(torch.float32) / 255
+                        )
+                        obs_with_policy_features[name] = (
+                            obs_with_policy_features[name].permute(2, 0, 1).contiguous()
+                        )
+                    obs_with_policy_features[name] = obs_with_policy_features[name].unsqueeze(0)
+                    obs_with_policy_features[name] = obs_with_policy_features[name].to(policy_device)
+
+                obs_with_policy_features["task"] = [cfg.task]  # Task should be a list, not a string!
+                obs_with_policy_features["robot_type"] = (
+                    robot.robot.name if hasattr(robot.robot, "name") else ""
+                )
+
+                preproceseded_obs = preprocessor(obs_with_policy_features)
+
+                # Re-anchor leftover actions for relative-action policies.
+                # We need the *postprocessed* (absolute) leftover, not the original
+                # (normalized/relative) one that get_left_over() returns.
+                if (
+                    prev_actions is not None
+                    and relative_step is not None
+                    and OBS_STATE in obs_with_policy_features
+                ):
+                    with action_queue.lock:
+                        if action_queue.queue is not None:
+                            prev_actions_abs = action_queue.queue[action_queue.last_index :].clone()
+                        else:
+                            prev_actions_abs = None
+                    if prev_actions_abs is not None and prev_actions_abs.numel() > 0:
+                        prev_actions = _reanchor_relative_rtc_prefix(
+                            prev_actions_absolute=prev_actions_abs,
+                            current_state=obs_with_policy_features[OBS_STATE],
+                            relative_step=relative_step,
+                            normalizer_step=normalizer_step,
+                            policy_device=policy_device,
+                        )
+
+                # Generate actions WITH RTC
+                actions = policy.predict_action_chunk(
+                    preproceseded_obs,
+                    inference_delay=inference_delay,
+                    prev_chunk_left_over=prev_actions,
+                )
+
+                # Store original actions (before postprocessing) for RTC
+                original_actions = actions.squeeze(0).clone()
+
+                postprocessed_actions = postprocessor(actions)
+
+                postprocessed_actions = postprocessed_actions.squeeze(0)
+
+                new_latency = time.perf_counter() - current_time
+                new_delay = math.ceil(new_latency / time_per_chunk)
+                latency_tracker.add(new_latency)
+
+                if cfg.action_queue_size_to_get_new_actions < cfg.rtc.execution_horizon + new_delay:
+                    logger.warning(
+                        "[GET_ACTIONS] cfg.action_queue_size_to_get_new_actions Too small, It should be higher than inference delay + execution horizon."
+                    )
+
+                action_queue.merge(
+                    original_actions, postprocessed_actions, new_delay, action_index_before_inference
+                )
+            else:
+                # Small sleep to prevent busy waiting
+                time.sleep(0.1)
+
+        logger.info("[GET_ACTIONS] get actions thread shutting down")
+    except Exception as e:
+        logger.error(f"[GET_ACTIONS] Fatal exception in get_actions thread: {e}")
+        logger.error(traceback.format_exc())
+        sys.exit(1)
+
+
+def actor_control(
+    robot: RobotWrapper,
+    robot_action_processor,
+    action_queue: ActionQueue,
+    shutdown_event: Event,
+    cfg: RTCDemoConfig,
+):
+    """Thread function to execute actions on the robot.
+
+    Args:
+        robot: The robot instance
+        action_queue: Queue to get actions from
+        shutdown_event: Event to signal shutdown
+        cfg: Demo configuration
+    """
+    try:
+        logger.info("[ACTOR] Starting actor thread")
+
+        action_keys = [k for k in robot.action_features() if k.endswith(".pos")]
+
+        action_count = 0
+        interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
+        action_interval = interpolator.get_control_interval(cfg.fps)
+
+        while not shutdown_event.is_set():
+            start_time = time.perf_counter()
+
+            if interpolator.needs_new_action():
+                new_action = action_queue.get()
+                if new_action is not None:
+                    interpolator.add(new_action.cpu())
+
+            action = interpolator.get()
+            if action is not None:
+                action = action.cpu()
+                action_dict = {key: action[i].item() for i, key in enumerate(action_keys)}
+                action_processed = robot_action_processor((action_dict, None))
+                robot.send_action(action_processed)
+                action_count += 1
+
+            dt_s = time.perf_counter() - start_time
+            time.sleep(max(0, (action_interval - dt_s) - 0.001))
+
+        logger.info(f"[ACTOR] Actor thread shutting down. Total actions executed: {action_count}")
+    except Exception as e:
+        logger.error(f"[ACTOR] Fatal exception in actor_control thread: {e}")
+        logger.error(traceback.format_exc())
+        sys.exit(1)
+
+
+def _apply_torch_compile(policy, cfg: RTCDemoConfig):
+    """Apply torch.compile to the policy's predict_action_chunk method.
+
+    Args:
+        policy: Policy instance to compile
+        cfg: Configuration containing torch compile settings
+
+    Returns:
+        Policy with compiled predict_action_chunk method
+    """
+
+    # PI models handle their own compilation
+    if policy.type == "pi05" or policy.type == "pi0":
+        return policy
+
+    try:
+        # Check if torch.compile is available (PyTorch 2.0+)
+        if not hasattr(torch, "compile"):
+            logger.warning(
+                f"torch.compile is not available. Requires PyTorch 2.0+. "
+                f"Current version: {torch.__version__}. Skipping compilation."
+            )
+            return policy
+
+        logger.info("Applying torch.compile to predict_action_chunk...")
+        logger.info(f"  Backend: {cfg.torch_compile_backend}")
+        logger.info(f"  Mode: {cfg.torch_compile_mode}")
+        logger.info(f"  Disable CUDA graphs: {cfg.torch_compile_disable_cudagraphs}")
+
+        # Compile the predict_action_chunk method
+        # - CUDA graphs disabled to prevent tensor aliasing from in-place ops (x_t += dt * v_t)
+        compile_kwargs = {
+            "backend": cfg.torch_compile_backend,
+            "mode": cfg.torch_compile_mode,
+        }
+
+        # Disable CUDA graphs if requested (prevents tensor aliasing issues)
+        if cfg.torch_compile_disable_cudagraphs:
+            compile_kwargs["options"] = {"triton.cudagraphs": False}
+
+        original_method = policy.predict_action_chunk
+        compiled_method = torch.compile(original_method, **compile_kwargs)
+        policy.predict_action_chunk = compiled_method
+        logger.info("✓ Successfully compiled predict_action_chunk")
+
+    except Exception as e:
+        logger.error(f"Failed to apply torch.compile: {e}")
+        logger.warning("Continuing without torch.compile")
+
+    return policy
+
+
+@parser.wrap()
+def demo_cli(cfg: RTCDemoConfig):
+    """Main entry point for RTC demo with draccus configuration."""
+
+    # Initialize logging
+    init_logging()
+
+    logger.info(f"Using device: {cfg.device}")
+
+    # Setup signal handler for graceful shutdown
+    signal_handler = ProcessSignalHandler(use_threads=True, display_pid=False)
+    shutdown_event = signal_handler.shutdown_event
+
+    policy = None
+    robot = None
+    get_actions_thread = None
+    actor_thread = None
+
+    policy_class = get_policy_class(cfg.policy.type)
+
+    # Load config and set compile_model for pi0/pi05 models
+    config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
+
+    if cfg.policy.type == "pi05" or cfg.policy.type == "pi0":
+        config.compile_model = cfg.use_torch_compile
+
+    if config.use_peft:
+        from peft import PeftConfig, PeftModel
+
+        peft_pretrained_path = cfg.policy.pretrained_path
+        peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
+
+        policy = policy_class.from_pretrained(
+            pretrained_name_or_path=peft_config.base_model_name_or_path, config=config
+        )
+        policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
+    else:
+        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
+
+    # Turn on RTC
+    policy.config.rtc_config = cfg.rtc
+
+    # Init RTC processort, as by default if RTC disabled in the config
+    # The processor won't be created
+    policy.init_rtc_processor()
+
+    assert policy.name in ["smolvla", "pi05", "pi0"], "Only smolvla, pi05, and pi0 are supported for RTC"
+
+    policy = policy.to(cfg.device)
+    policy.eval()
+
+    # Apply torch.compile to predict_action_chunk method if enabled
+    if cfg.use_torch_compile:
+        policy = _apply_torch_compile(policy, cfg)
+
+    # Create robot
+    logger.info(f"Initializing robot: {cfg.robot.type}")
+    robot = make_robot_from_config(cfg.robot)
+    robot.connect()
+    robot_wrapper = RobotWrapper(robot)
+
+    # Create robot observation processor
+    robot_observation_processor = make_default_robot_observation_processor()
+    robot_action_processor = make_default_robot_action_processor()
+
+    # Create action queue for communication between threads
+    action_queue = ActionQueue(cfg.rtc)
+
+    # Start chunk requester thread
+    get_actions_thread = Thread(
+        target=get_actions,
+        args=(policy, robot_wrapper, robot_observation_processor, action_queue, shutdown_event, cfg),
+        daemon=True,
+        name="GetActions",
+    )
+    get_actions_thread.start()
+    logger.info("Started get actions thread")
+
+    # Start action executor thread
+    actor_thread = Thread(
+        target=actor_control,
+        args=(robot_wrapper, robot_action_processor, action_queue, shutdown_event, cfg),
+        daemon=True,
+        name="Actor",
+    )
+    actor_thread.start()
+    logger.info("Started actor thread")
+
+    logger.info("Started stop by duration thread")
+
+    # Main thread monitors for duration or shutdown
+    logger.info(f"Running demo for {cfg.duration} seconds...")
+    start_time = time.time()
+
+    while not shutdown_event.is_set() and (time.time() - start_time) < cfg.duration:
+        time.sleep(10)
+
+        # Log queue status periodically
+        if int(time.time() - start_time) % 5 == 0:
+            logger.info(f"[MAIN] Action queue size: {action_queue.qsize()}")
+
+        if time.time() - start_time > cfg.duration:
+            break
+
+    logger.info("Demo duration reached or shutdown requested")
+
+    # Signal shutdown
+    shutdown_event.set()
+
+    # Wait for threads to finish
+    if get_actions_thread and get_actions_thread.is_alive():
+        logger.info("Waiting for chunk requester thread to finish...")
+        get_actions_thread.join()
+
+    if actor_thread and actor_thread.is_alive():
+        logger.info("Waiting for action executor thread to finish...")
+        actor_thread.join()
+
+    # Cleanup robot
+    if robot:
+        robot.disconnect()
+        logger.info("Robot disconnected")
+
+    logger.info("Cleanup completed")
+
+
+if __name__ == "__main__":
+    demo_cli()
+    logging.info("RTC demo finished")
@@ -14,17 +14,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -38,12 +34,11 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import combine_feature_dicts
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 5
 FPS = 30
@@ -54,9 +49,6 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -151,67 +143,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_joints_to_ee_pose_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot (EE -> joints via IK)
-                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=make_default_teleop_action_processor(),
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=make_default_teleop_action_processor(),
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -222,6 +190,7 @@ def main():

            # Save episode
            dataset.save_episode()
+            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -62,20 +62,21 @@ def main():
    follower = SO100Follower(follower_config)
    leader = SO100Leader(leader_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    follower_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(follower.bus.motors.keys()),
    )
+
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    leader_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(leader.bus.motors.keys()),
    )

-    # Build pipeline to convert follower joints to EE observation.
+    # Build pipeline to convert follower joints to EE observation
    follower_joints_to_ee = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -86,7 +87,7 @@ def main():
        to_output=transition_to_observation,
    )

-    # Build pipeline to convert leader joints to EE action.
+    # Build pipeline to convert leader joints to EE action
    leader_joints_to_ee = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -97,9 +98,9 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to follower joints (with safety bounds).
+    # Build pipeline to convert EE action to follower joints
    ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
+        [
            EEBoundsAndSafety(
                end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
                max_ee_step_m=0.10,
@@ -114,12 +115,13 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Create the dataset, deriving features from the pipelines so the on-disk schema
-    # matches exactly what the pipelines produce at runtime.
+    # Create the dataset
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
+            # Run the feature contract of the pipelines
+            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=leader_joints_to_ee,
                initial_features=create_initial_features(action=leader.action_features),
@@ -142,7 +144,7 @@ def main():

    # Initialize the keyboard listener and rerun visualization
    listener, events = init_keyboard_listener()
-    init_rerun(session_name="recording_so100_ee")
+    init_rerun(session_name="recording_phone")

    try:
        if not leader.is_connected or not follower.is_connected:
@@ -158,14 +160,14 @@ def main():
                robot=follower,
                events=events,
                fps=FPS,
-                teleop_action_processor=leader_joints_to_ee,
-                robot_action_processor=ee_to_follower_joints,
-                robot_observation_processor=follower_joints_to_ee,
                teleop=leader,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=leader_joints_to_ee,
+                robot_action_processor=ee_to_follower_joints,
+                robot_observation_processor=follower_joints_to_ee,
            )

            # Reset the environment if not stopping or re-recording
@@ -177,13 +179,13 @@ def main():
                    robot=follower,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=leader_joints_to_ee,
-                    robot_action_processor=ee_to_follower_joints,
-                    robot_observation_processor=follower_joints_to_ee,
                    teleop=leader,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=leader_joints_to_ee,
+                    robot_action_processor=ee_to_follower_joints,
+                    robot_observation_processor=follower_joints_to_ee,
                )

            if events["rerecord_episode"]:
@@ -1,134 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained EE-space policy on SO100 without recording (base rollout).
-
-Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
-no dataset) with :class:`SyncInferenceConfig` (inline policy call per
-control tick).  The custom observation/action processors convert between
-joint space (robot hardware) and end-effector space (policy I/O) via
-forward/inverse kinematics.
-"""
-
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.configs import PreTrainedConfig
-from lerobot.model.kinematics import RobotKinematics
-from lerobot.processor import (
-    RobotProcessorPipeline,
-    observation_to_transition,
-    robot_action_observation_to_transition,
-    transition_to_observation,
-    transition_to_robot_action,
-)
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.robots.so_follower.robot_kinematic_processor import (
-    ForwardKinematicsJointsToEE,
-    InverseKinematicsEEToJoints,
-)
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    # Robot configuration — the rollout engine will connect it inside build_rollout_context.
-    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-    robot_config = SO100FollowerConfig(
-        port="/dev/tty.usbmodem5A460814411",
-        id="my_awesome_follower_arm",
-        cameras=camera_config,
-        use_degrees=True,
-    )
-
-    # Kinematic solver: we need the motor-name list, so peek at the robot once.
-    # (The rollout engine owns the connected instance; we only use this for introspection.)
-    temp_robot = SO100Follower(robot_config)
-    motor_names = list(temp_robot.bus.motors.keys())
-
-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
-    kinematics_solver = RobotKinematics(
-        urdf_path="./SO101/so101_new_calib.urdf",
-        target_frame_name="gripper_frame_link",
-        joint_names=motor_names,
-    )
-
-    # Joint-space observation → EE-space observation (consumed by the policy).
-    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
-        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
-        to_transition=observation_to_transition,
-        to_output=transition_to_observation,
-    )
-
-    # EE-space action (produced by the policy) → joint-space action (sent to robot).
-    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
-            InverseKinematicsEEToJoints(
-                kinematics=kinematics_solver,
-                motor_names=motor_names,
-                initial_guess_current_joints=True,
-            ),
-        ],
-        to_transition=robot_action_observation_to_transition,
-        to_output=transition_to_robot_action,
-    )
-
-    # Policy config (full model is loaded inside build_rollout_context).
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    # Pass the EE kinematic processors via kwargs; the defaults (identity) would
-    # otherwise skip the joint↔EE conversion and the policy would receive the
-    # wrong observation/action space.
-    ctx = build_rollout_context(
-        cfg,
-        signal_handler.shutdown_event,
-        robot_action_processor=robot_ee_to_joints_processor,
-        robot_observation_processor=robot_joints_to_ee_pose_processor,
-    )
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -10,7 +10,7 @@ from lerobot.datasets import LeRobotDataset
 from lerobot.envs.configs import HILSerlProcessorConfig, HILSerlRobotEnvConfig
 from lerobot.policies import SACConfig
 from lerobot.policies.sac.modeling_sac import SACPolicy
-from lerobot.rewards.classifier.modeling_classifier import Classifier
+from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
 from lerobot.rl.buffer import ReplayBuffer
 from lerobot.rl.gym_manipulator import make_robot_env
 from lerobot.robots.so_follower import SO100FollowerConfig
@@ -1,7 +1,7 @@
 import torch

 from lerobot.datasets import LeRobotDataset
-from lerobot.rewards import RewardClassifierConfig, make_reward_model, make_reward_pre_post_processors
+from lerobot.policies import RewardClassifierConfig, make_policy, make_pre_post_processors


 def main():
@@ -22,10 +22,10 @@ def main():
        model_name="microsoft/resnet-18",
    )

-    # Make reward model, preprocessor, and optimizer
-    reward_model = make_reward_model(config, dataset_stats=dataset.meta.stats)
-    optimizer = config.get_optimizer_preset().build(reward_model.parameters())
-    preprocessor, _ = make_reward_pre_post_processors(config, dataset_stats=dataset.meta.stats)
+    # Make policy, preprocessor, and optimizer
+    policy = make_policy(config, ds_meta=dataset.meta)
+    optimizer = config.get_optimizer_preset().build(policy.parameters())
+    preprocessor, _ = make_pre_post_processors(policy_cfg=config, dataset_stats=dataset.meta.stats)

    classifier_id = "<user>/reward_classifier_hil_serl_example"

@@ -42,7 +42,7 @@ def main():
            batch = preprocessor(batch)

            # Forward pass
-            loss, output_dict = reward_model.forward(batch)
+            loss, output_dict = policy.forward(batch)

            # Backward pass and optimization
            optimizer.zero_grad()
@@ -58,8 +58,8 @@ def main():

    print("Training finished!")

-    # You can now save the trained reward model.
-    reward_model.push_to_hub(classifier_id)
+    # You can now save the trained policy.
+    policy.push_to_hub(classifier_id)


 if __name__ == "__main__":
@@ -128,7 +128,7 @@ dataset_viz = ["lerobot[dataset]", "lerobot[viz]"]
 av-dep = ["av>=15.0.0,<16.0.0"]
 pygame-dep = ["pygame>=2.5.1,<2.7.0"]
 placo-dep = ["placo>=0.9.6,<0.9.17"]
-transformers-dep = ["transformers>=5.4.0,<5.6.0"]
+transformers-dep = ["transformers==5.3.0"] # TODO(Steven): https://github.com/huggingface/lerobot/pull/3249
 grpcio-dep = ["grpcio==1.73.1", "protobuf>=6.31.1,<6.32.0"]
 can-dep = ["python-can>=4.2.0,<5.0.0"]
 peft-dep = ["peft>=0.18.0,<1.0.0"]
@@ -289,7 +289,6 @@ lerobot-find-joint-limits="lerobot.scripts.lerobot_find_joint_limits:main"
 lerobot-imgtransform-viz="lerobot.scripts.lerobot_imgtransform_viz:main"
 lerobot-edit-dataset="lerobot.scripts.lerobot_edit_dataset:main"
 lerobot-setup-can="lerobot.scripts.lerobot_setup_can:main"
-lerobot-rollout="lerobot.scripts.lerobot_rollout:main"

 # ---------------- Tool Configurations ----------------
 [tool.setuptools.package-data]
@@ -41,12 +41,8 @@ def cfg_to_group(
            return tag
        return tag[:max_tag_length]

-    if cfg.is_reward_model_training:
-        trainable_tag = f"reward_model:{cfg.reward_model.type}"
-    else:
-        trainable_tag = f"policy:{cfg.policy.type}"
    lst = [
-        trainable_tag,
+        f"policy:{cfg.policy.type}",
        f"seed:{cfg.seed}",
    ]
    if cfg.dataset is not None:
@@ -21,7 +21,6 @@ are intentionally NOT re-exported here to avoid circular dependencies
 Import them directly: ``from lerobot.configs.train import TrainPipelineConfig``
 """

-from .dataset import DatasetRecordConfig
 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
 from .types import (
@@ -40,7 +39,6 @@ __all__ = [
    "PolicyFeature",
    "RTCAttentionSchedule",
    # Config classes
-    "DatasetRecordConfig",
    "DatasetConfig",
    "EvalConfig",
    "PeftConfig",
@@ -1,80 +0,0 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Shared dataset recording configuration used by both ``lerobot-record`` and ``lerobot-rollout``."""
-
-from dataclasses import dataclass
-from datetime import datetime
-from pathlib import Path
-
-
-@dataclass
-class DatasetRecordConfig:
-    # Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test`).
-    repo_id: str = ""
-    # A short but accurate description of the task performed during the recording (e.g. "Pick the Lego block and drop it in the box on the right.")
-    single_task: str = ""
-    # Root directory where the dataset will be stored (e.g. 'dataset/path'). If None, defaults to $HF_LEROBOT_HOME/repo_id.
-    root: str | Path | None = None
-    # Limit the frames per second.
-    fps: int = 30
-    # Number of seconds for data recording for each episode.
-    episode_time_s: int | float = 60
-    # Number of seconds for resetting the environment after each episode.
-    reset_time_s: int | float = 60
-    # Number of episodes to record.
-    num_episodes: int = 50
-    # Encode frames in the dataset into video
-    video: bool = True
-    # Upload dataset to Hugging Face hub.
-    push_to_hub: bool = True
-    # Upload on private repository on the Hugging Face hub.
-    private: bool = False
-    # Add tags to your dataset on the hub.
-    tags: list[str] | None = None
-    # Number of subprocesses handling the saving of frames as PNG. Set to 0 to use threads only;
-    # set to ≥1 to use subprocesses, each using threads to write images. The best number of processes
-    # and threads depends on your system. We recommend 4 threads per camera with 0 processes.
-    # If fps is unstable, adjust the thread count. If still unstable, try using 1 or more subprocesses.
-    num_image_writer_processes: int = 0
-    # Number of threads writing the frames as png images on disk, per camera.
-    # Too many threads might cause unstable teleoperation fps due to main thread being blocked.
-    # Not enough threads might cause low camera fps.
-    num_image_writer_threads_per_camera: int = 4
-    # Number of episodes to record before batch encoding videos
-    # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
-    video_encoding_batch_size: int = 1
-    # Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto',
-    # or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'.
-    # Use 'auto' to auto-detect the best available hardware encoder.
-    vcodec: str = "libsvtav1"
-    # Enable streaming video encoding: encode frames in real-time during capture instead
-    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
-    streaming_encoding: bool = False
-    # Maximum number of frames to buffer per camera when using streaming encoding.
-    # ~1s buffer at 30fps. Provides backpressure if the encoder can't keep up.
-    encoder_queue_maxsize: int = 30
-    # Number of threads per encoder instance. None = auto (codec default).
-    # Lower values reduce CPU usage, maps to 'lp' (via svtav1-params) for libsvtav1 and 'threads' for h264/hevc..
-    encoder_threads: int | None = None
-
-    def stamp_repo_id(self) -> None:
-        """Append a date-time tag to ``repo_id`` so each recording session gets a unique name.
-
-        Must be called explicitly at dataset *creation* time — not on resume,
-        where the existing ``repo_id`` (already stamped) must be preserved.
-        """
-        if self.repo_id:
-            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-            self.repo_id = f"{self.repo_id}_{timestamp}"
@@ -1,163 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import abc
-import builtins
-import json
-import logging
-import os
-import tempfile
-from dataclasses import dataclass, field
-from pathlib import Path
-from typing import Any, TypeVar
-
-import draccus
-from huggingface_hub import hf_hub_download
-from huggingface_hub.constants import CONFIG_NAME
-from huggingface_hub.errors import HfHubHTTPError
-
-from lerobot.configs.types import PolicyFeature
-from lerobot.optim.optimizers import OptimizerConfig
-from lerobot.optim.schedulers import LRSchedulerConfig
-from lerobot.utils.device_utils import auto_select_torch_device, is_torch_device_available
-from lerobot.utils.hub import HubMixin
-
-T = TypeVar("T", bound="RewardModelConfig")
-logger = logging.getLogger(__name__)
-
-
-@dataclass
-class RewardModelConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC):
-    """Base configuration for reward models.
-
-    Args:
-    input_features: A dictionary defining the PolicyFeature of the input data for the reward. The key represents
-        the input data name, and the value is PolicyFeature, which consists of FeatureType and shape attributes.
-    output_features: A dictionary defining the PolicyFeature of the output data for the reward. The key represents
-        the output data name, and the value is PolicyFeature, which consists of FeatureType and shape attributes.
-    """
-
-    # Reuses PolicyFeature
-    input_features: dict[str, PolicyFeature] = field(default_factory=dict)
-    output_features: dict[str, PolicyFeature] = field(default_factory=dict)
-
-    device: str | None = None
-
-    pretrained_path: str | None = None
-
-    push_to_hub: bool = False
-    repo_id: str | None = None
-
-    # Hub metadata
-    license: str | None = None
-    tags: list[str] | None = None
-    private: bool | None = None
-
-    def __post_init__(self) -> None:
-        if not self.device or not is_torch_device_available(self.device):
-            auto_device = auto_select_torch_device()
-            logger.warning(f"Device '{self.device}' is not available. Switching to '{auto_device}'.")
-            self.device = auto_device.type
-
-    @property
-    def type(self) -> str:
-        choice_name = self.get_choice_name(self.__class__)
-        if not isinstance(choice_name, str):
-            raise TypeError(f"Expected string from get_choice_name, got {type(choice_name)}")
-        return choice_name
-
-    @property
-    def observation_delta_indices(self) -> list | None:  # type: ignore[type-arg]
-        return None
-
-    @property
-    def action_delta_indices(self) -> list | None:  # type: ignore[type-arg]
-        return None
-
-    @property
-    def reward_delta_indices(self) -> list | None:  # type: ignore[type-arg]
-        return None
-
-    @abc.abstractmethod
-    def get_optimizer_preset(self) -> OptimizerConfig:
-        raise NotImplementedError
-
-    def get_scheduler_preset(self) -> LRSchedulerConfig | None:
-        return None
-
-    def validate_features(self) -> None:
-        pass
-
-    def _save_pretrained(self, save_directory: Path) -> None:
-        with open(save_directory / CONFIG_NAME, "w") as f, draccus.config_type("json"):
-            draccus.dump(self, f, indent=4)
-
-    @classmethod
-    def from_pretrained(
-        cls: builtins.type[T],
-        pretrained_name_or_path: str | Path,
-        *,
-        force_download: bool = False,
-        resume_download: bool | None = None,
-        proxies: dict[Any, Any] | None = None,
-        token: str | bool | None = None,
-        cache_dir: str | Path | None = None,
-        local_files_only: bool = False,
-        revision: str | None = None,
-        **reward_kwargs: Any,
-    ) -> T:
-        model_id = str(pretrained_name_or_path)
-        config_file: str | None = None
-        if Path(model_id).is_dir():
-            if CONFIG_NAME in os.listdir(model_id):
-                config_file = os.path.join(model_id, CONFIG_NAME)
-            else:
-                logger.error(f"{CONFIG_NAME} not found in {Path(model_id).resolve()}")
-        else:
-            try:
-                config_file = hf_hub_download(
-                    repo_id=model_id,
-                    filename=CONFIG_NAME,
-                    revision=revision,
-                    cache_dir=cache_dir,
-                    force_download=force_download,
-                    proxies=proxies,
-                    resume_download=resume_download,
-                    token=token,
-                    local_files_only=local_files_only,
-                )
-            except HfHubHTTPError as e:
-                raise FileNotFoundError(
-                    f"{CONFIG_NAME} not found on the HuggingFace Hub in {model_id}"
-                ) from e
-
-        if config_file is None:
-            raise FileNotFoundError(f"{CONFIG_NAME} not found in {model_id}")
-
-        # HACK: Parse the original config to get the config subclass, so that we can
-        # apply cli overrides.
-        with draccus.config_type("json"):
-            orig_config = draccus.parse(cls, config_file, args=[])
-
-        with open(config_file) as f:
-            config = json.load(f)
-
-        config.pop("type", None)
-        with tempfile.NamedTemporaryFile("w+", delete=False, suffix=".json") as f:
-            json.dump(config, f)
-            config_file = f.name
-
-        cli_overrides = reward_kwargs.pop("cli_overrides", [])
-        with draccus.config_type("json"):
-            return draccus.parse(orig_config.__class__, config_file, args=cli_overrides)
@@ -13,9 +13,7 @@
 # limitations under the License.
 import builtins
 import datetime as dt
-import json
 import os
-import tempfile
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
@@ -28,57 +26,18 @@ from lerobot import envs
 from lerobot.configs import parser
 from lerobot.optim import LRSchedulerConfig, OptimizerConfig
 from lerobot.utils.hub import HubMixin
-from lerobot.utils.sample_weighting import SampleWeightingConfig

 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
-from .rewards import RewardModelConfig

 TRAIN_CONFIG_NAME = "train_config.json"


-def _migrate_legacy_rabc_fields(config: dict[str, Any]) -> dict[str, Any] | None:
-    """Return migrated payload for legacy RA-BC fields, or None when no migration is needed."""
-    legacy_fields = (
-        "use_rabc",
-        "rabc_progress_path",
-        "rabc_kappa",
-        "rabc_epsilon",
-        "rabc_head_mode",
-    )
-    if not any(key in config for key in legacy_fields):
-        return None
-
-    migrated_config = dict(config)
-    use_rabc = bool(migrated_config.pop("use_rabc", False))
-    rabc_progress_path = migrated_config.pop("rabc_progress_path", None)
-    rabc_kappa = migrated_config.pop("rabc_kappa", None)
-    rabc_epsilon = migrated_config.pop("rabc_epsilon", None)
-    rabc_head_mode = migrated_config.pop("rabc_head_mode", None)
-
-    # New configs may already define sample_weighting explicitly. In that case,
-    # legacy fields are ignored after being stripped from the payload.
-    if migrated_config.get("sample_weighting") is None and use_rabc:
-        sample_weighting: dict[str, Any] = {"type": "rabc"}
-        if rabc_progress_path is not None:
-            sample_weighting["progress_path"] = rabc_progress_path
-        if rabc_kappa is not None:
-            sample_weighting["kappa"] = rabc_kappa
-        if rabc_epsilon is not None:
-            sample_weighting["epsilon"] = rabc_epsilon
-        if rabc_head_mode is not None:
-            sample_weighting["head_mode"] = rabc_head_mode
-        migrated_config["sample_weighting"] = sample_weighting
-
-    return migrated_config
-
-
@dataclass
 class TrainPipelineConfig(HubMixin):
    dataset: DatasetConfig
    env: envs.EnvConfig | None = None
    policy: PreTrainedConfig | None = None
-    reward_model: RewardModelConfig | None = None
    # Set `dir` to where you would like to save all of the run outputs. If you run another training session
    # with the same value for `dir` its contents will be overwritten unless you set `resume` to true.
    output_dir: Path | None = None
@@ -113,41 +72,27 @@ class TrainPipelineConfig(HubMixin):
    wandb: WandBConfig = field(default_factory=WandBConfig)
    peft: PeftConfig | None = None

-    # Sample weighting configuration (e.g., for RA-BC training)
-    sample_weighting: SampleWeightingConfig | None = None
+    # RA-BC (Reward-Aligned Behavior Cloning) parameters
+    use_rabc: bool = False  # Enable reward-weighted training
+    rabc_progress_path: str | None = None  # Path to precomputed SARM progress parquet file
+    rabc_kappa: float = 0.01  # Hard threshold for high-quality samples
+    rabc_epsilon: float = 1e-6  # Small constant for numerical stability
+    rabc_head_mode: str | None = "sparse"  # For dual-head models: "sparse" or "dense"

    # Rename map for the observation to override the image and state keys
    rename_map: dict[str, str] = field(default_factory=dict)
    checkpoint_path: Path | None = field(init=False, default=None)

-    @property
-    def is_reward_model_training(self) -> bool:
-        """True when the config targets a reward model rather than a policy."""
-        return self.reward_model is not None
-
-    @property
-    def trainable_config(self) -> PreTrainedConfig | RewardModelConfig:
-        """Return whichever config (policy or reward_model) is active."""
-        if self.is_reward_model_training:
-            return self.reward_model  # type: ignore[return-value]
-        return self.policy  # type: ignore[return-value]
-
    def validate(self) -> None:
        # HACK: We parse again the cli args here to get the pretrained paths if there was some.
        policy_path = parser.get_path_arg("policy")
-        reward_model_path = parser.get_path_arg("reward_model")
-
-        if reward_model_path:
-            cli_overrides = parser.get_cli_overrides("reward_model")
-            self.reward_model = RewardModelConfig.from_pretrained(
-                reward_model_path, cli_overrides=cli_overrides
-            )
-            self.reward_model.pretrained_path = str(Path(reward_model_path))
-        elif policy_path:
+        if policy_path:
+            # Only load the policy config
            cli_overrides = parser.get_cli_overrides("policy")
            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
            self.policy.pretrained_path = Path(policy_path)
        elif self.resume:
+            # The entire train config is already loaded, we just need to get the checkpoint dir
            config_path = parser.parse_arg("config_path")
            if not config_path:
                raise ValueError(
@@ -163,22 +108,18 @@ class TrainPipelineConfig(HubMixin):
            policy_dir = Path(config_path).parent
            if self.policy is not None:
                self.policy.pretrained_path = policy_dir
-            if self.reward_model is not None:
-                self.reward_model.pretrained_path = str(policy_dir)
            self.checkpoint_path = policy_dir.parent

-        if self.policy is None and self.reward_model is None:
+        if self.policy is None:
            raise ValueError(
-                "Neither policy nor reward_model is configured. "
-                "Please specify one with `--policy.path` or `--reward_model.path`."
+                "Policy is not configured. Please specify a pretrained policy with `--policy.path`."
            )

-        active_cfg = self.trainable_config
        if not self.job_name:
            if self.env is None:
-                self.job_name = f"{active_cfg.type}"
+                self.job_name = f"{self.policy.type}"
            else:
-                self.job_name = f"{self.env.type}_{active_cfg.type}"
+                self.job_name = f"{self.env.type}_{self.policy.type}"

        if not self.resume and isinstance(self.output_dir, Path) and self.output_dir.is_dir():
            raise FileExistsError(
@@ -196,16 +137,26 @@ class TrainPipelineConfig(HubMixin):
        if not self.use_policy_training_preset and (self.optimizer is None or self.scheduler is None):
            raise ValueError("Optimizer and Scheduler must be set when the policy presets are not used.")
        elif self.use_policy_training_preset and not self.resume:
-            self.optimizer = active_cfg.get_optimizer_preset()
-            self.scheduler = active_cfg.get_scheduler_preset()
+            self.optimizer = self.policy.get_optimizer_preset()
+            self.scheduler = self.policy.get_scheduler_preset()

-        if hasattr(active_cfg, "push_to_hub") and active_cfg.push_to_hub and not active_cfg.repo_id:
-            raise ValueError("'repo_id' argument missing. Please specify it to push the model to the hub.")
+        if self.policy.push_to_hub and not self.policy.repo_id:
+            raise ValueError(
+                "'policy.repo_id' argument missing. Please specify it to push the model to the hub."
+            )
+
+        if self.use_rabc and not self.rabc_progress_path:
+            # Auto-detect from dataset path
+            repo_id = self.dataset.repo_id
+            if self.dataset.root:
+                self.rabc_progress_path = str(Path(self.dataset.root) / "sarm_progress.parquet")
+            else:
+                self.rabc_progress_path = f"hf://datasets/{repo_id}/sarm_progress.parquet"

    @classmethod
    def __get_path_fields__(cls) -> list[str]:
-        """Keys for draccus pretrained-path loading."""
-        return ["policy", "reward_model"]
+        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
+        return ["policy"]

    def to_dict(self) -> dict[str, Any]:
        return draccus.encode(self)  # type: ignore[no-any-return]  # because of the third-party library draccus uses Any as the return type
@@ -256,15 +207,6 @@ class TrainPipelineConfig(HubMixin):
                ) from e

        cli_args = kwargs.pop("cli_args", [])
-        if config_file is not None:
-            with open(config_file) as f:
-                config = json.load(f)
-            migrated_config = _migrate_legacy_rabc_fields(config)
-            if migrated_config is not None:
-                with tempfile.NamedTemporaryFile("w+", delete=False, suffix=".json") as f:
-                    json.dump(migrated_config, f)
-                    config_file = f.name
-
        with draccus.config_type("json"):
            return draccus.parse(cls, config_file, args=cli_args)

@@ -97,8 +97,8 @@ def update_data_df(df, src_meta, dst_meta):
        pd.DataFrame: Updated DataFrame with adjusted indices.
    """

-    df["episode_index"] = df["episode_index"] + dst_meta.info.total_episodes
-    df["index"] = df["index"] + dst_meta.info.total_frames
+    df["episode_index"] = df["episode_index"] + dst_meta.info["total_episodes"]
+    df["index"] = df["index"] + dst_meta.info["total_frames"]

    src_task_names = src_meta.tasks.index.take(df["task_index"].to_numpy())
    df["task_index"] = dst_meta.tasks.loc[src_task_names, "task_index"].to_numpy()
@@ -225,9 +225,9 @@ def update_meta_data(
        # Clean up temporary columns
        df = df.drop(columns=["_orig_chunk", "_orig_file"])

-    df["dataset_from_index"] = df["dataset_from_index"] + dst_meta.info.total_frames
-    df["dataset_to_index"] = df["dataset_to_index"] + dst_meta.info.total_frames
-    df["episode_index"] = df["episode_index"] + dst_meta.info.total_episodes
+    df["dataset_from_index"] = df["dataset_from_index"] + dst_meta.info["total_frames"]
+    df["dataset_to_index"] = df["dataset_to_index"] + dst_meta.info["total_frames"]
+    df["episode_index"] = df["episode_index"] + dst_meta.info["total_episodes"]

    return df

@@ -237,8 +237,8 @@ def aggregate_datasets(
    aggr_repo_id: str,
    roots: list[Path] | None = None,
    aggr_root: Path | None = None,
-    data_files_size_in_mb: int | None = None,
-    video_files_size_in_mb: int | None = None,
+    data_files_size_in_mb: float | None = None,
+    video_files_size_in_mb: float | None = None,
    chunk_size: int | None = None,
 ):
    """Aggregates multiple LeRobot datasets into a single unified dataset.
@@ -313,8 +313,8 @@ def aggregate_datasets(
        # to avoid interference between different source datasets
        data_idx.pop("src_to_dst", None)

-        dst_meta.info.total_episodes += src_meta.total_episodes
-        dst_meta.info.total_frames += src_meta.total_frames
+        dst_meta.info["total_episodes"] += src_meta.total_episodes
+        dst_meta.info["total_frames"] += src_meta.total_frames

    finalize_aggregation(dst_meta, all_metadata)
    logging.info("Aggregation complete.")
@@ -640,10 +640,14 @@ def finalize_aggregation(aggr_meta, all_metadata):
    write_tasks(aggr_meta.tasks, aggr_meta.root)

    logging.info("write info")
-    aggr_meta.info.total_tasks = len(aggr_meta.tasks)
-    aggr_meta.info.total_episodes = sum(m.total_episodes for m in all_metadata)
-    aggr_meta.info.total_frames = sum(m.total_frames for m in all_metadata)
-    aggr_meta.info.splits = {"train": f"0:{sum(m.total_episodes for m in all_metadata)}"}
+    aggr_meta.info.update(
+        {
+            "total_tasks": len(aggr_meta.tasks),
+            "total_episodes": sum(m.total_episodes for m in all_metadata),
+            "total_frames": sum(m.total_frames for m in all_metadata),
+            "splits": {"train": f"0:{sum(m.total_episodes for m in all_metadata)}"},
+        }
+    )
    write_info(aggr_meta.info, aggr_meta.root)

    logging.info("write stats")
@@ -37,11 +37,13 @@ from .io_utils import (
    load_subtasks,
    load_tasks,
    write_info,
+    write_json,
    write_stats,
    write_tasks,
 )
 from .utils import (
    DEFAULT_EPISODES_PATH,
+    INFO_PATH,
    check_version_compatibility,
    get_safe_version,
    has_legacy_hub_download_metadata,
@@ -226,7 +228,7 @@ class LeRobotDatasetMetadata:
    @property
    def _version(self) -> packaging.version.Version:
        """Codebase version used to create this dataset."""
-        return packaging.version.parse(self.info.codebase_version)
+        return packaging.version.parse(self.info["codebase_version"])

    def get_data_file_path(self, ep_index: int) -> Path:
        """Return the relative parquet file path for the given episode index.
@@ -281,27 +283,27 @@ class LeRobotDatasetMetadata:
    @property
    def data_path(self) -> str:
        """Formattable string for the parquet files."""
-        return self.info.data_path
+        return self.info["data_path"]

    @property
    def video_path(self) -> str | None:
        """Formattable string for the video files."""
-        return self.info.video_path
+        return self.info["video_path"]

    @property
    def robot_type(self) -> str | None:
        """Robot type used in recording this dataset."""
-        return self.info.robot_type
+        return self.info["robot_type"]

    @property
    def fps(self) -> int:
        """Frames per second used during data collection."""
-        return self.info.fps
+        return self.info["fps"]

    @property
    def features(self) -> dict[str, dict]:
        """All features contained in the dataset."""
-        return self.info.features
+        return self.info["features"]

    @property
    def image_keys(self) -> list[str]:
@@ -331,32 +333,32 @@ class LeRobotDatasetMetadata:
    @property
    def total_episodes(self) -> int:
        """Total number of episodes available."""
-        return self.info.total_episodes
+        return self.info["total_episodes"]

    @property
    def total_frames(self) -> int:
        """Total number of frames saved in this dataset."""
-        return self.info.total_frames
+        return self.info["total_frames"]

    @property
    def total_tasks(self) -> int:
        """Total number of different tasks performed in this dataset."""
-        return self.info.total_tasks
+        return self.info["total_tasks"]

    @property
    def chunks_size(self) -> int:
        """Max number of files per chunk."""
-        return self.info.chunks_size
+        return self.info["chunks_size"]

    @property
    def data_files_size_in_mb(self) -> int:
        """Max size of data file in mega bytes."""
-        return self.info.data_files_size_in_mb
+        return self.info["data_files_size_in_mb"]

    @property
    def video_files_size_in_mb(self) -> int:
        """Max size of video file in mega bytes."""
-        return self.info.video_files_size_in_mb
+        return self.info["video_files_size_in_mb"]

    def get_task_index(self, task: str) -> int | None:
        """
@@ -500,10 +502,10 @@ class LeRobotDatasetMetadata:
        self._save_episode_metadata(episode_dict)

        # Update info
-        self.info.total_episodes += 1
-        self.info.total_frames += episode_length
-        self.info.total_tasks = len(self.tasks)
-        self.info.splits = {"train": f"0:{self.info.total_episodes}"}
+        self.info["total_episodes"] += 1
+        self.info["total_frames"] += episode_length
+        self.info["total_tasks"] = len(self.tasks)
+        self.info["splits"] = {"train": f"0:{self.info['total_episodes']}"}

        write_info(self.info, self.root)

@@ -522,7 +524,7 @@ class LeRobotDatasetMetadata:
        for key in video_keys:
            if not self.features[key].get("info", None):
                video_path = self.root / self.video_path.format(video_key=key, chunk_index=0, file_index=0)
-                self.info.features[key]["info"] = get_video_info(video_path)
+                self.info["features"][key]["info"] = get_video_info(video_path)

    def update_chunk_settings(
        self,
@@ -544,17 +546,17 @@ class LeRobotDatasetMetadata:
        if chunks_size is not None:
            if chunks_size <= 0:
                raise ValueError(f"chunks_size must be positive, got {chunks_size}")
-            self.info.chunks_size = chunks_size
+            self.info["chunks_size"] = chunks_size

        if data_files_size_in_mb is not None:
            if data_files_size_in_mb <= 0:
                raise ValueError(f"data_files_size_in_mb must be positive, got {data_files_size_in_mb}")
-            self.info.data_files_size_in_mb = data_files_size_in_mb
+            self.info["data_files_size_in_mb"] = data_files_size_in_mb

        if video_files_size_in_mb is not None:
            if video_files_size_in_mb <= 0:
                raise ValueError(f"video_files_size_in_mb must be positive, got {video_files_size_in_mb}")
-            self.info.video_files_size_in_mb = video_files_size_in_mb
+            self.info["video_files_size_in_mb"] = video_files_size_in_mb

        # Update the info file on disk
        write_info(self.info, self.root)
@@ -651,7 +653,7 @@ class LeRobotDatasetMetadata:
                f"Features contain video keys {obj.video_keys}, but 'use_videos' is set to False. "
                "Either remove video features from the features dict, or set 'use_videos=True'."
            )
-        write_info(obj.info, obj.root)
+        write_json(obj.info, obj.root / INFO_PATH)
        obj.revision = None
        obj._pq_writer = None
        obj.latest_episode = None
@@ -897,10 +897,14 @@ def _copy_and_reindex_episodes_metadata(

    dst_meta.finalize()

-    dst_meta.info.total_episodes = len(episode_mapping)
-    dst_meta.info.total_frames = total_frames
-    dst_meta.info.total_tasks = len(dst_meta.tasks) if dst_meta.tasks is not None else 0
-    dst_meta.info.splits = {"train": f"0:{len(episode_mapping)}"}
+    dst_meta.info.update(
+        {
+            "total_episodes": len(episode_mapping),
+            "total_frames": total_frames,
+            "total_tasks": len(dst_meta.tasks) if dst_meta.tasks is not None else 0,
+            "splits": {"train": f"0:{len(episode_mapping)}"},
+        }
+    )
    write_info(dst_meta.info, dst_meta.root)

    if not all_stats:
@@ -1065,20 +1069,21 @@ def _copy_episodes_metadata_and_stats(
    if episodes_dir.exists():
        shutil.copytree(episodes_dir, dst_episodes_dir, dirs_exist_ok=True)

-    dst_meta.info.total_episodes = src_dataset.meta.total_episodes
-    dst_meta.info.total_frames = src_dataset.meta.total_frames
-    dst_meta.info.total_tasks = src_dataset.meta.total_tasks
-    # Preserve original splits if available, otherwise create default
-    dst_meta.info.splits = (
-        src_dataset.meta.info.splits
-        if src_dataset.meta.info.splits
-        else {"train": f"0:{src_dataset.meta.total_episodes}"}
+    dst_meta.info.update(
+        {
+            "total_episodes": src_dataset.meta.total_episodes,
+            "total_frames": src_dataset.meta.total_frames,
+            "total_tasks": src_dataset.meta.total_tasks,
+            "splits": src_dataset.meta.info.get("splits", {"train": f"0:{src_dataset.meta.total_episodes}"}),
+        }
    )

    if dst_meta.video_keys and src_dataset.meta.video_keys:
        for key in dst_meta.video_keys:
            if key in src_dataset.meta.features:
-                dst_meta.info.features[key]["info"] = src_dataset.meta.info.features[key].get("info", {})
+                dst_meta.info["features"][key]["info"] = src_dataset.meta.info["features"][key].get(
+                    "info", {}
+                )

    write_info(dst_meta.info, dst_meta.root)

@@ -1520,7 +1525,7 @@ def modify_tasks(
    write_tasks(new_task_df, root)

    # Update info.json
-    dataset.meta.info.total_tasks = len(unique_tasks)
+    dataset.meta.info["total_tasks"] = len(unique_tasks)
    write_info(dataset.meta.info, root)

    # Reload metadata to reflect changes
@@ -1853,10 +1858,10 @@ def convert_image_to_video_dataset(
        episodes_df.to_parquet(episodes_path, index=False)

        # Update metadata info
-        new_meta.info.total_episodes = len(episode_indices)
-        new_meta.info.total_frames = sum(ep["length"] for ep in all_episode_metadata.values())
-        new_meta.info.total_tasks = dataset.meta.total_tasks
-        new_meta.info.splits = {"train": f"0:{len(episode_indices)}"}
+        new_meta.info["total_episodes"] = len(episode_indices)
+        new_meta.info["total_frames"] = sum(ep["length"] for ep in all_episode_metadata.values())
+        new_meta.info["total_tasks"] = dataset.meta.total_tasks
+        new_meta.info["splits"] = {"train": f"0:{len(episode_indices)}"}

        # Update video info for all image keys (now videos)
        # We need to manually set video info since update_video_info() checks video_keys first
@@ -1865,7 +1870,7 @@ def convert_image_to_video_dataset(
                video_path = new_meta.root / new_meta.video_path.format(
                    video_key=img_key, chunk_index=0, file_index=0
                )
-                new_meta.info.features[img_key]["info"] = get_video_info(video_path)
+                new_meta.info["features"][img_key]["info"] = get_video_info(video_path)

        write_info(new_meta.info, new_meta.root)

@@ -19,7 +19,6 @@ from pprint import pformat
 import torch

 from lerobot.configs import PreTrainedConfig
-from lerobot.configs.rewards import RewardModelConfig
 from lerobot.configs.train import TrainPipelineConfig
 from lerobot.transforms import ImageTransforms
 from lerobot.utils.constants import ACTION, IMAGENET_STATS, OBS_PREFIX, REWARD
@@ -31,14 +30,12 @@ from .streaming_dataset import StreamingLeRobotDataset


 def resolve_delta_timestamps(
-    cfg: PreTrainedConfig | RewardModelConfig, ds_meta: LeRobotDatasetMetadata
+    cfg: PreTrainedConfig, ds_meta: LeRobotDatasetMetadata
 ) -> dict[str, list] | None:
-    """Resolves delta_timestamps by reading from the 'delta_indices' properties of the config.
+    """Resolves delta_timestamps by reading from the 'delta_indices' properties of the PreTrainedConfig.

    Args:
-        cfg (PreTrainedConfig | RewardModelConfig): The config to read delta_indices from. Both
-            ``PreTrainedConfig`` and concrete ``RewardModelConfig`` subclasses expose the
-            ``{observation,action,reward}_delta_indices`` properties used below.
+        cfg (PreTrainedConfig): The PreTrainedConfig to read delta_indices from.
        ds_meta (LeRobotDatasetMetadata): The dataset from which features and fps are used to build
            delta_timestamps against.

@@ -85,7 +82,7 @@ def make_dataset(cfg: TrainPipelineConfig) -> LeRobotDataset | MultiLeRobotDatas
        ds_meta = LeRobotDatasetMetadata(
            cfg.dataset.repo_id, root=cfg.dataset.root, revision=cfg.dataset.revision
        )
-        delta_timestamps = resolve_delta_timestamps(cfg.trainable_config, ds_meta)
+        delta_timestamps = resolve_delta_timestamps(cfg.policy, ds_meta)
        if not cfg.dataset.streaming:
            dataset = LeRobotDataset(
                cfg.dataset.repo_id,
@@ -28,7 +28,6 @@ from .utils import (
    DEFAULT_DATA_PATH,
    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_PATH,
-    DatasetInfo,
 )


@@ -79,8 +78,8 @@ def create_empty_dataset_info(
    chunks_size: int | None = None,
    data_files_size_in_mb: int | None = None,
    video_files_size_in_mb: int | None = None,
-) -> DatasetInfo:
-    """Create a template ``DatasetInfo`` object for a new dataset's ``meta/info.json``.
+) -> dict:
+    """Create a template dictionary for a new dataset's `info.json`.

    Args:
        codebase_version (str): The version of the LeRobot codebase.
@@ -88,24 +87,25 @@ def create_empty_dataset_info(
        features (dict): The LeRobot features dictionary for the dataset.
        use_videos (bool): Whether the dataset will store videos.
        robot_type (str | None): The type of robot used, if any.
-        chunks_size (int | None): Max files per chunk directory. Defaults to ``DEFAULT_CHUNK_SIZE``.
-        data_files_size_in_mb (int | None): Max parquet file size in MB. Defaults to ``DEFAULT_DATA_FILE_SIZE_IN_MB``.
-        video_files_size_in_mb (int | None): Max video file size in MB. Defaults to ``DEFAULT_VIDEO_FILE_SIZE_IN_MB``.

    Returns:
-        DatasetInfo: A typed dataset information object with initial metadata.
+        dict: A dictionary with the initial dataset metadata.
    """
-    return DatasetInfo(
-        codebase_version=codebase_version,
-        fps=fps,
-        features=features,
-        robot_type=robot_type,
-        chunks_size=chunks_size or DEFAULT_CHUNK_SIZE,
-        data_files_size_in_mb=data_files_size_in_mb or DEFAULT_DATA_FILE_SIZE_IN_MB,
-        video_files_size_in_mb=video_files_size_in_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB,
-        data_path=DEFAULT_DATA_PATH,
-        video_path=DEFAULT_VIDEO_PATH if use_videos else None,
-    )
+    return {
+        "codebase_version": codebase_version,
+        "robot_type": robot_type,
+        "total_episodes": 0,
+        "total_frames": 0,
+        "total_tasks": 0,
+        "chunks_size": chunks_size or DEFAULT_CHUNK_SIZE,
+        "data_files_size_in_mb": data_files_size_in_mb or DEFAULT_DATA_FILE_SIZE_IN_MB,
+        "video_files_size_in_mb": video_files_size_in_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+        "fps": fps,
+        "splits": {},
+        "data_path": DEFAULT_DATA_PATH,
+        "video_path": DEFAULT_VIDEO_PATH if use_videos else None,
+        "features": features,
+    }


 def check_delta_timestamps(
@@ -39,7 +39,6 @@ from .utils import (
    EPISODES_DIR,
    INFO_PATH,
    STATS_PATH,
-    DatasetInfo,
    serialize_dict,
 )

@@ -116,21 +115,25 @@ def embed_images(dataset: datasets.Dataset) -> datasets.Dataset:
    return dataset


-def write_info(info: DatasetInfo, local_dir: Path) -> None:
-    write_json(info.to_dict(), local_dir / INFO_PATH)
+def write_info(info: dict, local_dir: Path) -> None:
+    write_json(info, local_dir / INFO_PATH)


-def load_info(local_dir: Path) -> DatasetInfo:
+def load_info(local_dir: Path) -> dict:
    """Load dataset info metadata from its standard file path.

+    Also converts shape lists to tuples for consistency.
+
    Args:
        local_dir (Path): The root directory of the dataset.

    Returns:
-        DatasetInfo: The typed dataset information object.
+        dict: The dataset information dictionary.
    """
-    raw = load_json(local_dir / INFO_PATH)
-    return DatasetInfo.from_dict(raw)
+    info = load_json(local_dir / INFO_PATH)
+    for ft in info["features"].values():
+        ft["shape"] = tuple(ft["shape"])
+    return info


 def write_stats(stats: dict, local_dir: Path) -> None:
@@ -630,8 +630,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
        encoder_threads: int | None = None,
-        video_files_size_in_mb: int | None = None,
-        data_files_size_in_mb: int | None = None,
    ) -> "LeRobotDataset":
        """Create a new LeRobotDataset from scratch for recording data.

@@ -679,8 +677,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
            root=root,
            use_videos=use_videos,
            metadata_buffer_size=metadata_buffer_size,
-            video_files_size_in_mb=video_files_size_in_mb,
-            data_files_size_in_mb=data_files_size_in_mb,
        )
        obj.repo_id = obj.meta.repo_id
        obj._requested_root = obj.meta.root
@@ -123,7 +123,7 @@ class MultiLeRobotDataset(torch.utils.data.Dataset):

        NOTE: Fow now, this relies on a check in __init__ to make sure all sub-datasets have the same info.
        """
-        return self._datasets[0].meta.info.fps
+        return self._datasets[0].meta.info["fps"]

    @property
    def video(self) -> bool:
@@ -133,7 +133,7 @@ class MultiLeRobotDataset(torch.utils.data.Dataset):

        NOTE: Fow now, this relies on a check in __init__ to make sure all sub-datasets have the same info.
        """
-        return len(self._datasets[0].meta.video_keys) > 0
+        return self._datasets[0].meta.info.get("video", False)

    @property
    def features(self) -> datasets.Features:
@@ -434,7 +434,7 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):

    def _make_padding_camera_frame(self, camera_key: str):
        """Variable-shape padding frame for given camera keys, given in (H, W, C)"""
-        return torch.zeros(self.meta.info.features[camera_key]["shape"]).permute(-1, 0, 1)
+        return torch.zeros(self.meta.info["features"][camera_key]["shape"]).permute(-1, 0, 1)

    def _get_video_frame_padding_mask(
        self,
@@ -14,11 +14,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import contextlib
-import dataclasses
 import importlib.resources
 import json
 import logging
-from dataclasses import dataclass, field
 from pathlib import Path

 import datasets
@@ -72,9 +70,6 @@ class ForwardCompatibilityError(CompatibilityError):
        super().__init__(message)


-logger = logging.getLogger(__name__)
-
-
 DEFAULT_CHUNK_SIZE = 1000  # Max number of files per chunk
 DEFAULT_DATA_FILE_SIZE_IN_MB = 100  # Max size per file
 DEFAULT_VIDEO_FILE_SIZE_IN_MB = 200  # Max size per file
@@ -99,123 +94,6 @@ LEGACY_EPISODES_STATS_PATH = "meta/episodes_stats.jsonl"
 LEGACY_TASKS_PATH = "meta/tasks.jsonl"


-@dataclass
-class DatasetInfo:
-    """Typed representation of the ``meta/info.json`` file for a LeRobot dataset.
-
-    Replaces the previously untyped ``dict`` returned by ``load_info()`` and
-    created by ``create_empty_dataset_info()``.  Using a dataclass provides
-    explicit field definitions, IDE auto-completion, and validation at
-    construction time.
-    """
-
-    codebase_version: str
-    fps: int
-    features: dict[str, dict]
-
-    # Episode / frame counters — start at zero for new datasets
-    total_episodes: int = 0
-    total_frames: int = 0
-    total_tasks: int = 0
-
-    # Storage settings
-    chunks_size: int = field(default=DEFAULT_CHUNK_SIZE)
-    data_files_size_in_mb: int = field(default=DEFAULT_DATA_FILE_SIZE_IN_MB)
-    video_files_size_in_mb: int = field(default=DEFAULT_VIDEO_FILE_SIZE_IN_MB)
-
-    # File path templates
-    data_path: str = field(default=DEFAULT_DATA_PATH)
-    video_path: str | None = field(default=DEFAULT_VIDEO_PATH)
-
-    # Optional metadata
-    robot_type: str | None = None
-    splits: dict[str, str] = field(default_factory=dict)
-
-    def __post_init__(self) -> None:
-        # Coerce feature shapes from list to tuple — JSON deserialisation
-        # returns lists, but the rest of the codebase expects tuples.
-        for ft in self.features.values():
-            if isinstance(ft.get("shape"), list):
-                ft["shape"] = tuple(ft["shape"])
-
-        if self.fps <= 0:
-            raise ValueError(f"fps must be positive, got {self.fps}")
-        if self.chunks_size <= 0:
-            raise ValueError(f"chunks_size must be positive, got {self.chunks_size}")
-        if self.data_files_size_in_mb <= 0:
-            raise ValueError(f"data_files_size_in_mb must be positive, got {self.data_files_size_in_mb}")
-        if self.video_files_size_in_mb <= 0:
-            raise ValueError(f"video_files_size_in_mb must be positive, got {self.video_files_size_in_mb}")
-
-    def to_dict(self) -> dict:
-        """Return a JSON-serialisable dict.
-
-        Converts tuple shapes back to lists so ``json.dump`` can handle them.
-        """
-        d = dataclasses.asdict(self)
-        for ft in d["features"].values():
-            if isinstance(ft.get("shape"), tuple):
-                ft["shape"] = list(ft["shape"])
-        return d
-
-    @classmethod
-    def from_dict(cls, data: dict) -> "DatasetInfo":
-        """Construct from a raw dict (e.g. loaded directly from JSON).
-
-        Unknown keys are ignored for forward compatibility with datasets that
-        carry additional fields (e.g. ``total_videos`` from v2.x). A warning is
-        logged when such fields are present.
-        """
-        known = {f.name for f in dataclasses.fields(cls)}
-        unknown = sorted(k for k in data if k not in known)
-        if unknown:
-            logger.warning(f"Unknown fields in DatasetInfo: {unknown}. These will be ignored.")
-        return cls(**{k: v for k, v in data.items() if k in known})
-
-    # ---------------------------------------------------------------------------
-    # Temporary dict-style compatibility layer
-    # Allows existing ``info["key"]`` call-sites to keep working without changes.
-    # Once all callers have been migrated to attribute access, remove these.
-    # ---------------------------------------------------------------------------
-    def __getitem__(self, key: str):
-        import warnings
-
-        warnings.warn(
-            f"Accessing DatasetInfo with dict-style syntax info['{key}'] is deprecated. "
-            f"Use attribute access info.{key} instead.",
-            DeprecationWarning,
-            stacklevel=2,
-        )
-        try:
-            return getattr(self, key)
-        except AttributeError as err:
-            raise KeyError(key) from err
-
-    def __setitem__(self, key: str, value) -> None:
-        import warnings
-
-        warnings.warn(
-            f"Setting DatasetInfo with dict-style syntax info['{key}'] = ... is deprecated. "
-            f"Use attribute assignment info.{key} = ... instead.",
-            DeprecationWarning,
-            stacklevel=2,
-        )
-        if not hasattr(self, key):
-            raise KeyError(f"DatasetInfo has no field '{key}'")
-        setattr(self, key, value)
-
-    def __contains__(self, key: str) -> bool:
-        """Check if a field exists (dict-like interface)."""
-        return hasattr(self, key)
-
-    def get(self, key: str, default=None):
-        """Get attribute value with default fallback (dict-like interface)."""
-        try:
-            return getattr(self, key)
-        except AttributeError:
-            return default
-
-
 def has_legacy_hub_download_metadata(root: Path) -> bool:
    """Return ``True`` when *root* looks like a legacy Hub ``local_dir`` mirror.

@@ -416,7 +294,7 @@ def create_branch(repo_id: str, *, branch: str, repo_type: str | None = None) ->

 def create_lerobot_dataset_card(
    tags: list | None = None,
-    dataset_info: DatasetInfo | None = None,
+    dataset_info: dict | None = None,
    **kwargs,
 ) -> DatasetCard:
    """Create a `DatasetCard` for a LeRobot dataset.
@@ -427,7 +305,7 @@ def create_lerobot_dataset_card(

    Args:
        tags (list | None): A list of tags to add to the dataset card.
-        dataset_info (DatasetInfo | None): The dataset's info object, which will
+        dataset_info (dict | None): The dataset's info dictionary, which will
            be displayed on the card.
        **kwargs: Additional keyword arguments to populate the card template.

@@ -440,7 +318,7 @@ def create_lerobot_dataset_card(
        card_tags += tags
    if dataset_info:
        dataset_structure = "[meta/info.json](meta/info.json):\n"
-        dataset_structure += f"```json\n{json.dumps(dataset_info.to_dict(), indent=4)}\n```\n"
+        dataset_structure += f"```json\n{json.dumps(dataset_info, indent=4)}\n```\n"
        kwargs = {**kwargs, "dataset_structure": dataset_structure}
    card_data = DatasetCardData(
        license=kwargs.get("license"),
@@ -12,8 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from lerobot.utils.action_interpolator import ActionInterpolator as ActionInterpolator
-
 from .act.configuration_act import ACTConfig as ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig as DiffusionConfig
 from .factory import get_policy_class, make_policy, make_policy_config, make_pre_post_processors
@@ -23,7 +21,10 @@ from .pi0.configuration_pi0 import PI0Config as PI0Config
 from .pi0_fast.configuration_pi0_fast import PI0FastConfig as PI0FastConfig
 from .pi05.configuration_pi05 import PI05Config as PI05Config
 from .pretrained import PreTrainedPolicy as PreTrainedPolicy
+from .rtc import ActionInterpolator as ActionInterpolator
 from .sac.configuration_sac import SACConfig as SACConfig
+from .sac.reward_model.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
+from .sarm.configuration_sarm import SARMConfig as SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig as TDMPCConfig
 from .utils import make_robot_action, prepare_observation_for_inference
@@ -44,7 +45,9 @@ __all__ = [
    "PI0Config",
    "PI0FastConfig",
    "PI05Config",
+    "RewardClassifierConfig",
    "SACConfig",
+    "SARMConfig",
    "SmolVLAConfig",
    "TDMPCConfig",
    "VQBeTConfig",
@@ -52,6 +52,8 @@ from .pi0.configuration_pi0 import PI0Config
 from .pi05.configuration_pi05 import PI05Config
 from .pretrained import PreTrainedPolicy
 from .sac.configuration_sac import SACConfig
+from .sac.reward_model.configuration_classifier import RewardClassifierConfig
+from .sarm.configuration_sarm import SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig
 from .utils import validate_visual_features_consistency
@@ -87,7 +89,7 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:

    Args:
        name: The name of the policy. Supported names are "tdmpc", "diffusion", "act",
-            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "smolvla", "wall_x".
+            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "reward_classifier", "smolvla", "wall_x".
    Returns:
        The policy class corresponding to the given name.

@@ -130,10 +132,18 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:
        from .sac.modeling_sac import SACPolicy

        return SACPolicy
+    elif name == "reward_classifier":
+        from .sac.reward_model.modeling_classifier import Classifier
+
+        return Classifier
    elif name == "smolvla":
        from .smolvla.modeling_smolvla import SmolVLAPolicy

        return SmolVLAPolicy
+    elif name == "sarm":
+        from .sarm.modeling_sarm import SARMRewardModel
+
+        return SARMRewardModel
    elif name == "groot":
        from .groot.modeling_groot import GrootPolicy

@@ -163,7 +173,7 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
    Args:
        policy_type: The type of the policy. Supported types include "tdmpc",
                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "sac",
-                     "smolvla", "wall_x".
+                     "smolvla", "reward_classifier", "wall_x".
        **kwargs: Keyword arguments to be passed to the configuration class constructor.

    Returns:
@@ -190,6 +200,8 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
        return SACConfig(**kwargs)
    elif policy_type == "smolvla":
        return SmolVLAConfig(**kwargs)
+    elif policy_type == "reward_classifier":
+        return RewardClassifierConfig(**kwargs)
    elif policy_type == "groot":
        return GrootConfig(**kwargs)
    elif policy_type == "xvla":
@@ -366,6 +378,14 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

+    elif isinstance(policy_cfg, RewardClassifierConfig):
+        from .sac.reward_model.processor_classifier import make_classifier_processor
+
+        processors = make_classifier_processor(
+            config=policy_cfg,
+            dataset_stats=kwargs.get("dataset_stats"),
+        )
+
    elif isinstance(policy_cfg, SmolVLAConfig):
        from .smolvla.processor_smolvla import make_smolvla_pre_post_processors

@@ -374,6 +394,14 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

+    elif isinstance(policy_cfg, SARMConfig):
+        from .sarm.processor_sarm import make_sarm_pre_post_processors
+
+        processors = make_sarm_pre_post_processors(
+            config=policy_cfg,
+            dataset_stats=kwargs.get("dataset_stats"),
+            dataset_meta=kwargs.get("dataset_meta"),
+        )
    elif isinstance(policy_cfg, GrootConfig):
        from .groot.processor_groot import make_groot_pre_post_processors

@@ -514,7 +542,7 @@ def make_policy(

        logging.info("Loading policy's PEFT adapter.")

-        peft_pretrained_path = str(cfg.pretrained_path)
+        peft_pretrained_path = cfg.pretrained_path
        peft_config = PeftConfig.from_pretrained(peft_pretrained_path)

        kwargs["pretrained_name_or_path"] = peft_config.base_model_name_or_path
@@ -527,9 +555,7 @@ def make_policy(
            )

        policy = policy_cls.from_pretrained(**kwargs)
-        policy = PeftModel.from_pretrained(
-            policy, peft_pretrained_path, config=peft_config, is_trainable=True
-        )
+        policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)

    else:
        # Make a fresh policy.
@@ -13,7 +13,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from dataclasses import field
+from dataclasses import dataclass, field
 from typing import TYPE_CHECKING

 import torch
@@ -109,6 +109,7 @@ class MultiEmbodimentActionEncoder(nn.Module):
        return x


+@dataclass
 class FlowmatchingActionHeadConfig(PretrainedConfig):
    """NOTE: N1.5 uses XEmbFlowmatchingPolicyHeadConfig as action head"""

@@ -444,13 +444,13 @@ class PaliGemmaWithExpertModel(
        if image.dtype != torch.float32:
            image = image.to(torch.float32)
        image_outputs = self.paligemma.model.get_image_features(image)
-        features = image_outputs.pooler_output
+        features = image_outputs.pooler_output * self.paligemma.config.text_config.hidden_size**0.5
        if features.dtype != out_dtype:
            features = features.to(out_dtype)
        return features

    def embed_language_tokens(self, tokens: torch.Tensor):
-        return self.paligemma.model.language_model.get_input_embeddings()(tokens)
+        return self.paligemma.model.language_model.embed_tokens(tokens)

    def forward(
        self,
@@ -666,7 +666,8 @@ class PI0Pytorch(nn.Module):  # see openpi `PI0Pytorch`
        # Process language tokens
        def lang_embed_func(lang_tokens):
            lang_emb = self.paligemma_with_expert.embed_language_tokens(lang_tokens)
-            return lang_emb
+            lang_emb_dim = lang_emb.shape[-1]
+            return lang_emb * math.sqrt(lang_emb_dim)

        lang_emb = self._apply_checkpoint(lang_embed_func, lang_tokens)
        embs.append(lang_emb)
@@ -747,8 +748,16 @@ class PI0Pytorch(nn.Module):  # see openpi `PI0Pytorch`

        return embs, pad_masks, att_masks, adarms_cond

-    def forward(self, images, img_masks, lang_tokens, lang_masks, state, actions, noise, time) -> Tensor:
+    def forward(
+        self, images, img_masks, lang_tokens, lang_masks, state, actions, noise=None, time=None
+    ) -> Tensor:
        """Do a full training forward pass and compute the loss."""
+        if noise is None:
+            noise = self.sample_noise(actions.shape, actions.device)
+
+        if time is None:
+            time = self.sample_time(actions.shape[0], actions.device)
+
        time_expanded = time[:, None, None]
        x_t = time_expanded * noise + (1 - time_expanded) * actions
        u_t = noise - actions
@@ -1283,11 +1292,8 @@ class PI0Policy(PreTrainedPolicy):
        state = self.prepare_state(batch)
        actions = self.prepare_action(batch)

-        noise = self.model.sample_noise(actions.shape, actions.device)
-        time = self.model.sample_time(actions.shape[0], actions.device)
-
        # Compute loss
-        losses = self.model.forward(images, img_masks, lang_tokens, lang_masks, state, actions, noise, time)
+        losses = self.model.forward(images, img_masks, lang_tokens, lang_masks, state, actions)

        # Truncate losses to actual action dimensions
        original_action_dim = self.config.output_features[ACTION].shape[0]
@@ -728,8 +728,14 @@ class PI05Pytorch(nn.Module):  # see openpi `PI0Pytorch`

        return embs, pad_masks, att_masks, adarms_cond

-    def forward(self, images, img_masks, tokens, masks, actions, noise, time) -> Tensor:
+    def forward(self, images, img_masks, tokens, masks, actions, noise=None, time=None) -> Tensor:
        """Do a full training forward pass and compute the loss."""
+        if noise is None:
+            noise = self.sample_noise(actions.shape, actions.device)
+
+        if time is None:
+            time = self.sample_time(actions.shape[0], actions.device)
+
        time_expanded = time[:, None, None]
        x_t = time_expanded * noise + (1 - time_expanded) * actions
        u_t = noise - actions
@@ -1256,11 +1262,8 @@ class PI05Policy(PreTrainedPolicy):

        actions = self.prepare_action(batch)

-        noise = self.model.sample_noise(actions.shape, actions.device)
-        time = self.model.sample_time(actions.shape[0], actions.device)
-
        # Compute loss (no separate state needed for PI05)
-        losses = self.model.forward(images, img_masks, tokens, masks, actions, noise, time)
+        losses = self.model.forward(images, img_masks, tokens, masks, actions)

        # Truncate losses to actual action dimensions
        original_action_dim = self.config.output_features[ACTION].shape[0]
@@ -16,6 +16,7 @@

 import builtins
 import logging
+import math
 from collections import deque
 from pathlib import Path
 from typing import TYPE_CHECKING, Literal, TypedDict, Unpack
@@ -226,7 +227,6 @@ class PI0FastPaliGemma(nn.Module):
        # forward(..., adarms_cond=...) is supported (same as pi0/pi05).
        if use_adarms[0]:
            text_config = self.paligemma.config.text_config
-            del self.paligemma.model.language_model
            self.paligemma.model.language_model = PiGemmaModel(text_config)

        self.to_bfloat16_for_selected_params(precision)
@@ -260,15 +260,13 @@ class PI0FastPaliGemma(nn.Module):
        if image.dtype != torch.float32:
            image = image.to(torch.float32)
        image_outputs = self.paligemma.model.get_image_features(image)
-        features = image_outputs.pooler_output
-        norm = 2048**0.5
-        features = features / norm * norm
+        features = image_outputs.pooler_output * self.paligemma.config.text_config.hidden_size**0.5
        if features.dtype != out_dtype:
            features = features.to(out_dtype)
        return features

    def embed_language_tokens(self, tokens: torch.Tensor):
-        return self.paligemma.model.language_model.get_input_embeddings()(tokens)
+        return self.paligemma.model.language_model.embed_tokens(tokens)

    def forward(
        self,
@@ -418,7 +416,8 @@ class PI0FastPytorch(nn.Module):  # see openpi `PI0Pytorch`
        # Process language instruction tokens
        def lang_embed_func(tokens):
            lang_emb = self.paligemma_with_expert.embed_language_tokens(tokens)
-            return lang_emb
+            lang_emb_dim = lang_emb.shape[-1]
+            return lang_emb * math.sqrt(lang_emb_dim)

        lang_emb = self._apply_checkpoint(lang_embed_func, tokens)
        embs.append(lang_emb)
@@ -432,7 +431,8 @@ class PI0FastPytorch(nn.Module):  # see openpi `PI0Pytorch`

            def fast_action_embed_func(fast_action_tokens):
                fast_emb = self.paligemma_with_expert.embed_language_tokens(fast_action_tokens)
-                return fast_emb
+                fast_emb_dim = fast_emb.shape[-1]
+                return fast_emb * math.sqrt(fast_emb_dim)

            fast_action_emb = self._apply_checkpoint(fast_action_embed_func, fast_action_tokens)
            embs.append(fast_action_emb)
@@ -665,6 +665,7 @@ class PI0FastPytorch(nn.Module):  # see openpi `PI0Pytorch`
            if t < max_decoding_steps - 1:
                # embed the newly generated token
                next_token_emb = self.paligemma_with_expert.embed_language_tokens(next_token)
+                next_token_emb = next_token_emb * math.sqrt(next_token_emb.shape[-1])
                if prefix_embs.dtype == torch.bfloat16:
                    next_token_emb = next_token_emb.to(dtype=torch.bfloat16)

@@ -769,6 +770,7 @@ class PI0FastPytorch(nn.Module):  # see openpi `PI0Pytorch`
            # Embed the single previous token
            # We use embed_language_tokens directly to avoid overhead of full prefix embedding
            next_token_emb = self.paligemma_with_expert.embed_language_tokens(next_token)
+            next_token_emb = next_token_emb * math.sqrt(next_token_emb.shape[-1])
            if prefix_embs.dtype == torch.bfloat16:
                next_token_emb = next_token_emb.to(dtype=torch.bfloat16)

@@ -197,9 +197,6 @@ class PiGemmaModel(GemmaModel):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
-        # Free parent-allocated layers/norm before replacing to avoid ~2x peak memory.
-        del self.layers
-        del self.norm
        # if not getattr(config, "use_adarms", False):
        #     return
        cond_dim = getattr(config, "adarms_cond_dim", None)
@@ -331,7 +328,6 @@ class PiGemmaForCausalLM(GemmaForCausalLM):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
-        del self.model
        self.model = PiGemmaModel(config)


@@ -340,7 +336,6 @@ class PaliGemmaModelWithPiGemma(PaliGemmaModel):

    def __init__(self, config):
        super().__init__(config)
-        del self.language_model
        self.language_model = PiGemmaModel(config.text_config)


@@ -349,7 +344,6 @@ class PaliGemmaForConditionalGenerationWithPiGemma(PaliGemmaForConditionalGenera

    def __init__(self, config):
        super().__init__(config)
-        del self.model
        self.model = PaliGemmaModelWithPiGemma(config)

    # Make modules available through conditional class for BC
@@ -19,7 +19,6 @@ from .action_queue import ActionQueue
 from .configuration_rtc import RTCConfig
 from .latency_tracker import LatencyTracker
 from .modeling_rtc import RTCProcessor
-from .relative import reanchor_relative_rtc_prefix

 __all__ = [
    "ActionInterpolator",
@@ -27,5 +26,4 @@ __all__ = [
    "LatencyTracker",
    "RTCConfig",
    "RTCProcessor",
-    "reanchor_relative_rtc_prefix",
 ]
@@ -1,4 +1,116 @@
-# Moved to lerobot.utils.action_interpolator — re-exported for backwards compatibility.
-from lerobot.utils.action_interpolator import ActionInterpolator
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

-__all__ = ["ActionInterpolator"]
+"""Action interpolation for smoother robot control.
+
+Provides configurable Nx control rate by interpolating between consecutive actions.
+Useful with RTC and action-chunking policies to reduce jerkiness.
+"""
+
+from torch import Tensor
+
+
+class ActionInterpolator:
+    """Interpolates between consecutive actions for smoother control.
+
+    When enabled with multiplier N, produces N actions per policy action
+    by linearly interpolating between the previous and current action.
+
+    Example with multiplier=3:
+        prev_action -> [1/3 interpolated, 2/3 interpolated, current_action]
+
+    This effectively multiplies the control rate for smoother motion.
+
+    Usage:
+        interpolator = ActionInterpolator(multiplier=2)  # 2x control rate
+
+        # In control loop:
+        if interpolator.needs_new_action():
+            new_action = queue.get()
+            if new_action:
+                interpolator.add(new_action.cpu())
+
+        action = interpolator.get()
+        if action:
+            robot.send_action(action)
+    """
+
+    def __init__(self, multiplier: int = 1):
+        """Initialize the interpolator.
+
+        Args:
+            multiplier: Control rate multiplier (1 = no interpolation, 2 = 2x, 3 = 3x, etc.)
+        """
+        if multiplier < 1:
+            raise ValueError(f"multiplier must be >= 1, got {multiplier}")
+        self.multiplier = multiplier
+        self._prev: Tensor | None = None
+        self._buffer: list[Tensor] = []
+        self._idx = 0
+
+    @property
+    def enabled(self) -> bool:
+        """Whether interpolation is active (multiplier > 1)."""
+        return self.multiplier > 1
+
+    def reset(self):
+        """Reset interpolation state (call between episodes)."""
+        self._prev = None
+        self._buffer = []
+        self._idx = 0
+
+    def needs_new_action(self) -> bool:
+        """Check if a new action is needed from the queue."""
+        return self._idx >= len(self._buffer)
+
+    def add(self, action: Tensor) -> None:
+        """Add a new action and compute interpolated sequence.
+
+        Args:
+            action: New action tensor from policy/queue (already on CPU).
+        """
+        if self.multiplier > 1 and self._prev is not None:
+            self._buffer = []
+            for i in range(1, self.multiplier + 1):
+                t = i / self.multiplier
+                interp = self._prev + t * (action - self._prev)
+                self._buffer.append(interp)
+        else:
+            # First step: no previous action yet, so run at base FPS without interpolation.
+            self._buffer = [action.clone()]
+        self._prev = action.clone()
+        self._idx = 0
+
+    def get(self) -> Tensor | None:
+        """Get the next interpolated action.
+
+        Returns:
+            Next action tensor, or None if buffer is exhausted.
+        """
+        if self._idx >= len(self._buffer):
+            return None
+        action = self._buffer[self._idx]
+        self._idx += 1
+        return action
+
+    def get_control_interval(self, fps: float) -> float:
+        """Get the control interval based on interpolation multiplier.
+
+        Args:
+            fps: Base frames per second.
+
+        Returns:
+            Control interval in seconds (divided by multiplier).
+        """
+        return 1.0 / (fps * self.multiplier)
@@ -92,10 +92,10 @@ class ActionQueue:
        Returns:
            int: Number of unconsumed actions.
        """
-        with self.lock:
-            if self.queue is None:
-                return 0
-            return len(self.queue) - self.last_index
+        if self.queue is None:
+            return 0
+        length = len(self.queue)
+        return length - self.last_index

    def empty(self) -> bool:
        """Check if the queue is empty.
@@ -103,10 +103,11 @@ class ActionQueue:
        Returns:
            bool: True if no actions remain, False otherwise.
        """
-        with self.lock:
-            if self.queue is None:
-                return True
-            return len(self.queue) - self.last_index <= 0
+        if self.queue is None:
+            return True
+
+        length = len(self.queue)
+        return length - self.last_index <= 0

    def get_action_index(self) -> int:
        """Get the current action consumption index.
@@ -114,8 +115,7 @@ class ActionQueue:
        Returns:
            int: Index of the next action to be consumed.
        """
-        with self.lock:
-            return self.last_index
+        return self.last_index

    def get_left_over(self) -> Tensor | None:
        """Get leftover original actions for RTC prev_chunk_left_over.
@@ -35,7 +35,7 @@ class RTCConfig:
    """

    # Infrastructure
-    enabled: bool = True
+    enabled: bool = False

    # Core RTC settings
    # Todo change to exp
@@ -1,58 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Relative-action helpers for Real-Time Chunking (RTC)."""
-
-from __future__ import annotations
-
-import torch
-
-from lerobot.processor import (
-    NormalizerProcessorStep,
-    RelativeActionsProcessorStep,
-    TransitionKey,
-    create_transition,
-    to_relative_actions,
-)
-
-
-def reanchor_relative_rtc_prefix(
-    prev_actions_absolute: torch.Tensor,
-    current_state: torch.Tensor,
-    relative_step: RelativeActionsProcessorStep,
-    normalizer_step: NormalizerProcessorStep | None,
-    policy_device: torch.device | str,
-) -> torch.Tensor:
-    """Convert absolute leftover actions into model-space for relative-action RTC policies.
-
-    When using relative actions, the RTC prefix (previous chunk's unexecuted tail)
-    is stored in absolute coordinates. Before feeding it back to the policy, this
-    helper re-expresses those actions relative to the robot's current joint state
-    and optionally normalizes them so the policy receives correctly scaled inputs.
-    """
-    state = current_state.detach().cpu()
-    if state.dim() == 1:
-        state = state.unsqueeze(0)
-
-    action_cpu = prev_actions_absolute.detach().cpu()
-    mask = relative_step._build_mask(action_cpu.shape[-1])
-    relative_actions = to_relative_actions(action_cpu, state, mask)
-
-    transition = create_transition(action=relative_actions)
-    if normalizer_step is not None:
-        transition = normalizer_step(transition)
-
-    return transition[TransitionKey.ACTION].to(policy_device)
@@ -1,3 +1,5 @@
+# !/usr/bin/env python
+
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -13,15 +15,14 @@
 # limitations under the License.
 from dataclasses import dataclass, field

-from lerobot.configs import NormalizationMode
-from lerobot.configs.rewards import RewardModelConfig
+from lerobot.configs import NormalizationMode, PreTrainedConfig
 from lerobot.optim import AdamWConfig, LRSchedulerConfig, OptimizerConfig
 from lerobot.utils.constants import OBS_IMAGE


-@RewardModelConfig.register_subclass(name="reward_classifier")
+@PreTrainedConfig.register_subclass(name="reward_classifier")
@dataclass
-class RewardClassifierConfig(RewardModelConfig):
+class RewardClassifierConfig(PreTrainedConfig):
    """Configuration for the Reward Classifier model."""

    name: str = "reward_classifier"
@@ -1,3 +1,5 @@
+# !/usr/bin/env python
+
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -17,10 +19,11 @@ import logging
 import torch
 from torch import Tensor, nn

-from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig
-from lerobot.rewards.pretrained import PreTrainedRewardModel
 from lerobot.utils.constants import OBS_IMAGE, REWARD

+from ...pretrained import PreTrainedPolicy
+from .configuration_classifier import RewardClassifierConfig
+

 class ClassifierOutput:
    """Wrapper for classifier outputs with additional metadata."""
@@ -96,7 +99,7 @@ class SpatialLearnedEmbeddings(nn.Module):
        return output


-class Classifier(PreTrainedRewardModel):
+class Classifier(PreTrainedPolicy):
    """Image classifier built on top of a pre-trained encoder."""

    name = "reward_classifier"
@@ -232,16 +235,6 @@ class Classifier(PreTrainedRewardModel):

        return ClassifierOutput(logits=logits, probabilities=probabilities, hidden_states=encoder_outputs)

-    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
-        """Returns 1.0 for success, 0.0 for failure based on image observations."""
-        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]
-        output = self.predict(images)
-
-        if self.config.num_classes == 2:
-            return (output.probabilities > 0.5).float()
-        else:
-            return torch.argmax(output.probabilities, dim=1).float()
-
    def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict[str, Tensor]]:
        """Standard forward pass for training compatible with train.py."""
        # Extract images and labels
@@ -276,6 +269,10 @@ class Classifier(PreTrainedRewardModel):

    def predict_reward(self, batch, threshold=0.5):
        """Eval method. Returns predicted reward with the decision threshold as argument."""
+        # Check for both OBS_IMAGE and OBS_IMAGES prefixes
+        batch = self.normalize_inputs(batch)
+        batch = self.normalize_targets(batch)
+
        # Extract images from batch dict
        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]

@@ -285,3 +282,28 @@ class Classifier(PreTrainedRewardModel):
            return (probs > threshold).float()
        else:
            return torch.argmax(self.predict(images).probabilities, dim=1)
+
+    def get_optim_params(self):
+        """Return optimizer parameters for the policy."""
+        return self.parameters()
+
+    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
+        """
+        This method is required by PreTrainedPolicy but not used for reward classifiers.
+        The reward classifier is not an actor and does not select actions.
+        """
+        raise NotImplementedError("Reward classifiers do not select actions")
+
+    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
+        """
+        This method is required by PreTrainedPolicy but not used for reward classifiers.
+        The reward classifier is not an actor and does not produce action chunks.
+        """
+        raise NotImplementedError("Reward classifiers do not predict action chunks")
+
+    def reset(self):
+        """
+        This method is required by PreTrainedPolicy but not used for reward classifiers.
+        The reward classifier is not an actor and does not select actions.
+        """
+        pass
@@ -1,3 +1,5 @@
+# !/usr/bin/env python
+
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -25,7 +27,8 @@ from lerobot.processor import (
    policy_action_to_transition,
    transition_to_policy_action,
 )
-from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig
+
+from .configuration_classifier import RewardClassifierConfig


 def make_classifier_processor(
@@ -49,6 +52,8 @@ def make_classifier_processor(
    Args:
        config: The configuration object for the RewardClassifier.
        dataset_stats: A dictionary of statistics for normalization.
+        preprocessor_kwargs: Additional arguments for the pre-processor pipeline.
+        postprocessor_kwargs: Additional arguments for the post-processor pipeline.

    Returns:
        A tuple containing the configured pre-processor and post-processor pipelines.
@@ -0,0 +1 @@
+../../../../docs/source/policy_sarm_README.md
@@ -1,4 +1,4 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -14,6 +14,5 @@

 from .configuration_sarm import SARMConfig
 from .modeling_sarm import SARMRewardModel
-from .processor_sarm import make_sarm_pre_post_processors

-__all__ = ["SARMConfig", "SARMRewardModel", "make_sarm_pre_post_processors"]
+__all__ = ["SARMConfig", "SARMRewardModel"]
@@ -25,18 +25,18 @@ need ~num_frames/30 queries instead of one per frame (~30x speedup).

 Usage:
    # Full RA-BC computation with visualizations
-    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4

    # Faster computation with stride (compute every 5 frames, interpolate the rest)
-    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --stride 5

    # Visualize predictions only (no RA-BC computation)
-    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --visualize-only \\
@@ -58,9 +58,10 @@ import torch
 from tqdm import tqdm

 from lerobot.datasets import LeRobotDataset
-from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel
-from lerobot.rewards.sarm.processor_sarm import make_sarm_pre_post_processors
-from lerobot.rewards.sarm.sarm_utils import normalize_stage_tau
+
+from .modeling_sarm import SARMRewardModel
+from .processor_sarm import make_sarm_pre_post_processors
+from .sarm_utils import normalize_stage_tau


 def get_reward_model_path_from_parquet(parquet_path: Path) -> str | None:
@@ -712,12 +713,12 @@ def main():
        epilog="""
 Examples:
    # Full RA-BC computation with visualizations
-    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4

    # Visualize predictions only (no RA-BC computation)
-    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --visualize-only \\
@@ -1,3 +1,5 @@
+#!/usr/bin/env python
+
 # Copyright 2025 Qianzhong Chen, Justin Yu, Mac Schwager, Pieter Abbeel, Yide Shentu, Philipp Wu
 # and The HuggingFace Inc. team. All rights reserved.
 #
@@ -20,15 +22,14 @@ Paper: https://arxiv.org/abs/2509.25358

 from dataclasses import dataclass, field

-from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature
-from lerobot.configs.rewards import RewardModelConfig
+from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature, PreTrainedConfig
 from lerobot.optim import AdamWConfig, CosineDecayWithWarmupSchedulerConfig
 from lerobot.utils.constants import OBS_IMAGES, OBS_STATE


-@RewardModelConfig.register_subclass("sarm")
+@PreTrainedConfig.register_subclass("sarm")
@dataclass
-class SARMConfig(RewardModelConfig):
+class SARMConfig(PreTrainedConfig):
    """Configuration class for SARM (Stage-Aware Reward Modeling).

    Supports three annotation modes:
@@ -109,6 +110,7 @@ class SARMConfig(RewardModelConfig):

    def __post_init__(self):
        super().__post_init__()
+
        if self.annotation_mode not in ["single_stage", "dense_only", "dual"]:
            raise ValueError(
                f"annotation_mode must be 'single_stage', 'dense_only', or 'dual', got {self.annotation_mode}"
@@ -1,3 +1,5 @@
+#!/usr/bin/env python
+
 # Copyright 2025 Qianzhong Chen, Justin Yu, Mac Schwager, Pieter Abbeel, Yide Shentu, Philipp Wu
 # and The HuggingFace Inc. team. All rights reserved.
 #
@@ -32,13 +34,14 @@ import torch.nn as nn
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor

-from lerobot.rewards.pretrained import PreTrainedRewardModel
-from lerobot.rewards.sarm.configuration_sarm import SARMConfig
-from lerobot.rewards.sarm.sarm_utils import (
+from lerobot.utils.constants import OBS_STR
+
+from ..pretrained import PreTrainedPolicy
+from .configuration_sarm import SARMConfig
+from .sarm_utils import (
    normalize_stage_tau,
    pad_state_to_max_dim,
 )
-from lerobot.utils.constants import OBS_STR


 class StageTransformer(nn.Module):
@@ -350,7 +353,7 @@ def gen_stage_emb(num_classes: int, targets: torch.Tensor) -> torch.Tensor:
    return stage_onehot


-class SARMRewardModel(PreTrainedRewardModel):
+class SARMRewardModel(PreTrainedPolicy):
    """
    SARM Reward Model for stage-aware task completion rewards.

@@ -468,23 +471,6 @@ class SARMRewardModel(PreTrainedRewardModel):
        self.subtask_model.to(device)
        return self

-    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
-        """Compute dense progress reward in [0, 1] from batch.
-
-        Expects batch to contain:
-        - "observation_features" or video embeddings: (B, T, 512)
-        - "language_embedding" or text embeddings: (B, 512)
-        - optionally "observation.state": (B, T, state_dim)
-        """
-        text_emb = batch.get("language_embedding", batch.get("text_features"))
-        video_emb = batch.get("observation_features", batch.get("video_features"))
-        state = batch.get("observation.state", batch.get("state_features"))
-
-        rewards = self.calculate_rewards(text_emb, video_emb, state)
-        if isinstance(rewards, np.ndarray):
-            rewards = torch.from_numpy(rewards).float()
-        return rewards
-
    @torch.no_grad()
    def calculate_rewards(
        self,
@@ -645,9 +631,17 @@ class SARMRewardModel(PreTrainedRewardModel):
        return self.parameters()

    def reset(self):
-        """SARM has no episode-level state to reset."""
+        """Required by PreTrainedPolicy but not used for reward models."""
        pass

+    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
+        """Required by PreTrainedPolicy but not used for reward models."""
+        raise NotImplementedError("SARM model does not predict action chunks")
+
+    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
+        """Required by PreTrainedPolicy but not used for SARM."""
+        raise NotImplementedError("SARM model does not select actions")
+
    def _train_step(
        self,
        img_emb: torch.Tensor,  # (B, N, T, D)
@@ -1,3 +1,5 @@
+#!/usr/bin/env python
+
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -58,15 +60,16 @@ from lerobot.processor import (
    policy_action_to_transition,
    transition_to_policy_action,
 )
-from lerobot.rewards.sarm.configuration_sarm import SARMConfig
-from lerobot.rewards.sarm.sarm_utils import (
+from lerobot.types import EnvTransition, PolicyAction, TransitionKey
+from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME
+
+from .configuration_sarm import SARMConfig
+from .sarm_utils import (
    apply_rewind_augmentation,
    compute_absolute_indices,
    find_stage_and_tau,
    pad_state_to_max_dim,
 )
-from lerobot.types import EnvTransition, PolicyAction, TransitionKey
-from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME


 class SARMEncodingProcessorStep(ProcessorStep):
@@ -1,3 +1,5 @@
+#!/usr/bin/env python
+
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -22,7 +22,7 @@ from transformers.utils import (
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
    is_flash_attn_2_available,
-    is_flash_attn_greater_or_equal,
+    is_flash_attn_greater_or_equal_2_10,
    is_torchdynamo_compiling,
    logging,
    replace_return_docstrings,
@@ -890,7 +890,7 @@ class Qwen2_5_VLFlashAttention2(Qwen2_5_VLAttention):
        # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
        # flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignment, that was made default for flash_attn>=2.1. This attribute is used to handle this difference. Reference: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.1.0.
        # Beware that with flash_attn<2.1, using q_seqlen != k_seqlen (except for the case q_seqlen == 1) produces a wrong mask (top-left).
-        self._flash_attn_uses_top_left_mask = not is_flash_attn_greater_or_equal("2.1.0")
+        self._flash_attn_uses_top_left_mask = not is_flash_attn_greater_or_equal_2_10()

    def forward(
        self,
@@ -45,7 +45,7 @@ from transformers.utils import (
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
    is_flash_attn_2_available,
-    is_flash_attn_greater_or_equal,
+    is_flash_attn_greater_or_equal_2_10,
    logging,
    replace_return_docstrings,
 )
@@ -909,7 +909,7 @@ class Florence2FlashAttention2(Florence2Attention):
        # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
        # flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignment, that was made default for flash_attn>=2.1. This attribute is used to handle this difference. Reference: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.1.0.
        # Beware that with flash_attn<2.1, using q_seqlen != k_seqlen (except for the case q_seqlen == 1) produces a wrong mask (top-left).
-        self._flash_attn_uses_top_left_mask = not is_flash_attn_greater_or_equal("2.1.0")
+        self._flash_attn_uses_top_left_mask = not is_flash_attn_greater_or_equal_2_10()

    def _reshape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
        return tensor.view(bsz, seq_len, self.num_heads, self.head_dim)
@@ -321,6 +321,7 @@ class GymHILAdapterProcessorStep(ProcessorStep):
    This step normalizes the `transition` object by:
    1. Copying `teleop_action` from `info` to `complementary_data`.
    2. Copying `is_intervention` from `info` (using the string key) to `info` (using the enum key).
+    3. Copying `discrete_penalty` from `info` to `complementary_data`.
    """

    def __call__(self, transition: EnvTransition) -> EnvTransition:
@@ -330,6 +331,9 @@ class GymHILAdapterProcessorStep(ProcessorStep):
        if TELEOP_ACTION_KEY in info:
            complementary_data[TELEOP_ACTION_KEY] = info[TELEOP_ACTION_KEY]

+        if DISCRETE_PENALTY_KEY in info:
+            complementary_data[DISCRETE_PENALTY_KEY] = info[DISCRETE_PENALTY_KEY]
+
        if "is_intervention" in info:
            info[TeleopEvents.IS_INTERVENTION] = info["is_intervention"]

@@ -348,18 +352,24 @@ class GymHILAdapterProcessorStep(ProcessorStep):
@ProcessorStepRegistry.register("gripper_penalty_processor")
 class GripperPenaltyProcessorStep(ProcessorStep):
    """
-    Applies a penalty for inefficient gripper usage.
+    Applies a small per-transition cost on the discrete gripper action.

-    This step penalizes actions that attempt to close an already closed gripper or
-    open an already open one, based on position thresholds.
+    Fires only when the commanded action would actually transition the gripper
+    from one extreme to the other (close-while-open or open-while-closed).
+    This discourages gripper oscillation while leaving "stay" and saturating-further
+    commands unpenalized.

    Attributes:
        penalty: The negative reward value to apply.
        max_gripper_pos: The maximum position value for the gripper, used for normalization.
+        open_threshold: Normalized state below which the gripper is considered "open".
+        closed_threshold: Normalized state above which the gripper is considered "closed".
    """

-    penalty: float = -0.01
+    penalty: float = -0.02
    max_gripper_pos: float = 30.0
+    open_threshold: float = 0.1
+    closed_threshold: float = 0.9

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        """
@@ -391,9 +401,13 @@ class GripperPenaltyProcessorStep(ProcessorStep):
        gripper_state_normalized = current_gripper_pos / self.max_gripper_pos

        # Calculate penalty boolean as in original
-        gripper_penalty_bool = (gripper_state_normalized < 0.5 and gripper_action_normalized > 0.5) or (
-            gripper_state_normalized > 0.75 and gripper_action_normalized < 0.5
-        )
+        #   - currently open  AND target is closed  -> close transition
+        #   - currently closed AND target is open   -> open transition
+        is_open = gripper_state_normalized < self.open_threshold
+        is_closed = gripper_state_normalized > self.closed_threshold
+        cmd_close = gripper_action_normalized > self.closed_threshold
+        cmd_open = gripper_action_normalized < self.open_threshold
+        gripper_penalty_bool = (is_open and cmd_close) or (is_closed and cmd_open)

        gripper_penalty = self.penalty * int(gripper_penalty_bool)

@@ -409,11 +423,14 @@ class GripperPenaltyProcessorStep(ProcessorStep):
        Returns the configuration of the step for serialization.

        Returns:
-            A dictionary containing the penalty value and max gripper position.
+            A dictionary containing the penalty value, max gripper position,
+            and the open/closed thresholds.
        """
        return {
            "penalty": self.penalty,
            "max_gripper_pos": self.max_gripper_pos,
+            "open_threshold": self.open_threshold,
+            "closed_threshold": self.closed_threshold,
        }

    def reset(self) -> None:
@@ -557,7 +574,7 @@ class RewardClassifierProcessorStep(ProcessorStep):
    def __post_init__(self):
        """Initializes the reward classifier model after the dataclass is created."""
        if self.pretrained_path is not None:
-            from lerobot.rewards.classifier.modeling_classifier import Classifier
+            from lerobot.policies.sac.reward_model.modeling_classifier import Classifier

            self.reward_classifier = Classifier.from_pretrained(self.pretrained_path)
            self.reward_classifier.to(self.device)
@@ -134,6 +134,15 @@ class _NormalizationMixin:
        if self.dtype is None:
            self.dtype = torch.float32
        self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)
+        self._reshape_visual_stats()
+
+    def _reshape_visual_stats(self) -> None:
+        """Reshape visual stats from ``[C]`` to ``[C, 1, 1]`` for image broadcasting."""
+        for key, feature in self.features.items():
+            if feature.type == FeatureType.VISUAL and key in self._tensor_stats:
+                for stat_name, stat_tensor in self._tensor_stats[key].items():
+                    if isinstance(stat_tensor, Tensor) and stat_tensor.ndim == 1:
+                        self._tensor_stats[key][stat_name] = stat_tensor.reshape(-1, 1, 1)

    def to(
        self, device: torch.device | str | None = None, dtype: torch.dtype | None = None
@@ -152,6 +161,7 @@ class _NormalizationMixin:
        if dtype is not None:
            self.dtype = dtype
        self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)
+        self._reshape_visual_stats()
        return self

    def state_dict(self) -> dict[str, Tensor]:
@@ -201,6 +211,7 @@ class _NormalizationMixin:
            # Don't load from state_dict, keep the explicitly provided stats
            # But ensure _tensor_stats is properly initialized
            self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)  # type: ignore[assignment]
+            self._reshape_visual_stats()
            return

        # Normal behavior: load stats from state_dict
@@ -211,6 +222,7 @@ class _NormalizationMixin:
            self._tensor_stats.setdefault(key, {})[stat_name] = tensor.to(
                dtype=torch.float32, device=self.device
            )
+        self._reshape_visual_stats()

        # Reconstruct the original stats dict from tensor stats for compatibility with to() method
        # and other functions that rely on self.stats
@@ -142,10 +142,6 @@ class RelativeActionsProcessorStep(ProcessorStep):
        new_transition[TransitionKey.ACTION] = to_relative_actions(action, state, mask)
        return new_transition

-    def get_cached_state(self) -> torch.Tensor | None:
-        """Return the cached ``observation.state`` used as the reference point for relative/absolute action conversions."""
-        return self._last_state
-
    def get_config(self) -> dict[str, Any]:
        return {
            "enabled": self.enabled,
@@ -186,8 +182,7 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
                "but relative_step is None. Ensure relative_step is set when constructing the postprocessor."
            )

-        cached_state = self.relative_step.get_cached_state()
-        if cached_state is None:
+        if self.relative_step._last_state is None:
            raise RuntimeError(
                "AbsoluteActionsProcessorStep requires state from RelativeActionsProcessorStep "
                "but no state has been cached. Ensure the preprocessor runs before the postprocessor."
@@ -199,7 +194,9 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
            return new_transition

        mask = self.relative_step._build_mask(action.shape[-1])
-        new_transition[TransitionKey.ACTION] = to_absolute_actions(action, cached_state, mask)
+        new_transition[TransitionKey.ACTION] = to_absolute_actions(
+            action, self.relative_step._last_state, mask
+        )
        return new_transition

    def get_config(self) -> dict[str, Any]:
@@ -1,36 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .classifier.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
-from .factory import (
-    get_reward_model_class as get_reward_model_class,
-    make_reward_model as make_reward_model,
-    make_reward_model_config as make_reward_model_config,
-    make_reward_pre_post_processors as make_reward_pre_post_processors,
-)
-from .pretrained import PreTrainedRewardModel as PreTrainedRewardModel
-from .sarm.configuration_sarm import SARMConfig as SARMConfig
-
-__all__ = [
-    # Configuration classes
-    "RewardClassifierConfig",
-    "SARMConfig",
-    # Base class
-    "PreTrainedRewardModel",
-    # Factory functions
-    "get_reward_model_class",
-    "make_reward_model",
-    "make_reward_model_config",
-    "make_reward_pre_post_processors",
-]
@@ -1,238 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import importlib
-import logging
-from typing import Any
-
-import torch
-
-from lerobot.configs.rewards import RewardModelConfig
-from lerobot.processor import PolicyAction, PolicyProcessorPipeline
-from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig
-from lerobot.rewards.pretrained import PreTrainedRewardModel
-from lerobot.rewards.sarm.configuration_sarm import SARMConfig
-
-
-def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
-    """
-    Retrieves a reward model class by its registered name.
-
-    This function uses dynamic imports to avoid loading all reward model classes into
-    memory at once, improving startup time and reducing dependencies.
-
-    Args:
-        name: The name of the reward model. Supported names are "reward_classifier",
-              "sarm".
-
-    Returns:
-        The reward model class corresponding to the given name.
-
-    Raises:
-        ValueError: If the reward model name is not recognized.
-    """
-    if name == "reward_classifier":
-        from lerobot.rewards.classifier.modeling_classifier import Classifier
-
-        return Classifier
-    elif name == "sarm":
-        from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel
-
-        return SARMRewardModel
-    else:
-        try:
-            return _get_reward_model_cls_from_name(name=name)
-        except Exception as e:
-            raise ValueError(f"Reward model type '{name}' is not available.") from e
-
-
-def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:
-    """
-    Instantiates a reward model configuration object based on the reward type.
-
-    This factory function simplifies the creation of reward model configuration objects
-    by mapping a string identifier to the corresponding config class.
-
-    Args:
-        reward_type: The type of the reward model. Supported types include
-                     "reward_classifier", "sarm".
-        **kwargs: Keyword arguments to be passed to the configuration class constructor.
-
-    Returns:
-        An instance of a `RewardModelConfig` subclass.
-
-    Raises:
-        ValueError: If the `reward_type` is not recognized.
-    """
-    if reward_type == "reward_classifier":
-        return RewardClassifierConfig(**kwargs)
-    elif reward_type == "sarm":
-        return SARMConfig(**kwargs)
-    else:
-        try:
-            config_cls = RewardModelConfig.get_choice_class(reward_type)
-            return config_cls(**kwargs)
-        except Exception as e:
-            raise ValueError(f"Reward model type '{reward_type}' is not available.") from e
-
-
-def make_reward_model(cfg: RewardModelConfig, **kwargs) -> PreTrainedRewardModel:
-    """
-    Instantiate a reward model from its configuration.
-
-    Args:
-        cfg: The configuration for the reward model to be created. If
-             `cfg.pretrained_path` is set, the model will be loaded with weights
-             from that path.
-        **kwargs: Additional keyword arguments forwarded to the model constructor
-            (e.g., ``dataset_stats``, ``dataset_meta``).
-
-    Returns:
-        An instantiated and device-placed reward model.
-    """
-    reward_cls = get_reward_model_class(cfg.type)
-
-    kwargs["config"] = cfg
-
-    if cfg.pretrained_path:
-        kwargs["pretrained_name_or_path"] = cfg.pretrained_path
-        reward_model = reward_cls.from_pretrained(**kwargs)
-    else:
-        reward_model = reward_cls(**kwargs)
-
-    reward_model.to(cfg.device)
-    assert isinstance(reward_model, torch.nn.Module)
-
-    return reward_model
-
-
-def make_reward_pre_post_processors(
-    reward_cfg: RewardModelConfig,
-    **kwargs,
-) -> tuple[
-    PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
-    PolicyProcessorPipeline[PolicyAction, PolicyAction],
-]:
-    """
-    Create pre- and post-processor pipelines for a given reward model.
-
-    Each reward model type has a dedicated factory function for its processors.
-
-    Args:
-        reward_cfg: The configuration of the reward model for which to create processors.
-        **kwargs: Additional keyword arguments passed to the processor factory
-            (e.g., ``dataset_stats``, ``dataset_meta``).
-
-    Returns:
-        A tuple containing the input (pre-processor) and output (post-processor) pipelines.
-
-    Raises:
-        ValueError: If a processor factory is not implemented for the given reward
-            model configuration type.
-    """
-    # Create a new processor based on reward model type
-    if isinstance(reward_cfg, RewardClassifierConfig):
-        from lerobot.rewards.classifier.processor_classifier import make_classifier_processor
-
-        return make_classifier_processor(
-            config=reward_cfg,
-            dataset_stats=kwargs.get("dataset_stats"),
-        )
-
-    elif isinstance(reward_cfg, SARMConfig):
-        from lerobot.rewards.sarm.processor_sarm import make_sarm_pre_post_processors
-
-        return make_sarm_pre_post_processors(
-            config=reward_cfg,
-            dataset_stats=kwargs.get("dataset_stats"),
-            dataset_meta=kwargs.get("dataset_meta"),
-        )
-
-    else:
-        try:
-            processors = _make_processors_from_reward_model_config(
-                config=reward_cfg,
-                dataset_stats=kwargs.get("dataset_stats"),
-            )
-        except Exception as e:
-            raise ValueError(
-                f"Processor for reward model type '{reward_cfg.type}' is not implemented."
-            ) from e
-        return processors
-
-
-def _get_reward_model_cls_from_name(name: str) -> type[PreTrainedRewardModel]:
-    """Get reward model class from its registered name using dynamic imports.
-
-    This is used as a helper function to import reward models from 3rd party lerobot
-    plugins.
-
-    Args:
-        name: The name of the reward model.
-
-    Returns:
-        The reward model class corresponding to the given name.
-    """
-    if name not in RewardModelConfig.get_known_choices():
-        raise ValueError(
-            f"Unknown reward model name '{name}'. "
-            f"Available reward models: {RewardModelConfig.get_known_choices()}"
-        )
-
-    config_cls = RewardModelConfig.get_choice_class(name)
-    config_cls_name = config_cls.__name__
-
-    model_name = config_cls_name.removesuffix("Config")
-    if model_name == config_cls_name:
-        raise ValueError(
-            f"The config class name '{config_cls_name}' does not follow the expected naming convention. "
-            f"Make sure it ends with 'Config'!"
-        )
-
-    cls_name = model_name + "RewardModel"
-    module_path = config_cls.__module__.replace("configuration_", "modeling_")
-
-    module = importlib.import_module(module_path)
-    reward_cls = getattr(module, cls_name)
-    return reward_cls
-
-
-def _make_processors_from_reward_model_config(
-    config: RewardModelConfig,
-    dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None,
-) -> tuple[Any, Any]:
-    """Create pre- and post-processors from a reward model configuration using dynamic imports.
-
-    This is used as a helper function to import processor factories from 3rd party
-    lerobot reward model plugins.
-
-    Args:
-        config: The reward model configuration object.
-        dataset_stats: Dataset statistics for normalization.
-
-    Returns:
-        A tuple containing the input (pre-processor) and output (post-processor) pipelines.
-    """
-    reward_type = config.type
-    function_name = f"make_{reward_type}_pre_post_processors"
-    module_path = config.__class__.__module__.replace("configuration_", "processor_")
-    logging.debug(
-        f"Instantiating reward pre/post processors using function '{function_name}' "
-        f"from module '{module_path}'"
-    )
-    module = importlib.import_module(module_path)
-    function = getattr(module, function_name)
-    return function(config, dataset_stats=dataset_stats)
@@ -1,244 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import abc
-import builtins
-import logging
-import os
-from importlib.resources import files
-from pathlib import Path
-from tempfile import TemporaryDirectory
-from typing import TYPE_CHECKING, Any, TypeVar
-
-import packaging
-import safetensors
-from huggingface_hub import HfApi, ModelCard, ModelCardData, hf_hub_download
-from huggingface_hub.constants import SAFETENSORS_SINGLE_FILE
-from huggingface_hub.errors import HfHubHTTPError
-from safetensors.torch import load_model as load_model_as_safetensor, save_model as save_model_as_safetensor
-from torch import Tensor, nn
-
-from lerobot.configs.rewards import RewardModelConfig
-from lerobot.utils.hub import HubMixin
-
-if TYPE_CHECKING:
-    from lerobot.configs.train import TrainPipelineConfig
-
-T = TypeVar("T", bound="PreTrainedRewardModel")
-
-
-class PreTrainedRewardModel(nn.Module, HubMixin, abc.ABC):
-    """Base class for reward models."""
-
-    config_class: None
-    name: None
-
-    def __init__(self, config: RewardModelConfig, *inputs, **kwargs):
-        super().__init__()
-        if not isinstance(config, RewardModelConfig):
-            raise ValueError(
-                f"Parameter config in `{self.__class__.__name__}(config)` should be an instance of class "
-                "`RewardModelConfig`. To create a model from a pretrained model use "
-                f"`model = {self.__class__.__name__}.from_pretrained(PRETRAINED_MODEL_NAME)`"
-            )
-        self.config = config
-
-    def __init_subclass__(cls, **kwargs):
-        super().__init_subclass__(**kwargs)
-        if not getattr(cls, "config_class", None):
-            raise TypeError(f"Class {cls.__name__} must define 'config_class'")
-        if not getattr(cls, "name", None):
-            raise TypeError(f"Class {cls.__name__} must define 'name'")
-
-    def _save_pretrained(self, save_directory: Path) -> None:
-        self.config._save_pretrained(save_directory)
-        model_to_save = self.module if hasattr(self, "module") else self
-        save_model_as_safetensor(model_to_save, str(save_directory / SAFETENSORS_SINGLE_FILE))
-
-    @classmethod
-    def from_pretrained(
-        cls: builtins.type[T],
-        pretrained_name_or_path: str | Path,
-        *,
-        config: RewardModelConfig | None = None,
-        force_download: bool = False,
-        resume_download: bool | None = None,
-        proxies: dict | None = None,
-        token: str | bool | None = None,
-        cache_dir: str | Path | None = None,
-        local_files_only: bool = False,
-        revision: str | None = None,
-        strict: bool = False,
-        **kwargs,
-    ) -> T:
-        """
-        The reward model is set in evaluation mode by default using `reward.eval()` (dropout modules are
-        deactivated). To train it, you should first set it back in training mode with `reward.train()`.
-        """
-        if config is None:
-            config = RewardModelConfig.from_pretrained(
-                pretrained_name_or_path=pretrained_name_or_path,
-                force_download=force_download,
-                resume_download=resume_download,
-                proxies=proxies,
-                token=token,
-                cache_dir=cache_dir,
-                local_files_only=local_files_only,
-                revision=revision,
-                **kwargs,
-            )
-        model_id = str(pretrained_name_or_path)
-        instance = cls(config, **kwargs)
-        if os.path.isdir(model_id):
-            print("Loading weights from local directory")
-            model_file = os.path.join(model_id, SAFETENSORS_SINGLE_FILE)
-            reward = cls._load_as_safetensor(instance, model_file, config.device or "cpu", strict)
-        else:
-            try:
-                model_file = hf_hub_download(
-                    repo_id=model_id,
-                    filename=SAFETENSORS_SINGLE_FILE,
-                    revision=revision,
-                    cache_dir=cache_dir,
-                    force_download=force_download,
-                    proxies=proxies,
-                    resume_download=resume_download,
-                    token=token,
-                    local_files_only=local_files_only,
-                )
-                reward = cls._load_as_safetensor(instance, model_file, config.device or "cpu", strict)
-            except HfHubHTTPError as e:
-                raise FileNotFoundError(
-                    f"{SAFETENSORS_SINGLE_FILE} not found on the HuggingFace Hub in {model_id}"
-                ) from e
-
-        reward.to(config.device)
-        reward.eval()
-        return reward
-
-    @classmethod
-    def _load_as_safetensor(cls, model: T, model_file: str, map_location: str, strict: bool) -> T:
-        # Create base kwargs
-        kwargs = {"strict": strict}
-
-        # Add device parameter for newer versions that support it
-        if packaging.version.parse(safetensors.__version__) >= packaging.version.parse("0.4.3"):
-            kwargs["device"] = map_location
-
-        # Load the model with appropriate kwargs
-        missing_keys, unexpected_keys = load_model_as_safetensor(model, model_file, **kwargs)
-        if missing_keys:
-            logging.warning(f"Missing key(s) when loading model: {missing_keys}")
-        if unexpected_keys:
-            logging.warning(f"Unexpected key(s) when loading model: {unexpected_keys}")
-
-        # For older versions, manually move to device if needed
-        if "device" not in kwargs and map_location != "cpu":
-            logging.warning(
-                "Loading model weights on other devices than 'cpu' is not supported natively in your version of safetensors."
-                " This means that the model is loaded on 'cpu' first and then copied to the device."
-                " This leads to a slower loading time."
-                " Please update safetensors to version 0.4.3 or above for improved performance."
-            )
-            model.to(map_location)
-        return model
-
-    def get_optim_params(self):
-        """
-        Returns the reward-model-specific parameters dict to be passed on to the optimizer.
-        """
-        return self.parameters()
-
-    def reset(self) -> None:
-        """Reset any internal state."""
-        pass
-
-    @abc.abstractmethod
-    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
-        """Compute a scalar reward signal for a batch of observations.
-
-        Args:
-            batch: Dictionary containing at minimum observation tensors.
-                   May also contain "action", "next_observation.*", etc.
-
-        Returns:
-            Tensor of shape ``(batch_size,)`` with reward values.
-        """
-        ...
-
-    def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict[str, Any]]:
-        """Training forward pass — override for trainable reward models."""
-        raise NotImplementedError(
-            f"{self.__class__.__name__} is not trainable. Only use compute_reward() for inference."
-        )
-
-    @property
-    def is_trainable(self) -> bool:
-        """Whether this reward model can be trained via ``lerobot-train``.
-
-        Trainable reward models override :meth:`forward`; zero-shot models
-        inherit the base implementation that raises ``NotImplementedError``.
-        """
-        return type(self).forward is not PreTrainedRewardModel.forward
-
-    def push_model_to_hub(self, cfg: "TrainPipelineConfig"):
-        api = HfApi()
-        repo_id = api.create_repo(
-            repo_id=self.config.repo_id, private=self.config.private, exist_ok=True
-        ).repo_id
-
-        # Push the files to the repo in a single commit
-        with TemporaryDirectory(ignore_cleanup_errors=True) as tmp:
-            saved_path = Path(tmp) / repo_id
-
-            self.save_pretrained(saved_path)  # Calls _save_pretrained and stores model tensors
-
-            card = self.generate_model_card(
-                cfg.dataset.repo_id, self.config.type, self.config.license, self.config.tags
-            )
-            card.save(str(saved_path / "README.md"))
-
-            cfg.save_pretrained(saved_path)  # Calls _save_pretrained and stores train config
-
-            commit_info = api.upload_folder(
-                repo_id=repo_id,
-                repo_type="model",
-                folder_path=saved_path,
-                commit_message="Upload reward model weights, train config and readme",
-                allow_patterns=["*.safetensors", "*.json", "*.yaml", "*.md"],
-                ignore_patterns=["*.tmp", "*.log"],
-            )
-
-            logging.info(f"Model pushed to {commit_info.repo_url.url}")
-
-    def generate_model_card(
-        self, dataset_repo_id: str, model_type: str, license: str | None, tags: list[str] | None
-    ) -> ModelCard:
-        card_data = ModelCardData(
-            license=license or "apache-2.0",
-            library_name="lerobot",
-            pipeline_tag="robotics",
-            tags=list(set(tags or []).union({"robotics", "lerobot", "reward-model", model_type})),
-            model_name=model_type,
-            datasets=dataset_repo_id,
-        )
-
-        template_card = (
-            files("lerobot.templates")
-            .joinpath("lerobot_rewardmodel_modelcard_template.md")
-            .read_text(encoding="utf-8")
-        )
-        card = ModelCard.from_template(card_data, template_str=template_card)
-        card.validate()
-        return card
@@ -60,7 +60,7 @@ from torch.multiprocessing import Event, Queue
 from lerobot.cameras import opencv  # noqa: F401
 from lerobot.configs import parser
 from lerobot.configs.train import TrainRLServerPipelineConfig
-from lerobot.policies import make_policy
+from lerobot.policies import make_policy, make_pre_post_processors
 from lerobot.policies.sac.modeling_sac import SACPolicy
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
@@ -76,7 +76,6 @@ from lerobot.transport.utils import (
 )
 from lerobot.types import TransitionKey
 from lerobot.utils.device_utils import get_safe_torch_device
-from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
 from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.transition import (
@@ -90,11 +89,12 @@ from lerobot.utils.utils import (
 )

 from .gym_manipulator import (
-    create_transition,
    make_processors,
    make_robot_env,
+    reset_and_build_transition,
    step_env_and_process_transition,
 )
+from .process import ProcessSignalHandler
 from .queue import get_last_item_from_queue

 # Main entry point
@@ -261,13 +261,12 @@ def act_with_policy(
    policy = policy.eval()
    assert isinstance(policy, nn.Module)

-    obs, info = online_env.reset()
-    env_processor.reset()
-    action_processor.reset()
+    preprocessor, postprocessor = make_pre_post_processors(
+        policy_cfg=cfg.policy,
+        dataset_stats=cfg.policy.dataset_stats,
+    )

-    # Process initial observation
-    transition = create_transition(observation=obs, info=info)
-    transition = env_processor(transition)
+    transition = reset_and_build_transition(online_env, env_processor, action_processor)

    # NOTE: For the moment we will solely handle the case of a single environment
    sum_reward_episode = 0
@@ -291,8 +290,21 @@ def act_with_policy(

        # Time policy inference and check if it meets FPS requirement
        with policy_timer:
-            # Extract observation from transition for policy
-            action = policy.select_action(batch=observation)
+            normalized_observation = preprocessor.process_observation(observation)
+            action = policy.select_action(batch=normalized_observation)
+            # Unnormalize only the continuous part. When `num_discrete_actions` is set,
+            # `select_action` concatenates an argmax index in env space at the last dim;
+            # action stats cover the continuous dims only, so feeding the full vector to
+            # the unnormalizer would shape-mismatch and would also corrupt the discrete
+            # index by treating it as a normalized value.
+            if cfg.policy.num_discrete_actions is not None:
+                continuous_action = postprocessor.process_action(action[..., :-1])
+                discrete_action = action[..., -1:].to(
+                    device=continuous_action.device, dtype=continuous_action.dtype
+                )
+                action = torch.cat([continuous_action, discrete_action], dim=-1)
+            else:
+                action = postprocessor.process_action(action)
        policy_fps = policy_timer.fps_last

        log_policy_frequency_issue(policy_fps=policy_fps, cfg=cfg, interaction_step=interaction_step)
@@ -326,7 +338,8 @@ def act_with_policy(

        # Check for intervention from transition info
        intervention_info = new_transition[TransitionKey.INFO]
-        if intervention_info.get(TeleopEvents.IS_INTERVENTION, False):
+        is_intervention = bool(intervention_info.get(TeleopEvents.IS_INTERVENTION, False))
+        if is_intervention:
            episode_intervention = True
            episode_intervention_steps += 1

@@ -334,6 +347,10 @@ def act_with_policy(
            "discrete_penalty": torch.tensor(
                [new_transition[TransitionKey.COMPLEMENTARY_DATA].get("discrete_penalty", 0.0)]
            ),
+            # Forward the intervention flag so the learner can route this transition
+            # into the offline replay buffer (see `process_transitions` in learner.py).
+            # Use the plain string key so the payload survives torch.load(weights_only=True).
+            TeleopEvents.IS_INTERVENTION.value: is_intervention,
        }
        # Create transition for learner (convert to old format)
        list_transition_to_send_to_learner.append(
@@ -390,14 +407,7 @@ def act_with_policy(
            episode_intervention_steps = 0
            episode_total_steps = 0

-            # Reset environment and processors
-            obs, info = online_env.reset()
-            env_processor.reset()
-            action_processor.reset()
-
-            # Process initial observation
-            transition = create_transition(observation=obs, info=info)
-            transition = env_processor(transition)
+            transition = reset_and_build_transition(online_env, env_processor, action_processor)

        if cfg.env.fps is not None:
            dt_time = time.perf_counter() - start_time
@@ -193,15 +193,15 @@ def convert_lerobot_dataset_to_cropped_lerobot_dataset(
        fps=int(original_dataset.fps),
        root=new_dataset_root,
        robot_type=original_dataset.meta.robot_type,
-        features=original_dataset.meta.info.features,
+        features=original_dataset.meta.info["features"],
        use_videos=len(original_dataset.meta.video_keys) > 0,
    )

    # Update the metadata for every image key that will be cropped:
    # (Here we simply set the shape to be the final resize_size.)
    for key in crop_params_dict:
-        if key in new_dataset.meta.info.features:
-            new_dataset.meta.info.features[key]["shape"] = (3, *resize_size)
+        if key in new_dataset.meta.info["features"]:
+            new_dataset.meta.info["features"][key]["shape"] = [3] + list(resize_size)

    # TODO:  Directly modify the mp4 video + meta info features, instead of recreating a dataset
    prev_episode_index = 0
@@ -383,10 +383,21 @@ def make_processors(
            GymHILAdapterProcessorStep(),
            Numpy2TorchActionProcessorStep(),
            VanillaObservationProcessorStep(),
-            AddBatchDimensionProcessorStep(),
-            DeviceProcessorStep(device=device),
        ]

+        # Add time limit processor if reset config exists
+        if cfg.processor.reset is not None:
+            env_pipeline_steps.append(
+                TimeLimitProcessorStep(max_episode_steps=int(cfg.processor.reset.control_time_s * cfg.fps))
+            )
+
+        env_pipeline_steps.extend(
+            [
+                AddBatchDimensionProcessorStep(),
+                DeviceProcessorStep(device=device),
+            ]
+        )
+
        return DataProcessorPipeline(
            steps=env_pipeline_steps, to_transition=identity_transition, to_output=identity_transition
        ), DataProcessorPipeline(
@@ -551,8 +562,19 @@ def step_env_and_process_transition(
    terminated = terminated or processed_action_transition[TransitionKey.DONE]
    truncated = truncated or processed_action_transition[TransitionKey.TRUNCATED]
    complementary_data = processed_action_transition[TransitionKey.COMPLEMENTARY_DATA].copy()
+
+    if hasattr(env, "get_raw_joint_positions"):
+        raw_joint_positions = env.get_raw_joint_positions()
+        if raw_joint_positions is not None:
+            complementary_data["raw_joint_positions"] = raw_joint_positions
+
+    # Merge env and action-processor info: env wins for str keys, action-processor
+    # wins for `TeleopEvents` enum keys
+    action_info = processed_action_transition[TransitionKey.INFO]
    new_info = info.copy()
-    new_info.update(processed_action_transition[TransitionKey.INFO])
+    for key, value in action_info.items():
+        if isinstance(key, TeleopEvents):
+            new_info[key] = value

    new_transition = create_transition(
        observation=obs,
@@ -568,6 +590,24 @@ def step_env_and_process_transition(
    return new_transition


+def reset_and_build_transition(
+    env: gym.Env,
+    env_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
+    action_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
+) -> EnvTransition:
+    """Reset env + processors and return the first env-processed transition."""
+    obs, info = env.reset()
+    env_processor.reset()
+    action_processor.reset()
+    complementary_data: dict[str, Any] = {}
+    if hasattr(env, "get_raw_joint_positions"):
+        raw_joint_positions = env.get_raw_joint_positions()
+        if raw_joint_positions is not None:
+            complementary_data["raw_joint_positions"] = raw_joint_positions
+    transition = create_transition(observation=obs, info=info, complementary_data=complementary_data)
+    return env_processor(data=transition)
+
+
 def control_loop(
    env: gym.Env,
    env_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
@@ -593,17 +633,7 @@ def control_loop(
    print("- When not intervening, robot will stay still")
    print("- Press Ctrl+C to exit")

-    # Reset environment and processors
-    obs, info = env.reset()
-    complementary_data = (
-        {"raw_joint_positions": info.pop("raw_joint_positions")} if "raw_joint_positions" in info else {}
-    )
-    env_processor.reset()
-    action_processor.reset()
-
-    # Process initial observation
-    transition = create_transition(observation=obs, info=info, complementary_data=complementary_data)
-    transition = env_processor(data=transition)
+    transition = reset_and_build_transition(env, env_processor, action_processor)

    # Determine if gripper is used
    use_gripper = cfg.env.processor.gripper.use_gripper if cfg.env.processor.gripper is not None else True
@@ -665,7 +695,7 @@ def control_loop(
        # Create a neutral action (no movement)
        neutral_action = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32)
        if use_gripper:
-            neutral_action = torch.cat([neutral_action, torch.tensor([0.0])])  # Gripper stay
+            neutral_action = torch.cat([neutral_action, torch.tensor([1.0])])  # Gripper stay

        # Use the new step function
        transition = step_env_and_process_transition(
@@ -723,12 +753,7 @@ def control_loop(
                    dataset.save_episode()

            # Reset for new episode
-            obs, info = env.reset()
-            env_processor.reset()
-            action_processor.reset()
-
-            transition = create_transition(observation=obs, info=info)
-            transition = env_processor(transition)
+            transition = reset_and_build_transition(env, env_processor, action_processor)

        # Maintain fps timing
        precise_sleep(max(dt - (time.perf_counter() - step_start_time), 0.0))
@@ -70,7 +70,7 @@ from lerobot.common.wandb_utils import WandBLogger
 from lerobot.configs import parser
 from lerobot.configs.train import TrainRLServerPipelineConfig
 from lerobot.datasets import LeRobotDataset, make_dataset
-from lerobot.policies import make_policy
+from lerobot.policies import make_policy, make_pre_post_processors
 from lerobot.policies.sac.modeling_sac import SACPolicy
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
@@ -90,7 +90,6 @@ from lerobot.utils.constants import (
    TRAINING_STATE_DIR,
 )
 from lerobot.utils.device_utils import get_safe_torch_device
-from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
 from lerobot.utils.transition import move_state_dict_to_device, move_transition_to_device
 from lerobot.utils.utils import (
@@ -100,6 +99,7 @@ from lerobot.utils.utils import (

 from .buffer import ReplayBuffer, concatenate_batch_transitions
 from .learner_service import MAX_WORKERS, SHUTDOWN_TIMEOUT, LearnerService
+from .process import ProcessSignalHandler


@parser.wrap()
@@ -317,6 +317,11 @@ def add_actor_information_and_train(

    policy.train()

+    preprocessor, _postprocessor = make_pre_post_processors(
+        policy_cfg=cfg.policy,
+        dataset_stats=cfg.policy.dataset_stats,
+    )
+
    push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)

    last_time_policy_pushed = time.time()
@@ -405,8 +410,8 @@ def add_actor_information_and_train(

            actions = batch[ACTION]
            rewards = batch["reward"]
-            observations = batch["state"]
-            next_observations = batch["next_state"]
+            observations = preprocessor.process_observation(batch["state"])
+            next_observations = preprocessor.process_observation(batch["next_state"])
            done = batch["done"]
            check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)

@@ -463,8 +468,8 @@ def add_actor_information_and_train(

        actions = batch[ACTION]
        rewards = batch["reward"]
-        observations = batch["state"]
-        next_observations = batch["next_state"]
+        observations = preprocessor.process_observation(batch["state"])
+        next_observations = preprocessor.process_observation(batch["next_state"])
        done = batch["done"]

        check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
@@ -1163,7 +1168,7 @@ def process_transitions(

            # Add to offline buffer if it's an intervention
            if dataset_repo_id is not None and transition.get("complementary_info", {}).get(
-                TeleopEvents.IS_INTERVENTION
+                TeleopEvents.IS_INTERVENTION.value
            ):
                offline_replay_buffer.add(**transition)

@@ -353,7 +353,8 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
        speed_factor: A scaling factor to convert the normalized velocity command to a position change.
        clip_min: The minimum allowed gripper joint position.
        clip_max: The maximum allowed gripper joint position.
-        discrete_gripper: If True, treat the input action as discrete (0: open, 1: close, 2: stay).
+        discrete_gripper: If True, interpret the input as a discrete class index
+            {0 = close, 1 = stay, 2 = open}, matching `GamepadTeleop.GripperAction`.
    """

    speed_factor: float = 20.0
@@ -377,10 +378,10 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
            raise ValueError("Joints observation is require for computing robot kinematics")

        if self.discrete_gripper:
-            # Discrete gripper actions are in [0, 1, 2]
-            # 0: open, 1: close, 2: stay
-            # We need to shift them to [-1, 0, 1] and then scale them to clip_max
-            gripper_vel = (gripper_vel - 1) * self.clip_max
+            # Map discrete command {0=close, 1=stay, 2=open} -> signed velocity.
+            # Negation accounts for SO100 sign (joint position increases on close).
+            #   0 -> +clip_max (close), 1 -> 0 (stay), 2 -> -clip_max (open)
+            gripper_vel = -(gripper_vel - 1) * self.clip_max

        # Compute desired gripper position
        delta = gripper_vel * float(self.speed_factor)
@@ -1,87 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Policy deployment engine with pluggable rollout strategies."""
-
-from lerobot.utils.import_utils import require_package
-
-require_package("datasets", extra="dataset")
-
-from .configs import (
-    BaseStrategyConfig,
-    DAggerKeyboardConfig,
-    DAggerPedalConfig,
-    DAggerStrategyConfig,
-    HighlightStrategyConfig,
-    RolloutConfig,
-    RolloutStrategyConfig,
-    SentryStrategyConfig,
-)
-from .context import (
-    DatasetContext,
-    HardwareContext,
-    PolicyContext,
-    ProcessorContext,
-    RolloutContext,
-    RuntimeContext,
-    build_rollout_context,
-)
-from .inference import (
-    InferenceEngine,
-    InferenceEngineConfig,
-    RTCInferenceConfig,
-    RTCInferenceEngine,
-    SyncInferenceConfig,
-    SyncInferenceEngine,
-    create_inference_engine,
-)
-from .strategies import (
-    BaseStrategy,
-    DAggerStrategy,
-    HighlightStrategy,
-    RolloutStrategy,
-    SentryStrategy,
-    create_strategy,
-)
-
-__all__ = [
-    "BaseStrategy",
-    "BaseStrategyConfig",
-    "DAggerKeyboardConfig",
-    "DAggerPedalConfig",
-    "DAggerStrategy",
-    "DAggerStrategyConfig",
-    "DatasetContext",
-    "HardwareContext",
-    "HighlightStrategy",
-    "HighlightStrategyConfig",
-    "InferenceEngine",
-    "InferenceEngineConfig",
-    "PolicyContext",
-    "ProcessorContext",
-    "RTCInferenceConfig",
-    "RTCInferenceEngine",
-    "RolloutConfig",
-    "RolloutContext",
-    "RolloutStrategy",
-    "RolloutStrategyConfig",
-    "RuntimeContext",
-    "SentryStrategy",
-    "SentryStrategyConfig",
-    "SyncInferenceConfig",
-    "SyncInferenceEngine",
-    "build_rollout_context",
-    "create_inference_engine",
-    "create_strategy",
-]
@@ -1,323 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Configuration dataclasses for the rollout deployment engine."""
-
-from __future__ import annotations
-
-import abc
-import logging
-from dataclasses import dataclass, field
-
-import draccus
-
-from lerobot.configs import PreTrainedConfig, parser
-from lerobot.configs.dataset import DatasetRecordConfig
-from lerobot.robots.config import RobotConfig
-from lerobot.teleoperators.config import TeleoperatorConfig
-from lerobot.utils.device_utils import auto_select_torch_device, is_torch_device_available
-
-from .inference import InferenceEngineConfig, SyncInferenceConfig
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# Strategy configs (polymorphic dispatch via draccus ChoiceRegistry)
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RolloutStrategyConfig(draccus.ChoiceRegistry, abc.ABC):
-    """Abstract base for rollout strategy configurations.
-
-    Use ``--strategy.type=<name>`` on the CLI to select a strategy.
-    """
-
-    @property
-    def type(self) -> str:
-        return self.get_choice_name(self.__class__)
-
-
-@RolloutStrategyConfig.register_subclass("base")
-@dataclass
-class BaseStrategyConfig(RolloutStrategyConfig):
-    """Autonomous rollout with no data recording."""
-
-    pass
-
-
-@RolloutStrategyConfig.register_subclass("sentry")
-@dataclass
-class SentryStrategyConfig(RolloutStrategyConfig):
-    """Continuous autonomous rollout with always-on recording.
-
-    Episode duration is derived from camera resolution, FPS, and
-    ``target_video_file_size_mb`` so that each saved episode produces a
-    video file that has crossed the target size.  This aligns episode
-    boundaries with the dataset's video file chunking, so each
-    ``push_to_hub`` call uploads complete video files rather than
-    re-uploading a growing file that hasn't crossed the chunk boundary.
-    """
-
-    upload_every_n_episodes: int = 5
-    # Target video file size in MB for episode rotation.  Episodes are
-    # saved once the estimated video duration would exceed this limit.
-    # Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when set to None.
-    target_video_file_size_mb: int | None = None
-
-
-@RolloutStrategyConfig.register_subclass("highlight")
-@dataclass
-class HighlightStrategyConfig(RolloutStrategyConfig):
-    """Autonomous rollout with on-demand recording via ring buffer.
-
-    A memory-bounded ring buffer continuously captures telemetry.  When
-    the user presses the save key, the buffer contents are flushed to
-    the dataset and live recording continues until the key is pressed
-    again.
-    """
-
-    ring_buffer_seconds: float = 10.0
-    ring_buffer_max_memory_mb: int = 1024
-    save_key: str = "s"
-    push_key: str = "h"
-
-
-@dataclass
-class DAggerKeyboardConfig:
-    """Keyboard key bindings for DAgger controls.
-
-    Keys are specified as single characters (e.g. ``"c"``, ``"h"``) or
-    special key names (``"space"``).
-    """
-
-    pause_resume: str = "space"
-    correction: str = "tab"
-    upload: str = "enter"
-
-
-@dataclass
-class DAggerPedalConfig:
-    """Foot pedal configuration for DAgger controls.
-
-    Pedal codes are evdev key code strings (e.g. ``"KEY_A"``).
-    """
-
-    device_path: str = "/dev/input/by-id/usb-PCsensor_FootSwitch-event-kbd"
-    pause_resume: str = "KEY_A"
-    correction: str = "KEY_B"
-    upload: str = "KEY_C"
-
-
-@RolloutStrategyConfig.register_subclass("dagger")
-@dataclass
-class DAggerStrategyConfig(RolloutStrategyConfig):
-    """Human-in-the-loop data collection (DAgger / RaC).
-
-    Alternates between autonomous policy execution and human intervention.
-    Intervention frames are tagged with ``intervention=True``.
-
-    Input is controlled via either a keyboard or foot pedal, selected by
-    ``input_device``.  Each device exposes three actions:
-
-    1. **pause_resume** — toggle policy execution on/off.
-    2. **correction** — toggle human correction recording.
-    3. **upload** — push dataset to hub on demand (corrections-only mode).
-
-    When ``record_autonomous=False`` (default) only human-correction windows
-    are recorded — each correction becomes its own episode.  Set to ``True``
-    to record both autonomous and correction frames with size-based episode
-    rotation (same as Sentry) and background uploading.  ``push_to_hub`` is
-    blocked while a correction is in progress.
-    """
-
-    # Number of correction episodes to collect (corrections-only mode).
-    # When None, falls back to ``--dataset.num_episodes``.
-    num_episodes: int | None = None
-    record_autonomous: bool = False
-    upload_every_n_episodes: int = 5
-    # Target video file size in MB for episode rotation (record_autonomous
-    # mode only).  Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when None.
-    target_video_file_size_mb: int | None = None
-    input_device: str = "keyboard"
-    keyboard: DAggerKeyboardConfig = field(default_factory=DAggerKeyboardConfig)
-    pedal: DAggerPedalConfig = field(default_factory=DAggerPedalConfig)
-
-    def __post_init__(self):
-        if self.input_device not in ("keyboard", "pedal"):
-            raise ValueError(f"DAgger input_device must be 'keyboard' or 'pedal', got '{self.input_device}'")
-
-
-# ---------------------------------------------------------------------------
-# Top-level rollout config
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RolloutConfig:
-    """Top-level configuration for the ``lerobot-rollout`` CLI.
-
-    Combines hardware, policy, strategy, and runtime settings.  The
-    ``__post_init__`` method performs fail-fast validation to reject
-    invalid flag combinations early.
-    """
-
-    # Hardware
-    robot: RobotConfig | None = None
-    teleop: TeleoperatorConfig | None = None
-
-    # Policy (loaded from --policy.path via __post_init__)
-    policy: PreTrainedConfig | None = None
-
-    # Strategy (polymorphic: --strategy.type=base|sentry|highlight|dagger)
-    strategy: RolloutStrategyConfig = field(default_factory=BaseStrategyConfig)
-
-    # Inference backend (polymorphic: --inference.type=sync|rtc)
-    inference: InferenceEngineConfig = field(default_factory=SyncInferenceConfig)
-
-    # Dataset (required for sentry, highlight, dagger; None for base)
-    dataset: DatasetRecordConfig | None = None
-
-    # Runtime
-    fps: float = 30.0
-    duration: float = 0.0  # 0 = infinite (24/7 mode)
-    interpolation_multiplier: int = 1
-    device: str | None = None
-    task: str = ""
-    display_data: bool = False
-    # Display data on a remote Rerun server
-    display_ip: str | None = None
-    # Port of the remote Rerun server
-    display_port: int | None = None
-    # Whether to display compressed images in Rerun
-    display_compressed_images: bool = False
-    # Use vocal synthesis to read events
-    play_sounds: bool = True
-    resume: bool = False
-    # Rename map for mapping robot/dataset observation keys to policy keys
-    rename_map: dict[str, str] = field(default_factory=dict)
-
-    # Hardware teardown
-    # When True (default), smoothly interpolate the robot back to the joint
-    # positions captured at startup before disconnecting.  Set to False to
-    # leave the robot in its final achieved pose at shutdown.
-    return_to_initial_position: bool = True
-
-    # Torch compile
-    use_torch_compile: bool = False
-    torch_compile_backend: str = "inductor"
-    torch_compile_mode: str = "default"
-    compile_warmup_inferences: int = 2
-
-    def __post_init__(self):
-        """Validate config invariants and load the policy config from ``--policy.path``."""
-        # --- Strategy-specific validation ---
-        if isinstance(self.strategy, DAggerStrategyConfig) and self.teleop is None:
-            raise ValueError("DAgger strategy requires --teleop.type to be set")
-
-        # TODO(Steven): DAgger shouldn't require a dataset (user may want to just rollout+intervene without recording), but for now we require it to simplify the implementation.
-        needs_dataset = isinstance(
-            self.strategy, (SentryStrategyConfig, HighlightStrategyConfig, DAggerStrategyConfig)
-        )
-        if needs_dataset and (self.dataset is None or not self.dataset.repo_id):
-            raise ValueError(f"{self.strategy.type} strategy requires --dataset.repo_id to be set")
-
-        if isinstance(self.strategy, BaseStrategyConfig) and self.dataset is not None:
-            raise ValueError(
-                "Base strategy does not record data. Use sentry, highlight, or dagger for recording."
-            )
-
-        # Sentry MUST use streaming encoding to avoid disk I/O blocking the control loop
-        if (
-            isinstance(self.strategy, SentryStrategyConfig)
-            and self.dataset is not None
-            and not self.dataset.streaming_encoding
-        ):
-            logger.warning("Sentry mode forces streaming_encoding=True")
-            self.dataset.streaming_encoding = True
-
-        # Highlight writes frames while the policy is still running, so streaming is mandatory.
-        if (
-            isinstance(self.strategy, HighlightStrategyConfig)
-            and self.dataset is not None
-            and not self.dataset.streaming_encoding
-        ):
-            logger.warning("Highlight mode forces streaming_encoding=True")
-            self.dataset.streaming_encoding = True
-
-        # DAgger: streaming is mandatory only when the autonomous phase is also recorded.
-        if isinstance(self.strategy, DAggerStrategyConfig) and self.dataset is not None:
-            if self.strategy.record_autonomous and not self.dataset.streaming_encoding:
-                logger.warning("DAgger with record_autonomous=True forces streaming_encoding=True")
-                self.dataset.streaming_encoding = True
-            elif not self.strategy.record_autonomous and not self.dataset.streaming_encoding:
-                logger.info(
-                    "Streaming encoding is disabled for DAgger corrections-only mode. "
-                    "Consider enabling it for faster episode saving: "
-                    "--dataset.streaming_encoding=true --dataset.encoder_threads=2"
-                )
-
-        # DAgger: resolve num_episodes from dataset config when not explicitly set.
-        if isinstance(self.strategy, DAggerStrategyConfig) and self.strategy.num_episodes is None:
-            if self.dataset is not None:
-                self.strategy.num_episodes = self.dataset.num_episodes
-                logger.info(
-                    "DAgger num_episodes not set — using --dataset.num_episodes=%d",
-                    self.strategy.num_episodes,
-                )
-            else:
-                raise ValueError(
-                    "DAgger num_episodes must be set either via --strategy.num_episodes or --dataset.num_episodes"
-                )
-
-        # --- Policy loading ---
-        if self.robot is None:
-            raise ValueError("--robot.type is required for rollout")
-
-        policy_path = parser.get_path_arg("policy")
-        if policy_path:
-            cli_overrides = parser.get_cli_overrides("policy")
-            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
-            self.policy.pretrained_path = policy_path
-        if self.policy is None:
-            raise ValueError("--policy.path is required for rollout")
-
-        # --- Task resolution ---
-        # When any --dataset.* flag is passed, draccus creates a DatasetRecordConfig with single_task="".
-        # If the user set the task via the top-level --task flag, propagate it so that all
-        # downstream consumers (inference engine, dataset frame builders) see it.
-        if self.dataset is not None and not self.dataset.single_task and self.task:
-            logger.info("Propagating top-level task '%s' to dataset config", self.task)
-            self.dataset.single_task = self.task
-        elif self.dataset is not None and self.dataset.single_task and not self.task:
-            logger.info("Propagating dataset single_task '%s' to top-level task", self.dataset.single_task)
-            self.task = self.dataset.single_task
-
-        # --- Device resolution ---
-        # Resolve device from the policy config when not explicitly set so all
-        # components (policy.to, preprocessor, inference engine) use the same
-        # device string instead of inconsistent fallbacks.
-        if self.device is None or not is_torch_device_available(self.device):
-            resolved = self.policy.device
-            if resolved:
-                self.device = resolved
-                logger.info("Resolved device from policy config: %s", self.device)
-            else:
-                self.device = auto_select_torch_device().type
-                logger.info("No policy config to resolve device from; auto-selected device: %s", self.device)
-
-    @classmethod
-    def __get_path_fields__(cls) -> list[str]:
-        return ["policy"]
@@ -1,454 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Rollout context: shared state created once before strategy dispatch.
-
-Grouped into five topical sub-contexts — :class:`RuntimeContext`,
-:class:`HardwareContext`, :class:`PolicyContext`, :class:`ProcessorContext`,
-and :class:`DatasetContext` — assembled into :class:`RolloutContext`.
-"""
-
-from __future__ import annotations
-
-import logging
-from dataclasses import dataclass, field
-from threading import Event
-
-import torch
-
-from lerobot.configs import FeatureType
-from lerobot.datasets import (
-    LeRobotDataset,
-    aggregate_pipeline_dataset_features,
-    create_initial_features,
-)
-from lerobot.policies import get_policy_class, make_pre_post_processors
-from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.processor import (
-    PolicyProcessorPipeline,
-    RobotAction,
-    RobotObservation,
-    RobotProcessorPipeline,
-    make_default_processors,
-    rename_stats,
-)
-from lerobot.processor.relative_action_processor import RelativeActionsProcessorStep
-from lerobot.robots import make_robot_from_config
-from lerobot.teleoperators import Teleoperator, make_teleoperator_from_config
-from lerobot.utils.feature_utils import combine_feature_dicts, hw_to_dataset_features
-
-from .configs import BaseStrategyConfig, DAggerStrategyConfig, RolloutConfig
-from .inference import (
-    InferenceEngine,
-    RTCInferenceConfig,
-    SyncInferenceConfig,
-    create_inference_engine,
-)
-from .robot_wrapper import ThreadSafeRobot
-
-logger = logging.getLogger(__name__)
-
-
-def _resolve_action_key_order(
-    policy_action_names: list[str] | None, dataset_action_names: list[str]
-) -> list[str]:
-    """Choose action name ordering for mapping policy tensor outputs to robot action dicts."""
-    if not policy_action_names:
-        return dataset_action_names
-    policy_action_names = list(policy_action_names)
-    if len(policy_action_names) != len(dataset_action_names):
-        logger.warning(
-            "policy.action_feature_names length (%d) != dataset action dim (%d); using dataset order",
-            len(policy_action_names),
-            len(dataset_action_names),
-        )
-        return dataset_action_names
-    if set(dataset_action_names) != set(policy_action_names):
-        logger.warning("policy.action_feature_names keys don't match dataset; using dataset order")
-        return dataset_action_names
-    return policy_action_names
-
-
-# ---------------------------------------------------------------------------
-# Sub-contexts
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RuntimeContext:
-    """Runtime knobs shared with every strategy."""
-
-    cfg: RolloutConfig
-    shutdown_event: Event
-
-
-@dataclass
-class HardwareContext:
-    """Connected hardware.
-
-    The raw robot is available via ``robot_wrapper.inner`` when needed
-    (e.g. for disconnect); strategies should otherwise go through the
-    thread-safe wrapper.
-
-    ``initial_position`` stores the robot's joint positions at connect
-    time.  Strategies use it to return the robot to a safe pose before
-    shutting down.
-    """
-
-    robot_wrapper: ThreadSafeRobot
-    teleop: Teleoperator | None
-    initial_position: dict | None = None
-
-
-@dataclass
-class PolicyContext:
-    """Loaded policy and its inference engine."""
-
-    policy: PreTrainedPolicy
-    preprocessor: PolicyProcessorPipeline
-    postprocessor: PolicyProcessorPipeline
-    inference: InferenceEngine
-
-
-@dataclass
-class ProcessorContext:
-    """Robot-side pipelines (run outside the policy)."""
-
-    teleop_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
-    robot_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
-    robot_observation_processor: RobotProcessorPipeline[RobotObservation, RobotObservation]
-
-
-@dataclass
-class DatasetContext:
-    """Dataset and feature bookkeeping."""
-
-    dataset: LeRobotDataset | None
-    dataset_features: dict = field(default_factory=dict)
-    hw_features: dict = field(default_factory=dict)
-    ordered_action_keys: list[str] = field(default_factory=list)
-
-
-@dataclass
-class RolloutContext:
-    """Bundle of sub-contexts passed to every rollout strategy.
-
-    Built once by :func:`build_rollout_context` before strategy dispatch.
-    """
-
-    runtime: RuntimeContext
-    hardware: HardwareContext
-    policy: PolicyContext
-    processors: ProcessorContext
-    data: DatasetContext
-
-
-# ---------------------------------------------------------------------------
-# Build
-# ---------------------------------------------------------------------------
-
-
-def build_rollout_context(
-    cfg: RolloutConfig,
-    shutdown_event: Event,
-    teleop_action_processor: RobotProcessorPipeline | None = None,
-    robot_action_processor: RobotProcessorPipeline | None = None,
-    robot_observation_processor: RobotProcessorPipeline | None = None,
-) -> RolloutContext:
-    """Wire up policy, processors, hardware, dataset, and inference engine.
-
-    The order is policy-first / hardware-last so a bad ``--policy.path``
-    fails fast without touching the robot.
-    """
-    is_rtc = isinstance(cfg.inference, RTCInferenceConfig)
-
-    # --- 1. Policy (heavy I/O, but no hardware yet) -------------------
-    logger.info("Loading policy from '%s'...", cfg.policy.pretrained_path)
-    policy_config = cfg.policy
-    policy_class = get_policy_class(policy_config.type)
-
-    if hasattr(policy_config, "compile_model"):
-        policy_config.compile_model = cfg.use_torch_compile
-
-    if policy_config.type == "vqbet" and cfg.device == "mps":
-        raise NotImplementedError(
-            "Current implementation of VQBeT does not support `mps` backend. "
-            "Please use `cpu` or `cuda` backend."
-        )
-
-    if policy_config.use_peft:
-        from peft import PeftConfig, PeftModel
-
-        peft_path = policy_config.pretrained_path
-        peft_config = PeftConfig.from_pretrained(peft_path)
-        policy = policy_class.from_pretrained(
-            pretrained_name_or_path=peft_config.base_model_name_or_path, config=policy_config
-        )
-        policy = PeftModel.from_pretrained(policy, peft_path, config=peft_config)
-    else:
-        policy = policy_class.from_pretrained(policy_config.pretrained_path, config=policy_config)
-
-    if is_rtc:
-        policy.config.rtc_config = cfg.inference.rtc
-        if hasattr(policy, "init_rtc_processor"):
-            policy.init_rtc_processor()
-
-    policy = policy.to(cfg.device)
-    policy.eval()
-    logger.info("Policy loaded: type=%s, device=%s", policy_config.type, cfg.device)
-
-    if cfg.use_torch_compile and policy.type not in ("pi0", "pi05"):
-        try:
-            if hasattr(torch, "compile"):
-                compile_kwargs = {
-                    "backend": cfg.torch_compile_backend,
-                    "mode": cfg.torch_compile_mode,
-                    "options": {"triton.cudagraphs": False},
-                }
-                policy.predict_action_chunk = torch.compile(policy.predict_action_chunk, **compile_kwargs)
-                logger.info("torch.compile applied to predict_action_chunk")
-        except Exception as e:
-            logger.warning("Failed to apply torch.compile: %s", e)
-
-    # --- 2. Robot-side processors (user-supplied or defaults) --------
-    if (
-        teleop_action_processor is None
-        or robot_action_processor is None
-        or robot_observation_processor is None
-    ):
-        _t, _r, _o = make_default_processors()
-        teleop_action_processor = teleop_action_processor or _t
-        robot_action_processor = robot_action_processor or _r
-        robot_observation_processor = robot_observation_processor or _o
-
-    # --- 3. Hardware (heaviest side-effect, deferred) -----------------
-    logger.info("Connecting robot (%s)...", cfg.robot.type if cfg.robot else "?")
-    robot = make_robot_from_config(cfg.robot)
-    robot.connect()
-    logger.info("Robot connected: %s", robot.name)
-
-    # Store the initial joint positions so we can return to a safe pose on shutdown.
-    initial_obs = robot.get_observation()
-    initial_position = {k: v for k, v in initial_obs.items() if k.endswith(".pos")}
-    logger.info("Captured initial robot position (%d keys)", len(initial_position))
-
-    robot_wrapper = ThreadSafeRobot(robot)
-
-    teleop = None
-    if cfg.teleop is not None:
-        logger.info("Connecting teleoperator (%s)...", cfg.teleop.type if cfg.teleop else "?")
-        teleop = make_teleoperator_from_config(cfg.teleop)
-        teleop.connect()
-        logger.info("Teleoperator connected")
-
-    # TODO(Steven): once Teleoperator motor-control methods are standardised
-    # (``enable_torque`` / ``disable_torque`` / ``write_goal_positions``), gate
-    # the DAgger strategy on their presence here and fail fast with a helpful
-    # message instead of relying on the operator to pre-align the leader by
-    # hand.  See :func:`DAggerStrategy._apply_transition` for the matching
-    # disabled call sites.
-    # if isinstance(cfg.strategy, DAggerStrategyConfig) and teleop is not None:
-    #     required_teleop_methods = ("enable_torque", "disable_torque", "write_goal_positions")
-    #     missing = [m for m in required_teleop_methods if not callable(getattr(teleop, m, None))]
-    #     if missing:
-    #         teleop.disconnect()
-    #         raise ValueError(
-    #             f"DAgger strategy requires a teleoperator with motor control methods "
-    #             f"{required_teleop_methods}. '{type(teleop).__name__}' is missing: {missing}"
-    #         )
-
-    # --- 4. Features + action-key reconciliation ---------------------
-    # TODO(Steven):Only ``.pos`` joint features are routed to the policy as state and as the
-    # action target; velocity and torque channels (when present) are kept in
-    # the raw observation but excluded from the policy-facing tensors.
-    all_obs_features = robot.observation_features
-    # ``observation_features`` values are either a tuple (camera shape) or the
-    # ``float`` type itself used as a sentinel for scalar motor features —
-    # see ``dict[str, type | tuple]`` annotation on ``Robot.observation_features``.
-    observation_features_hw = {
-        k: v
-        for k, v in all_obs_features.items()
-        if isinstance(v, tuple) or (v is float and k.endswith(".pos"))
-    }
-    action_features_hw = {k: v for k, v in robot.action_features.items() if k.endswith(".pos")}
-
-    # The action side is always needed: sync inference reads action names from
-    # ``dataset_features[ACTION]`` to map policy tensors back to robot actions.
-    action_dataset_features = aggregate_pipeline_dataset_features(
-        pipeline=teleop_action_processor,
-        initial_features=create_initial_features(action=action_features_hw),
-        use_videos=cfg.dataset.video if cfg.dataset else True,
-    )
-    # Observation-side aggregation is needed because of build_dataset_frame
-    observation_dataset_features = aggregate_pipeline_dataset_features(
-        pipeline=robot_observation_processor,
-        initial_features=create_initial_features(observation=observation_features_hw),
-        use_videos=cfg.dataset.video if cfg.dataset else True,
-    )
-    dataset_features = combine_feature_dicts(action_dataset_features, observation_dataset_features)
-    hw_features = hw_to_dataset_features(observation_features_hw, "observation")
-    raw_action_keys = list(action_features_hw.keys())
-    policy_action_names = getattr(policy_config, "action_feature_names", None)
-    ordered_action_keys = _resolve_action_key_order(
-        list(policy_action_names) if policy_action_names else None,
-        raw_action_keys,
-    )
-
-    # Validate visual features if no rename_map is active
-    rename_map = cfg.rename_map
-    if not rename_map:
-        expected_visuals = {
-            k for k, v in policy_config.input_features.items() if v.type == FeatureType.VISUAL
-        }
-        provided_visuals = {
-            f"observation.images.{k}" for k, v in robot.observation_features.items() if isinstance(v, tuple)
-        }
-        policy_subset = expected_visuals.issubset(provided_visuals)
-        hw_subset = provided_visuals.issubset(expected_visuals)
-        if not (policy_subset or hw_subset):
-            raise ValueError(
-                f"Visual feature mismatch between policy and robot hardware.\n"
-                f"Policy expects: {expected_visuals}\n"
-                f"Robot provides: {provided_visuals}"
-            )
-
-    # --- 5. Dataset -------------
-    dataset = None
-    if cfg.dataset is not None and not isinstance(cfg.strategy, BaseStrategyConfig):
-        logger.info("Setting up dataset (repo_id=%s)...", cfg.dataset.repo_id)
-        if cfg.resume:
-            dataset = LeRobotDataset.resume(
-                cfg.dataset.repo_id,
-                root=cfg.dataset.root,
-                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
-                streaming_encoding=cfg.dataset.streaming_encoding,
-                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
-                image_writer_processes=cfg.dataset.num_image_writer_processes,
-                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
-                * len(robot.cameras if hasattr(robot, "cameras") else []),
-            )
-        else:
-            if isinstance(cfg.strategy, DAggerStrategyConfig):
-                dataset_features["intervention"] = {
-                    "dtype": "bool",
-                    "shape": (1,),
-                    "names": None,
-                }
-
-            repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
-            if not repo_name.startswith("rollout_"):
-                raise ValueError(
-                    "Dataset names for rollout must start with 'rollout_'. "
-                    "Use --dataset.repo_id=<user>/rollout_<name> for policy deployment datasets."
-                )
-            cfg.dataset.stamp_repo_id()
-            target_video_mb = getattr(cfg.strategy, "target_video_file_size_mb", None)
-            dataset = LeRobotDataset.create(
-                cfg.dataset.repo_id,
-                cfg.dataset.fps,
-                root=cfg.dataset.root,
-                robot_type=robot.name,
-                features=dataset_features,
-                use_videos=cfg.dataset.video,
-                image_writer_processes=cfg.dataset.num_image_writer_processes,
-                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
-                * len(robot.cameras if hasattr(robot, "cameras") else []),
-                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
-                streaming_encoding=cfg.dataset.streaming_encoding,
-                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
-                video_files_size_in_mb=target_video_mb,
-            )
-
-    if dataset is not None:
-        logger.info("Dataset ready: %s (%d existing episodes)", dataset.repo_id, dataset.num_episodes)
-
-    # --- 6. Policy pre/post processors (needs dataset stats if any) ---
-    dataset_stats = None
-    if dataset is not None:
-        dataset_stats = rename_stats(
-            dataset.meta.stats,
-            cfg.rename_map,
-        )
-
-    preprocessor, postprocessor = make_pre_post_processors(
-        policy_cfg=policy_config,
-        pretrained_path=cfg.policy.pretrained_path,
-        dataset_stats=dataset_stats,
-        preprocessor_overrides={
-            "device_processor": {"device": cfg.device},
-            "rename_observations_processor": {"rename_map": cfg.rename_map},
-        },
-    )
-
-    if isinstance(cfg.inference, SyncInferenceConfig) and any(
-        isinstance(step, RelativeActionsProcessorStep) and step.enabled
-        for step in getattr(preprocessor, "steps", ())
-    ):
-        raise NotImplementedError(
-            "SyncInferenceEngine does not support policies with relative actions for now."
-            "Use --inference.type=rtc or remove relative action processor steps from the policy pipeline."
-        )
-
-    # --- 7. Inference strategy (needs policy + pre/post + hardware) --
-    logger.info(
-        "Creating inference engine (type=%s)...",
-        cfg.inference.type if hasattr(cfg.inference, "type") else "sync",
-    )
-    task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-    inference_strategy = create_inference_engine(
-        cfg.inference,
-        policy=policy,
-        preprocessor=preprocessor,
-        postprocessor=postprocessor,
-        robot_wrapper=robot_wrapper,
-        hw_features=hw_features,
-        dataset_features=dataset_features,
-        ordered_action_keys=ordered_action_keys,
-        task=task_str,
-        fps=cfg.fps,
-        device=cfg.device,
-        use_torch_compile=cfg.use_torch_compile,
-        compile_warmup_inferences=cfg.compile_warmup_inferences,
-        shutdown_event=shutdown_event,
-    )
-
-    # --- 8. Assemble ---------------------------------------------------
-    logger.info("Rollout context assembled successfully")
-    return RolloutContext(
-        runtime=RuntimeContext(cfg=cfg, shutdown_event=shutdown_event),
-        hardware=HardwareContext(
-            robot_wrapper=robot_wrapper, teleop=teleop, initial_position=initial_position
-        ),
-        policy=PolicyContext(
-            policy=policy,
-            preprocessor=preprocessor,
-            postprocessor=postprocessor,
-            inference=inference_strategy,
-        ),
-        processors=ProcessorContext(
-            teleop_action_processor=teleop_action_processor,
-            robot_action_processor=robot_action_processor,
-            robot_observation_processor=robot_observation_processor,
-        ),
-        data=DatasetContext(
-            dataset=dataset,
-            dataset_features=dataset_features,
-            hw_features=hw_features,
-            ordered_action_keys=ordered_action_keys,
-        ),
-    )
@@ -1,39 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Inference engine package — backend-agnostic action production.
-
-Concrete backends (``sync``, ``rtc``, ...) expose the same small interface so
-rollout strategies never branch on which backend is in use.
-"""
-
-from .base import InferenceEngine
-from .factory import (
-    InferenceEngineConfig,
-    RTCInferenceConfig,
-    SyncInferenceConfig,
-    create_inference_engine,
-)
-from .rtc import RTCInferenceEngine
-from .sync import SyncInferenceEngine
-
-__all__ = [
-    "InferenceEngine",
-    "InferenceEngineConfig",
-    "RTCInferenceConfig",
-    "RTCInferenceEngine",
-    "SyncInferenceConfig",
-    "SyncInferenceEngine",
-    "create_inference_engine",
-]
@@ -1,89 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Inference engine ABC.
-
-Rollout strategies consume actions through this small interface so they
-do not need to know whether inference happens inline on the control thread
-or asynchronously in a background thread (RTC).
-"""
-
-from __future__ import annotations
-
-import abc
-
-import torch
-
-
-class InferenceEngine(abc.ABC):
-    """Abstract backend for producing actions during rollout.
-
-    Subclasses decide whether inference happens inline on the control
-    thread or asynchronously in a background thread.  The contract is
-    minimal so additional backends can be plugged in without touching
-    rollout strategies.
-
-    Lifecycle
-    ---------
-    ``start`` — prepare the backend (e.g. launch a background thread).
-    ``stop`` — shut the backend down cleanly.
-    ``reset`` — clear episode-scoped state (policy hidden state, queues…).
-
-    Action production
-    -----------------
-    ``get_action(obs_frame)`` — return the next action tensor, or
-    ``None`` if none is available (e.g. async queue empty).  Sync
-    backends always compute from ``obs_frame``; async backends ignore
-    it (they receive observations via ``notify_observation``).
-
-    Optional hooks
-    --------------
-    ``notify_observation`` / ``pause`` / ``resume`` have a no-op default
-    so rollout strategies can invoke them unconditionally.
-    """
-
-    @abc.abstractmethod
-    def start(self) -> None:
-        """Initialise the backend."""
-
-    @abc.abstractmethod
-    def stop(self) -> None:
-        """Tear the backend down."""
-
-    @abc.abstractmethod
-    def reset(self) -> None:
-        """Clear episode-scoped state."""
-
-    @abc.abstractmethod
-    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
-        """Return the next action tensor, or ``None`` if unavailable."""
-
-    def notify_observation(self, obs: dict) -> None:  # noqa: B027
-        """Publish the latest processed observation.  Default: no-op."""
-
-    def pause(self) -> None:  # noqa: B027
-        """Pause background inference.  Default: no-op."""
-
-    def resume(self) -> None:  # noqa: B027
-        """Resume background inference.  Default: no-op."""
-
-    @property
-    def ready(self) -> bool:
-        """True once the backend can produce actions (e.g. warmup done)."""
-        return True
-
-    @property
-    def failed(self) -> bool:
-        """True if an unrecoverable error occurred in the backend."""
-        return False
@@ -1,128 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Inference engine configs and factory.
-
-Selection is explicit via ``--inference.type=sync|rtc``.  Adding a new
-backend requires registering its config subclass and dispatching it in
-:func:`create_inference_engine`.
-"""
-
-from __future__ import annotations
-
-import abc
-import logging
-from dataclasses import dataclass, field
-from threading import Event
-
-import draccus
-
-from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.policies.rtc.configuration_rtc import RTCConfig
-from lerobot.processor import PolicyProcessorPipeline
-
-from ..robot_wrapper import ThreadSafeRobot
-from .base import InferenceEngine
-from .rtc import RTCInferenceEngine
-from .sync import SyncInferenceEngine
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# Configs
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class InferenceEngineConfig(draccus.ChoiceRegistry, abc.ABC):
-    """Abstract base for inference backend configuration.
-
-    Use ``--inference.type=<name>`` on the CLI to select a backend.
-    """
-
-    @property
-    def type(self) -> str:
-        return self.get_choice_name(self.__class__)
-
-
-@InferenceEngineConfig.register_subclass("sync")
-@dataclass
-class SyncInferenceConfig(InferenceEngineConfig):
-    """Inline synchronous inference (one policy call per control tick)."""
-
-
-@InferenceEngineConfig.register_subclass("rtc")
-@dataclass
-class RTCInferenceConfig(InferenceEngineConfig):
-    """Real-Time Chunking: async policy inference in a background thread."""
-
-    # Eagerly constructed so draccus exposes nested fields directly on the CLI
-    # (e.g. ``--inference.rtc.execution_horizon=...``).
-    rtc: RTCConfig = field(default_factory=RTCConfig)
-    queue_threshold: int = 30
-
-
-# ---------------------------------------------------------------------------
-# Factory
-# ---------------------------------------------------------------------------
-
-
-def create_inference_engine(
-    config: InferenceEngineConfig,
-    *,
-    policy: PreTrainedPolicy,
-    preprocessor: PolicyProcessorPipeline,
-    postprocessor: PolicyProcessorPipeline,
-    robot_wrapper: ThreadSafeRobot,
-    hw_features: dict,
-    dataset_features: dict,
-    ordered_action_keys: list[str],
-    task: str,
-    fps: float,
-    device: str | None,
-    use_torch_compile: bool = False,
-    compile_warmup_inferences: int = 2,
-    shutdown_event: Event | None = None,
-) -> InferenceEngine:
-    """Instantiate the appropriate inference engine from a config object."""
-    logger.info("Creating inference engine: %s", config.type)
-    if isinstance(config, SyncInferenceConfig):
-        return SyncInferenceEngine(
-            policy=policy,
-            preprocessor=preprocessor,
-            postprocessor=postprocessor,
-            dataset_features=dataset_features,
-            ordered_action_keys=ordered_action_keys,
-            task=task,
-            device=device,
-            robot_type=robot_wrapper.robot_type,
-        )
-    if isinstance(config, RTCInferenceConfig):
-        return RTCInferenceEngine(
-            policy=policy,
-            preprocessor=preprocessor,
-            postprocessor=postprocessor,
-            robot_wrapper=robot_wrapper,
-            rtc_config=config.rtc,
-            hw_features=hw_features,
-            task=task,
-            fps=fps,
-            device=device,
-            use_torch_compile=use_torch_compile,
-            compile_warmup_inferences=compile_warmup_inferences,
-            rtc_queue_threshold=config.queue_threshold,
-            shutdown_event=shutdown_event,
-        )
-    raise ValueError(f"Unknown inference engine type: {type(config).__name__}")
@@ -1,360 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Real-Time Chunking inference engine.
-
-A background thread produces action chunks asynchronously via
-:meth:`policy.predict_action_chunk`.  The main control loop polls
-``get_action`` for the next ready action; observations flow the other
-way via ``notify_observation``.
-"""
-
-from __future__ import annotations
-
-import logging
-import math
-import time
-import traceback
-from threading import Event, Lock, Thread
-from typing import Any
-
-import torch
-
-from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.policies.rtc import ActionQueue, LatencyTracker, reanchor_relative_rtc_prefix
-from lerobot.policies.rtc.configuration_rtc import RTCConfig
-from lerobot.policies.utils import prepare_observation_for_inference
-from lerobot.processor import (
-    NormalizerProcessorStep,
-    PolicyProcessorPipeline,
-    RelativeActionsProcessorStep,
-)
-from lerobot.utils.feature_utils import build_dataset_frame
-
-from ..robot_wrapper import ThreadSafeRobot
-from .base import InferenceEngine
-
-logger = logging.getLogger(__name__)
-
-# How long the RTC loop sleeps when paused, idle, or backpressured by a full queue.
-_RTC_IDLE_SLEEP_S: float = 0.01
-# Backoff between transient inference errors (per consecutive failure).
-_RTC_ERROR_RETRY_DELAY_S: float = 0.5
-# Consecutive transient errors tolerated before giving up and propagating shutdown.
-_RTC_MAX_CONSECUTIVE_ERRORS: int = 10
-# Hard timeout for joining the RTC thread on stop().
-_RTC_JOIN_TIMEOUT_S: float = 3.0
-
-
-# ---------------------------------------------------------------------------
-# RTC helpers
-# ---------------------------------------------------------------------------
-
-
-def _normalize_prev_actions_length(prev_actions: torch.Tensor, target_steps: int) -> torch.Tensor:
-    """Pad or truncate RTC prefix actions to a fixed length for stable compiled inference."""
-    if prev_actions.ndim != 2:
-        raise ValueError(f"Expected 2D [T, A] tensor, got shape={tuple(prev_actions.shape)}")
-    steps, action_dim = prev_actions.shape
-    if steps == target_steps:
-        return prev_actions
-    if steps > target_steps:
-        return prev_actions[:target_steps]
-    padded = torch.zeros((target_steps, action_dim), dtype=prev_actions.dtype, device=prev_actions.device)
-    padded[:steps] = prev_actions
-    return padded
-
-
-# ---------------------------------------------------------------------------
-# RTCInferenceEngine
-# ---------------------------------------------------------------------------
-
-
-class RTCInferenceEngine(InferenceEngine):
-    """Async RTC inference: a background thread produces action chunks.
-
-    ``get_action`` pops the next action from the shared queue (or
-    returns ``None`` if the queue is empty).  The main loop should call
-    ``notify_observation`` every tick and ``pause``/``resume`` around
-    human-intervention phases.
-    """
-
-    def __init__(
-        self,
-        policy: PreTrainedPolicy,
-        preprocessor: PolicyProcessorPipeline,
-        postprocessor: PolicyProcessorPipeline,
-        robot_wrapper: ThreadSafeRobot,
-        rtc_config: RTCConfig,
-        hw_features: dict,
-        task: str,
-        fps: float,
-        device: str | None,
-        use_torch_compile: bool = False,
-        compile_warmup_inferences: int = 2,
-        rtc_queue_threshold: int = 30,
-        shutdown_event: Event | None = None,
-    ) -> None:
-        self._policy = policy
-        self._preprocessor = preprocessor
-        self._postprocessor = postprocessor
-        self._robot = robot_wrapper
-        self._rtc_config = rtc_config
-        self._hw_features = hw_features
-        self._task = task
-        self._fps = fps
-        self._device = device or "cpu"
-        self._use_torch_compile = use_torch_compile
-        self._compile_warmup_inferences = compile_warmup_inferences
-        self._rtc_queue_threshold = rtc_queue_threshold
-
-        self._action_queue: ActionQueue | None = None
-        self._obs_holder: dict[str, Any] = {}
-        self._obs_lock = Lock()
-        self._policy_active = Event()
-        self._compile_warmup_done = Event()
-        self._shutdown_event = Event()
-        self._rtc_error = Event()
-        self._global_shutdown_event = shutdown_event
-        self._rtc_thread: Thread | None = None
-
-        if not self._use_torch_compile:
-            self._compile_warmup_done.set()
-            logger.info("RTCInferenceEngine initialized (torch.compile disabled, no warmup needed)")
-        else:
-            logger.info(
-                "RTCInferenceEngine initialized (torch.compile enabled, %d warmup inferences)",
-                compile_warmup_inferences,
-            )
-
-        # Processor introspection for relative-action re-anchoring.
-        self._relative_step = next(
-            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
-            None,
-        )
-        self._normalizer_step = next(
-            (s for s in preprocessor.steps if isinstance(s, NormalizerProcessorStep)),
-            None,
-        )
-        if self._relative_step is not None:
-            if self._relative_step.action_names is None:
-                cfg_names = getattr(policy.config, "action_feature_names", None)
-                if cfg_names:
-                    self._relative_step.action_names = list(cfg_names)
-                else:
-                    self._relative_step.action_names = [
-                        k for k in robot_wrapper.action_features if k.endswith(".pos")
-                    ]
-            logger.info("Relative actions enabled: RTC prefix will be re-anchored")
-
-    # ------------------------------------------------------------------
-    # Lifecycle
-    # ------------------------------------------------------------------
-
-    @property
-    def ready(self) -> bool:
-        """True once torch.compile warmup is complete (or immediately if compile is disabled)."""
-        return self._compile_warmup_done.is_set()
-
-    @property
-    def failed(self) -> bool:
-        """True if the RTC background thread exited due to an unrecoverable error."""
-        return self._rtc_error.is_set()
-
-    @property
-    def action_queue(self) -> ActionQueue | None:
-        """The shared action queue between the RTC thread and the main loop."""
-        return self._action_queue
-
-    def start(self) -> None:
-        """Launch the RTC background thread."""
-        self._action_queue = ActionQueue(self._rtc_config)
-        self._obs_holder = {
-            "obs": None,
-            "robot_type": self._robot.robot_type,
-        }
-        self._shutdown_event.clear()
-        self._rtc_thread = Thread(
-            target=self._rtc_loop,
-            daemon=True,
-            name="RTCInference",
-        )
-        self._rtc_thread.start()
-        logger.info("RTC inference thread started")
-
-    def stop(self) -> None:
-        """Signal the RTC thread to stop and wait for it."""
-        logger.info("Stopping RTC inference thread...")
-        self._shutdown_event.set()
-        self._policy_active.clear()
-        if self._rtc_thread is not None and self._rtc_thread.is_alive():
-            self._rtc_thread.join(timeout=_RTC_JOIN_TIMEOUT_S)
-            if self._rtc_thread.is_alive():
-                logger.warning("RTC thread did not join within %.1fs", _RTC_JOIN_TIMEOUT_S)
-            else:
-                logger.info("RTC inference thread stopped")
-            self._rtc_thread = None
-
-    def pause(self) -> None:
-        """Pause the RTC background thread."""
-        logger.info("Pausing RTC inference thread")
-        self._policy_active.clear()
-
-    def resume(self) -> None:
-        """Resume the RTC background thread."""
-        logger.info("Resuming RTC inference thread")
-        self._policy_active.set()
-
-    def reset(self) -> None:
-        """Reset the policy, processors, and action queue."""
-        logger.info("Resetting RTC inference state (policy + processors + queue)")
-        self._policy.reset()
-        self._preprocessor.reset()
-        self._postprocessor.reset()
-        if self._action_queue is not None:
-            self._action_queue.clear()
-
-    # ------------------------------------------------------------------
-    # Action production (called from main thread)
-    # ------------------------------------------------------------------
-
-    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
-        """Pop the next action from the RTC queue (ignores ``obs_frame``)."""
-        if self._action_queue is None:
-            return None
-        return self._action_queue.get()
-
-    def notify_observation(self, obs: dict) -> None:
-        """Publish the latest observation for the RTC thread to consume."""
-        with self._obs_lock:
-            self._obs_holder["obs"] = obs
-
-    # ------------------------------------------------------------------
-    # RTC: background inference thread
-    # ------------------------------------------------------------------
-
-    def _rtc_loop(self) -> None:
-        """Background thread that generates action chunks via RTC."""
-        try:
-            latency_tracker = LatencyTracker()
-            time_per_chunk = 1.0 / self._fps
-            policy_device = torch.device(self._device)
-
-            warmup_required = max(1, self._compile_warmup_inferences) if self._use_torch_compile else 0
-            inference_count = 0
-            consecutive_errors = 0
-
-            while not self._shutdown_event.is_set():
-                if not self._policy_active.is_set():
-                    time.sleep(_RTC_IDLE_SLEEP_S)
-                    continue
-
-                queue = self._action_queue
-                with self._obs_lock:
-                    obs = self._obs_holder.get("obs")
-                if queue is None or obs is None:
-                    time.sleep(_RTC_IDLE_SLEEP_S)
-                    continue
-
-                if queue.qsize() <= self._rtc_queue_threshold:
-                    try:
-                        current_time = time.perf_counter()
-                        idx_before = queue.get_action_index()
-                        prev_actions = queue.get_left_over()
-
-                        latency = latency_tracker.max()
-                        delay = math.ceil(latency / time_per_chunk) if latency else 0
-
-                        obs_batch = build_dataset_frame(self._hw_features, obs, prefix="observation")
-                        obs_batch = prepare_observation_for_inference(
-                            obs_batch, policy_device, self._task, self._robot.robot_type
-                        )
-                        obs_batch["task"] = [self._task]
-
-                        preprocessed = self._preprocessor(obs_batch)
-
-                        if prev_actions is not None and self._relative_step is not None:
-                            # Rebase against the raw cached state so the leftover tail stays in
-                            # the training-time coordinate frame.
-                            raw_state = self._relative_step.get_cached_state()
-                            if raw_state is not None:
-                                prev_abs = queue.get_processed_left_over()
-                                if prev_abs is not None and prev_abs.numel() > 0:
-                                    prev_actions = reanchor_relative_rtc_prefix(
-                                        prev_actions_absolute=prev_abs,
-                                        current_state=raw_state,
-                                        relative_step=self._relative_step,
-                                        normalizer_step=self._normalizer_step,
-                                        policy_device=policy_device,
-                                    )
-
-                        if prev_actions is not None:
-                            prev_actions = _normalize_prev_actions_length(
-                                prev_actions, target_steps=self._rtc_config.execution_horizon
-                            )
-
-                        actions = self._policy.predict_action_chunk(
-                            preprocessed, inference_delay=delay, prev_chunk_left_over=prev_actions
-                        )
-
-                        original = actions.squeeze(0).clone()
-                        processed = self._postprocessor(actions).squeeze(0)
-                        new_latency = time.perf_counter() - current_time
-                        new_delay = math.ceil(new_latency / time_per_chunk)
-
-                        inference_count += 1
-                        consecutive_errors = 0
-                        is_warmup = self._use_torch_compile and inference_count <= warmup_required
-                        if is_warmup:
-                            latency_tracker.reset()
-                        else:
-                            latency_tracker.add(new_latency)
-
-                        queue.merge(original, processed, new_delay, idx_before)
-
-                        if (
-                            is_warmup
-                            and inference_count >= warmup_required
-                            and not self._compile_warmup_done.is_set()
-                        ):
-                            self._compile_warmup_done.set()
-                            logger.info("Compile warmup complete (%d inferences)", inference_count)
-
-                        logger.debug("RTC inference latency=%.2fs, queue=%d", new_latency, queue.qsize())
-
-                    except Exception as e:
-                        consecutive_errors += 1
-                        logger.error(
-                            "RTC inference error (%d/%d): %s",
-                            consecutive_errors,
-                            _RTC_MAX_CONSECUTIVE_ERRORS,
-                            e,
-                        )
-                        logger.debug(traceback.format_exc())
-                        if consecutive_errors >= _RTC_MAX_CONSECUTIVE_ERRORS:
-                            # Persistent failure: stop retrying and propagate shutdown.
-                            raise
-                        time.sleep(_RTC_ERROR_RETRY_DELAY_S)
-                else:
-                    time.sleep(_RTC_IDLE_SLEEP_S)
-
-        except Exception as e:
-            logger.error("Fatal error in RTC thread: %s", e)
-            logger.error(traceback.format_exc())
-            self._rtc_error.set()
-            # Unblock any warmup waiters so the main loop doesn't spin forever
-            self._compile_warmup_done.set()
-            # Signal the top-level shutdown so strategies exit their control loops
-            if self._global_shutdown_event is not None:
-                self._global_shutdown_event.set()
@@ -1,122 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Synchronous inference engine: inline policy call per control tick."""
-
-from __future__ import annotations
-
-import logging
-from contextlib import nullcontext
-from copy import copy
-
-import torch
-
-from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.policies.utils import make_robot_action, prepare_observation_for_inference
-from lerobot.processor import PolicyProcessorPipeline
-
-from .base import InferenceEngine
-
-logger = logging.getLogger(__name__)
-
-
-# TODO(Steven): support relative-action policies.  The per-tick flow refreshes
-# ``RelativeActionsProcessorStep._last_state`` every call, so cached chunk
-# actions popped on later ticks get reanchored to the *current* robot state and
-# absolute targets drift through the chunk.  Relative-action policies are
-# rejected at context-build time today; RTC postprocesses the whole chunk and
-# is unaffected.
-#
-# Candidate fix: drive the policy via ``predict_action_chunk`` and serve a
-# local FIFO of postprocessed actions.  Eliminates drift by construction and
-# saves per-tick pre/post work, but bypasses ``select_action`` — needs
-# fallbacks for SAC (raises), ACT temporal ensembling (ensembler lives in
-# ``select_action``), and Diffusion-family (obs-history queues populated as a
-# side effect of ``select_action``).
-
-
-class SyncInferenceEngine(InferenceEngine):
-    """Inline synchronous inference: compute one action per call.
-
-    ``get_action`` runs the full policy pipeline (pre/post-processor +
-    ``select_action``) on the given observation frame and returns a
-    CPU action tensor reordered to match the dataset action keys.
-    """
-
-    def __init__(
-        self,
-        policy: PreTrainedPolicy,
-        preprocessor: PolicyProcessorPipeline,
-        postprocessor: PolicyProcessorPipeline,
-        dataset_features: dict,
-        ordered_action_keys: list[str],
-        task: str,
-        device: str | None,
-        robot_type: str,
-    ) -> None:
-        self._policy = policy
-        self._preprocessor = preprocessor
-        self._postprocessor = postprocessor
-        self._dataset_features = dataset_features
-        self._ordered_action_keys = ordered_action_keys
-        self._task = task
-        self._device = torch.device(device or "cpu")
-        self._robot_type = robot_type
-        logger.info(
-            "SyncInferenceEngine initialized (device=%s, action_keys=%d)",
-            self._device,
-            len(ordered_action_keys),
-        )
-
-    def start(self) -> None:
-        """No background resources to start."""
-        logger.info("SyncInferenceEngine started (inline mode — no background thread)")
-
-    def stop(self) -> None:
-        """No background resources to stop."""
-        logger.info("SyncInferenceEngine stopped")
-
-    def reset(self) -> None:
-        """Reset the policy and pre/post-processors."""
-        logger.info("Resetting sync inference state (policy + processors)")
-        self._policy.reset()
-        self._preprocessor.reset()
-        self._postprocessor.reset()
-
-    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
-        """Run the full inference pipeline on ``obs_frame`` and return an action tensor."""
-        if obs_frame is None:
-            return None
-        # Shallow copy is intentional: the caller (`send_next_action`) builds
-        # ``obs_frame`` fresh per tick via ``build_dataset_frame``, so the
-        # tensor/array values are not shared with any other reader.
-        observation = copy(obs_frame)
-        autocast_ctx = (
-            torch.autocast(device_type=self._device.type)
-            if self._device.type == "cuda" and self._policy.config.use_amp
-            else nullcontext()
-        )
-        with torch.inference_mode(), autocast_ctx:
-            observation = prepare_observation_for_inference(
-                observation, self._device, self._task, self._robot_type
-            )
-            observation = self._preprocessor(observation)
-            action = self._policy.select_action(observation)
-            action = self._postprocessor(action)
-        action_tensor = action.squeeze(0).cpu()
-
-        # Reorder to match dataset action ordering so the caller can treat
-        # the returned tensor uniformly across backends.
-        action_dict = make_robot_action(action_tensor, self._dataset_features)
-        return torch.tensor([action_dict[k] for k in self._ordered_action_keys])
@@ -1,112 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Memory-bounded ring buffer for the Highlight Reel rollout strategy."""
-
-from __future__ import annotations
-
-from collections import deque
-
-import numpy as np
-import torch
-
-
-class RolloutRingBuffer:
-    """Fixed-capacity circular buffer for observation/action frames.
-
-    Stores the last *N* seconds of telemetry in memory, bounded by both
-    time (``max_frames``) and memory (``max_memory_bytes``).  When either
-    limit is reached the oldest frames are evicted.
-
-    .. note::
-       This class is **single-threaded**.  ``append``/``drain``/``clear``
-       must all be called from the same thread (the rollout main loop).
-       Concurrent access from a background thread will corrupt
-       ``_current_bytes`` accounting.
-
-    Parameters
-    ----------
-    max_seconds:
-        Maximum duration of buffered telemetry.
-    max_memory_mb:
-        Hard memory cap in MiB.  Frames are evicted when the estimated
-        total size exceeds this.
-    fps:
-        Frames per second — used to convert ``max_seconds`` to a frame
-        count.
-    """
-
-    def __init__(self, max_seconds: float = 30.0, max_memory_mb: int = 2048, fps: float = 30.0) -> None:
-        self._max_frames = int(max_seconds * fps)
-        self._max_bytes = int(max_memory_mb * 1024 * 1024)
-        self._buffer: deque[dict] = deque(maxlen=self._max_frames)
-        self._current_bytes: int = 0
-
-    # ------------------------------------------------------------------
-    # Public API
-    # ------------------------------------------------------------------
-
-    def append(self, frame: dict) -> None:
-        """Add *frame* to the buffer, evicting the oldest if at capacity."""
-        frame_bytes = _estimate_frame_bytes(frame)
-
-        # Evict oldest frames until we are under the memory cap
-        while self._current_bytes + frame_bytes > self._max_bytes and self._buffer:
-            evicted = self._buffer.popleft()
-            self._current_bytes -= _estimate_frame_bytes(evicted)
-
-        self._buffer.append(frame)
-        self._current_bytes += frame_bytes
-
-    def drain(self) -> list[dict]:
-        """Return all buffered frames and clear the buffer."""
-        frames = list(self._buffer)
-        self._buffer.clear()
-        self._current_bytes = 0
-        return frames
-
-    def clear(self) -> None:
-        """Discard all buffered frames."""
-        self._buffer.clear()
-        self._current_bytes = 0
-
-    def __len__(self) -> int:
-        return len(self._buffer)
-
-    @property
-    def estimated_bytes(self) -> int:
-        """Estimated total byte size of all buffered frames."""
-        return self._current_bytes
-
-
-# ------------------------------------------------------------------
-# Helpers
-# ------------------------------------------------------------------
-
-
-def _estimate_frame_bytes(frame: dict) -> int:
-    """Rough byte estimate for a single frame dictionary."""
-    total = 0
-    for v in frame.values():
-        if isinstance(v, torch.Tensor):
-            # ``torch.Tensor`` has no ``nbytes``; compute it explicitly so the
-            # memory cap is honoured even when frames hold unconverted tensors.
-            total += v.nelement() * v.element_size()
-        elif isinstance(v, np.ndarray) or hasattr(v, "nbytes"):
-            total += v.nbytes
-        elif isinstance(v, (int, float)):
-            total += 8
-        elif isinstance(v, (str, bytes)):
-            total += len(v)
-    return max(total, 1)  # avoid zero-size frames
@@ -1,79 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Thread-safe robot wrapper for concurrent observation/action access."""
-
-from __future__ import annotations
-
-from threading import Lock
-from typing import Any
-
-from lerobot.robots import Robot
-
-
-class ThreadSafeRobot:
-    """Lock-protected wrapper around a :class:`Robot` for use with background threads.
-
-    When RTC inference runs in a background thread while the main loop
-    executes actions, both threads may access the robot concurrently.
-    This wrapper serialises ``get_observation`` and ``send_action`` calls.
-
-    Read-only properties are proxied without the lock since they don't
-    mutate hardware state.
-    """
-
-    def __init__(self, robot: Robot) -> None:
-        self._robot = robot
-        self._lock = Lock()
-
-    # -- Lock-protected I/O --------------------------------------------------
-
-    def get_observation(self) -> dict[str, Any]:
-        with self._lock:
-            return self._robot.get_observation()
-
-    def send_action(self, action: dict[str, Any] | Any) -> Any:
-        with self._lock:
-            return self._robot.send_action(action)
-
-    # -- Read-only proxies (no lock needed) -----------------------------------
-
-    @property
-    def observation_features(self) -> dict:
-        return self._robot.observation_features
-
-    @property
-    def action_features(self) -> dict:
-        return self._robot.action_features
-
-    @property
-    def name(self) -> str:
-        return self._robot.name
-
-    @property
-    def robot_type(self) -> str:
-        return self._robot.robot_type
-
-    @property
-    def cameras(self):
-        return getattr(self._robot, "cameras", {})
-
-    @property
-    def is_connected(self) -> bool:
-        return self._robot.is_connected
-
-    @property
-    def inner(self) -> Robot:
-        """Access the underlying robot (e.g. for connect/disconnect)."""
-        return self._robot
@@ -1,36 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Rollout strategies — public API re-exports."""
-
-from .base import BaseStrategy
-from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
-from .dagger import DAggerEvents, DAggerPhase, DAggerStrategy
-from .factory import create_strategy
-from .highlight import HighlightStrategy
-from .sentry import SentryStrategy
-
-__all__ = [
-    "BaseStrategy",
-    "DAggerEvents",
-    "DAggerPhase",
-    "DAggerStrategy",
-    "HighlightStrategy",
-    "RolloutStrategy",
-    "SentryStrategy",
-    "create_strategy",
-    "estimate_max_episode_seconds",
-    "safe_push_to_hub",
-    "send_next_action",
-]
@@ -1,85 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Base rollout strategy: autonomous policy execution with no data recording."""
-
-from __future__ import annotations
-
-import logging
-import time
-
-from lerobot.utils.robot_utils import precise_sleep
-
-from ..context import RolloutContext
-from .core import RolloutStrategy, send_next_action
-
-logger = logging.getLogger(__name__)
-
-
-class BaseStrategy(RolloutStrategy):
-    """Autonomous policy rollout with no data recording.
-
-    All actions flow through the ``robot_action_processor`` pipeline
-    before reaching the robot.
-    """
-
-    def setup(self, ctx: RolloutContext) -> None:
-        """Initialise the inference engine."""
-        self._init_engine(ctx)
-        logger.info("Base strategy ready")
-
-    def run(self, ctx: RolloutContext) -> None:
-        """Run the autonomous control loop until shutdown or duration expires."""
-        engine = self._engine
-        cfg = ctx.runtime.cfg
-        robot = ctx.hardware.robot_wrapper
-        interpolator = self._interpolator
-
-        control_interval = interpolator.get_control_interval(cfg.fps)
-
-        start_time = time.perf_counter()
-        engine.resume()
-        logger.info("Base strategy control loop started")
-
-        while not ctx.runtime.shutdown_event.is_set():
-            loop_start = time.perf_counter()
-
-            if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
-                logger.info("Duration limit reached (%.0fs)", cfg.duration)
-                break
-
-            obs = robot.get_observation()
-            obs_processed = self._process_observation_and_notify(ctx.processors, obs)
-
-            if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
-                continue
-
-            action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
-            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
-
-            dt = time.perf_counter() - loop_start
-            if (sleep_t := control_interval - dt) > 0:
-                precise_sleep(sleep_t)
-            else:
-                logger.warning(
-                    f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
-                )
-
-    def teardown(self, ctx: RolloutContext) -> None:
-        """Disconnect hardware and stop inference."""
-        self._teardown_hardware(
-            ctx.hardware,
-            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
-        )
-        logger.info("Base strategy teardown complete")
@@ -1,304 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Rollout strategy ABC and shared action-dispatch helper."""
-
-from __future__ import annotations
-
-import abc
-import logging
-import time
-from typing import TYPE_CHECKING
-
-from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
-from lerobot.utils.action_interpolator import ActionInterpolator
-from lerobot.utils.constants import OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame
-from lerobot.utils.robot_utils import precise_sleep
-from lerobot.utils.visualization_utils import log_rerun_data
-
-from ..inference import InferenceEngine
-
-if TYPE_CHECKING:
-    from ..configs import RolloutStrategyConfig
-    from ..context import HardwareContext, ProcessorContext, RolloutContext, RuntimeContext
-
-logger = logging.getLogger(__name__)
-
-
-class RolloutStrategy(abc.ABC):
-    """Abstract base for rollout execution strategies.
-
-    Each concrete strategy implements a self-contained control loop with
-    its own recording/interaction semantics.  Strategies are mutually
-    exclusive — only one runs per session.
-    """
-
-    def __init__(self, config: RolloutStrategyConfig) -> None:
-        self.config = config
-        self._engine: InferenceEngine | None = None
-        self._interpolator: ActionInterpolator | None = None
-        self._warmup_flushed: bool = False
-        self._cached_obs_processed: dict | None = None
-
-    def _init_engine(self, ctx: RolloutContext) -> None:
-        """Attach the inference engine and action interpolator, then start the backend.
-
-        Creates an :class:`ActionInterpolator` from the config's
-        ``interpolation_multiplier`` and starts the inference engine.
-        Call this from ``setup()`` so strategies share identical
-        initialisation without duplicating code.
-        """
-        self._interpolator = ActionInterpolator(multiplier=ctx.runtime.cfg.interpolation_multiplier)
-        self._engine = ctx.policy.inference
-        logger.info("Starting inference engine...")
-        self._engine.reset()
-        self._engine.start()
-        self._warmup_flushed = False
-        self._cached_obs_processed = None
-        logger.info("Inference engine started")
-
-    def _process_observation_and_notify(self, processors: ProcessorContext, obs_raw: dict) -> dict:
-        """Run the observation processor and notify the engine — throttled to policy ticks.
-
-        Callers are responsible for calling ``robot.get_observation()`` every loop
-        iteration so ``obs_raw`` stays fresh for the action post-processor.  This
-        helper gates only the comparatively expensive bits — the processor pipeline
-        and ``engine.notify_observation`` — to fire when the interpolator signals
-        it needs a new action (once per ``interpolation_multiplier`` ticks).  On
-        interpolated ticks the cached ``obs_processed`` is reused.
-
-        With ``interpolation_multiplier == 1`` this is equivalent to the unthrottled
-        path: ``needs_new_action()`` is True every tick.
-
-        The cache is implicitly invalidated whenever ``interpolator.reset()`` is
-        called (warmup completion, DAgger phase transitions back to AUTONOMOUS),
-        because reset makes ``needs_new_action()`` return True on the next call.
-        """
-        if self._cached_obs_processed is None or self._interpolator.needs_new_action():
-            obs_processed = processors.robot_observation_processor(obs_raw)
-            self._engine.notify_observation(obs_processed)
-            self._cached_obs_processed = obs_processed
-        return self._cached_obs_processed
-
-    def _handle_warmup(self, use_torch_compile: bool, loop_start: float, control_interval: float) -> bool:
-        """Handle torch.compile warmup phase.
-
-        Returns ``True`` if the caller should ``continue`` (still warming
-        up).  On the first post-warmup iteration the engine and
-        interpolator are reset so stale warmup state is discarded.
-        """
-        engine = self._engine
-        interpolator = self._interpolator
-        if not use_torch_compile:
-            return False
-        if not engine.ready:
-            dt = time.perf_counter() - loop_start
-            if (sleep_t := control_interval - dt) > 0:
-                precise_sleep(sleep_t)
-            return True
-        if not self._warmup_flushed:
-            logger.info("Warmup complete — flushing stale state and resuming engine")
-            engine.reset()
-            interpolator.reset()
-            self._warmup_flushed = True
-            engine.resume()
-        return False
-
-    def _teardown_hardware(self, hw: HardwareContext, return_to_initial_position: bool = True) -> None:
-        """Stop the inference engine, optionally return robot to initial position, and disconnect hardware."""
-        if self._engine is not None:
-            logger.info("Stopping inference engine...")
-            self._engine.stop()
-        robot = hw.robot_wrapper.inner
-        if robot.is_connected:
-            if return_to_initial_position and hw.initial_position:
-                logger.info("Returning robot to initial position before shutdown...")
-                self._return_to_initial_position(hw)
-            elif not return_to_initial_position:
-                logger.info(
-                    "Skipping return-to-initial-position (disabled by config); leaving robot in final pose."
-                )
-            logger.info("Disconnecting robot...")
-            robot.disconnect()
-        teleop = hw.teleop
-        if teleop is not None and teleop.is_connected:
-            logger.info("Disconnecting teleoperator...")
-            teleop.disconnect()
-
-    @staticmethod
-    def _return_to_initial_position(hw: HardwareContext, duration_s: float = 3.0, fps: int = 50) -> None:
-        """Smoothly interpolate the robot back to its initial position."""
-        robot = hw.robot_wrapper
-        target = hw.initial_position
-        try:
-            current_obs = robot.get_observation()
-            current_pos = {k: v for k, v in current_obs.items() if k in target}
-            steps = max(int(duration_s * fps), 1)
-            for step in range(1, steps + 1):
-                t = step / steps
-                interp = {}
-                for k in current_pos:
-                    interp[k] = current_pos[k] * (1 - t) + target[k] * t
-                robot.send_action(interp)
-                precise_sleep(1 / fps)
-        except Exception as e:
-            logger.warning("Could not return to initial position: %s", e)
-
-    @staticmethod
-    def _log_telemetry(
-        obs_processed: dict | None,
-        action_dict: dict | None,
-        runtime_ctx: RuntimeContext,
-    ) -> None:
-        """Log observation/action telemetry to Rerun if display_data is enabled."""
-        cfg = runtime_ctx.cfg
-        if not cfg.display_data:
-            return
-        log_rerun_data(
-            observation=obs_processed,
-            action=action_dict,
-            compress_images=cfg.display_compressed_images,
-        )
-
-    @abc.abstractmethod
-    def setup(self, ctx: RolloutContext) -> None:
-        """Strategy-specific initialisation (keyboard listeners, buffers, etc.)."""
-
-    @abc.abstractmethod
-    def run(self, ctx: RolloutContext) -> None:
-        """Main rollout loop.  Returns when shutdown is requested or duration expires."""
-
-    @abc.abstractmethod
-    def teardown(self, ctx: RolloutContext) -> None:
-        """Cleanup: save dataset, stop threads, disconnect hardware."""
-
-
-# ---------------------------------------------------------------------------
-# Shared helpers
-# ---------------------------------------------------------------------------
-
-
-def safe_push_to_hub(dataset, tags=None, private=False) -> bool:
-    """Push dataset to hub, skipping if no episodes have been saved.
-
-    Returns ``True`` if the push was attempted, ``False`` if skipped.
-    """
-    if dataset.num_episodes == 0:
-        logger.warning("No episodes saved — skipping push to hub")
-        return False
-    dataset.push_to_hub(tags=tags, private=private)
-    return True
-
-
-def estimate_max_episode_seconds(
-    dataset_features: dict,
-    fps: float,
-    target_size_mb: float = DEFAULT_VIDEO_FILE_SIZE_IN_MB,
-) -> float:
-    """Conservatively estimate how many seconds of video will exceed *target_size_mb*.
-
-    Each camera produces its own video file, so the episode duration is
-    driven by the **slowest** camera to fill ``target_size_mb`` — i.e.
-    the one with the fewest pixels per frame (lowest bitrate).
-
-    Uses a deliberately **low** bits-per-pixel estimate so the computed
-    duration is *longer* than reality.  By the time the timer fires the
-    actual video file is guaranteed to have crossed the target size,
-    which aligns episode boundaries with the dataset's video-file
-    chunking — each ``push_to_hub`` uploads complete files rather than
-    re-uploading a still-growing one.
-
-    The estimate ignores codec-specific settings (CRF, preset) on purpose:
-    we only need a rough lower bound on bitrate, not a precise prediction.
-
-    Falls back to 300 s (5 min) when no video features are present.
-    """
-    # 0.1 bits-per-pixel is a *low* estimate for CRF-30 streaming video of
-    # robot footage (real-world is typically 0.1 – 0.3 bpp).  Under-
-    # estimating the bitrate over-estimates the time → the episode will be
-    # *larger* than target_size_mb when we save, which is what we want.
-    conservative_bpp = 0.1
-
-    # Collect per-camera pixel counts — each camera has its own video file.
-    camera_pixels = []
-    for feat in dataset_features.values():
-        if feat.get("dtype") == "video":
-            shape = feat.get("shape", ())
-
-            # (H, W, C) — bits-per-pixel is a per-spatial-pixel metric,
-            # so we exclude the channel dimension from the count.
-            if len(shape) == 3:
-                pixels = shape[0] * shape[1]
-                camera_pixels.append(pixels)
-            else:
-                raise ValueError(f"Unexpected video feature shape: {shape}")
-
-    if not camera_pixels:
-        return 300.0
-
-    # Use the smallest camera: it produces the lowest bitrate and therefore
-    # takes the longest to reach the target — the conservative choice.
-    min_pixels = min(camera_pixels)
-    bits_per_frame = min_pixels * conservative_bpp
-    bytes_per_second = (bits_per_frame * fps) / 8
-
-    # Guard against division by zero just in case
-    if bytes_per_second <= 0:
-        return 300.0
-
-    return (target_size_mb * 1024 * 1024) / bytes_per_second
-
-
-# ---------------------------------------------------------------------------
-# Shared action-dispatch helper
-# ---------------------------------------------------------------------------
-
-
-def send_next_action(
-    obs_processed: dict,
-    obs_raw: dict,
-    ctx: RolloutContext,
-    interpolator: ActionInterpolator,
-) -> dict | None:
-    """Dispatch the next action to the robot.
-
-    Pulls the next action tensor from the inference engine, feeds the
-    interpolator, and sends the interpolated action through the
-    ``robot_action_processor`` to the robot.  Works identically for
-    sync and async backends — the rollout strategy never needs to branch.
-
-    Returns the action dict that was sent, or ``None`` if no action was
-    ready (e.g. empty async queue, interpolator not yet primed).
-    """
-    engine = ctx.policy.inference
-    features = ctx.data.dataset_features
-    ordered_keys = ctx.data.ordered_action_keys
-
-    if interpolator.needs_new_action():
-        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-        action_tensor = engine.get_action(obs_frame)
-        if action_tensor is not None:
-            interpolator.add(action_tensor.cpu())
-
-    interp = interpolator.get()
-    if interp is None:
-        return None
-
-    if len(interp) != len(ordered_keys):
-        raise ValueError(f"Interpolated tensor length ({len(interp)}) != action keys ({len(ordered_keys)})")
-    action_dict = {k: interp[i].item() for i, k in enumerate(ordered_keys)}
-    processed = ctx.processors.robot_action_processor((action_dict, obs_raw))
-    ctx.hardware.robot_wrapper.send_action(processed)
-    return action_dict
@@ -1,832 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""DAgger rollout strategy: Human-in-the-Loop data collection.
-
-Implements the RaC paradigm (Recovery and Correction) for interactive
-imitation learning.  Alternates between autonomous policy execution and
-human intervention via teleoperator.
-
-Input is controlled via either a keyboard or foot pedal, selected by
-the ``input_device`` config field.  Each device exposes three actions:
-
-    1. **pause_resume** — Toggle policy execution (AUTONOMOUS <-> PAUSED).
-    2. **correction**   — Toggle correction recording (PAUSED <-> CORRECTING).
-    3. **upload**        — Push dataset to hub on demand (corrections-only mode).
-    ESC (keyboard only) — Stop session.
-
-Recording modes:
-    ``record_autonomous=True``:  Sentry-like continuous recording with
-        time-based episode rotation.  Both autonomous and correction
-        frames are recorded; corrections tagged ``intervention=True``.
-    ``record_autonomous=False``: Only correction windows are recorded.
-        Each correction (start to stop) becomes one episode.
-
-Teleoperator handover:
-    On AUTONOMOUS → PAUSED, actuated teleops (those with non-empty
-    ``feedback_features``, e.g. SO-101, OpenArmMini) are smoothly driven to
-    the follower's last position via ``send_feedback`` so the operator takes
-    over without a jerk.  Non-actuated teleops cannot be driven,
-    so on PAUSED → CORRECTING the follower is instead slid to the teleop's
-    current pose before the correction begins.
-"""
-
-from __future__ import annotations
-
-import contextlib
-import enum
-import logging
-import os
-import sys
-import time
-from concurrent.futures import Future, ThreadPoolExecutor
-from threading import Event, Lock
-from typing import Any
-
-import numpy as np
-
-from lerobot.common.control_utils import is_headless
-from lerobot.datasets import VideoEncodingManager
-from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
-from lerobot.teleoperators import Teleoperator
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame
-from lerobot.utils.import_utils import _pynput_available
-from lerobot.utils.pedal import start_pedal_listener
-from lerobot.utils.robot_utils import precise_sleep
-from lerobot.utils.utils import log_say
-
-from ..configs import DAggerKeyboardConfig, DAggerPedalConfig, DAggerStrategyConfig
-from ..context import RolloutContext
-from ..robot_wrapper import ThreadSafeRobot
-from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
-
-PYNPUT_AVAILABLE = _pynput_available
-keyboard = None
-if PYNPUT_AVAILABLE:
-    try:
-        if ("DISPLAY" not in os.environ) and ("linux" in sys.platform):
-            logging.info("No DISPLAY set. Skipping pynput import.")
-            PYNPUT_AVAILABLE = False
-        else:
-            from pynput import keyboard
-    except Exception as e:
-        PYNPUT_AVAILABLE = False
-        logging.info(f"Could not import pynput: {e}")
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# DAgger state machine
-# ---------------------------------------------------------------------------
-
-
-class DAggerPhase(enum.Enum):
-    """Observable phases of a DAgger episode."""
-
-    AUTONOMOUS = "autonomous"  # Policy driving
-    PAUSED = "paused"  # Engine paused, teleop aligned, awaiting input
-    CORRECTING = "correcting"  # Human driving via teleop, recording interventions
-
-
-# Valid (current_phase, event) -> next_phase
-_DAGGER_TRANSITIONS: dict[tuple[DAggerPhase, str], DAggerPhase] = {
-    (DAggerPhase.AUTONOMOUS, "pause_resume"): DAggerPhase.PAUSED,
-    (DAggerPhase.PAUSED, "pause_resume"): DAggerPhase.AUTONOMOUS,
-    (DAggerPhase.PAUSED, "correction"): DAggerPhase.CORRECTING,
-    (DAggerPhase.CORRECTING, "correction"): DAggerPhase.PAUSED,
-}
-
-
-class DAggerEvents:
-    """Thread-safe container for DAgger input device events.
-
-    The keyboard/pedal threads write transition requests; the main loop
-    consumes them.
-    """
-
-    def __init__(self) -> None:
-        self._lock = Lock()
-        self._phase = DAggerPhase.AUTONOMOUS
-        self._pending_transition: str | None = None
-
-        # Session-level flags
-        self.stop_recording = Event()
-        self.upload_requested = Event()
-
-    # -- Thread-safe phase access ------------------------------------------
-
-    @property
-    def phase(self) -> DAggerPhase:
-        """Current phase of the DAgger state machine."""
-        with self._lock:
-            return self._phase
-
-    @phase.setter
-    def phase(self, value: DAggerPhase) -> None:
-        with self._lock:
-            self._phase = value
-
-    def request_transition(self, event: str) -> None:
-        """Request a phase transition (called from keyboard/pedal threads).
-
-        Only enqueues the request if it corresponds to a valid transition
-        from the current phase, preventing impossible state changes.
-        """
-        with self._lock:
-            if (self._phase, event) in _DAGGER_TRANSITIONS:
-                self._pending_transition = event
-
-    def consume_transition(self) -> tuple[DAggerPhase, DAggerPhase] | None:
-        """Consume a pending transition (called from main loop)."""
-        with self._lock:
-            if self._pending_transition is None:
-                return None
-            key = (self._phase, self._pending_transition)
-            self._pending_transition = None
-            new_phase = _DAGGER_TRANSITIONS.get(key)
-            if new_phase is None:
-                return None
-            old_phase = self._phase
-            self._phase = new_phase
-            return old_phase, new_phase
-
-    def reset(self) -> None:
-        """Reset all transient state for a fresh session."""
-        with self._lock:
-            self._phase = DAggerPhase.AUTONOMOUS
-            self._pending_transition = None
-        self.upload_requested.clear()
-
-
-# ---------------------------------------------------------------------------
-# Teleoperator helpers
-# ---------------------------------------------------------------------------
-
-
-def _teleop_supports_feedback(teleop: Teleoperator) -> bool:
-    """Return True when the teleop can receive position feedback (is actuated).
-    TODO(Maxime): See if it is possible to unify this interface across teleops instead of duck-typing.
-    """
-    return (
-        bool(teleop.feedback_features)
-        and hasattr(teleop, "disable_torque")
-        and hasattr(teleop, "enable_torque")
-    )
-
-
-def _teleop_smooth_move_to(
-    teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 30
-) -> None:
-    """Smoothly move an actuated teleop to ``target_pos`` via linear interpolation.
-
-    Requires the teleoperator to support feedback
-    (i.e. have non-empty ``feedback_features`` and implement ``disable_torque`` / ``enable_torque``).
-
-    TODO(Maxime): This blocks up to ``duration_s`` seconds, during this time
-    the follower robot doesn't receive new actions, this could be an issue on LeKiwi.
-    """
-    teleop.enable_torque()
-    current = teleop.get_action()
-    steps = max(int(duration_s * fps), 1)
-
-    for step in range(steps + 1):
-        t = step / steps
-        interp = {
-            k: current[k] * (1 - t) + target_pos[k] * t if k in target_pos else current[k] for k in current
-        }
-        teleop.send_feedback(interp)
-        time.sleep(1 / fps)
-
-
-def _follower_smooth_move_to(
-    robot: ThreadSafeRobot, current: dict, target: dict, duration_s: float = 1.0, fps: int = 30
-) -> None:
-    """Smoothly move the follower robot from ``current`` to ``target`` action.
-
-    Used when the teleop is non-actuated: instead of driving the leader arm
-    to the follower, we bring the follower to the teleop's current pose.
-    Both ``current`` and ``target`` must be in robot-action key space.
-    """
-    steps = max(int(duration_s * fps), 1)
-
-    for step in range(steps + 1):
-        t = step / steps
-        interp = {k: current[k] * (1 - t) + target[k] * t if k in target else current[k] for k in current}
-        robot.send_action(interp)
-        time.sleep(1 / fps)
-
-
-# ---------------------------------------------------------------------------
-# Input device handlers
-# ---------------------------------------------------------------------------
-
-
-def _init_dagger_keyboard(events: DAggerEvents, cfg: DAggerKeyboardConfig):
-    """Initialise keyboard listener with DAgger 3-key controls.
-
-    Returns the pynput Listener (or ``None`` in headless mode or when
-    pynput is unavailable).
-    """
-    if not PYNPUT_AVAILABLE or is_headless():
-        logger.warning("Headless environment or pynput unavailable — keyboard controls disabled")
-        return None
-
-    # Map config key names to pynput Key objects for special keys
-    special_keys = {
-        "space": keyboard.Key.space,
-        "tab": keyboard.Key.tab,
-        "enter": keyboard.Key.enter,
-    }
-
-    def _resolve_key(key) -> str | None:
-        """Resolve a pynput key event to a config-comparable string."""
-        if key == keyboard.Key.esc:
-            return "esc"
-        for name, pynput_key in special_keys.items():
-            if key == pynput_key:
-                return name
-        if hasattr(key, "char") and key.char:
-            return key.char
-        return None
-
-    # Build mapping: resolved key string -> DAgger event name
-    key_to_event = {
-        cfg.pause_resume: "pause_resume",
-        cfg.correction: "correction",
-    }
-
-    def on_press(key):
-        try:
-            resolved = _resolve_key(key)
-            if resolved is None:
-                return
-            if resolved == "esc":
-                logger.info("Stop recording...")
-                events.stop_recording.set()
-                return
-            if resolved in key_to_event:
-                events.request_transition(key_to_event[resolved])
-            if resolved == cfg.upload:
-                events.upload_requested.set()
-        except Exception as e:
-            logger.debug("Key error: %s", e)
-
-    listener = keyboard.Listener(on_press=on_press)
-    listener.start()
-    logger.info(
-        "DAgger keyboard listener started (pause_resume='%s', correction='%s', upload='%s', ESC=stop)",
-        cfg.pause_resume,
-        cfg.correction,
-        cfg.upload,
-    )
-    return listener
-
-
-def _init_dagger_pedal(events: DAggerEvents, cfg: DAggerPedalConfig):
-    """Initialise foot pedal listener with DAgger 3-pedal controls.
-
-    Returns the pedal listener thread (or ``None`` if evdev is unavailable).
-    """
-    code_to_event = {
-        cfg.pause_resume: "pause_resume",
-        cfg.correction: "correction",
-    }
-
-    def on_press(code: str) -> None:
-        if code in code_to_event:
-            events.request_transition(code_to_event[code])
-        if code == cfg.upload:
-            events.upload_requested.set()
-
-    logger.info("Initializing DAgger foot pedal listener (device=%s)", cfg.device_path)
-    return start_pedal_listener(on_press, device_path=cfg.device_path)
-
-
-# ---------------------------------------------------------------------------
-# DAgger Strategy
-# ---------------------------------------------------------------------------
-
-
-class DAggerStrategy(RolloutStrategy):
-    """Human-in-the-Loop data collection with intervention tagging.
-
-    State machine::
-
-        AUTONOMOUS --(key1)--> PAUSED --(key2)--> CORRECTING --(key2)--> PAUSED
-                               --(key1)--> AUTONOMOUS
-
-    Recording modes:
-        ``record_autonomous=True``: Sentry-like continuous recording with
-            time-based episode rotation.  Intervention frames tagged True.
-        ``record_autonomous=False``: Only correction windows recorded.
-            Each correction = one episode.  Upload on demand via key3.
-    """
-
-    config: DAggerStrategyConfig
-
-    def __init__(self, config: DAggerStrategyConfig):
-        super().__init__(config)
-        self._listener = None
-        self._pedal_thread = None
-        self._events = DAggerEvents()
-        self._push_executor: ThreadPoolExecutor | None = None
-        self._pending_push: Future | None = None
-        self._needs_push = Event()
-        self._episode_lock = Lock()
-
-    def setup(self, ctx: RolloutContext) -> None:
-        """Initialise the inference engine and input device listener."""
-        self._init_engine(ctx)
-        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="dagger-push")
-        target_mb = self.config.target_video_file_size_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB
-        self._episode_duration_s = estimate_max_episode_seconds(
-            ctx.data.dataset_features, ctx.runtime.cfg.fps, target_size_mb=target_mb
-        )
-
-        if self.config.input_device == "keyboard":
-            self._listener = _init_dagger_keyboard(self._events, self.config.keyboard)
-        else:
-            self._pedal_thread = _init_dagger_pedal(self._events, self.config.pedal)
-
-        record_mode = "all frames (sentry-like)" if self.config.record_autonomous else "corrections only"
-        logger.info(
-            "DAgger strategy ready (input=%s, episodes=%d, record=%s, episode_duration=%.0fs)",
-            self.config.input_device,
-            self.config.num_episodes,
-            record_mode,
-            self._episode_duration_s,
-        )
-
-    def run(self, ctx: RolloutContext) -> None:
-        """Run DAgger episodes with human-in-the-loop intervention."""
-        if self.config.record_autonomous:
-            self._run_continuous(ctx)
-        else:
-            self._run_corrections_only(ctx)
-
-    def teardown(self, ctx: RolloutContext) -> None:
-        """Stop listeners, finalise the dataset, and disconnect hardware."""
-        play_sounds = ctx.runtime.cfg.play_sounds
-        logger.info("Stopping DAgger recording")
-        log_say("Stopping DAgger recording", play_sounds)
-
-        if self._listener is not None and not is_headless():
-            logger.info("Stopping keyboard listener")
-            self._listener.stop()
-
-        # Flush any queued/running push cleanly
-        if self._push_executor is not None:
-            logger.info("Shutting down push executor (waiting for pending pushes)...")
-            self._push_executor.shutdown(wait=True)
-            self._push_executor = None
-
-        if ctx.data.dataset is not None:
-            logger.info("Finalizing dataset...")
-            ctx.data.dataset.finalize()
-            if self._needs_push.is_set() and ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
-                logger.info("Pushing final dataset to hub...")
-                if safe_push_to_hub(
-                    ctx.data.dataset,
-                    tags=ctx.runtime.cfg.dataset.tags,
-                    private=ctx.runtime.cfg.dataset.private,
-                ):
-                    logger.info("Dataset uploaded to hub")
-                    log_say("Dataset uploaded to hub", play_sounds)
-
-        self._teardown_hardware(
-            ctx.hardware,
-            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
-        )
-        logger.info("DAgger strategy teardown complete")
-
-    # ------------------------------------------------------------------
-    # Continuous recording mode (record_autonomous=True)
-    # ------------------------------------------------------------------
-
-    def _run_continuous(self, ctx: RolloutContext) -> None:
-        """Sentry-like continuous recording with intervention tagging.
-
-        Episodes are auto-rotated every ``episode_time_s`` seconds and
-        uploaded in the background every ``upload_every_n_episodes`` episodes.
-        Both autonomous and correction frames are recorded; corrections are
-        tagged with ``intervention=True``.
-        """
-        engine = self._engine
-        cfg = ctx.runtime.cfg
-        robot = ctx.hardware.robot_wrapper
-        teleop = ctx.hardware.teleop
-        dataset = ctx.data.dataset
-        events = self._events
-        interpolator = self._interpolator
-        features = ctx.data.dataset_features
-
-        control_interval = interpolator.get_control_interval(cfg.fps)
-        record_stride = max(1, cfg.interpolation_multiplier)
-        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-        play_sounds = cfg.play_sounds
-
-        engine.reset()
-        interpolator.reset()
-        events.reset()
-        engine.resume()
-
-        last_action: dict[str, Any] | None = None
-        record_tick = 0
-        start_time = time.perf_counter()
-        episode_start = time.perf_counter()
-        episodes_since_push = 0
-        episode_duration_s = self._episode_duration_s
-        logger.info("DAgger continuous recording started (episode_duration=%.0fs)", episode_duration_s)
-
-        with VideoEncodingManager(dataset):
-            try:
-                while not events.stop_recording.is_set() and not ctx.runtime.shutdown_event.is_set():
-                    loop_start = time.perf_counter()
-
-                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
-                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
-                        break
-
-                    # Process transitions
-                    transition = events.consume_transition()
-                    if transition is not None:
-                        old_phase, new_phase = transition
-                        self._apply_transition(
-                            old_phase,
-                            new_phase,
-                            engine,
-                            interpolator,
-                            ctx,
-                            last_action,
-                        )
-                        if new_phase == DAggerPhase.AUTONOMOUS:
-                            last_action = None
-
-                    phase = events.phase
-                    obs = robot.get_observation()
-
-                    # --- CORRECTING: human teleop control ---
-                    # TODO(Steven): teleop runs at the same FPS as the policy. To
-                    # decouple the two, sample teleop at its native rate and
-                    # interpolate to the control loop's tick rate.
-                    if phase == DAggerPhase.CORRECTING:
-                        obs_processed = ctx.processors.robot_observation_processor(obs)
-                        teleop_action = teleop.get_action()
-                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
-                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
-                        robot.send_action(robot_action_to_send)
-                        last_action = robot_action_to_send
-                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)
-                        if record_tick % record_stride == 0:
-                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
-                            frame = {
-                                **obs_frame,
-                                **action_frame,
-                                "task": task_str,
-                                "intervention": np.array([True], dtype=bool),
-                            }
-                            dataset.add_frame(frame)
-                        record_tick += 1
-
-                    # --- PAUSED: hold position ---
-                    elif phase == DAggerPhase.PAUSED:
-                        if last_action:
-                            robot.send_action(last_action)
-
-                    # --- AUTONOMOUS: policy control ---
-                    else:
-                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)
-
-                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
-                            continue
-
-                        action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
-                        if action_dict is not None:
-                            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
-                            last_action = ctx.processors.robot_action_processor((action_dict, obs))
-                            if record_tick % record_stride == 0:
-                                obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                                action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
-                                frame = {
-                                    **obs_frame,
-                                    **action_frame,
-                                    "task": task_str,
-                                    "intervention": np.array([False], dtype=bool),
-                                }
-                                dataset.add_frame(frame)
-                            record_tick += 1
-
-                    # Episode rotation derived from the video file-size target.
-                    # Saving is deferred while a correction is ongoing so the
-                    # episode boundary lands on a clean autonomous frame.
-                    elapsed = time.perf_counter() - episode_start
-                    if elapsed >= episode_duration_s and phase != DAggerPhase.CORRECTING:
-                        with self._episode_lock:
-                            dataset.save_episode()
-                        episodes_since_push += 1
-                        self._needs_push.set()
-                        logger.info(
-                            "Episode saved (total: %d, elapsed: %.1fs)",
-                            dataset.num_episodes,
-                            elapsed,
-                        )
-                        log_say(f"Episode {dataset.num_episodes} saved", play_sounds)
-
-                        if episodes_since_push >= self.config.upload_every_n_episodes:
-                            self._background_push(dataset, cfg)
-                            episodes_since_push = 0
-
-                        episode_start = time.perf_counter()
-
-                    dt = time.perf_counter() - loop_start
-                    if (sleep_t := control_interval - dt) > 0:
-                        precise_sleep(sleep_t)
-                    else:
-                        logger.warning(
-                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
-                        )
-
-            finally:
-                logger.info("DAgger continuous control loop ended — pausing engine")
-                engine.pause()
-                with contextlib.suppress(Exception):
-                    with self._episode_lock:
-                        dataset.save_episode()
-                    self._needs_push.set()
-                    logger.info("Final in-progress episode saved")
-
-    # ------------------------------------------------------------------
-    # Corrections-only mode (record_autonomous=False)
-    # ------------------------------------------------------------------
-
-    def _run_corrections_only(self, ctx: RolloutContext) -> None:
-        """Record only human correction windows.  Each correction = one episode.
-
-        The policy runs autonomously without recording.  When the user
-        pauses and starts a correction, frames are recorded with
-        ``intervention=True``.  Stopping the correction saves the episode.
-        The dataset can be uploaded on demand via the upload key/pedal.
-        """
-        engine = self._engine
-        cfg = ctx.runtime.cfg
-        robot = ctx.hardware.robot_wrapper
-        teleop = ctx.hardware.teleop
-        dataset = ctx.data.dataset
-        events = self._events
-        interpolator = self._interpolator
-        features = ctx.data.dataset_features
-
-        control_interval = interpolator.get_control_interval(cfg.fps)
-        record_stride = max(1, cfg.interpolation_multiplier)
-        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-        play_sounds = cfg.play_sounds
-
-        engine.reset()
-        interpolator.reset()
-        events.reset()
-        engine.resume()
-
-        last_action: dict[str, Any] | None = None
-        start_time = time.perf_counter()
-        record_tick = 0
-        recorded = 0
-        logger.info(
-            "DAgger corrections-only recording started (target: %d episodes)", self.config.num_episodes
-        )
-
-        with VideoEncodingManager(dataset):
-            try:
-                while (
-                    recorded < self.config.num_episodes
-                    and not events.stop_recording.is_set()
-                    and not ctx.runtime.shutdown_event.is_set()
-                ):
-                    loop_start = time.perf_counter()
-
-                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
-                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
-                        break
-
-                    # Process transitions
-                    transition = events.consume_transition()
-                    if transition is not None:
-                        old_phase, new_phase = transition
-                        self._apply_transition(
-                            old_phase,
-                            new_phase,
-                            engine,
-                            interpolator,
-                            ctx,
-                            last_action,
-                        )
-                        if new_phase == DAggerPhase.AUTONOMOUS:
-                            last_action = None
-
-                        # Correction ended -> save episode (blocking if not streaming)
-                        if old_phase == DAggerPhase.CORRECTING and new_phase == DAggerPhase.PAUSED:
-                            with self._episode_lock:
-                                dataset.save_episode()
-                            recorded += 1
-                            self._needs_push.set()
-                            logger.info(
-                                "Correction %d/%d saved",
-                                recorded,
-                                self.config.num_episodes,
-                            )
-                            log_say(f"Correction {recorded} saved", play_sounds)
-
-                    # On-demand upload
-                    if events.upload_requested.is_set():
-                        events.upload_requested.clear()
-                        logger.info("Upload requested by user")
-                        self._background_push(dataset, cfg)
-
-                    phase = events.phase
-                    obs = robot.get_observation()
-
-                    # --- CORRECTING: human teleop control + recording ---
-                    # TODO(Steven): teleop runs at the same FPS as the policy. To
-                    # decouple the two, sample teleop at its native rate and
-                    # interpolate to the control loop's tick rate.
-                    if phase == DAggerPhase.CORRECTING:
-                        obs_processed = ctx.processors.robot_observation_processor(obs)
-                        teleop_action = teleop.get_action()
-                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
-                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
-                        robot.send_action(robot_action_to_send)
-                        last_action = robot_action_to_send
-                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)
-
-                        if record_tick % record_stride == 0:
-                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
-                            dataset.add_frame(
-                                {
-                                    **obs_frame,
-                                    **action_frame,
-                                    "task": task_str,
-                                    "intervention": np.array([True], dtype=bool),
-                                }
-                            )
-                        record_tick += 1
-
-                    # --- PAUSED: hold position ---
-                    elif phase == DAggerPhase.PAUSED:
-                        if last_action:
-                            robot.send_action(last_action)
-
-                    # --- AUTONOMOUS: policy control (no recording) ---
-                    else:
-                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)
-
-                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
-                            continue
-
-                        action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
-                        if action_dict is not None:
-                            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
-                            last_action = ctx.processors.robot_action_processor((action_dict, obs))
-
-                    dt = time.perf_counter() - loop_start
-                    if (sleep_t := control_interval - dt) > 0:
-                        precise_sleep(sleep_t)
-                    else:
-                        logger.warning(
-                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
-                        )
-
-            finally:
-                logger.info("DAgger corrections-only loop ended — pausing engine")
-                engine.pause()
-                with contextlib.suppress(Exception):
-                    with self._episode_lock:
-                        dataset.save_episode()
-                    self._needs_push.set()
-                    logger.info("Final in-progress episode saved")
-
-    # ------------------------------------------------------------------
-    # State-machine transition side-effects
-    # ------------------------------------------------------------------
-
-    @staticmethod
-    def _apply_transition(
-        old_phase: DAggerPhase,
-        new_phase: DAggerPhase,
-        engine,
-        interpolator,
-        ctx: RolloutContext,
-        prev_action: dict | None,
-    ) -> None:
-        """Execute side-effects for a validated phase transition, including smooth handovers.
-
-        AUTONOMOUS -> PAUSED (actuated teleop):
-            Pause the engine, then drive the leader arm to the follower's last
-            commanded position so the operator takes over without a jerk.
-
-        PAUSED -> CORRECTING (non-actuated teleop):
-            Slide the follower to the teleop's current pose so the robot meets
-            the operator's hand rather than jumping to it on the first frame.
-
-        CORRECTING -> PAUSED (actuated teleop):
-            Re-enable torque to hold position after correction.
-            This will be potentially useful if cancelling the correction recording
-
-        PAUSED -> AUTONOMOUS:
-            Reset and resume the inference engine.
-        """
-        teleop = ctx.hardware.teleop
-        robot = ctx.hardware.robot_wrapper
-
-        logger.info("Phase transition: %s -> %s", old_phase.value, new_phase.value)
-        if old_phase == DAggerPhase.AUTONOMOUS and new_phase == DAggerPhase.PAUSED:
-            logger.info("Pausing engine - robot holds position")
-            engine.pause()
-
-            if _teleop_supports_feedback(teleop) and prev_action is not None:
-                # TODO(Maxime): prev_action is in robot action key space (output of robot_action_processor).
-                # send_feedback expects teleop feedback key space. For homogeneous setups (e.g. SO-101
-                # leader + SO-101 follower) the keys are identical so this works. If the processor pipeline
-                # does non-trivial key renaming (e.g. a rename_map on action keys), the interpolation in
-                # _teleop_smooth_move_to silently no-ops and the arm doesn't move.
-                logger.info("Smooth handover: moving leader arm to follower position")
-                _teleop_smooth_move_to(teleop, prev_action)
-
-        elif old_phase == DAggerPhase.PAUSED and new_phase == DAggerPhase.CORRECTING:
-            logger.info("Entering correction mode - human teleop control")
-            if not _teleop_supports_feedback(teleop) and prev_action is not None:
-                logger.info("Smooth handover: sliding follower to teleop position")
-                obs = robot.get_observation()
-                teleop_action = teleop.get_action()
-                processed = ctx.processors.teleop_action_processor((teleop_action, obs))
-                target = ctx.processors.robot_action_processor((processed, obs))
-                _follower_smooth_move_to(robot, prev_action, target)
-
-            # unlock the teleop for human control
-            if _teleop_supports_feedback(teleop):
-                teleop.disable_torque()
-
-        elif old_phase == DAggerPhase.CORRECTING and new_phase == DAggerPhase.PAUSED:
-            if _teleop_supports_feedback(teleop):
-                teleop.enable_torque()
-
-        elif new_phase == DAggerPhase.AUTONOMOUS:
-            logger.info("Resuming autonomous mode - resetting engine and interpolator")
-            interpolator.reset()
-            engine.reset()
-            engine.resume()
-
-            # release teleop before resuming the policy
-            if _teleop_supports_feedback(teleop):
-                teleop.disable_torque()
-
-    # ------------------------------------------------------------------
-    # Background push (shared by both modes)
-    # ------------------------------------------------------------------
-
-    def _background_push(self, dataset, cfg) -> None:
-        """Queue a Hub push on the single-worker executor.
-
-        The executor's max_workers=1 guarantees at most one push runs at
-        a time; submitted tasks are queued rather than dropped.  Pushes
-        are blocked while the operator is mid-correction to avoid
-        uploading a partially-recorded episode.
-        """
-        if self._push_executor is None:
-            return
-
-        if self._events.phase == DAggerPhase.CORRECTING:
-            logger.info("Skipping push — correction in progress")
-            return
-
-        if self._pending_push is not None and not self._pending_push.done():
-            logger.info("Previous push still in progress; queueing next")
-
-        def _push():
-            try:
-                with self._episode_lock:
-                    if safe_push_to_hub(
-                        dataset,
-                        tags=cfg.dataset.tags if cfg.dataset else None,
-                        private=cfg.dataset.private if cfg.dataset else False,
-                    ):
-                        self._needs_push.clear()
-                        logger.info("Background push to hub complete")
-            except Exception as e:
-                logger.error("Background push failed: %s", e)
-
-        self._pending_push = self._push_executor.submit(_push)
-        logger.info("Background push task submitted")
@@ -1,45 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Strategy factory: config type-name → strategy class dispatch."""
-
-from __future__ import annotations
-
-from typing import TYPE_CHECKING
-
-from .base import BaseStrategy
-from .core import RolloutStrategy
-from .dagger import DAggerStrategy
-from .highlight import HighlightStrategy
-from .sentry import SentryStrategy
-
-if TYPE_CHECKING:
-    from ..configs import RolloutStrategyConfig
-
-
-def create_strategy(config: RolloutStrategyConfig) -> RolloutStrategy:
-    """Instantiate the appropriate strategy from a config object.
-
-    Dispatches on ``config.type`` (the name registered via
-    ``draccus.ChoiceRegistry``).
-    """
-    if config.type == "base":
-        return BaseStrategy(config)
-    if config.type == "sentry":
-        return SentryStrategy(config)
-    if config.type == "highlight":
-        return HighlightStrategy(config)
-    if config.type == "dagger":
-        return DAggerStrategy(config)
-    raise ValueError(f"Unknown strategy type '{config.type}'. Available: base, sentry, highlight, dagger")
@@ -1,283 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Highlight Reel strategy: on-demand recording via ring buffer."""
-
-from __future__ import annotations
-
-import contextlib
-import logging
-import os
-import sys
-import time
-from concurrent.futures import Future, ThreadPoolExecutor
-from threading import Event as ThreadingEvent, Lock
-
-from lerobot.common.control_utils import is_headless
-from lerobot.datasets import VideoEncodingManager
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame
-from lerobot.utils.import_utils import _pynput_available, require_package
-from lerobot.utils.robot_utils import precise_sleep
-from lerobot.utils.utils import log_say
-
-from ..configs import HighlightStrategyConfig
-from ..context import RolloutContext
-from ..ring_buffer import RolloutRingBuffer
-from .core import RolloutStrategy, safe_push_to_hub, send_next_action
-
-PYNPUT_AVAILABLE = _pynput_available
-keyboard = None
-if PYNPUT_AVAILABLE:
-    try:
-        if ("DISPLAY" not in os.environ) and ("linux" in sys.platform):
-            logging.info("No DISPLAY set. Skipping pynput import.")
-            PYNPUT_AVAILABLE = False
-        else:
-            from pynput import keyboard
-    except Exception as e:
-        PYNPUT_AVAILABLE = False
-        logging.info(f"Could not import pynput: {e}")
-
-logger = logging.getLogger(__name__)
-
-
-class HighlightStrategy(RolloutStrategy):
-    """Autonomous rollout with on-demand recording via ring buffer.
-
-    The robot runs autonomously while a memory-bounded ring buffer
-    captures continuous telemetry.  When the user presses the save key:
-
-    1. The ring buffer is flushed to the dataset (last *Z* seconds).
-    2. Live recording continues until the save key is pressed again.
-    3. The episode is saved and the ring buffer resumes capturing.
-
-    Requires ``streaming_encoding=True`` (enforced in config validation)
-    so that ``dataset.add_frame`` is a non-blocking queue put — flushing
-    the entire ring buffer in one tick must not stall the control loop.
-    """
-
-    config: HighlightStrategyConfig
-
-    def __init__(self, config: HighlightStrategyConfig):
-        super().__init__(config)
-        require_package("pynput", extra="pynput-dep")
-        self._ring: RolloutRingBuffer | None = None
-        self._listener = None
-        self._save_requested = ThreadingEvent()
-        self._recording_live = ThreadingEvent()
-        self._push_requested = ThreadingEvent()
-        self._push_executor: ThreadPoolExecutor | None = None
-        self._pending_push: Future | None = None
-        self._episode_lock = Lock()
-
-    def setup(self, ctx: RolloutContext) -> None:
-        """Initialise the inference engine, ring buffer, and keyboard listener."""
-        self._init_engine(ctx)
-
-        self._ring = RolloutRingBuffer(
-            max_seconds=self.config.ring_buffer_seconds,
-            max_memory_mb=self.config.ring_buffer_max_memory_mb,
-            fps=ctx.runtime.cfg.fps,
-        )
-
-        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="highlight-push")
-        logger.info(
-            "Ring buffer initialized (max_seconds=%.0f, max_memory=%.0fMB)",
-            self.config.ring_buffer_seconds,
-            self.config.ring_buffer_max_memory_mb,
-        )
-        self._setup_keyboard(ctx.runtime.shutdown_event)
-        logger.info(
-            "Highlight strategy ready (buffer=%.0fs, save='%s', push='%s')",
-            self.config.ring_buffer_seconds,
-            self.config.save_key,
-            self.config.push_key,
-        )
-
-    def run(self, ctx: RolloutContext) -> None:
-        """Run the autonomous loop, buffering frames and recording on demand."""
-        engine = self._engine
-        cfg = ctx.runtime.cfg
-        robot = ctx.hardware.robot_wrapper
-        dataset = ctx.data.dataset
-        ring = self._ring
-        interpolator = self._interpolator
-        features = ctx.data.dataset_features
-
-        control_interval = interpolator.get_control_interval(cfg.fps)
-
-        engine.resume()
-        play_sounds = cfg.play_sounds
-
-        start_time = time.perf_counter()
-        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-        logger.info("Highlight strategy recording started (press '%s' to save)", self.config.save_key)
-
-        with VideoEncodingManager(dataset):
-            try:
-                while not ctx.runtime.shutdown_event.is_set():
-                    loop_start = time.perf_counter()
-
-                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
-                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
-                        break
-
-                    obs = robot.get_observation()
-                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)
-
-                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
-                        continue
-
-                    action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
-
-                    if action_dict is not None:
-                        self._log_telemetry(obs_processed, action_dict, ctx.runtime)
-                        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                        action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
-                        frame = {**obs_frame, **action_frame, "task": task_str}
-
-                        # NOTE: ``is_set()`` then ``clear()`` is not atomic
-                        # against the keyboard thread setting the flag again
-                        # in between — but that is benign: we lose at most one
-                        # toggle, processed on the next iteration.
-                        if self._save_requested.is_set():
-                            self._save_requested.clear()
-                            if not self._recording_live.is_set():
-                                logger.info(
-                                    "Flushing ring buffer (%d frames) + starting live recording",
-                                    len(ring),
-                                )
-                                for buffered_frame in ring.drain():
-                                    dataset.add_frame(buffered_frame)
-                                self._recording_live.set()
-                            else:
-                                dataset.add_frame(frame)
-                                with self._episode_lock:
-                                    dataset.save_episode()
-                                logger.info("Episode saved (total: %d)", dataset.num_episodes)
-                                log_say(
-                                    f"Episode {dataset.num_episodes} saved",
-                                    play_sounds,
-                                )
-                                self._recording_live.clear()
-                                continue  # frame already consumed — skip ring.append
-
-                        if self._push_requested.is_set():
-                            self._push_requested.clear()
-                            logger.info("Push requested by user")
-                            self._background_push(dataset, cfg)
-
-                        if self._recording_live.is_set():
-                            dataset.add_frame(frame)
-                        else:
-                            ring.append(frame)
-
-                    dt = time.perf_counter() - loop_start
-                    if (sleep_t := control_interval - dt) > 0:
-                        precise_sleep(sleep_t)
-                    else:
-                        logger.warning(
-                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
-                        )
-
-            finally:
-                logger.info("Highlight control loop ended")
-                if self._recording_live.is_set():
-                    logger.info("Saving in-progress live episode")
-                    with contextlib.suppress(Exception), self._episode_lock:
-                        dataset.save_episode()
-
-    def teardown(self, ctx: RolloutContext) -> None:
-        """Stop listeners, finalise the dataset, and disconnect hardware."""
-        play_sounds = ctx.runtime.cfg.play_sounds
-        logger.info("Stopping highlight recording")
-        log_say("Stopping highlight recording", play_sounds)
-
-        if self._listener is not None:
-            logger.info("Stopping keyboard listener")
-            self._listener.stop()
-
-        if self._push_executor is not None:
-            logger.info("Shutting down push executor (waiting for pending pushes)...")
-            self._push_executor.shutdown(wait=True)
-            self._push_executor = None
-
-        if ctx.data.dataset is not None:
-            logger.info("Finalizing dataset...")
-            ctx.data.dataset.finalize()
-            if ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
-                logger.info("Pushing final dataset to hub...")
-                if safe_push_to_hub(
-                    ctx.data.dataset,
-                    tags=ctx.runtime.cfg.dataset.tags,
-                    private=ctx.runtime.cfg.dataset.private,
-                ):
-                    logger.info("Dataset uploaded to hub")
-                    log_say("Dataset uploaded to hub", play_sounds)
-
-        self._teardown_hardware(
-            ctx.hardware,
-            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
-        )
-        logger.info("Highlight strategy teardown complete")
-
-    def _setup_keyboard(self, shutdown_event: ThreadingEvent) -> None:
-        """Set up keyboard listener for save and push keys."""
-        if is_headless():
-            logger.warning("Headless environment — highlight keys unavailable")
-            return
-
-        try:
-            save_key = self.config.save_key
-            push_key = self.config.push_key
-
-            def on_press(key):
-                with contextlib.suppress(Exception):
-                    if hasattr(key, "char") and key.char == save_key:
-                        self._save_requested.set()
-                    elif hasattr(key, "char") and key.char == push_key:
-                        self._push_requested.set()
-                    elif key == keyboard.Key.esc:
-                        self._save_requested.clear()
-                        shutdown_event.set()
-
-            self._listener = keyboard.Listener(on_press=on_press)
-            self._listener.start()
-            logger.info("Keyboard listener started (save='%s', push='%s', ESC=stop)", save_key, push_key)
-        except ImportError:
-            logger.warning("pynput not available — keyboard listener disabled")
-
-    def _background_push(self, dataset, cfg) -> None:
-        """Queue a Hub push on the single-worker executor."""
-        if self._push_executor is None:
-            return
-
-        if self._pending_push is not None and not self._pending_push.done():
-            logger.info("Previous push still in progress; queueing next")
-
-        def _push():
-            try:
-                with self._episode_lock:
-                    if safe_push_to_hub(
-                        dataset,
-                        tags=cfg.dataset.tags if cfg.dataset else None,
-                        private=cfg.dataset.private if cfg.dataset else False,
-                    ):
-                        logger.info("Background push to hub complete")
-            except Exception as e:
-                logger.error("Background push failed: %s", e)
-
-        self._pending_push = self._push_executor.submit(_push)
-        logger.info("Background push task submitted")
@@ -1,231 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Sentry rollout strategy: continuous autonomous recording with auto-upload."""
-
-from __future__ import annotations
-
-import contextlib
-import logging
-import time
-from concurrent.futures import Future, ThreadPoolExecutor
-from threading import Event, Lock
-
-from lerobot.datasets import VideoEncodingManager
-from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame
-from lerobot.utils.robot_utils import precise_sleep
-from lerobot.utils.utils import log_say
-
-from ..configs import SentryStrategyConfig
-from ..context import RolloutContext
-from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
-
-logger = logging.getLogger(__name__)
-
-
-class SentryStrategy(RolloutStrategy):
-    """Continuous autonomous rollout with always-on recording.
-
-    Episode duration is derived from camera resolution, FPS, and
-    ``DEFAULT_VIDEO_FILE_SIZE_IN_MB`` so that each saved episode
-    produces a video file that has crossed the chunk-size boundary.
-    This keeps ``push_to_hub`` efficient — it uploads complete video
-    files rather than re-uploading a still-growing one.
-
-    The dataset is pushed to the Hub via a bounded single-worker executor
-    so no push is ever silently dropped and exactly one push runs at a
-    time.
-
-    Policy state (hidden state, RTC queue) intentionally persists across
-    episode boundaries — Sentry slices one continuous rollout, the robot
-    does not reset between slices.
-
-    Requires ``streaming_encoding=True`` (enforced in config validation)
-    to prevent disk I/O from blocking the control loop.
-    """
-
-    config: SentryStrategyConfig
-
-    def __init__(self, config: SentryStrategyConfig):
-        super().__init__(config)
-        self._push_executor: ThreadPoolExecutor | None = None
-        self._pending_push: Future | None = None
-        self._needs_push = Event()
-        self._episode_lock = Lock()
-
-    def setup(self, ctx: RolloutContext) -> None:
-        """Initialise the inference engine and background push executor."""
-        self._init_engine(ctx)
-        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="sentry-push")
-        target_mb = self.config.target_video_file_size_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB
-        self._episode_duration_s = estimate_max_episode_seconds(
-            ctx.data.dataset_features, ctx.runtime.cfg.fps, target_size_mb=target_mb
-        )
-        logger.info(
-            "Sentry strategy ready (episode_duration=%.0fs, upload_every=%d eps)",
-            self._episode_duration_s,
-            self.config.upload_every_n_episodes,
-        )
-
-    def run(self, ctx: RolloutContext) -> None:
-        """Run the continuous recording loop with automatic episode rotation."""
-        engine = self._engine
-        cfg = ctx.runtime.cfg
-        robot = ctx.hardware.robot_wrapper
-        dataset = ctx.data.dataset
-        interpolator = self._interpolator
-        features = ctx.data.dataset_features
-
-        control_interval = interpolator.get_control_interval(cfg.fps)
-
-        engine.resume()
-        play_sounds = cfg.play_sounds
-        episode_duration_s = self._episode_duration_s
-
-        start_time = time.perf_counter()
-        episode_start = time.perf_counter()
-        episodes_since_push = 0
-        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-        logger.info("Sentry recording started (episode_duration=%.0fs)", episode_duration_s)
-
-        with VideoEncodingManager(dataset):
-            try:
-                while not ctx.runtime.shutdown_event.is_set():
-                    loop_start = time.perf_counter()
-
-                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
-                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
-                        break
-
-                    obs = robot.get_observation()
-                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)
-
-                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
-                        continue
-
-                    action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
-
-                    if action_dict is not None:
-                        self._log_telemetry(obs_processed, action_dict, ctx.runtime)
-                        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                        action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
-                        frame = {**obs_frame, **action_frame, "task": task_str}
-                        # ``add_frame`` writes to the in-progress episode buffer; the
-                        # background pusher only ever touches *finalised* episode
-                        # artifacts on disk.  The two operate on disjoint state, so
-                        # ``add_frame`` does not need ``_episode_lock``.
-                        dataset.add_frame(frame)
-
-                    # Episode rotation derived from video file-size target.
-                    # The duration is a conservative estimate so the actual
-                    # video has crossed DEFAULT_VIDEO_FILE_SIZE_IN_MB by now,
-                    # keeping push_to_hub efficient (uploads complete files).
-                    elapsed = time.perf_counter() - episode_start
-                    if elapsed >= episode_duration_s:
-                        # ``save_episode`` finalises the in-progress episode and
-                        # flushes it to disk; ``_episode_lock`` serialises this with
-                        # ``push_to_hub`` (run in the background executor) so the
-                        # pusher never reads a half-written episode.
-                        with self._episode_lock:
-                            dataset.save_episode()
-                        episodes_since_push += 1
-                        self._needs_push.set()
-                        logger.info(
-                            "Episode saved (total: %d, elapsed: %.1fs)",
-                            dataset.num_episodes,
-                            elapsed,
-                        )
-                        log_say(f"Episode {dataset.num_episodes} saved", play_sounds)
-
-                        if episodes_since_push >= self.config.upload_every_n_episodes:
-                            self._background_push(dataset, cfg)
-                            episodes_since_push = 0
-
-                        episode_start = time.perf_counter()
-
-                    dt = time.perf_counter() - loop_start
-                    if (sleep_t := control_interval - dt) > 0:
-                        precise_sleep(sleep_t)
-                    else:
-                        logger.warning(
-                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
-                        )
-
-            finally:
-                logger.info("Sentry control loop ended — saving final episode")
-                with contextlib.suppress(Exception):
-                    with self._episode_lock:
-                        dataset.save_episode()
-                    self._needs_push.set()
-
-    def teardown(self, ctx: RolloutContext) -> None:
-        """Flush pending pushes, finalise the dataset, and disconnect hardware."""
-        play_sounds = ctx.runtime.cfg.play_sounds
-        logger.info("Stopping sentry recording")
-        log_say("Stopping sentry recording", play_sounds)
-
-        # Flush any queued/running push cleanly.
-        if self._push_executor is not None:
-            logger.info("Shutting down push executor (waiting for pending pushes)...")
-            self._push_executor.shutdown(wait=True)
-            self._push_executor = None
-
-        if ctx.data.dataset is not None:
-            logger.info("Finalizing dataset...")
-            ctx.data.dataset.finalize()
-            if self._needs_push.is_set() and ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
-                logger.info("Pushing final dataset to hub...")
-                if safe_push_to_hub(
-                    ctx.data.dataset,
-                    tags=ctx.runtime.cfg.dataset.tags,
-                    private=ctx.runtime.cfg.dataset.private,
-                ):
-                    logger.info("Dataset uploaded to hub")
-                    log_say("Dataset uploaded to hub", play_sounds)
-
-        self._teardown_hardware(
-            ctx.hardware,
-            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
-        )
-        logger.info("Sentry strategy teardown complete")
-
-    def _background_push(self, dataset, cfg) -> None:
-        """Queue a Hub push on the single-worker executor.
-
-        The executor's max_workers=1 guarantees at most one push runs at
-        a time; submitted tasks are queued rather than dropped.
-        """
-        if self._push_executor is None:
-            return
-
-        if self._pending_push is not None and not self._pending_push.done():
-            logger.info("Previous push still in progress; queueing next")
-
-        def _push():
-            try:
-                with self._episode_lock:
-                    if safe_push_to_hub(
-                        dataset,
-                        tags=cfg.dataset.tags if cfg.dataset else None,
-                        private=cfg.dataset.private if cfg.dataset else False,
-                    ):
-                        self._needs_push.clear()
-                        logger.info("Background push to hub complete")
-            except Exception as e:
-                logger.error("Background push failed: %s", e)
-
-        self._pending_push = self._push_executor.submit(_push)
-        logger.info("Background push task submitted")
@@ -70,7 +70,6 @@ from lerobot.datasets.io_utils import (
    get_parquet_file_size_in_mb,
    get_parquet_num_frames,
    load_info,
-    load_json,
    write_episodes,
    write_info,
    write_stats,
@@ -82,11 +81,9 @@ from lerobot.datasets.utils import (
    DEFAULT_DATA_PATH,
    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_PATH,
-    INFO_PATH,
    LEGACY_EPISODES_PATH,
    LEGACY_EPISODES_STATS_PATH,
    LEGACY_TASKS_PATH,
-    DatasetInfo,
    update_chunk_file_indices,
 )
 from lerobot.datasets.video_utils import concatenate_video_files, get_video_duration_in_s
@@ -168,7 +165,7 @@ def legacy_load_tasks(local_dir: Path) -> tuple[dict, dict]:
 def validate_local_dataset_version(local_path: Path) -> None:
    """Validate that the local dataset has the expected v2.1 version."""
    info = load_info(local_path)
-    dataset_version = info.codebase_version or "unknown"
+    dataset_version = info.get("codebase_version", "unknown")
    if dataset_version != V21:
        raise ValueError(
            f"Local dataset has codebase version '{dataset_version}', expected '{V21}'. "
@@ -259,14 +256,14 @@ def convert_data(root: Path, new_root: Path, data_file_size_in_mb: int):

 def get_video_keys(root):
    info = load_info(root)
-    features = info.features
+    features = info["features"]
    video_keys = [key for key, ft in features.items() if ft["dtype"] == "video"]
    return video_keys


 def get_image_keys(root):
    info = load_info(root)
-    features = info.features
+    features = info["features"]
    image_keys = [key for key, ft in features.items() if ft["dtype"] == "image"]
    return image_keys

@@ -437,8 +434,7 @@ def convert_episodes_metadata(root, new_root, episodes_metadata, episodes_video_


 def convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb):
-    # Load as raw dict to remove legacy v2.1 fields before constructing DatasetInfo.
-    info = load_json(root / INFO_PATH)
+    info = load_info(root)
    info["codebase_version"] = V30
    del info["total_chunks"]
    del info["total_videos"]
@@ -453,9 +449,7 @@ def convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb):
            # already has fps in video_info
            continue
        info["features"][key]["fps"] = info["fps"]
-    # Convert raw dict to typed DatasetInfo before writing
-    dataset_info = DatasetInfo.from_dict(info)
-    write_info(dataset_info, new_root)
+    write_info(info, new_root)


 def convert_dataset(
@@ -13,62 +13,70 @@
 # limitations under the License.

 """
-Records a dataset via teleoperation.  This is a pure data-collection
-tool — no policy inference.  For deploying trained policies, use
-``lerobot-rollout`` instead.
+Records a dataset. Actions for the robot can be either generated by teleoperation or by a policy.

 Requires: pip install 'lerobot[core_scripts]'  (includes dataset + hardware + viz extras)

 Example:

 ```shell
-lerobot-record \\
-    --robot.type=so100_follower \\
-    --robot.port=/dev/tty.usbmodem58760431541 \\
-    --robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \\
-    --robot.id=black \\
-    --teleop.type=so100_leader \\
-    --teleop.port=/dev/tty.usbmodem58760431551 \\
-    --teleop.id=blue \\
-    --dataset.repo_id=<my_username>/<my_dataset_name> \\
-    --dataset.num_episodes=2 \\
-    --dataset.single_task="Grab the cube" \\
-    --dataset.streaming_encoding=true \\
-    --dataset.encoder_threads=2 \\
+lerobot-record \
+    --robot.type=so100_follower \
+    --robot.port=/dev/tty.usbmodem58760431541 \
+    --robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --robot.id=black \
+    --dataset.repo_id=<my_username>/<my_dataset_name> \
+    --dataset.num_episodes=2 \
+    --dataset.single_task="Grab the cube" \
+    --dataset.streaming_encoding=true \
+    --dataset.encoder_threads=2 \
    --display_data=true
+    # <- Optional: specify video codec (auto, h264, hevc, libsvtav1). Default is libsvtav1. \
+    # --dataset.vcodec=h264 \
+    # <- Teleop optional if you want to teleoperate to record or in between episodes with a policy \
+    # --teleop.type=so100_leader \
+    # --teleop.port=/dev/tty.usbmodem58760431551 \
+    # --teleop.id=blue \
+    # <- Policy optional if you want to record with a policy \
+    # --policy.path=${HF_USER}/my_policy \
 ```

 Example recording with bimanual so100:
 ```shell
-lerobot-record \\
-  --robot.type=bi_so_follower \\
-  --robot.left_arm_config.port=/dev/tty.usbmodem5A460822851 \\
-  --robot.right_arm_config.port=/dev/tty.usbmodem5A460814411 \\
-  --robot.id=bimanual_follower \\
+lerobot-record \
+  --robot.type=bi_so_follower \
+  --robot.left_arm_config.port=/dev/tty.usbmodem5A460822851 \
+  --robot.right_arm_config.port=/dev/tty.usbmodem5A460814411 \
+  --robot.id=bimanual_follower \
  --robot.left_arm_config.cameras='{
    wrist: {"type": "opencv", "index_or_path": 1, "width": 640, "height": 480, "fps": 30},
    top: {"type": "opencv", "index_or_path": 3, "width": 640, "height": 480, "fps": 30},
  }' --robot.right_arm_config.cameras='{
    wrist: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
    front: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
-  }' \\
-  --teleop.type=bi_so_leader \\
-  --teleop.left_arm_config.port=/dev/tty.usbmodem5A460852721 \\
-  --teleop.right_arm_config.port=/dev/tty.usbmodem5A460819811 \\
-  --teleop.id=bimanual_leader \\
-  --display_data=true \\
-  --dataset.repo_id=${HF_USER}/bimanual-so-handover-cube \\
-  --dataset.num_episodes=25 \\
-  --dataset.single_task="Grab and handover the red cube to the other arm" \\
-  --dataset.streaming_encoding=true \\
+  }' \
+  --teleop.type=bi_so_leader \
+  --teleop.left_arm_config.port=/dev/tty.usbmodem5A460852721 \
+  --teleop.right_arm_config.port=/dev/tty.usbmodem5A460819811 \
+  --teleop.id=bimanual_leader \
+  --display_data=true \
+  --dataset.repo_id=${HF_USER}/bimanual-so-handover-cube \
+  --dataset.num_episodes=25 \
+  --dataset.single_task="Grab and handover the red cube to the other arm" \
+  --dataset.streaming_encoding=true \
+  # --dataset.vcodec=auto \
  --dataset.encoder_threads=2
 ```
 """

 import logging
 import time
-from dataclasses import asdict, dataclass
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
 from pprint import pformat
+from typing import Any
+
+import torch

 from lerobot.cameras import CameraConfig  # noqa: F401
 from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
@@ -78,10 +86,11 @@ from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
 from lerobot.common.control_utils import (
    init_keyboard_listener,
    is_headless,
+    predict_action,
+    sanity_check_dataset_name,
    sanity_check_dataset_robot_compatibility,
 )
-from lerobot.configs import parser
-from lerobot.configs.dataset import DatasetRecordConfig
+from lerobot.configs import PreTrainedConfig, parser
 from lerobot.datasets import (
    LeRobotDataset,
    VideoEncodingManager,
@@ -89,11 +98,21 @@ from lerobot.datasets import (
    create_initial_features,
    safe_stop_image_writer,
 )
+from lerobot.policies import (
+    ActionInterpolator,
+    PreTrainedPolicy,
+    make_policy,
+    make_pre_post_processors,
+    make_robot_action,
+)
 from lerobot.processor import (
+    PolicyAction,
+    PolicyProcessorPipeline,
    RobotAction,
    RobotObservation,
    RobotProcessorPipeline,
    make_default_processors,
+    rename_stats,
 )
 from lerobot.robots import (  # noqa: F401
    Robot,
@@ -127,6 +146,7 @@ from lerobot.teleoperators import (  # noqa: F401
 )
 from lerobot.teleoperators.keyboard import KeyboardTeleop
 from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.device_utils import get_safe_torch_device
 from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
 from lerobot.utils.import_utils import register_third_party_plugins
 from lerobot.utils.robot_utils import precise_sleep
@@ -137,12 +157,71 @@ from lerobot.utils.utils import (
 from lerobot.utils.visualization_utils import init_rerun, log_rerun_data


+@dataclass
+class DatasetRecordConfig:
+    # Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test`).
+    repo_id: str
+    # A short but accurate description of the task performed during the recording (e.g. "Pick the Lego block and drop it in the box on the right.")
+    single_task: str
+    # Root directory where the dataset will be stored (e.g. 'dataset/path'). If None, defaults to $HF_LEROBOT_HOME/repo_id.
+    root: str | Path | None = None
+    # Limit the frames per second.
+    fps: int = 30
+    # Number of seconds for data recording for each episode.
+    episode_time_s: int | float = 60
+    # Number of seconds for resetting the environment after each episode.
+    reset_time_s: int | float = 60
+    # Number of episodes to record.
+    num_episodes: int = 50
+    # Encode frames in the dataset into video
+    video: bool = True
+    # Upload dataset to Hugging Face hub.
+    push_to_hub: bool = True
+    # Upload on private repository on the Hugging Face hub.
+    private: bool = False
+    # Add tags to your dataset on the hub.
+    tags: list[str] | None = None
+    # Number of subprocesses handling the saving of frames as PNG. Set to 0 to use threads only;
+    # set to ≥1 to use subprocesses, each using threads to write images. The best number of processes
+    # and threads depends on your system. We recommend 4 threads per camera with 0 processes.
+    # If fps is unstable, adjust the thread count. If still unstable, try using 1 or more subprocesses.
+    num_image_writer_processes: int = 0
+    # Number of threads writing the frames as png images on disk, per camera.
+    # Too many threads might cause unstable teleoperation fps due to main thread being blocked.
+    # Not enough threads might cause low camera fps.
+    num_image_writer_threads_per_camera: int = 4
+    # Number of episodes to record before batch encoding videos
+    # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
+    video_encoding_batch_size: int = 1
+    # Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto',
+    # or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'.
+    # Use 'auto' to auto-detect the best available hardware encoder.
+    vcodec: str = "libsvtav1"
+    # Enable streaming video encoding: encode frames in real-time during capture instead
+    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
+    streaming_encoding: bool = False
+    # Maximum number of frames to buffer per camera when using streaming encoding.
+    # ~1s buffer at 30fps. Provides backpressure if the encoder can't keep up.
+    encoder_queue_maxsize: int = 30
+    # Number of threads per encoder instance. None = auto (codec default).
+    # Lower values reduce CPU usage, maps to 'lp' (via svtav1-params) for libsvtav1 and 'threads' for h264/hevc..
+    encoder_threads: int | None = None
+    # Rename map for the observation to override the image and state keys
+    rename_map: dict[str, str] = field(default_factory=dict)
+
+    def __post_init__(self):
+        if self.single_task is None:
+            raise ValueError("You need to provide a task as argument in `single_task`.")
+
+
@dataclass
 class RecordConfig:
    robot: RobotConfig
    dataset: DatasetRecordConfig
-    # Teleoperator to control the robot (required)
+    # Whether to control the robot with a teleoperator
    teleop: TeleoperatorConfig | None = None
+    # Whether to control the robot with a policy
+    policy: PreTrainedConfig | None = None
    # Display all cameras on screen
    display_data: bool = False
    # Display data on a remote Rerun server
@@ -155,14 +234,27 @@ class RecordConfig:
    play_sounds: bool = True
    # Resume recording on an existing dataset.
    resume: bool = False
+    # Action interpolation multiplier for smoother policy control (1=off, 2=2x, 3=3x)
+    # Only applies when using a policy (not teleop)
+    interpolation_multiplier: int = 1

    def __post_init__(self):
-        if self.teleop is None:
-            raise ValueError(
-                "A teleoperator is required for recording. "
-                "Use --teleop.type=... to specify one. "
-                "For policy-based deployment, use lerobot-rollout instead."
-            )
+        # HACK: We parse again the cli args here to get the pretrained path if there was one.
+        policy_path = parser.get_path_arg("policy")
+
+        if policy_path:
+            cli_overrides = parser.get_cli_overrides("policy")
+
+            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
+            self.policy.pretrained_path = policy_path
+
+        if self.teleop is None and self.policy is None:
+            raise ValueError("Choose a policy, a teleoperator or both to control the robot")
+
+    @classmethod
+    def __get_path_fields__(cls) -> list[str]:
+        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
+        return ["policy"]


 """ --------------- record_loop() data flow --------------------------
@@ -172,14 +264,18 @@ class RecordConfig:
           V
     [ robot_observation_processor ] ---> processed_obs
           V
-     [ Teleoperator ]
-     |
-     |  [teleop.get_action] -> raw_action
-     |          |
-     |          V
-     | [teleop_action_processor]
-     |          |
-     '---> processed_teleop_action
+     .-----( ACTION LOGIC )------------------.
+     V                                       V
+     [ From Teleoperator ]                   [ From Policy ]
+     |                                       |
+     |  [teleop.get_action] -> raw_action    |   [predict_action]
+     |          |                            |          |
+     |          V                            |          V
+     | [teleop_action_processor]             |          |
+     |          |                            |          |
+     '---> processed_teleop_action           '---> processed_policy_action
+     |                                       |
+     '-------------------------.-------------'
                               V
                  [ robot_action_processor ] --> robot_action_to_send
                               V
@@ -207,9 +303,13 @@ def record_loop(
    ],  # runs after robot
    dataset: LeRobotDataset | None = None,
    teleop: Teleoperator | list[Teleoperator] | None = None,
+    policy: PreTrainedPolicy | None = None,
+    preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]] | None = None,
+    postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction] | None = None,
    control_time_s: int | None = None,
    single_task: str | None = None,
    display_data: bool = False,
+    interpolator: ActionInterpolator | None = None,
    display_compressed_images: bool = False,
 ):
    if dataset is not None and dataset.fps != fps:
@@ -240,7 +340,21 @@ def record_loop(
                "For multi-teleop, the list must contain exactly one KeyboardTeleop and one arm teleoperator. Currently only supported for LeKiwi robot."
            )

-    control_interval = 1 / fps
+    # Reset policy and processor if they are provided
+    if policy is not None and preprocessor is not None and postprocessor is not None:
+        policy.reset()
+        preprocessor.reset()
+        postprocessor.reset()
+
+    # Reset interpolator if provided
+    if interpolator is not None:
+        interpolator.reset()
+
+    # Calculate control interval based on interpolation
+    use_interpolation = interpolator is not None and interpolator.enabled and policy is not None
+    control_interval = interpolator.get_control_interval(fps) if interpolator else 1 / fps
+    # Pre-compute action key order outside the hot loop — it won't change mid-episode.
+    action_keys = sorted(robot.action_features) if use_interpolation else []

    no_action_count = 0
    timestamp = 0
@@ -258,11 +372,63 @@ def record_loop(
        # Applies a pipeline to the raw robot observation, default is IdentityProcessor
        obs_processed = robot_observation_processor(obs)

-        if dataset is not None:
+        if policy is not None or dataset is not None:
            observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)

-        # Get action from teleop
-        if isinstance(teleop, Teleoperator):
+        # Track whether this iteration should be recorded to the dataset.
+        # Interpolated-only iterations send actions to the robot but don't record frames,
+        # keeping the dataset at the original fps while the robot moves at the higher rate.
+        is_record_frame = True
+
+        # Get action from either policy or teleop
+        if policy is not None and preprocessor is not None and postprocessor is not None:
+            # With interpolation: only call policy when interpolator needs new action
+            if use_interpolation:
+                ran_inference = False
+
+                if interpolator.needs_new_action():
+                    action_values = predict_action(
+                        observation=observation_frame,
+                        policy=policy,
+                        device=get_safe_torch_device(policy.config.device),
+                        preprocessor=preprocessor,
+                        postprocessor=postprocessor,
+                        use_amp=policy.config.use_amp,
+                        task=single_task,
+                        robot_type=robot.robot_type,
+                    )
+                    act_processed_policy = make_robot_action(action_values, dataset.features)
+                    robot_action_to_send = robot_action_processor((act_processed_policy, obs))
+
+                    action_tensor = torch.tensor([robot_action_to_send[k] for k in action_keys])
+                    interpolator.add(action_tensor)
+                    ran_inference = True
+
+                interp_action = interpolator.get()
+                if interp_action is not None:
+                    robot_action_to_send = {k: interp_action[i].item() for i, k in enumerate(action_keys)}
+                    action_values = robot_action_to_send
+                else:
+                    continue
+
+                is_record_frame = ran_inference
+            else:
+                action_values = predict_action(
+                    observation=observation_frame,
+                    policy=policy,
+                    device=get_safe_torch_device(policy.config.device),
+                    preprocessor=preprocessor,
+                    postprocessor=postprocessor,
+                    use_amp=policy.config.use_amp,
+                    task=single_task,
+                    robot_type=robot.robot_type,
+                )
+                act_processed_policy: RobotAction = make_robot_action(action_values, dataset.features)
+                # Applies a pipeline to the action, default is IdentityProcessor
+                robot_action_to_send = robot_action_processor((act_processed_policy, obs))
+                action_values = robot_action_to_send
+
+        elif policy is None and isinstance(teleop, Teleoperator):
            act = teleop.get_action()
            if robot.name == "unitree_g1":
                teleop.send_feedback(obs)
@@ -272,7 +438,7 @@ def record_loop(
            action_values = act_processed_teleop
            robot_action_to_send = robot_action_processor((act_processed_teleop, obs))

-        elif isinstance(teleop, list):
+        elif policy is None and isinstance(teleop, list):
            arm_action = teleop_arm.get_action()
            arm_action = {f"arm_{k}": v for k, v in arm_action.items()}
            keyboard_action = teleop_keyboard.get_action()
@@ -285,7 +451,7 @@ def record_loop(
            no_action_count += 1
            if no_action_count == 1 or no_action_count % 10 == 0:
                logging.warning(
-                    "No teleoperator provided, skipping action generation. "
+                    "No policy or teleoperator provided, skipping action generation. "
                    "This is likely to happen when resetting the environment without a teleop device. "
                    "The robot won't be at its rest position at the start of the next episode."
                )
@@ -297,8 +463,8 @@ def record_loop(
        # TODO(steven, pepijn, adil): we should use a pipeline step to clip the action, so the sent action is the action that we input to the robot.
        _sent_action = robot.send_action(robot_action_to_send)

-        # Write to dataset
-        if dataset is not None:
+        # Write to dataset (only on real policy frames, not interpolated-only iterations)
+        if dataset is not None and is_record_frame:
            action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
            frame = {**observation_frame, **action_frame, "task": single_task}
            dataset.add_frame(frame)
@@ -322,12 +488,7 @@ def record_loop(


@parser.wrap()
-def record(
-    cfg: RecordConfig,
-    teleop_action_processor: RobotProcessorPipeline | None = None,
-    robot_action_processor: RobotProcessorPipeline | None = None,
-    robot_observation_processor: RobotProcessorPipeline | None = None,
-) -> LeRobotDataset:
+def record(cfg: RecordConfig) -> LeRobotDataset:
    init_logging()
    logging.info(pformat(asdict(cfg)))
    if cfg.display_data:
@@ -341,16 +502,7 @@ def record(
    robot = make_robot_from_config(cfg.robot)
    teleop = make_teleoperator_from_config(cfg.teleop) if cfg.teleop is not None else None

-    # Fall back to identity pipelines when the caller doesn't supply processors.
-    if (
-        teleop_action_processor is None
-        or robot_action_processor is None
-        or robot_observation_processor is None
-    ):
-        _t, _r, _o = make_default_processors()
-        teleop_action_processor = teleop_action_processor or _t
-        robot_action_processor = robot_action_processor or _r
-        robot_observation_processor = robot_observation_processor or _o
+    teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()

    dataset_features = combine_feature_dicts(
        aggregate_pipeline_dataset_features(
@@ -388,14 +540,8 @@ def record(
            )
            sanity_check_dataset_robot_compatibility(dataset, robot, cfg.dataset.fps, dataset_features)
        else:
-            # Reject eval_ prefix — for policy evaluation use lerobot-rollout
-            repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
-            if repo_name.startswith("eval_"):
-                raise ValueError(
-                    "Dataset names starting with 'eval_' are reserved for policy evaluation. "
-                    "lerobot-record is for data collection only. Use lerobot-rollout for policy deployment."
-                )
-            cfg.dataset.stamp_repo_id()
+            # Create empty dataset or load existing saved episodes
+            sanity_check_dataset_name(cfg.dataset.repo_id, cfg.policy)
            dataset = LeRobotDataset.create(
                cfg.dataset.repo_id,
                cfg.dataset.fps,
@@ -412,6 +558,30 @@ def record(
                encoder_threads=cfg.dataset.encoder_threads,
            )

+        # Load pretrained policy
+        policy = (
+            None
+            if cfg.policy is None
+            else make_policy(cfg.policy, ds_meta=dataset.meta, rename_map=cfg.dataset.rename_map)
+        )
+        preprocessor = None
+        postprocessor = None
+        interpolator = None
+        if cfg.policy is not None:
+            preprocessor, postprocessor = make_pre_post_processors(
+                policy_cfg=cfg.policy,
+                pretrained_path=cfg.policy.pretrained_path,
+                dataset_stats=rename_stats(dataset.meta.stats, cfg.dataset.rename_map),
+                preprocessor_overrides={
+                    "device_processor": {"device": cfg.policy.device},
+                    "rename_observations_processor": {"rename_map": cfg.dataset.rename_map},
+                },
+            )
+            # Create interpolator for smoother policy control
+            if cfg.interpolation_multiplier > 1:
+                interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
+                logging.info(f"Action interpolation enabled: {cfg.interpolation_multiplier}x control rate")
+
        robot.connect()
        if teleop is not None:
            teleop.connect()
@@ -435,10 +605,14 @@ def record(
                    robot_action_processor=robot_action_processor,
                    robot_observation_processor=robot_observation_processor,
                    teleop=teleop,
+                    policy=policy,
+                    preprocessor=preprocessor,
+                    postprocessor=postprocessor,
                    dataset=dataset,
                    control_time_s=cfg.dataset.episode_time_s,
                    single_task=cfg.dataset.single_task,
                    display_data=cfg.display_data,
+                    interpolator=interpolator,
                    display_compressed_images=display_compressed_images,
                )

@@ -486,10 +660,7 @@ def record(
            listener.stop()

        if cfg.dataset.push_to_hub:
-            if dataset and dataset.num_episodes > 0:
-                dataset.push_to_hub(tags=cfg.dataset.tags, private=cfg.dataset.private)
-            else:
-                logging.warning("No episodes saved — skipping push to hub")
+            dataset.push_to_hub(tags=cfg.dataset.tags, private=cfg.dataset.private)

        log_say("Exiting", cfg.play_sounds)
    return dataset
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Khalil Meftah	ef8bfffbd7	fix(rl): enhance intervention handling in actor and learner	2026-04-26 23:09:33 +02:00
Khalil Meftah	f887ab3f6a	fix(rl): improve action processing for discrete and continuous actions	2026-04-26 22:47:52 +02:00
Khalil Meftah	c2556439e5	fix(rl): postprocess action in actor	2026-04-26 18:15:04 +02:00
Khalil Meftah	d2a046dfc5	fix(rl): mirror gym_manipulator in actor	2026-04-26 18:11:26 +02:00
Khalil Meftah	613d581f6c	remove debug	2026-04-26 18:08:13 +02:00
Khalil Meftah	58b6d844c4	debug	2026-04-26 17:33:15 +02:00
Khalil Meftah	30e1886b64	fix(rl): merge environment and action-processor info in transition processing	2026-04-26 17:12:37 +02:00
Khalil Meftah	9c9064e5be	fix(rl): update neutral gripper action	2026-04-26 16:42:53 +02:00
Khalil Meftah	494f469a2b	fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100	2026-04-26 16:41:55 +02:00
Khalil Meftah	cd105f65cb	fix(rl): add time limit processor to environment pipeline	2026-04-26 16:38:20 +02:00
Khalil Meftah	9c2af818ff	fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline	2026-04-26 16:36:21 +02:00
Khalil Meftah	6495bb9706	add processor to main	2026-04-24 17:06:57 +02:00
				`@@ -0,0 +1 @@`
				`../../../../docs/source/policy_sarm_README.md`