feat(rl): add bus-control primitives and smooth move functionality for leader intervention

feat(rl): port haptic follow + torque toggle from #2596 to leader intervention
feat(rl): leader arm as HIL-SERL intervention device (position-only)
2026-05-12 15:19:43 +00:00 · 2026-04-27 18:31:13 +02:00 · 2026-04-27 17:50:29 +02:00 · 2026-04-27 17:26:29 +02:00 · 2026-04-27 16:53:34 +02:00 · 2026-04-27 16:20:13 +02:00
156 changed files with 7526 additions and 8512 deletions
@@ -2,6 +2,11 @@

 Short, imperative summary (e.g., "fix(robots): handle None in sensor parser"). See [CONTRIBUTING.md](../CONTRIBUTING.md) for PR conventions.

+## Type / Scope
+
+- **Type**: (Bug | Feature | Docs | Performance | Test | CI | Chore)
+- **Scope**: (optional — name of module or package affected)
+
 ## Summary / Motivation

 - One-paragraph description of what changes and why.
@@ -14,14 +19,28 @@ Short, imperative summary (e.g., "fix(robots): handle None in sensor parser"). S

 ## What changed

- Short, concrete bullets explaining the functional changes (how the behavior or output differs now).
+- Short, concrete bullets of the modifications (files/behaviour).
 - Short note if this introduces breaking changes and migration steps.

 ## How was this tested (or how to run locally)

- Tests added: list new tests or test files. `pytest -q tests/ -k <keyword>`
+- Tests added: list new tests or test files.
 - Manual checks / dataset runs performed.
- Instructions for the reviewer for reproducing with a quick example or CLI (if applicable)
+- Instructions for the reviewer
+
+Example:
+
+- Ran the relevant tests:
+
+  ```bash
+  pytest -q tests/ -k <keyword>
+  ```
+
+- Reproduce with a quick example or CLI (if applicable):
+
+  ```bash
+  lerobot-train --some.option=true
+  ```

 ## Checklist (required before merge)

@@ -29,7 +48,6 @@ Short, imperative summary (e.g., "fix(robots): handle None in sensor parser"). S
 - [ ] All tests pass locally (`pytest`)
 - [ ] Documentation updated
 - [ ] CI is green
- [ ] Community Review: I have reviewed another contributor's open PR and linked it here: # (insert PR number/link)

 ## Reviewer notes

@@ -33,7 +33,7 @@ jobs:
      github.event.workflow_run.event == 'pull_request' &&
      github.event.workflow_run.conclusion == 'success' &&
      github.repository == 'huggingface/lerobot'
-    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@9ad2de8582b56c017cb530c1165116d40433f1c6  # main
+    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
    with:
      package_name: lerobot
    secrets:
@@ -217,24 +217,6 @@ jobs:
      - name: Run end-to-end tests
        run: make test-end-to-end

-  slack-notification:
-    name: Slack Notification
-    needs: [cpu-tests, gpu-tests, upgrade-lock]
-    if: always() && needs.upgrade-lock.outputs.changed == 'true'
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-    env:
-      CI_SLACK_CHANNEL: ${{ secrets.CI_SLACK_CHANNEL }}
-    steps:
-      - name: Post to a Slack channel
-        uses: huggingface/hf-workflows/.github/actions/post-slack@a88e7fa2eaee28de5a4d6142381b1fb792349b67  # main
-        with:
-          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
-          title: "Results of the latest dependency tests (CPU + GPU)"
-          status: ${{ (needs.cpu-tests.result == 'success' && needs.gpu-tests.result == 'success') && 'success' || 'failure' }}
-          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
-
  # This job creates or updates a PR with the upgraded lockfile
  open-pr:
    name: Open PR
@@ -78,9 +78,6 @@ Use the templates for required fields and examples.
 - **Issues:** Follow the [ticket template](https://github.com/huggingface/lerobot/blob/main/.github/ISSUE_TEMPLATE/bug-report.yml).
 - **Pull requests:** Rebase on `upstream/main`, use a descriptive branch (don't work on `main`), run `pre-commit` and tests locally, and follow the [PR template](https://github.com/huggingface/lerobot/blob/main/.github/PULL_REQUEST_TEMPLATE.md).

-> [!IMPORTANT]
-> Community Review Policy: To help scale our efforts and foster a collaborative environment, we ask contributors to review at least one other person's open PR before their own receives attention. This shared responsibility multiplies our review capacity and helps everyone's code get merged faster!
-
-Once you have submitted your PR and completed a peer review, a member of the LeRobot team will review your contribution.
+One member of the LeRobot team will then review your contribution.

 Thank you for contributing to LeRobot!
@@ -61,8 +61,6 @@
    title: SARM
  title: "Reward Models"
 - sections:
-  - local: inference
-    title: Policy Deployment (lerobot-rollout)
  - local: async
    title: Use Async Inference
  - local: rtc
@@ -50,30 +50,30 @@ This process can be repeated iteratively: deploy, collect, fine-tune, repeat. Ea

 ### Teleoperator Requirements

-The `lerobot-rollout --strategy.type=dagger` mode requires **teleoperators with active motors** that can:
+The `examples/hil` HIL scripts require **teleoperators with active motors** that can:

 - Enable/disable torque programmatically
 - Move to target positions (to mirror the robot state when pausing)

-**Compatible teleoperators:**
+**Compatible teleoperators in the current `examples/hil` scripts:**

 - `openarm_mini` - OpenArm Mini
 - `so_leader` - SO100 / SO101 leader arm

 > [!IMPORTANT]
-> The provided commands default to `bi_openarm_follower` + `openarm_mini`.
+> The provided `examples/hil` commands default to `bi_openarm_follower` + `openarm_mini`.
 > `so_follower` + `so_leader` configs are also registered and can be used via CLI flags.

 ---

 ## Script

-Use `lerobot-rollout` with `--strategy.type=dagger` for HIL data collection. Select the inference backend with `--inference.type=sync|rtc`:
+A single script handles both synchronous and RTC-based inference. Toggle RTC with `--rtc.enabled=true`:

-| Mode                     | Flag                   | Models                |
-| ------------------------ | ---------------------- | --------------------- |
-| Standard (default)       | _(no flag needed)_     | ACT, Diffusion Policy |
-| Real-Time Chunking (RTC) | `--inference.type=rtc` | Pi0, Pi0.5, SmolVLA   |
+| Mode                     | Flag                 | Models                |
+| ------------------------ | -------------------- | --------------------- |
+| Standard (default)       | _(no flag needed)_   | ACT, Diffusion Policy |
+| Real-Time Chunking (RTC) | `--rtc.enabled=true` | Pi0, Pi0.5, SmolVLA   |

 ---

@@ -97,7 +97,7 @@ python src/lerobot/scripts/lerobot_train.py \
 **Standard inference (ACT, Diffusion Policy):**

 ```bash
-lerobot-rollout --strategy.type=dagger \
+python examples/hil/hil_data_collection.py \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -111,7 +111,8 @@ lerobot-rollout --strategy.type=dagger \
    --dataset.repo_id=your-username/hil-dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --strategy.num_episodes=50 \
+    --dataset.episode_time_s=1000 \
+    --dataset.num_episodes=50 \
    --interpolation_multiplier=2
 ```

@@ -120,11 +121,11 @@ lerobot-rollout --strategy.type=dagger \
 For models with high inference latency, enable RTC for smooth execution:

 ```bash
-lerobot-rollout --strategy.type=dagger \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=20 \
-    --inference.rtc.max_guidance_weight=5.0 \
-    --inference.rtc.prefix_attention_schedule=LINEAR \
+python examples/hil/hil_data_collection.py \
+    --rtc.enabled=true \
+    --rtc.execution_horizon=20 \
+    --rtc.max_guidance_weight=5.0 \
+    --rtc.prefix_attention_schedule=LINEAR \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -138,7 +139,8 @@ lerobot-rollout --strategy.type=dagger \
    --dataset.repo_id=your-username/hil-rtc-dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --strategy.num_episodes=50 \
+    --dataset.episode_time_s=1000 \
+    --dataset.num_episodes=50 \
    --interpolation_multiplier=3
 ```

@@ -233,7 +235,7 @@ This HIL data collection approach builds on ideas from interactive imitation lea

 - **HG-DAgger** (Kelly et al., 2019) made this practical for robotics: a human expert monitors the robot and only intervenes when needed, rather than labeling every state. The gating between autonomous and human control is exactly the pause → takeover → return-to-policy loop used in the scripts here.

- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the DAgger strategy in `lerobot-rollout`.
+- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the HIL scripts in `examples/hil`.

 - **π0.6/RECAP** (Physical Intelligence, 2025) applies the same iterative collect-and-finetune loop at scale with VLA models, showing that even large pretrained policies benefit substantially from targeted human corrections on their own failure modes. π0.6 is trained using RECAP.

@@ -820,10 +820,10 @@ The LeRobot system uses a distributed actor-learner architecture for training. T

 Create a training configuration file (example available [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/rl/train_config.json)). The training config is based on the main `TrainRLServerPipelineConfig` class in `lerobot/configs/train.py`.

-1. Configure the policy settings (`type="sac"`, `device`, etc.)
+1. Configure the policy settings (`type="gaussian_actor"`, `device`, etc.)
 2. Set `dataset` to your cropped dataset
 3. Configure environment settings with crop parameters
-4. Check the other parameters related to SAC in [configuration_sac.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/sac/configuration_sac.py#L79).
+4. Check the other parameters related to the Gaussian Actor in [configuration_gaussian_actor.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/gaussian_actor/configuration_gaussian_actor.py#L79).
 5. Verify that the `policy` config is correct with the right `input_features` and `output_features` for your task.

 **Starting the Learner**
@@ -926,7 +926,7 @@ The ideal behaviour is that your intervention rate should drop gradually during

 Some configuration values have a disproportionate impact on training stability and speed:

- **`temperature_init`** (`policy.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
+- **`temperature_init`** (`algorithm.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
 - **`policy_parameters_push_frequency`** (`policy.actor_learner_config.policy_parameters_push_frequency`) – interval in _seconds_ between two weight pushes from the learner to the actor. The default is `4 s`. Decrease to **1-2 s** to provide fresher weights (at the cost of more network traffic); increase only if your connection is slow, as this will reduce sample efficiency.
 - **`storage_device`** (`policy.storage_device`) – device on which the learner keeps the policy parameters. If you have spare GPU memory, set this to `"cuda"` (instead of the default `"cpu"`). Keeping the weights on-GPU removes CPU→GPU transfer overhead and can significantly increase the number of learner updates per second.

@@ -32,12 +32,6 @@ Once you’ve gathered enough trajectories, you’ll train a neural network to i

 If you run into any issues at any point, jump into our [Discord community](https://discord.com/invite/s3KuuzsPFb) for support.

-<Tip>
-
-Want to quickly get the right commands for your setup? The [quickstart notebook](https://github.com/huggingface/lerobot/blob/main/examples/notebooks/quickstart.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/lerobot/blob/main/examples/notebooks/quickstart.ipynb) lets you configure your robot once and generates all the commands below ready to paste.
-
-</Tip>
-
 ## Set up and Calibrate

 If you haven't yet set up and calibrated your robot and teleop device, please do so by following the robot-specific tutorial.
@@ -509,42 +503,121 @@ hf upload ${HF_USER}/act_so101_test${CKPT} \

 ## Run inference and evaluate your policy

-Use `lerobot-rollout` to deploy a trained policy on your robot. You can choose different strategies depending on your needs:
+You can use the `record` script from [`lerobot-record`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy. For instance, run this command or API example to run inference and record 10 evaluation episodes:

 <hfoptions id="eval">
-<hfoption id="Base mode (no recording)">
+<hfoption id="Command">
 ```bash
-lerobot-rollout \
-  --strategy.type=base \
-  --policy.path=${HF_USER}/my_policy \
-  --robot.type=so100_follower \
-  --robot.port=/dev/ttyACM1 \
-  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
-  --task="Put lego brick into the transparent box" \
-  --duration=60
-```
-</hfoption>
-<hfoption id="Sentry mode (with recording)">
-```bash
-lerobot-rollout \
-  --strategy.type=sentry \
-  --strategy.upload_every_n_episodes=5 \
-  --policy.path=${HF_USER}/my_policy \
+lerobot-record  \
  --robot.type=so100_follower \
  --robot.port=/dev/ttyACM1 \
  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --robot.id=my_awesome_follower_arm \
+  --display_data=false \
  --dataset.repo_id=${HF_USER}/eval_so100 \
  --dataset.single_task="Put lego brick into the transparent box" \
-  --duration=600
+  --dataset.streaming_encoding=true \
+  --dataset.encoder_threads=2 \
+  # --dataset.vcodec=auto \
+  # <- Teleop optional if you want to teleoperate in between episodes \
+  # --teleop.type=so100_leader \
+  # --teleop.port=/dev/ttyACM0 \
+  # --teleop.id=my_awesome_leader_arm \
+  --policy.path=${HF_USER}/my_policy
 ```
+</hfoption>
+<hfoption id="API example">
+
+<!-- prettier-ignore-start -->
+```python
+from lerobot.cameras.opencv import OpenCVCameraConfig
+from lerobot.datasets import LeRobotDataset
+from lerobot.utils.feature_utils import hw_to_dataset_features
+from lerobot.policies.act import ACTPolicy
+from lerobot.policies import make_pre_post_processors
+from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
+from lerobot.scripts.lerobot_record import record_loop
+from lerobot.common.control_utils import init_keyboard_listener
+from lerobot.utils.utils import log_say
+from lerobot.utils.visualization_utils import init_rerun
+
+
+NUM_EPISODES = 5
+FPS = 30
+EPISODE_TIME_SEC = 60
+TASK_DESCRIPTION = "My task description"
+HF_MODEL_ID = "<hf_username>/<model_repo_id>"
+HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"
+
+# Create the robot configuration
+camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
+robot_config = SO100FollowerConfig(
+    port="/dev/tty.usbmodem58760434471", id="my_awesome_follower_arm", cameras=camera_config
+)
+
+# Initialize the robot
+robot = SO100Follower(robot_config)
+
+# Initialize the policy
+policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
+
+# Configure the dataset features
+action_features = hw_to_dataset_features(robot.action_features, "action")
+obs_features = hw_to_dataset_features(robot.observation_features, "observation")
+dataset_features = {**action_features, **obs_features}
+
+# Create the dataset
+dataset = LeRobotDataset.create(
+    repo_id=HF_DATASET_ID,
+    fps=FPS,
+    features=dataset_features,
+    robot_type=robot.name,
+    use_videos=True,
+    image_writer_threads=4,
+)
+
+# Initialize the keyboard listener and rerun visualization
+_, events = init_keyboard_listener()
+init_rerun(session_name="recording")
+
+# Connect the robot
+robot.connect()
+
+preprocessor, postprocessor = make_pre_post_processors(
+    policy_cfg=policy,
+    pretrained_path=HF_MODEL_ID,
+    dataset_stats=dataset.meta.stats,
+)
+
+for episode_idx in range(NUM_EPISODES):
+    log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
+
+    # Run the policy inference loop
+    record_loop(
+        robot=robot,
+        events=events,
+        fps=FPS,
+        policy=policy,
+        preprocessor=preprocessor,
+        postprocessor=postprocessor,
+        dataset=dataset,
+        control_time_s=EPISODE_TIME_SEC,
+        single_task=TASK_DESCRIPTION,
+        display_data=True,
+    )
+
+    dataset.save_episode()
+
+# Clean up
+robot.disconnect()
+dataset.push_to_hub()
+```
+<!-- prettier-ignore-end -->
+
 </hfoption>
 </hfoptions>

-The `--strategy.type` flag selects the execution mode:
+As you can see, it's almost the same command as previously used to record your training dataset. Two things changed:

- `base`: Autonomous rollout with no data recording (useful for quick evaluation)
- `sentry`: Continuous recording with auto-upload (useful for large-scale evaluation)
- `highlight`: Ring buffer recording with keystroke save (useful for capturing interesting events)
- `dagger`: Human-in-the-loop data collection (see [HIL Data Collection](./hil_data_collection))
-
-All strategies support `--inference.type=rtc` for smooth execution with slow VLA models (Pi0, Pi0.5, SmolVLA).
+1. There is an additional `--control.policy.path` argument which indicates the path to your policy checkpoint with (e.g. `outputs/train/eval_act_so101_test/checkpoints/last/pretrained_model`). You can also use the model repository if you uploaded a model checkpoint to the hub (e.g. `${HF_USER}/act_so101_test`).
+2. The name of dataset begins by `eval` to reflect that you are running inference (e.g. `${HF_USER}/eval_act_so101_test`).
@@ -1,261 +0,0 @@
-# Policy Deployment (lerobot-rollout)
-
-`lerobot-rollout` is the single CLI for deploying trained policies on real robots. It supports multiple execution strategies and inference backends, from quick evaluation to continuous recording and human-in-the-loop data collection.
-
-## Quick Start
-
-No extra dependencies are needed beyond your robot and policy extras.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --policy.path=lerobot/act_koch_real \
-    --robot.type=koch_follower \
-    --robot.port=/dev/ttyACM0 \
-    --task="pick up cube" \
-    --duration=30
-```
-
-This runs the policy for 30 seconds with no recording.
-
---
-
-## Strategies
-
-Select a strategy with `--strategy.type=<name>`. Each strategy defines a different control loop with its own recording and interaction semantics.
-
-### Base (`--strategy.type=base`)
-
-Autonomous policy execution with no data recording. Use this for quick evaluation, demos, or when you only need to observe the robot.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --task="Put lego brick into the box" \
-    --duration=60
-```
-
-| Flag             | Description                                            |
-| ---------------- | ------------------------------------------------------ |
-| `--duration`     | Run time in seconds (0 = infinite)                     |
-| `--task`         | Task description passed to the policy                  |
-| `--display_data` | Stream observations/actions to Rerun for visualization |
-
-### Sentry (`--strategy.type=sentry`)
-
-Continuous autonomous recording with periodic upload to the Hugging Face Hub. Episode boundaries are auto-computed from camera resolution and FPS so each saved episode produces a complete video file, keeping uploads efficient.
-
-Policy state (hidden state, RTC queue) persists across episode boundaries: the robot does not reset between episodes.
-
-```bash
-lerobot-rollout \
-    --strategy.type=sentry \
-    --strategy.upload_every_n_episodes=5 \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --dataset.repo_id=${HF_USER}/eval_data \
-    --dataset.single_task="Put lego brick into the box" \
-    --duration=3600
-```
-
-| Flag                                   | Description                                                 |
-| -------------------------------------- | ----------------------------------------------------------- |
-| `--strategy.upload_every_n_episodes`   | Push to Hub every N episodes (default: 5)                   |
-| `--strategy.target_video_file_size_mb` | Target video file size for episode rotation (default: auto) |
-| `--dataset.repo_id`                    | **Required.** Hub repository for the recorded dataset       |
-| `--dataset.push_to_hub`                | Whether to push to Hub on teardown (default: true)          |
-
-### Highlight (`--strategy.type=highlight`)
-
-Autonomous rollout with on-demand recording via a memory-bounded ring buffer. The robot runs continuously while the buffer captures the last N seconds of telemetry. Press the save key to flush the buffer and start live recording; press it again to save the episode.
-
-```bash
-lerobot-rollout \
-    --strategy.type=highlight \
-    --strategy.ring_buffer_seconds=30 \
-    --strategy.save_key=s \
-    --strategy.push_key=h \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=koch_follower \
-    --robot.port=/dev/ttyACM0 \
-    --dataset.repo_id=${HF_USER}/highlight_data \
-    --dataset.single_task="Pick up the red cube"
-```
-
-**Keyboard controls:**
-
-| Key                | Action                                                   |
-| ------------------ | -------------------------------------------------------- |
-| `s` (configurable) | Start recording (flushes buffer) / stop and save episode |
-| `h` (configurable) | Push dataset to Hub                                      |
-| `ESC`              | Stop the session                                         |
-
-| Flag                                   | Description                                    |
-| -------------------------------------- | ---------------------------------------------- |
-| `--strategy.ring_buffer_seconds`       | Duration of buffered telemetry (default: 30)   |
-| `--strategy.ring_buffer_max_memory_mb` | Memory cap for the ring buffer (default: 2048) |
-| `--strategy.save_key`                  | Key to toggle recording (default: `s`)         |
-| `--strategy.push_key`                  | Key to push to Hub (default: `h`)              |
-
-### DAgger (`--strategy.type=dagger`)
-
-Human-in-the-loop data collection. Alternates between autonomous policy execution and human intervention via a teleoperator. Intervention frames are tagged with `intervention=True`. Requires a teleoperator (`--teleop.type`).
-
-See the [Human-In-the-Loop Data Collection](./hil_data_collection) guide for a detailed walkthrough.
-
-**Corrections-only mode** (default): Only human correction windows are recorded. Each correction becomes one episode.
-
-```bash
-lerobot-rollout \
-    --strategy.type=dagger \
-    --strategy.num_episodes=20 \
-    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --robot.type=bi_openarm_follower \
-    --teleop.type=openarm_mini \
-    --dataset.repo_id=${HF_USER}/hil_data \
-    --dataset.single_task="Fold the T-shirt"
-```
-
-**Continuous recording mode** (`--strategy.record_autonomous=true`): Both autonomous and correction frames are recorded with time-based episode rotation (same as Sentry).
-
-```bash
-lerobot-rollout \
-    --strategy.type=dagger \
-    --strategy.record_autonomous=true \
-    --strategy.num_episodes=50 \
-    --policy.path=${HF_USER}/my_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --teleop.type=so101_leader \
-    --teleop.port=/dev/ttyACM1 \
-    --dataset.repo_id=${HF_USER}/dagger_data \
-    --dataset.single_task="Grasp the block"
-```
-
-**Keyboard controls** (default input device):
-
-| Key     | Action                                      |
-| ------- | ------------------------------------------- |
-| `Space` | Pause / resume policy execution             |
-| `Tab`   | Start / stop human correction               |
-| `Enter` | Push dataset to Hub (corrections-only mode) |
-| `ESC`   | Stop the session                            |
-
-Foot pedal input is also supported via `--strategy.input_device=pedal`. Configure pedal codes with `--strategy.pedal.*` flags.
-
-| Flag                                 | Description                                             |
-| ------------------------------------ | ------------------------------------------------------- |
-| `--strategy.num_episodes`            | Number of correction episodes to record (default: 10)   |
-| `--strategy.record_autonomous`       | Record autonomous frames too (default: false)           |
-| `--strategy.upload_every_n_episodes` | Push to Hub every N episodes (default: 5)               |
-| `--strategy.input_device`            | Input device: `keyboard` or `pedal` (default: keyboard) |
-| `--teleop.type`                      | **Required.** Teleoperator type                         |
-
---
-
-## Inference Backends
-
-Select a backend with `--inference.type=<name>`. All strategies work with both backends.
-
-### Sync (default)
-
-One policy call per control tick. The main loop blocks until the action is computed.
-
-Works with all policies. No extra flags needed.
-
-### Real-Time Chunking (`--inference.type=rtc`)
-
-A background thread produces action chunks asynchronously. The main control loop polls for the next ready action while the policy computes the next chunk in parallel.
-
-Use RTC with large, slow VLA models (Pi0, Pi0.5, SmolVLA) for smooth, continuous motion despite high inference latency.
-
-```bash
-lerobot-rollout \
-    --strategy.type=base \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=10 \
-    --inference.rtc.max_guidance_weight=10.0 \
-    --policy.path=${HF_USER}/pi0_policy \
-    --robot.type=so100_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --task="Pick up the cube" \
-    --duration=60 \
-    --device=cuda
-```
-
-| Flag                                        | Description                                                    |
-| ------------------------------------------- | -------------------------------------------------------------- |
-| `--inference.rtc.execution_horizon`         | Steps to blend with previous chunk (default: varies by policy) |
-| `--inference.rtc.max_guidance_weight`       | Consistency enforcement strength (default: varies by policy)   |
-| `--inference.rtc.prefix_attention_schedule` | Blend schedule: `LINEAR`, `EXP`, `ONES`, `ZEROS`               |
-| `--inference.queue_threshold`               | Max queue size before backpressure (default: 30)               |
-
-See the [Real-Time Chunking](./rtc) guide for details on tuning RTC parameters.
-
---
-
-## Common Flags
-
-| Flag                              | Description                                                       | Default |
-| --------------------------------- | ----------------------------------------------------------------- | ------- |
-| `--policy.path`                   | **Required.** HF Hub model ID or local checkpoint path            | --      |
-| `--robot.type`                    | **Required.** Robot type (e.g. `so100_follower`, `koch_follower`) | --      |
-| `--robot.port`                    | Serial port for the robot                                         | --      |
-| `--robot.cameras`                 | Camera configuration (JSON dict)                                  | --      |
-| `--fps`                           | Control loop frequency                                            | 30      |
-| `--duration`                      | Run time in seconds (0 = infinite)                                | 0       |
-| `--device`                        | Torch device (`cpu`, `cuda`, `mps`)                               | auto    |
-| `--task`                          | Task description (used when no dataset is provided)               | --      |
-| `--display_data`                  | Stream telemetry to Rerun visualization                           | false   |
-| `--display_ip` / `--display_port` | Remote Rerun server address                                       | --      |
-| `--interpolation_multiplier`      | Action interpolation factor                                       | 1       |
-| `--use_torch_compile`             | Enable `torch.compile` for inference                              | false   |
-| `--resume`                        | Resume a previous recording session                               | false   |
-| `--play_sounds`                   | Vocal synthesis for events                                        | true    |
-
---
-
-## Programmatic Usage
-
-For custom deployments (e.g. with kinematics processors), use the rollout module API directly:
-
-```python
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.utils.process import ProcessSignalHandler
-
-cfg = RolloutConfig(
-    robot=my_robot_config,
-    policy=my_policy_config,
-    strategy=BaseStrategyConfig(),
-    inference=SyncInferenceConfig(),
-    fps=30,
-    duration=60,
-    task="my task",
-)
-
-signal_handler = ProcessSignalHandler(use_threads=True)
-ctx = build_rollout_context(
-    cfg,
-    signal_handler.shutdown_event,
-    robot_action_processor=my_custom_action_processor,       # optional
-    robot_observation_processor=my_custom_obs_processor,     # optional
-)
-
-strategy = BaseStrategy(cfg.strategy)
-try:
-    strategy.setup(ctx)
-    strategy.run(ctx)
-finally:
-    strategy.teardown(ctx)
-```
-
-See `examples/so100_to_so100_EE/rollout.py` and `examples/phone_to_so100/rollout.py` for full examples with kinematics processors.
@@ -34,7 +34,7 @@ pip install -e ".[smolvla]"

 ### Using RTC with Pi0

-You can use `lerobot-rollout --strategy.type=base --inference.type=rtc` for RTC deployment on real robots.
+You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
 The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:

 ```python
@@ -137,12 +137,8 @@ The script generates a visualization of the denoising process, comparing standar
 ## Testing RTC with a Real Robot

 ```bash
-lerobot-rollout \
-    --strategy.type=base \
+python examples/rtc/eval_with_real_robot.py \
    --policy.path=${HF_USERNAME}/policy_repo_id \
-    --inference.type=rtc \
-    --inference.rtc.execution_horizon=10 \
-    --inference.rtc.max_guidance_weight=10.0 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
@@ -182,7 +178,7 @@ visualizer = RTCDebugVisualizer()
 # ... create plots
 ```

-See `examples/rtc/eval_dataset.py` for a complete example of offline RTC visualization.
+See `examples/rtc/eval_dataset.py` for a complete example of visualization.

 ## References

@@ -284,7 +284,7 @@ python examples/rtc/eval_with_real_robot.py \
  --task="task_description" \
  --duration=1000 \
  --fps=30 \
-  --inference.type=rtc
+  --rtc.enabled=true
 ```

 ---
@@ -0,0 +1,226 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Shared utilities for Human-in-the-Loop data collection scripts."""
+
+import logging
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+
+from lerobot.common.control_utils import is_headless
+from lerobot.processor import (
+    IdentityProcessorStep,
+    RobotAction,
+    RobotObservation,
+    RobotProcessorPipeline,
+    observation_to_transition,
+    robot_action_observation_to_transition,
+    transition_to_observation,
+    transition_to_robot_action,
+)
+from lerobot.robots import Robot
+from lerobot.teleoperators import Teleoperator
+from lerobot.utils.robot_utils import precise_sleep
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class HILDatasetConfig:
+    repo_id: str
+    single_task: str
+    root: str | Path | None = None
+    fps: int = 30
+    episode_time_s: float = 120
+    num_episodes: int = 50
+    video: bool = True
+    push_to_hub: bool = True
+    private: bool = False
+    tags: list[str] | None = None
+    num_image_writer_processes: int = 0
+    num_image_writer_threads_per_camera: int = 4
+    video_encoding_batch_size: int = 1
+    vcodec: str = "auto"
+    streaming_encoding: bool = True
+    encoder_queue_maxsize: int = 30
+    encoder_threads: int | None = None
+    rename_map: dict[str, str] = field(default_factory=dict)
+
+
+def teleop_has_motor_control(teleop: Teleoperator) -> bool:
+    """Check if teleoperator has motor control capabilities."""
+    return all(hasattr(teleop, attr) for attr in ("enable_torque", "disable_torque", "write_goal_positions"))
+
+
+def teleop_disable_torque(teleop: Teleoperator) -> None:
+    """Disable teleop torque if supported."""
+    if hasattr(teleop, "disable_torque"):
+        teleop.disable_torque()
+
+
+def teleop_enable_torque(teleop: Teleoperator) -> None:
+    """Enable teleop torque if supported."""
+    if hasattr(teleop, "enable_torque"):
+        teleop.enable_torque()
+
+
+def teleop_smooth_move_to(teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50):
+    """Smoothly move teleop to target position if motor control is available."""
+    if not teleop_has_motor_control(teleop):
+        logger.warning("Teleop does not support motor control - cannot mirror robot position")
+        return
+
+    teleop_enable_torque(teleop)
+    current = teleop.get_action()
+    steps = max(int(duration_s * fps), 1)
+
+    for step in range(steps + 1):
+        t = step / steps
+        interp = {}
+        for k in current:
+            if k in target_pos:
+                interp[k] = current[k] * (1 - t) + target_pos[k] * t
+            else:
+                interp[k] = current[k]
+        teleop.write_goal_positions(interp)
+        time.sleep(1 / fps)
+
+
+def init_keyboard_listener():
+    """Initialize keyboard listener with HIL controls."""
+    events = {
+        "exit_early": False,
+        "rerecord_episode": False,
+        "stop_recording": False,
+        "policy_paused": False,
+        "correction_active": False,
+        "resume_policy": False,
+        "in_reset": False,
+        "start_next_episode": False,
+    }
+
+    if is_headless():
+        logger.warning("Headless environment - keyboard controls unavailable")
+        return None, events
+
+    from pynput import keyboard
+
+    def on_press(key):
+        try:
+            if events["in_reset"]:
+                if key in [keyboard.Key.space, keyboard.Key.right]:
+                    logger.info("[HIL] Starting next episode...")
+                    events["start_next_episode"] = True
+                elif hasattr(key, "char") and key.char == "c":
+                    events["start_next_episode"] = True
+                elif key == keyboard.Key.esc:
+                    logger.info("[HIL] ESC - Stop recording, pushing to hub...")
+                    events["stop_recording"] = True
+                    events["start_next_episode"] = True
+            else:
+                if key == keyboard.Key.space:
+                    if not events["policy_paused"] and not events["correction_active"]:
+                        logger.info("[HIL] PAUSED - Press 'c' to take control or 'p' to resume policy")
+                        events["policy_paused"] = True
+                elif hasattr(key, "char") and key.char == "c":
+                    if events["policy_paused"] and not events["correction_active"]:
+                        logger.info("[HIL] Taking control...")
+                        events["start_next_episode"] = True
+                elif hasattr(key, "char") and key.char == "p":
+                    if events["policy_paused"] or events["correction_active"]:
+                        logger.info("[HIL] Resuming policy...")
+                        events["resume_policy"] = True
+                elif key == keyboard.Key.right:
+                    logger.info("[HIL] End episode")
+                    events["exit_early"] = True
+                elif key == keyboard.Key.left:
+                    logger.info("[HIL] Re-record episode")
+                    events["rerecord_episode"] = True
+                    events["exit_early"] = True
+                elif key == keyboard.Key.esc:
+                    logger.info("[HIL] ESC - Stop recording...")
+                    events["stop_recording"] = True
+                    events["exit_early"] = True
+        except Exception as e:
+            logger.info(f"Key error: {e}")
+
+    listener = keyboard.Listener(on_press=on_press)
+    listener.start()
+    return listener, events
+
+
+def make_identity_processors():
+    """Create identity processors for recording."""
+    teleop_proc = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
+        steps=[IdentityProcessorStep()],
+        to_transition=robot_action_observation_to_transition,
+        to_output=transition_to_robot_action,
+    )
+    obs_proc = RobotProcessorPipeline[RobotObservation, RobotObservation](
+        steps=[IdentityProcessorStep()],
+        to_transition=observation_to_transition,
+        to_output=transition_to_observation,
+    )
+    return teleop_proc, obs_proc
+
+
+def reset_loop(robot: Robot, teleop: Teleoperator, events: dict, fps: int):
+    """Reset period where human repositions environment."""
+    logger.info("[HIL] RESET")
+
+    events["in_reset"] = True
+    events["start_next_episode"] = False
+
+    obs = robot.get_observation()
+    robot_pos = {k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features}
+    teleop_smooth_move_to(teleop, robot_pos, duration_s=2.0, fps=50)
+
+    logger.info("Press any key to enable teleoperation")
+    while not events["start_next_episode"] and not events["stop_recording"]:
+        precise_sleep(0.05)
+
+    if events["stop_recording"]:
+        return
+
+    events["start_next_episode"] = False
+    teleop_disable_torque(teleop)
+    logger.info("Teleop enabled - press any key to start episode")
+
+    while not events["start_next_episode"] and not events["stop_recording"]:
+        loop_start = time.perf_counter()
+        action = teleop.get_action()
+        robot.send_action(action)
+        precise_sleep(1 / fps - (time.perf_counter() - loop_start))
+
+    events["in_reset"] = False
+    events["start_next_episode"] = False
+    events["exit_early"] = False
+    events["policy_paused"] = False
+    events["correction_active"] = False
+    events["resume_policy"] = False
+
+
+def print_controls(rtc: bool = False):
+    """Print control instructions."""
+    mode = "Human-in-the-Loop Data Collection" + (" (RTC)" if rtc else "")
+    logger.info(
+        "%s\n  Controls:\n"
+        "    SPACE  - Pause policy\n"
+        "    c      - Take control\n"
+        "    p      - Resume policy after pause/correction\n"
+        "    →      - End episode\n"
+        "    ESC    - Stop and push to hub",
+        mode,
+    )
@@ -14,21 +14,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.datasets import LeRobotDataset
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import make_default_processors
 from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import hw_to_dataset_features
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 2
 FPS = 30
@@ -39,9 +35,6 @@ HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")

@@ -90,67 +83,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
            log_say(f"Running inference, recording eval episode {recorded_episodes} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_observation_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot
-                robot_action_to_send = robot_action_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=teleop_action_processor,
+                robot_action_processor=robot_action_processor,
+                robot_observation_processor=robot_observation_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=teleop_action_processor,
+                    robot_action_processor=robot_action_processor,
+                    robot_observation_processor=robot_observation_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -45,6 +45,9 @@ def main():
    leader_arm = SO100Leader(leader_arm_config)
    keyboard = KeyboardTeleop(keyboard_config)

+    # TODO(Steven): Update this example to use pipelines
+    teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
+
    # Configure the dataset features
    action_features = hw_to_dataset_features(robot.action_features, ACTION)
    obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
@@ -74,10 +77,6 @@ def main():
        if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
            raise ValueError("Robot or teleop is not connected!")

-        teleop_action_processor, robot_action_processor, robot_observation_processor = (
-            make_default_processors()
-        )
-
        print("Starting record loop...")
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
@@ -88,14 +87,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
-                teleop_action_processor=teleop_action_processor,
-                robot_action_processor=robot_action_processor,
-                robot_observation_processor=robot_observation_processor,
                dataset=dataset,
                teleop=[leader_arm, keyboard],
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=teleop_action_processor,
+                robot_action_processor=robot_action_processor,
+                robot_observation_processor=robot_observation_processor,
            )

            # Reset the environment if not stopping or re-recording
@@ -107,13 +106,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=teleop_action_processor,
-                    robot_action_processor=robot_action_processor,
-                    robot_observation_processor=robot_observation_processor,
                    teleop=[leader_arm, keyboard],
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=teleop_action_processor,
+                    robot_action_processor=robot_action_processor,
+                    robot_observation_processor=robot_observation_processor,
                )

            if events["rerecord_episode"]:
@@ -1,77 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained policy on LeKiwi without recording (base rollout).
-
-Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
-no dataset) with :class:`SyncInferenceConfig` (inline policy call per
-control tick).  For a CLI entry point with the same capabilities plus
-recording, upload, and human-in-the-loop variants, see ``lerobot-rollout``.
-"""
-
-from lerobot.configs import PreTrainedConfig
-from lerobot.robots.lekiwi import LeKiwiClientConfig
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    # Robot: LeKiwi client — make sure lekiwi_host is already running on the robot.
-    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
-
-    # Policy: load the pretrained config.  ``pretrained_path`` is read downstream
-    # by ``build_rollout_context`` to reload the full model.
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    # Assemble the rollout config: base strategy (no recording) + sync inference.
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    # Graceful Ctrl-C: the strategy loop exits when shutdown_event is set.
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    # Build the context (connects robot, loads policy, wires the inference strategy).
-    # No custom processors here — LeKiwi runs on raw joint features.
-    ctx = build_rollout_context(cfg, signal_handler.shutdown_event)
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -1,342 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# 🤗 LeRobot Quickstart\n",
-    "\n",
-    "Calibration → teleoperation → data collection → training → evaluation.\n",
-    "\n",
-    "Install the required dependencies: `pip install -e .[notebook,dataset,training,viz,hardware]`.\n",
-    "\n",
-    "**How to use:**\n",
-    "1. Edit the **Configuration** cell with your settings.\n",
-    "2. Run all cells (`Run All`).\n",
-    "3. Each section prints a ready-to-paste terminal command - copy it and run it.\n",
-    "\n",
-    "Each setup is different, please refer to the [LeRobot documentation](https://huggingface.co/docs/lerobot/il_robots) for more details on each step and available options. <br>\n",
-    "Feel free to make this notebook your own and adapt it to your needs!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## Utils"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def _cameras_arg(cameras: dict) -> str:\n",
-    "    if not cameras:\n",
-    "        return \"\"\n",
-    "    entries = [f\"{n}: {{{', '.join(f'{k}: {v}' for k, v in cfg.items())}}}\" for n, cfg in cameras.items()]\n",
-    "    return \"{ \" + \", \".join(entries) + \" }\"\n",
-    "\n",
-    "\n",
-    "def print_cmd(*parts: str) -> None:\n",
-    "    \"\"\"Print a shell command with line continuations, skipping empty parts.\"\"\"\n",
-    "    non_empty = [p for p in parts if p]\n",
-    "    print(\" \\\\\\n    \".join(non_empty))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## Configuration\n",
-    "\n",
-    "Edit this cell, then **Run All** to generate all commands below."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Robot (follower) - run `lerobot-find-port` to discover the port\n",
-    "ROBOT_TYPE = \"so101_follower\"\n",
-    "ROBOT_PORT = \"/dev/ttyACM0\"\n",
-    "ROBOT_ID = \"my_follower_arm\"\n",
-    "\n",
-    "# Teleop (leader) - run `lerobot-find-port` to discover the port\n",
-    "TELEOP_TYPE = \"so101_leader\"\n",
-    "TELEOP_PORT = \"/dev/ttyACM1\"\n",
-    "TELEOP_ID = \"my_leader_arm\"\n",
-    "\n",
-    "# Cameras - set to {} to disable\n",
-    "# Run `lerobot-find-cameras opencv` to list available cameras and their indices\n",
-    "CAMERAS = {\n",
-    "    \"top\": {\"type\": \"opencv\", \"index_or_path\": 2, \"width\": 640, \"height\": 480, \"fps\": 30},\n",
-    "    \"wrist\": {\"type\": \"opencv\", \"index_or_path\": 4, \"width\": 640, \"height\": 480, \"fps\": 30},\n",
-    "}\n",
-    "\n",
-    "# Dataset\n",
-    "HF_USER = \"your_hf_username\"  # `huggingface-cli whoami` to find your username\n",
-    "DATASET_NAME = \"my_so101_dataset\"\n",
-    "TASK_DESCRIPTION = \"pick and place the block\"\n",
-    "NUM_EPISODES = 10\n",
-    "\n",
-    "# Training\n",
-    "POLICY_TYPE = \"act\"  # act, diffusion, smolvla, ...\n",
-    "POLICY_DEVICE = \"cuda\"  # cuda / cpu / mps\n",
-    "TRAIN_STEPS = 10_000\n",
-    "SAVE_FREQ = 2_000\n",
-    "OUTPUT_DIR = f\"outputs/train/{DATASET_NAME}\"\n",
-    "\n",
-    "# Inference - Hub repo ID or local checkpoint path\n",
-    "# e.g. set to f\"{OUTPUT_DIR}/checkpoints/last\" to use a local checkpoint\n",
-    "POLICY_PATH = f\"{HF_USER}/{DATASET_NAME}_{POLICY_TYPE}\"\n",
-    "LAST_CHECKPOINT_PATH = f\"{OUTPUT_DIR}/checkpoints/last\"\n",
-    "\n",
-    "# Derived\n",
-    "DATASET_REPO_ID = f\"{HF_USER}/{DATASET_NAME}\"\n",
-    "DATASET_ROOT = f\"data/{DATASET_NAME}\"\n",
-    "POLICY_REPO_ID = f\"{HF_USER}/{DATASET_NAME}_{POLICY_TYPE}\"\n",
-    "EVAL_REPO_ID = f\"{HF_USER}/eval_{DATASET_NAME}\"\n",
-    "CAMERAS_ARG = _cameras_arg(CAMERAS)\n",
-    "CAMERAS_FLAG = f'--robot.cameras=\"{CAMERAS_ARG}\"' if CAMERAS_ARG else \"\"\n",
-    "\n",
-    "print(f\"Robot  : {ROBOT_TYPE} @ {ROBOT_PORT}\")\n",
-    "print(f\"Teleop : {TELEOP_TYPE} @ {TELEOP_PORT}\")\n",
-    "print(f\"Cameras: {list(CAMERAS) or 'none'}\")\n",
-    "print(f\"Dataset: {DATASET_REPO_ID} ({NUM_EPISODES} episodes) saved to {DATASET_ROOT}\")\n",
-    "print(f\"Policy : {POLICY_TYPE} -> {POLICY_REPO_ID}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## 1. Calibration\n",
-    "\n",
-    "Run once per arm before first use."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Follower\n",
-    "print_cmd(\n",
-    "    \"lerobot-calibrate\",\n",
-    "    f\"--robot.type={ROBOT_TYPE}\",\n",
-    "    f\"--robot.port={ROBOT_PORT}\",\n",
-    "    f\"--robot.id={ROBOT_ID}\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Leader\n",
-    "print_cmd(\n",
-    "    \"lerobot-calibrate\",\n",
-    "    f\"--teleop.type={TELEOP_TYPE}\",\n",
-    "    f\"--teleop.port={TELEOP_PORT}\",\n",
-    "    f\"--teleop.id={TELEOP_ID}\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## 2. Teleoperation\n",
-    "\n",
-    "See the [teleoperation docs](https://huggingface.co/docs/lerobot/il_robots#teleoperate) and the [cameras guide](https://huggingface.co/docs/lerobot/cameras) for more options."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print_cmd(\n",
-    "    \"lerobot-teleoperate\",\n",
-    "    f\"--robot.type={ROBOT_TYPE}\",\n",
-    "    f\"--robot.port={ROBOT_PORT}\",\n",
-    "    f\"--robot.id={ROBOT_ID}\",\n",
-    "    CAMERAS_FLAG,\n",
-    "    f\"--teleop.type={TELEOP_TYPE}\",\n",
-    "    f\"--teleop.port={TELEOP_PORT}\",\n",
-    "    f\"--teleop.id={TELEOP_ID}\",\n",
-    "    \"--display_data=true\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## 3. Record Dataset\n",
-    "\n",
-    "See the [recording docs](https://huggingface.co/docs/lerobot/il_robots#record-a-dataset) for tips on gathering good data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print_cmd(\n",
-    "    \"lerobot-record\",\n",
-    "    f\"--robot.type={ROBOT_TYPE}\",\n",
-    "    f\"--robot.port={ROBOT_PORT}\",\n",
-    "    f\"--robot.id={ROBOT_ID}\",\n",
-    "    CAMERAS_FLAG,\n",
-    "    f\"--teleop.type={TELEOP_TYPE}\",\n",
-    "    f\"--teleop.port={TELEOP_PORT}\",\n",
-    "    f\"--teleop.id={TELEOP_ID}\",\n",
-    "    f\"--dataset.repo_id={DATASET_REPO_ID}\",\n",
-    "    f\"--dataset.num_episodes={NUM_EPISODES}\",\n",
-    "    f'--dataset.single_task=\"{TASK_DESCRIPTION}\"',\n",
-    "    \"--dataset.streaming_encoding=true\",\n",
-    "    \"--display_data=true\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Resume a previously interrupted recording session\n",
-    "print_cmd(\n",
-    "    \"lerobot-record\",\n",
-    "    f\"--robot.type={ROBOT_TYPE}\",\n",
-    "    f\"--robot.port={ROBOT_PORT}\",\n",
-    "    f\"--robot.id={ROBOT_ID}\",\n",
-    "    CAMERAS_FLAG,\n",
-    "    f\"--teleop.type={TELEOP_TYPE}\",\n",
-    "    f\"--teleop.port={TELEOP_PORT}\",\n",
-    "    f\"--teleop.id={TELEOP_ID}\",\n",
-    "    f\"--dataset.repo_id={DATASET_REPO_ID}\",\n",
-    "    f\"--dataset.root={DATASET_ROOT}\",\n",
-    "    f\"--dataset.num_episodes={NUM_EPISODES}\",\n",
-    "    f'--dataset.single_task=\"{TASK_DESCRIPTION}\"',\n",
-    "    \"--dataset.streaming_encoding=true\",\n",
-    "    \"--display_data=true\",\n",
-    "    \"--resume=true\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## 4. Train Policy\n",
-    "\n",
-    "See the [training docs](https://huggingface.co/docs/lerobot/il_robots#train-a-policy) for configuration options and tips."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print_cmd(\n",
-    "    \"lerobot-train\",\n",
-    "    f\"--dataset.repo_id={DATASET_REPO_ID}\",\n",
-    "    f\"--policy.type={POLICY_TYPE}\",\n",
-    "    f\"--policy.device={POLICY_DEVICE}\",\n",
-    "    f\"--policy.repo_id={POLICY_REPO_ID}\",\n",
-    "    f\"--output_dir={OUTPUT_DIR}\",\n",
-    "    f\"--steps={TRAIN_STEPS}\",\n",
-    "    f\"--save_freq={SAVE_FREQ}\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Resume a previously interrupted training session\n",
-    "print_cmd(\n",
-    "    \"lerobot-train\",\n",
-    "    f\"--config_path={LAST_CHECKPOINT_PATH}/pretrained_model/train_config.json\",\n",
-    "    \"--resume=true\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "## 5. Inference\n",
-    "\n",
-    "Uses `POLICY_PATH` from the Configuration cell (defaults to the Hub repo ID). You can also put there the `LAST_CHECKPOINT_PATH`.\n",
-    "\n",
-    "See the [inference docs](https://huggingface.co/docs/lerobot/il_robots#run-inference-and-evaluate-your-policy) for details."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print_cmd(\n",
-    "    \"lerobot-record\",\n",
-    "    f\"--policy.path={POLICY_PATH}\",\n",
-    "    f\"--robot.type={ROBOT_TYPE}\",\n",
-    "    f\"--robot.port={ROBOT_PORT}\",\n",
-    "    f\"--robot.id={ROBOT_ID}\",\n",
-    "    CAMERAS_FLAG,\n",
-    "    f\"--teleop.type={TELEOP_TYPE}\",\n",
-    "    f\"--teleop.port={TELEOP_PORT}\",\n",
-    "    f\"--teleop.id={TELEOP_ID}\",\n",
-    "    f\"--dataset.repo_id={EVAL_REPO_ID}\",\n",
-    "    f\"--dataset.num_episodes={NUM_EPISODES}\",\n",
-    "    f'--dataset.single_task=\"{TASK_DESCRIPTION}\"',\n",
-    "    \"--dataset.streaming_encoding=true\",\n",
-    ")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "lerobot (3.12.3)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.12.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
@@ -14,17 +14,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -38,12 +34,11 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import combine_feature_dicts
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 5
 FPS = 30
@@ -54,9 +49,6 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -151,67 +143,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_joints_to_ee_pose_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot (EE -> joints via IK)
-                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=make_default_teleop_action_processor(),
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=make_default_teleop_action_processor(),
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -222,6 +190,7 @@ def main():

            # Save episode
            dataset.save_episode()
+            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -65,15 +65,14 @@ def main():
    robot = SO100Follower(robot_config)
    phone = Phone(teleop_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(robot.bus.motors.keys()),
    )

-    # Build pipeline to convert phone action to EE action (with gripper velocity mapped to joint).
+    # Build pipeline to convert phone action to EE action
    phone_to_robot_ee_pose_processor = RobotProcessorPipeline[
        tuple[RobotAction, RobotObservation], RobotAction
    ](
@@ -95,7 +94,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to joints action (IK).
+    # Build pipeline to convert EE action to joints action
    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            InverseKinematicsEEToJoints(
@@ -108,7 +107,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert joint observation to EE observation (FK).
+    # Build pipeline to convert joint observation to EE observation
    robot_joints_to_ee_pose = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -119,12 +118,13 @@ def main():
        to_output=transition_to_observation,
    )

-    # Create the dataset, deriving features from the pipelines so the on-disk schema
-    # matches exactly what the pipelines produce at runtime.
+    # Create the dataset
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
+            # Run the feature contract of the pipelines
+            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=phone_to_robot_ee_pose_processor,
                initial_features=create_initial_features(action=phone.action_features),
@@ -163,14 +163,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
-                teleop_action_processor=phone_to_robot_ee_pose_processor,
-                robot_action_processor=robot_ee_to_joints_processor,
-                robot_observation_processor=robot_joints_to_ee_pose,
                teleop=phone,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=phone_to_robot_ee_pose_processor,
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose,
            )

            # Reset the environment if not stopping or re-recording
@@ -182,13 +182,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=phone_to_robot_ee_pose_processor,
-                    robot_action_processor=robot_ee_to_joints_processor,
-                    robot_observation_processor=robot_joints_to_ee_pose,
                    teleop=phone,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=phone_to_robot_ee_pose_processor,
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose,
                )

            if events["rerecord_episode"]:
@@ -1,126 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained EE-space policy on SO100 (phone-trained) without recording.
-
-Mirrors ``examples/so100_to_so100_EE/rollout.py`` — the model was trained
-with phone teleoperation in EE space, so at deployment we only need the
-joint↔EE conversion on the robot side; the phone is not used.
-
-Uses :class:`BaseStrategy` (no recording) + :class:`SyncInferenceConfig`
-(inline policy call).  For recording during rollout, switch to Sentry,
-Highlight, or DAgger via ``lerobot-rollout --strategy.type=...``.
-"""
-
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.configs import PreTrainedConfig
-from lerobot.model.kinematics import RobotKinematics
-from lerobot.processor import (
-    RobotProcessorPipeline,
-    observation_to_transition,
-    robot_action_observation_to_transition,
-    transition_to_observation,
-    transition_to_robot_action,
-)
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.robots.so_follower.robot_kinematic_processor import (
-    ForwardKinematicsJointsToEE,
-    InverseKinematicsEEToJoints,
-)
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-    robot_config = SO100FollowerConfig(
-        port="/dev/tty.usbmodem58760434471",
-        id="my_awesome_follower_arm",
-        cameras=camera_config,
-        use_degrees=True,
-    )
-
-    # Peek at motor names once to build the kinematic solver.
-    temp_robot = SO100Follower(robot_config)
-    motor_names = list(temp_robot.bus.motors.keys())
-
-    kinematics_solver = RobotKinematics(
-        urdf_path="./SO101/so101_new_calib.urdf",
-        target_frame_name="gripper_frame_link",
-        joint_names=motor_names,
-    )
-
-    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
-        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
-        to_transition=observation_to_transition,
-        to_output=transition_to_observation,
-    )
-
-    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
-            InverseKinematicsEEToJoints(
-                kinematics=kinematics_solver,
-                motor_names=motor_names,
-                initial_guess_current_joints=True,
-            ),
-        ],
-        to_transition=robot_action_observation_to_transition,
-        to_output=transition_to_robot_action,
-    )
-
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    ctx = build_rollout_context(
-        cfg,
-        signal_handler.shutdown_event,
-        robot_action_processor=robot_ee_to_joints_processor,
-        robot_observation_processor=robot_joints_to_ee_pose_processor,
-    )
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -0,0 +1,673 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Demo script showing how to use Real-Time Chunking (RTC) with action chunking policies on real robots.
+
+This script demonstrates:
+1. Creating a robot and policy (SmolVLA, Pi0, etc.) with RTC
+2. Consuming actions from the policy while the robot executes
+3. Periodically requesting new action chunks in the background using threads
+4. Managing action buffers and timing for real-time operation
+
+For simulation environments, see eval_with_simulation.py
+
+Usage:
+    # Run RTC with Real robot with RTC
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/smolvla_check_rtc_last3 \
+        --policy.device=mps \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with Real robot without RTC
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/smolvla_check_rtc_last3 \
+        --policy.device=mps \
+        --rtc.enabled=false \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with Real robot with pi0.5 policy
+    uv run examples/rtc/eval_with_real_robot.py \
+        --policy.path=<USER>/pi05_check_rtc \
+        --policy.device=mps \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --robot.type=so100_follower \
+        --robot.port=/dev/tty.usbmodem58FA0834591 \
+        --robot.id=so100_follower \
+        --robot.cameras="{ gripper: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}}" \
+        --task="Move green small object into the purple platform" \
+        --duration=120
+
+    # Run RTC with bi_openarm_follower (dual-arm OpenArms) and pi0.5 policy
+    python examples/rtc/eval_with_real_robot.py \
+        --policy.path=lerobot-data-collection/folding_final \
+        --robot.type=bi_openarm_follower \
+        --robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}}' \
+        --robot.left_arm_config.port=can0 \
+        --robot.left_arm_config.side=left \
+        --robot.left_arm_config.can_interface=socketcan \
+        --robot.left_arm_config.disable_torque_on_disconnect=true \
+        --robot.left_arm_config.max_relative_target=8.0 \
+        --robot.right_arm_config.port=can1 \
+        --robot.right_arm_config.side=right \
+        --robot.right_arm_config.can_interface=socketcan \
+        --robot.right_arm_config.disable_torque_on_disconnect=true \
+        --robot.right_arm_config.max_relative_target=8.0 \
+        --task="Fold the T-shirt properly" \
+        --fps=30 \
+        --duration=2000 \
+        --interpolation_multiplier=3 \
+        --rtc.enabled=true \
+        --rtc.execution_horizon=20 \
+        --rtc.max_guidance_weight=5.0 \
+        --rtc.prefix_attention_schedule=LINEAR \
+        --device=cuda
+"""
+
+import logging
+import math
+import sys
+import time
+import traceback
+from dataclasses import dataclass, field
+from threading import Event, Lock, Thread
+
+import torch
+from torch import Tensor
+
+from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
+from lerobot.cameras.realsense import RealSenseCameraConfig  # noqa: F401
+from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
+from lerobot.configs import PreTrainedConfig, RTCAttentionSchedule, parser
+from lerobot.policies import get_policy_class, make_pre_post_processors
+from lerobot.policies.rtc import ActionInterpolator, ActionQueue, LatencyTracker, RTCConfig
+from lerobot.processor import (
+    NormalizerProcessorStep,
+    RelativeActionsProcessorStep,
+    TransitionKey,
+    create_transition,
+    make_default_robot_action_processor,
+    make_default_robot_observation_processor,
+    to_relative_actions,
+)
+from lerobot.rl.process import ProcessSignalHandler
+from lerobot.robots import (  # noqa: F401
+    Robot,
+    RobotConfig,
+    bi_openarm_follower,
+    bi_so_follower,
+    koch_follower,
+    so_follower,
+    unitree_g1,
+)
+from lerobot.robots.utils import make_robot_from_config
+from lerobot.utils.constants import OBS_IMAGES, OBS_STATE
+from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
+from lerobot.utils.hub import HubMixin
+from lerobot.utils.utils import init_logging
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class RobotWrapper:
+    def __init__(self, robot: Robot):
+        self.robot = robot
+        self.lock = Lock()
+
+    def get_observation(self) -> dict[str, Tensor]:
+        with self.lock:
+            return self.robot.get_observation()
+
+    def send_action(self, action: Tensor):
+        with self.lock:
+            self.robot.send_action(action)
+
+    def observation_features(self) -> list[str]:
+        with self.lock:
+            return self.robot.observation_features
+
+    def action_features(self) -> list[str]:
+        with self.lock:
+            return self.robot.action_features
+
+
+@dataclass
+class RTCDemoConfig(HubMixin):
+    """Configuration for RTC demo with action chunking policies and real robots."""
+
+    # Policy configuration
+    policy: PreTrainedConfig | None = None
+
+    # Robot configuration
+    robot: RobotConfig | None = None
+
+    # RTC configuration
+    rtc: RTCConfig = field(
+        default_factory=lambda: RTCConfig(
+            execution_horizon=10,
+            max_guidance_weight=1.0,
+            prefix_attention_schedule=RTCAttentionSchedule.EXP,
+        )
+    )
+
+    # Demo parameters
+    duration: float = 30.0  # Duration to run the demo (seconds)
+    fps: float = 10.0  # Action execution frequency (Hz)
+    interpolation_multiplier: int = 1  # Control rate multiplier (1=off, 2=2x, 3=3x)
+
+    # Compute device
+    device: str | None = None  # Device to run on (cuda, cpu, auto)
+
+    # Get new actions horizon. The amount of executed steps after which will be requested new actions.
+    # It should be higher than inference delay + execution horizon.
+    action_queue_size_to_get_new_actions: int = 30
+
+    # Task to execute
+    task: str = field(default="", metadata={"help": "Task to execute"})
+
+    # Torch compile configuration
+    use_torch_compile: bool = field(
+        default=False,
+        metadata={"help": "Use torch.compile for faster inference (PyTorch 2.0+)"},
+    )
+
+    torch_compile_backend: str = field(
+        default="inductor",
+        metadata={"help": "Backend for torch.compile (inductor, aot_eager, cudagraphs)"},
+    )
+
+    torch_compile_mode: str = field(
+        default="default",
+        metadata={"help": "Compilation mode (default, reduce-overhead, max-autotune)"},
+    )
+
+    torch_compile_disable_cudagraphs: bool = field(
+        default=True,
+        metadata={
+            "help": "Disable CUDA graphs in torch.compile. Required due to in-place tensor "
+            "operations in denoising loop (x_t += dt * v_t) which cause tensor aliasing issues."
+        },
+    )
+
+    def __post_init__(self):
+        # HACK: We parse again the cli args here to get the pretrained path if there was one.
+        policy_path = parser.get_path_arg("policy")
+        if policy_path:
+            cli_overrides = parser.get_cli_overrides("policy")
+            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
+            self.policy.pretrained_path = policy_path
+        else:
+            raise ValueError("Policy path is required")
+
+        # Validate that robot configuration is provided
+        if self.robot is None:
+            raise ValueError("Robot configuration must be provided")
+
+    @classmethod
+    def __get_path_fields__(cls) -> list[str]:
+        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
+        return ["policy"]
+
+
+def is_image_key(k: str) -> bool:
+    return k.startswith(OBS_IMAGES)
+
+
+def _reanchor_relative_rtc_prefix(
+    prev_actions_absolute: Tensor,
+    current_state: Tensor,
+    relative_step: RelativeActionsProcessorStep,
+    normalizer_step: NormalizerProcessorStep | None,
+    policy_device: torch.device | str,
+) -> Tensor:
+    """Convert absolute leftovers into model-space for relative-action RTC policies.
+
+    When a policy uses relative actions, the RTC prefix (leftover actions from
+    the previous chunk) is stored in absolute space. Before feeding it back to
+    the policy we need to re-express it relative to the *current* robot state
+    and then re-normalize.
+    """
+    state = current_state.detach().cpu()
+    if state.dim() == 1:
+        state = state.unsqueeze(0)
+
+    action_cpu = prev_actions_absolute.detach().cpu()
+    mask = relative_step._build_mask(action_cpu.shape[-1])
+    relative_actions = to_relative_actions(action_cpu, state, mask)
+
+    transition = create_transition(action=relative_actions)
+    if normalizer_step is not None:
+        transition = normalizer_step(transition)
+
+    return transition[TransitionKey.ACTION].to(policy_device)
+
+
+def get_actions(
+    policy,
+    robot: RobotWrapper,
+    robot_observation_processor,
+    action_queue: ActionQueue,
+    shutdown_event: Event,
+    cfg: RTCDemoConfig,
+):
+    """Thread function to request action chunks from the policy.
+
+    Args:
+        policy: The policy instance (SmolVLA, Pi0, etc.)
+        robot: The robot instance for getting observations
+        robot_observation_processor: Processor for raw robot observations
+        action_queue: Queue to put new action chunks
+        shutdown_event: Event to signal shutdown
+        cfg: Demo configuration
+    """
+    try:
+        logger.info("[GET_ACTIONS] Starting get actions thread")
+
+        latency_tracker = LatencyTracker()  # Track latency of action chunks
+        fps = cfg.fps
+        time_per_chunk = 1.0 / fps
+
+        # Only keep .pos joints + camera streams if the policy was trained on positions,
+        # not the full pos/vel/torque state the robot exposes.
+        observation_features_hw = {
+            key: value
+            for key, value in robot.observation_features().items()
+            if key.endswith(".pos") or isinstance(value, tuple)
+        }
+
+        dataset_features = hw_to_dataset_features(observation_features_hw, "observation")
+        policy_device = policy.config.device
+
+        # Load preprocessor and postprocessor from pretrained files
+        # The stats are embedded in the processor .safetensors files
+        logger.info(f"[GET_ACTIONS] Loading preprocessor/postprocessor from {cfg.policy.pretrained_path}")
+
+        preprocessor, postprocessor = make_pre_post_processors(
+            policy_cfg=cfg.policy,
+            pretrained_path=cfg.policy.pretrained_path,
+            dataset_stats=None,  # Will load from pretrained processor files
+            preprocessor_overrides={
+                "device_processor": {"device": cfg.policy.device},
+            },
+        )
+
+        logger.info("[GET_ACTIONS] Preprocessor/postprocessor loaded successfully with embedded stats")
+
+        relative_step = next(
+            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
+            None,
+        )
+        normalizer_step = next(
+            (s for s in preprocessor.steps if isinstance(s, NormalizerProcessorStep)),
+            None,
+        )
+        if relative_step is not None:
+            if relative_step.action_names is None:
+                cfg_names = getattr(cfg.policy, "action_feature_names", None)
+                if cfg_names:
+                    relative_step.action_names = list(cfg_names)
+                else:
+                    relative_step.action_names = [
+                        k for k in robot.robot.action_features if k.endswith(".pos")
+                    ]
+            logger.info("[GET_ACTIONS] Relative actions enabled: will re-anchor RTC prefix")
+
+        get_actions_threshold = cfg.action_queue_size_to_get_new_actions
+
+        if not cfg.rtc.enabled:
+            get_actions_threshold = 0
+
+        while not shutdown_event.is_set():
+            if action_queue.qsize() <= get_actions_threshold:
+                current_time = time.perf_counter()
+                action_index_before_inference = action_queue.get_action_index()
+                prev_actions = action_queue.get_left_over()
+
+                inference_latency = latency_tracker.max()
+                inference_delay = math.ceil(inference_latency / time_per_chunk)
+
+                obs = robot.get_observation()
+
+                # Apply robot observation processor
+                obs_processed = robot_observation_processor(obs)
+
+                obs_with_policy_features = build_dataset_frame(
+                    dataset_features, obs_processed, prefix="observation"
+                )
+
+                for name in obs_with_policy_features:
+                    obs_with_policy_features[name] = torch.from_numpy(obs_with_policy_features[name])
+                    if "image" in name:
+                        obs_with_policy_features[name] = (
+                            obs_with_policy_features[name].type(torch.float32) / 255
+                        )
+                        obs_with_policy_features[name] = (
+                            obs_with_policy_features[name].permute(2, 0, 1).contiguous()
+                        )
+                    obs_with_policy_features[name] = obs_with_policy_features[name].unsqueeze(0)
+                    obs_with_policy_features[name] = obs_with_policy_features[name].to(policy_device)
+
+                obs_with_policy_features["task"] = [cfg.task]  # Task should be a list, not a string!
+                obs_with_policy_features["robot_type"] = (
+                    robot.robot.name if hasattr(robot.robot, "name") else ""
+                )
+
+                preproceseded_obs = preprocessor(obs_with_policy_features)
+
+                # Re-anchor leftover actions for relative-action policies.
+                # We need the *postprocessed* (absolute) leftover, not the original
+                # (normalized/relative) one that get_left_over() returns.
+                if (
+                    prev_actions is not None
+                    and relative_step is not None
+                    and OBS_STATE in obs_with_policy_features
+                ):
+                    with action_queue.lock:
+                        if action_queue.queue is not None:
+                            prev_actions_abs = action_queue.queue[action_queue.last_index :].clone()
+                        else:
+                            prev_actions_abs = None
+                    if prev_actions_abs is not None and prev_actions_abs.numel() > 0:
+                        prev_actions = _reanchor_relative_rtc_prefix(
+                            prev_actions_absolute=prev_actions_abs,
+                            current_state=obs_with_policy_features[OBS_STATE],
+                            relative_step=relative_step,
+                            normalizer_step=normalizer_step,
+                            policy_device=policy_device,
+                        )
+
+                # Generate actions WITH RTC
+                actions = policy.predict_action_chunk(
+                    preproceseded_obs,
+                    inference_delay=inference_delay,
+                    prev_chunk_left_over=prev_actions,
+                )
+
+                # Store original actions (before postprocessing) for RTC
+                original_actions = actions.squeeze(0).clone()
+
+                postprocessed_actions = postprocessor(actions)
+
+                postprocessed_actions = postprocessed_actions.squeeze(0)
+
+                new_latency = time.perf_counter() - current_time
+                new_delay = math.ceil(new_latency / time_per_chunk)
+                latency_tracker.add(new_latency)
+
+                if cfg.action_queue_size_to_get_new_actions < cfg.rtc.execution_horizon + new_delay:
+                    logger.warning(
+                        "[GET_ACTIONS] cfg.action_queue_size_to_get_new_actions Too small, It should be higher than inference delay + execution horizon."
+                    )
+
+                action_queue.merge(
+                    original_actions, postprocessed_actions, new_delay, action_index_before_inference
+                )
+            else:
+                # Small sleep to prevent busy waiting
+                time.sleep(0.1)
+
+        logger.info("[GET_ACTIONS] get actions thread shutting down")
+    except Exception as e:
+        logger.error(f"[GET_ACTIONS] Fatal exception in get_actions thread: {e}")
+        logger.error(traceback.format_exc())
+        sys.exit(1)
+
+
+def actor_control(
+    robot: RobotWrapper,
+    robot_action_processor,
+    action_queue: ActionQueue,
+    shutdown_event: Event,
+    cfg: RTCDemoConfig,
+):
+    """Thread function to execute actions on the robot.
+
+    Args:
+        robot: The robot instance
+        action_queue: Queue to get actions from
+        shutdown_event: Event to signal shutdown
+        cfg: Demo configuration
+    """
+    try:
+        logger.info("[ACTOR] Starting actor thread")
+
+        action_keys = [k for k in robot.action_features() if k.endswith(".pos")]
+
+        action_count = 0
+        interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
+        action_interval = interpolator.get_control_interval(cfg.fps)
+
+        while not shutdown_event.is_set():
+            start_time = time.perf_counter()
+
+            if interpolator.needs_new_action():
+                new_action = action_queue.get()
+                if new_action is not None:
+                    interpolator.add(new_action.cpu())
+
+            action = interpolator.get()
+            if action is not None:
+                action = action.cpu()
+                action_dict = {key: action[i].item() for i, key in enumerate(action_keys)}
+                action_processed = robot_action_processor((action_dict, None))
+                robot.send_action(action_processed)
+                action_count += 1
+
+            dt_s = time.perf_counter() - start_time
+            time.sleep(max(0, (action_interval - dt_s) - 0.001))
+
+        logger.info(f"[ACTOR] Actor thread shutting down. Total actions executed: {action_count}")
+    except Exception as e:
+        logger.error(f"[ACTOR] Fatal exception in actor_control thread: {e}")
+        logger.error(traceback.format_exc())
+        sys.exit(1)
+
+
+def _apply_torch_compile(policy, cfg: RTCDemoConfig):
+    """Apply torch.compile to the policy's predict_action_chunk method.
+
+    Args:
+        policy: Policy instance to compile
+        cfg: Configuration containing torch compile settings
+
+    Returns:
+        Policy with compiled predict_action_chunk method
+    """
+
+    # PI models handle their own compilation
+    if policy.type == "pi05" or policy.type == "pi0":
+        return policy
+
+    try:
+        # Check if torch.compile is available (PyTorch 2.0+)
+        if not hasattr(torch, "compile"):
+            logger.warning(
+                f"torch.compile is not available. Requires PyTorch 2.0+. "
+                f"Current version: {torch.__version__}. Skipping compilation."
+            )
+            return policy
+
+        logger.info("Applying torch.compile to predict_action_chunk...")
+        logger.info(f"  Backend: {cfg.torch_compile_backend}")
+        logger.info(f"  Mode: {cfg.torch_compile_mode}")
+        logger.info(f"  Disable CUDA graphs: {cfg.torch_compile_disable_cudagraphs}")
+
+        # Compile the predict_action_chunk method
+        # - CUDA graphs disabled to prevent tensor aliasing from in-place ops (x_t += dt * v_t)
+        compile_kwargs = {
+            "backend": cfg.torch_compile_backend,
+            "mode": cfg.torch_compile_mode,
+        }
+
+        # Disable CUDA graphs if requested (prevents tensor aliasing issues)
+        if cfg.torch_compile_disable_cudagraphs:
+            compile_kwargs["options"] = {"triton.cudagraphs": False}
+
+        original_method = policy.predict_action_chunk
+        compiled_method = torch.compile(original_method, **compile_kwargs)
+        policy.predict_action_chunk = compiled_method
+        logger.info("✓ Successfully compiled predict_action_chunk")
+
+    except Exception as e:
+        logger.error(f"Failed to apply torch.compile: {e}")
+        logger.warning("Continuing without torch.compile")
+
+    return policy
+
+
+@parser.wrap()
+def demo_cli(cfg: RTCDemoConfig):
+    """Main entry point for RTC demo with draccus configuration."""
+
+    # Initialize logging
+    init_logging()
+
+    logger.info(f"Using device: {cfg.device}")
+
+    # Setup signal handler for graceful shutdown
+    signal_handler = ProcessSignalHandler(use_threads=True, display_pid=False)
+    shutdown_event = signal_handler.shutdown_event
+
+    policy = None
+    robot = None
+    get_actions_thread = None
+    actor_thread = None
+
+    policy_class = get_policy_class(cfg.policy.type)
+
+    # Load config and set compile_model for pi0/pi05 models
+    config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
+
+    if cfg.policy.type == "pi05" or cfg.policy.type == "pi0":
+        config.compile_model = cfg.use_torch_compile
+
+    if config.use_peft:
+        from peft import PeftConfig, PeftModel
+
+        peft_pretrained_path = cfg.policy.pretrained_path
+        peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
+
+        policy = policy_class.from_pretrained(
+            pretrained_name_or_path=peft_config.base_model_name_or_path, config=config
+        )
+        policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
+    else:
+        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
+
+    # Turn on RTC
+    policy.config.rtc_config = cfg.rtc
+
+    # Init RTC processort, as by default if RTC disabled in the config
+    # The processor won't be created
+    policy.init_rtc_processor()
+
+    assert policy.name in ["smolvla", "pi05", "pi0"], "Only smolvla, pi05, and pi0 are supported for RTC"
+
+    policy = policy.to(cfg.device)
+    policy.eval()
+
+    # Apply torch.compile to predict_action_chunk method if enabled
+    if cfg.use_torch_compile:
+        policy = _apply_torch_compile(policy, cfg)
+
+    # Create robot
+    logger.info(f"Initializing robot: {cfg.robot.type}")
+    robot = make_robot_from_config(cfg.robot)
+    robot.connect()
+    robot_wrapper = RobotWrapper(robot)
+
+    # Create robot observation processor
+    robot_observation_processor = make_default_robot_observation_processor()
+    robot_action_processor = make_default_robot_action_processor()
+
+    # Create action queue for communication between threads
+    action_queue = ActionQueue(cfg.rtc)
+
+    # Start chunk requester thread
+    get_actions_thread = Thread(
+        target=get_actions,
+        args=(policy, robot_wrapper, robot_observation_processor, action_queue, shutdown_event, cfg),
+        daemon=True,
+        name="GetActions",
+    )
+    get_actions_thread.start()
+    logger.info("Started get actions thread")
+
+    # Start action executor thread
+    actor_thread = Thread(
+        target=actor_control,
+        args=(robot_wrapper, robot_action_processor, action_queue, shutdown_event, cfg),
+        daemon=True,
+        name="Actor",
+    )
+    actor_thread.start()
+    logger.info("Started actor thread")
+
+    logger.info("Started stop by duration thread")
+
+    # Main thread monitors for duration or shutdown
+    logger.info(f"Running demo for {cfg.duration} seconds...")
+    start_time = time.time()
+
+    while not shutdown_event.is_set() and (time.time() - start_time) < cfg.duration:
+        time.sleep(10)
+
+        # Log queue status periodically
+        if int(time.time() - start_time) % 5 == 0:
+            logger.info(f"[MAIN] Action queue size: {action_queue.qsize()}")
+
+        if time.time() - start_time > cfg.duration:
+            break
+
+    logger.info("Demo duration reached or shutdown requested")
+
+    # Signal shutdown
+    shutdown_event.set()
+
+    # Wait for threads to finish
+    if get_actions_thread and get_actions_thread.is_alive():
+        logger.info("Waiting for chunk requester thread to finish...")
+        get_actions_thread.join()
+
+    if actor_thread and actor_thread.is_alive():
+        logger.info("Waiting for action executor thread to finish...")
+        actor_thread.join()
+
+    # Cleanup robot
+    if robot:
+        robot.disconnect()
+        logger.info("Robot disconnected")
+
+    logger.info("Cleanup completed")
+
+
+if __name__ == "__main__":
+    demo_cli()
+    logging.info("RTC demo finished")
@@ -14,17 +14,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import logging
-import time
-
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener, predict_action
+from lerobot.common.control_utils import init_keyboard_listener
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
-from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -38,12 +34,11 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
+from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
-from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.feature_utils import combine_feature_dicts
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+from lerobot.utils.visualization_utils import init_rerun

 NUM_EPISODES = 5
 FPS = 30
@@ -54,9 +49,6 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
-    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
-    # This script provides a self-contained example for educational purposes.
-
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -151,67 +143,43 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
-        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Inline evaluation loop: predict actions and send to robot
-            timestamp = 0
-            start_episode_t = time.perf_counter()
-            while timestamp < EPISODE_TIME_SEC:
-                start_loop_t = time.perf_counter()
-
-                if events["exit_early"]:
-                    events["exit_early"] = False
-                    break
-
-                # Get robot observation
-                obs = robot.get_observation()
-                obs_processed = robot_joints_to_ee_pose_processor(obs)
-                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
-
-                # Predict action using the policy
-                action_tensor = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=policy.config.device,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.device.type == "cuda",
-                    task=TASK_DESCRIPTION,
-                    robot_type=robot.name,
-                )
-
-                # Convert policy output to robot action dict
-                action_values = make_robot_action(action_tensor, dataset.features)
-
-                # Process and send action to robot (EE -> joints via IK)
-                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
-                robot.send_action(robot_action_to_send)
-
-                # Write to dataset
-                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
-                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
-                dataset.add_frame(frame)
-
-                log_rerun_data(observation=obs_processed, action=action_values)
-
-                dt_s = time.perf_counter() - start_loop_t
-                sleep_time_s = control_interval - dt_s
-                if sleep_time_s < 0:
-                    logging.warning(
-                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
-                    )
-                precise_sleep(max(sleep_time_s, 0.0))
-                timestamp = time.perf_counter() - start_episode_t
+            # Main record loop
+            record_loop(
+                robot=robot,
+                events=events,
+                fps=FPS,
+                policy=policy,
+                preprocessor=preprocessor,  # Pass the pre and post policy processors
+                postprocessor=postprocessor,
+                dataset=dataset,
+                control_time_s=EPISODE_TIME_SEC,
+                single_task=TASK_DESCRIPTION,
+                display_data=True,
+                teleop_action_processor=make_default_teleop_action_processor(),
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose_processor,
+            )

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                log_say("Waiting for environment reset, press right arrow key when ready...")
+                record_loop(
+                    robot=robot,
+                    events=events,
+                    fps=FPS,
+                    control_time_s=EPISODE_TIME_SEC,
+                    single_task=TASK_DESCRIPTION,
+                    display_data=True,
+                    teleop_action_processor=make_default_teleop_action_processor(),
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose_processor,
+                )

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -222,6 +190,7 @@ def main():

            # Save episode
            dataset.save_episode()
+            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -62,20 +62,21 @@ def main():
    follower = SO100Follower(follower_config)
    leader = SO100Leader(leader_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    follower_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(follower.bus.motors.keys()),
    )
+
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    leader_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(leader.bus.motors.keys()),
    )

-    # Build pipeline to convert follower joints to EE observation.
+    # Build pipeline to convert follower joints to EE observation
    follower_joints_to_ee = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -86,7 +87,7 @@ def main():
        to_output=transition_to_observation,
    )

-    # Build pipeline to convert leader joints to EE action.
+    # Build pipeline to convert leader joints to EE action
    leader_joints_to_ee = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -97,9 +98,9 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to follower joints (with safety bounds).
+    # Build pipeline to convert EE action to follower joints
    ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
+        [
            EEBoundsAndSafety(
                end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
                max_ee_step_m=0.10,
@@ -114,12 +115,13 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Create the dataset, deriving features from the pipelines so the on-disk schema
-    # matches exactly what the pipelines produce at runtime.
+    # Create the dataset
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
+            # Run the feature contract of the pipelines
+            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=leader_joints_to_ee,
                initial_features=create_initial_features(action=leader.action_features),
@@ -142,7 +144,7 @@ def main():

    # Initialize the keyboard listener and rerun visualization
    listener, events = init_keyboard_listener()
-    init_rerun(session_name="recording_so100_ee")
+    init_rerun(session_name="recording_phone")

    try:
        if not leader.is_connected or not follower.is_connected:
@@ -158,14 +160,14 @@ def main():
                robot=follower,
                events=events,
                fps=FPS,
-                teleop_action_processor=leader_joints_to_ee,
-                robot_action_processor=ee_to_follower_joints,
-                robot_observation_processor=follower_joints_to_ee,
                teleop=leader,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
+                teleop_action_processor=leader_joints_to_ee,
+                robot_action_processor=ee_to_follower_joints,
+                robot_observation_processor=follower_joints_to_ee,
            )

            # Reset the environment if not stopping or re-recording
@@ -177,13 +179,13 @@ def main():
                    robot=follower,
                    events=events,
                    fps=FPS,
-                    teleop_action_processor=leader_joints_to_ee,
-                    robot_action_processor=ee_to_follower_joints,
-                    robot_observation_processor=follower_joints_to_ee,
                    teleop=leader,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
+                    teleop_action_processor=leader_joints_to_ee,
+                    robot_action_processor=ee_to_follower_joints,
+                    robot_observation_processor=follower_joints_to_ee,
                )

            if events["rerecord_episode"]:
@@ -1,134 +0,0 @@
-# !/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Run a trained EE-space policy on SO100 without recording (base rollout).
-
-Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
-no dataset) with :class:`SyncInferenceConfig` (inline policy call per
-control tick).  The custom observation/action processors convert between
-joint space (robot hardware) and end-effector space (policy I/O) via
-forward/inverse kinematics.
-"""
-
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.configs import PreTrainedConfig
-from lerobot.model.kinematics import RobotKinematics
-from lerobot.processor import (
-    RobotProcessorPipeline,
-    observation_to_transition,
-    robot_action_observation_to_transition,
-    transition_to_observation,
-    transition_to_robot_action,
-)
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.robots.so_follower.robot_kinematic_processor import (
-    ForwardKinematicsJointsToEE,
-    InverseKinematicsEEToJoints,
-)
-from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
-from lerobot.rollout.inference import SyncInferenceConfig
-from lerobot.rollout.strategies import BaseStrategy
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.process import ProcessSignalHandler
-from lerobot.utils.utils import init_logging
-
-FPS = 30
-DURATION_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-
-
-def main():
-    init_logging()
-
-    # Robot configuration — the rollout engine will connect it inside build_rollout_context.
-    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-    robot_config = SO100FollowerConfig(
-        port="/dev/tty.usbmodem5A460814411",
-        id="my_awesome_follower_arm",
-        cameras=camera_config,
-        use_degrees=True,
-    )
-
-    # Kinematic solver: we need the motor-name list, so peek at the robot once.
-    # (The rollout engine owns the connected instance; we only use this for introspection.)
-    temp_robot = SO100Follower(robot_config)
-    motor_names = list(temp_robot.bus.motors.keys())
-
-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
-    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
-    kinematics_solver = RobotKinematics(
-        urdf_path="./SO101/so101_new_calib.urdf",
-        target_frame_name="gripper_frame_link",
-        joint_names=motor_names,
-    )
-
-    # Joint-space observation → EE-space observation (consumed by the policy).
-    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
-        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
-        to_transition=observation_to_transition,
-        to_output=transition_to_observation,
-    )
-
-    # EE-space action (produced by the policy) → joint-space action (sent to robot).
-    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[
-            InverseKinematicsEEToJoints(
-                kinematics=kinematics_solver,
-                motor_names=motor_names,
-                initial_guess_current_joints=True,
-            ),
-        ],
-        to_transition=robot_action_observation_to_transition,
-        to_output=transition_to_robot_action,
-    )
-
-    # Policy config (full model is loaded inside build_rollout_context).
-    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
-    policy_config.pretrained_path = HF_MODEL_ID
-
-    cfg = RolloutConfig(
-        robot=robot_config,
-        policy=policy_config,
-        strategy=BaseStrategyConfig(),
-        inference=SyncInferenceConfig(),
-        fps=FPS,
-        duration=DURATION_SEC,
-        task=TASK_DESCRIPTION,
-    )
-
-    signal_handler = ProcessSignalHandler(use_threads=True)
-
-    # Pass the EE kinematic processors via kwargs; the defaults (identity) would
-    # otherwise skip the joint↔EE conversion and the policy would receive the
-    # wrong observation/action space.
-    ctx = build_rollout_context(
-        cfg,
-        signal_handler.shutdown_event,
-        robot_action_processor=robot_ee_to_joints_processor,
-        robot_observation_processor=robot_joints_to_ee_pose_processor,
-    )
-
-    strategy = BaseStrategy(cfg.strategy)
-    try:
-        strategy.setup(ctx)
-        strategy.run(ctx)
-    finally:
-        strategy.teardown(ctx)
-
-
-if __name__ == "__main__":
-    main()
@@ -0,0 +1,170 @@
+# !/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""SO100 leader / follower teleop with HIL-SERL-style intervention toggle.
+
+This is a position-only standalone demo of the leader-arm intervention pattern
+used by the HIL-SERL training stack (see ``lerobot.processor.LeaderArmInterventionStep``
+and ``lerobot.teleoperators.so_leader.SOLeaderFollower``).
+
+Behaviour:
+    * **Following mode** (default): The follower is idle, the leader is
+      torque-enabled and haptically tracks the follower's pose. The user can
+      grab the leader at any time without fighting the position loop.
+    * **Intervention mode** (toggled by pressing SPACE): The leader's torque is
+      released, the user moves the leader freely and the follower mirrors the
+      leader's end-effector position via ``[delta_x, delta_y, delta_z]`` deltas,
+      identical to how the real HIL-SERL action pipeline records interventions.
+
+Keyboard:
+    * ``SPACE`` -- toggle intervention on/off.
+    * ``q``     -- exit the loop cleanly.
+"""
+
+from __future__ import annotations
+
+import time
+
+import numpy as np
+
+from lerobot.model.kinematics import RobotKinematics
+from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
+from lerobot.teleoperators.so_leader import SOLeaderFollower, SOLeaderTeleopConfig
+from lerobot.teleoperators.utils import TeleopEvents
+from lerobot.utils.robot_utils import precise_sleep
+
+FPS = 30
+
+# Per-axis EE-delta normalization (metres). Same convention as
+# `LeaderArmInterventionStep`: the normalised delta is `(p_leader - p_follower) / step`,
+# clipped to [-1, 1]. Keep these small so a single tick is a safe motion.
+EE_STEP_SIZES = {"x": 0.010, "y": 0.010, "z": 0.010}
+
+# Workspace bounds (metres) -- a tight box around the resting pose to keep the
+# follower from running into its joint limits during the demo.
+EE_BOUNDS = {"min": np.array([-0.20, -0.30, 0.02]), "max": np.array([0.30, 0.30, 0.40])}
+
+URDF_PATH = "./SO101/so101_new_calib.urdf"
+TARGET_FRAME = "gripper_frame_link"
+
+
+def _joints_dict_to_array(joints: dict[str, float], motor_names: list[str]) -> np.ndarray:
+    return np.array([joints[f"{m}.pos"] for m in motor_names], dtype=float)
+
+
+def _array_to_joints_dict(arr: np.ndarray, motor_names: list[str]) -> dict[str, float]:
+    return {f"{m}.pos": float(v) for m, v in zip(motor_names, arr, strict=True)}
+
+
+def main() -> None:
+    follower_config = SO100FollowerConfig(
+        port="/dev/tty.usbmodem5A460814411", id="my_follower_arm", use_degrees=True
+    )
+    leader_config = SOLeaderTeleopConfig(
+        port="/dev/tty.usbmodem5A460819811",
+        id="my_leader_arm",
+        use_degrees=True,
+        leader_follower_mode=True,
+        use_gripper=True,
+    )
+
+    follower = SO100Follower(follower_config)
+    leader = SOLeaderFollower(leader_config)
+
+    follower_motor_names = list(follower.bus.motors.keys())
+    leader_motor_names = list(leader.bus.motors.keys())
+
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
+    # https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    follower_kinematics = RobotKinematics(
+        urdf_path=URDF_PATH, target_frame_name=TARGET_FRAME, joint_names=follower_motor_names
+    )
+    leader_kinematics = RobotKinematics(
+        urdf_path=URDF_PATH, target_frame_name=TARGET_FRAME, joint_names=leader_motor_names
+    )
+
+    follower.connect()
+    leader.connect()
+
+    print("Starting leader-follower intervention demo...")
+    print("  - Press SPACE to toggle intervention.")
+    print("  - Press 'q' to exit.")
+
+    try:
+        while True:
+            t0 = time.perf_counter()
+
+            # 1. Read both arms.
+            follower_obs = follower.get_observation()
+            follower_joints_dict = {f"{m}.pos": float(follower_obs[f"{m}.pos"]) for m in follower_motor_names}
+            leader_joints_dict = leader.get_action()
+
+            # 2. Haptic follow: push follower joints back to the leader. The
+            # leader's `send_action` gates motor writes on its intervention
+            # state internally (torque on while following, off while intervening).
+            leader.send_action(follower_joints_dict)
+
+            # 3. Pull teleop events (SPACE toggle, 'q' terminate).
+            events = leader.get_teleop_events()
+            if events.get(TeleopEvents.TERMINATE_EPISODE):
+                print("Termination requested -- exiting.")
+                break
+
+            is_intervention = events.get(TeleopEvents.IS_INTERVENTION, False)
+
+            if is_intervention:
+                # 4a. Compute leader/follower EE poses, take the *normalised
+                # position-only delta*, and integrate it onto the follower's
+                # current EE pose to get a target. This mirrors the action
+                # space recorded by `LeaderArmInterventionStep` during HIL-SERL.
+                leader_arr = _joints_dict_to_array(leader_joints_dict, leader_motor_names)
+                follower_arr = _joints_dict_to_array(follower_joints_dict, follower_motor_names)
+
+                p_leader = leader_kinematics.forward_kinematics(leader_arr)[:3, 3]
+                p_follower_mat = follower_kinematics.forward_kinematics(follower_arr)
+                p_follower = p_follower_mat[:3, 3]
+
+                raw_delta = p_leader - p_follower
+                step_vec = np.array([EE_STEP_SIZES["x"], EE_STEP_SIZES["y"], EE_STEP_SIZES["z"]], dtype=float)
+                delta_norm = np.clip(raw_delta / step_vec, -1.0, 1.0)
+                delta_m = delta_norm * step_vec
+
+                target_pose = p_follower_mat.copy()
+                target_pose[:3, 3] = np.clip(p_follower + delta_m, EE_BOUNDS["min"], EE_BOUNDS["max"])
+
+                # IK -> joint-space goal for the follower's arm chain. The
+                # gripper joint is kept separate and driven from the leader's
+                # gripper position directly (no IK).
+                target_joints = follower_kinematics.inverse_kinematics(
+                    current_joint_pos=follower_arr,
+                    desired_ee_pose=target_pose,
+                    orientation_weight=0.0,
+                )
+                follower_action = _array_to_joints_dict(target_joints, follower_motor_names)
+                follower_action["gripper.pos"] = float(leader_joints_dict.get("gripper.pos", 50.0))
+                follower.send_action(follower_action)
+            # 4b. Following mode: leave the follower alone -- the leader just
+            # tracks it haptically. In real HIL-SERL training this is where the
+            # policy would step the follower forward.
+
+            precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
+    finally:
+        leader.disconnect()
+        follower.disconnect()
+
+
+if __name__ == "__main__":
+    main()
@@ -4,13 +4,13 @@ from pathlib import Path
 from queue import Empty, Full

 import torch
-import torch.optim as optim

 from lerobot.datasets import LeRobotDataset
 from lerobot.envs.configs import HILSerlProcessorConfig, HILSerlRobotEnvConfig
-from lerobot.policies import SACConfig
-from lerobot.policies.sac.modeling_sac import SACPolicy
-from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+from lerobot.policies import GaussianActorConfig
+from lerobot.policies.gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy
+from lerobot.policies.gaussian_actor.reward_model.modeling_classifier import Classifier
+from lerobot.rl.algorithms.sac import SACAlgorithm, SACAlgorithmConfig
 from lerobot.rl.buffer import ReplayBuffer
 from lerobot.rl.gym_manipulator import make_robot_env
 from lerobot.robots.so_follower import SO100FollowerConfig
@@ -28,7 +28,7 @@ def run_learner(
    transitions_queue: mp.Queue,
    parameters_queue: mp.Queue,
    shutdown_event: mp.Event,
-    policy_learner: SACPolicy,
+    policy_learner: GaussianActorPolicy,
    online_buffer: ReplayBuffer,
    offline_buffer: ReplayBuffer,
    lr: float = 3e-4,
@@ -40,8 +40,9 @@ def run_learner(
    policy_learner.train()
    policy_learner.to(device)

-    # Create Adam optimizer from scratch - simple and clean
-    optimizer = optim.Adam(policy_learner.parameters(), lr=lr)
+    algo_config = SACAlgorithmConfig.from_policy_config(policy_learner.config)
+    algorithm = SACAlgorithm(policy=policy_learner, config=algo_config)
+    algorithm.make_optimizers_and_scheduler()

    print(f"[LEARNER] Online buffer capacity: {online_buffer.capacity}")
    print(f"[LEARNER] Offline buffer capacity: {offline_buffer.capacity}")
@@ -83,24 +84,26 @@ def run_learner(
                else:
                    batch[key] = online_batch[key]

-            loss, _ = policy_learner.forward(batch)
+            def batch_iter(b=batch):
+                while True:
+                    yield b

-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
+            stats = algorithm.update(batch_iter())
            training_step += 1

            if training_step % LOG_EVERY == 0:
+                log_dict = stats.to_log_dict()
                print(
-                    f"[LEARNER] Training step {training_step}, Loss: {loss.item():.4f}, "
+                    f"[LEARNER] Training step {training_step}, "
+                    f"critic_loss: {log_dict.get('critic', 'N/A'):.4f}, "
                    f"Buffers: Online={len(online_buffer)}, Offline={len(offline_buffer)}"
                )

            # Send updated parameters to actor every 10 training steps
            if training_step % SEND_EVERY == 0:
                try:
-                    state_dict = {k: v.cpu() for k, v in policy_learner.state_dict().items()}
-                    parameters_queue.put_nowait(state_dict)
+                    weights = algorithm.get_weights()
+                    parameters_queue.put_nowait(weights)
                    print("[LEARNER] Sent updated parameters to actor")
                except Full:
                    # Missing write due to queue not being consumed (should happen rarely)
@@ -113,7 +116,7 @@ def run_actor(
    transitions_queue: mp.Queue,
    parameters_queue: mp.Queue,
    shutdown_event: mp.Event,
-    policy_actor: SACPolicy,
+    policy_actor: GaussianActorPolicy,
    reward_classifier: Classifier,
    env_cfg: HILSerlRobotEnvConfig,
    device: torch.device = "mps",
@@ -144,15 +147,15 @@ def run_actor(

            while step < MAX_STEPS_PER_EPISODE and not shutdown_event.is_set():
                try:
-                    new_params = parameters_queue.get_nowait()
-                    policy_actor.load_state_dict(new_params)
+                    new_weights = parameters_queue.get_nowait()
+                    policy_actor.load_state_dict(new_weights)
                    print("[ACTOR] Updated policy parameters from learner")
                except Empty:  # No new updated parameters available from learner, waiting
                    pass

-                # Get action from policy
+                # Get action from policy (returns full action: continuous + discrete)
                policy_obs = make_policy_obs(obs, device=device)
-                action_tensor = policy_actor.select_action(policy_obs)  # predicts a single action
+                action_tensor = policy_actor.select_action(policy_obs)
                action = action_tensor.squeeze(0).cpu().numpy()

                # Step environment
@@ -261,14 +264,14 @@ def main():
    action_features = hw_to_dataset_features(env.robot.action_features, "action")

    # Create SAC policy for action selection
-    policy_cfg = SACConfig(
+    policy_cfg = GaussianActorConfig(
        device=device,
        input_features=obs_features,
        output_features=action_features,
    )

-    policy_actor = SACPolicy(policy_cfg)
-    policy_learner = SACPolicy(policy_cfg)
+    policy_actor = GaussianActorPolicy(policy_cfg)
+    policy_learner = GaussianActorPolicy(policy_cfg)

    demonstrations_repo_id = "lerobot/example_hil_serl_dataset"
    offline_dataset = LeRobotDataset(repo_id=demonstrations_repo_id)
@@ -108,9 +108,9 @@ training = [
    "wandb>=0.24.0,<0.25.0",
 ]
 hardware = [
-    "lerobot[pynput-dep]",
-    "lerobot[pyserial-dep]",
-    "lerobot[deepdiff-dep]",
+    "pynput>=1.7.8,<1.9.0",
+    "pyserial>=3.5,<4.0",
+    "deepdiff>=7.0.1,<9.0.0",
 ]
 viz = [
    "rerun-sdk>=0.24.0,<0.27.0",
@@ -136,14 +136,10 @@ scipy-dep = ["scipy>=1.14.0,<2.0.0"]
 diffusers-dep = ["diffusers>=0.27.2,<0.36.0"]
 qwen-vl-utils-dep = ["qwen-vl-utils>=0.0.11,<0.1.0"]
 matplotlib-dep = ["matplotlib>=3.10.3,<4.0.0", "contourpy>=1.3.0,<2.0.0"] # NOTE: Explicitly listing contourpy helps the resolver converge faster.
-pyserial-dep = ["pyserial>=3.5,<4.0"]
-deepdiff-dep = ["deepdiff>=7.0.1,<9.0.0"]
-pynput-dep = ["pynput>=1.7.8,<1.9.0"]
-pyzmq-dep = ["pyzmq>=26.2.1,<28.0.0"]

 # Motors
-feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0", "lerobot[pyserial-dep]", "lerobot[deepdiff-dep]"]
-dynamixel = ["dynamixel-sdk>=3.7.31,<3.9.0", "lerobot[pyserial-dep]", "lerobot[deepdiff-dep]"]
+feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0"]
+dynamixel = ["dynamixel-sdk>=3.7.31,<3.9.0"]
 damiao = ["lerobot[can-dep]"]
 robstride = ["lerobot[can-dep]"]

@@ -151,11 +147,10 @@ robstride = ["lerobot[can-dep]"]
 openarms = ["lerobot[damiao]"]
 gamepad = ["lerobot[pygame-dep]", "hidapi>=0.14.0,<0.15.0"]
 hopejr = ["lerobot[feetech]", "lerobot[pygame-dep]"]
-lekiwi = ["lerobot[feetech]", "lerobot[pyzmq-dep]"]
+lekiwi = ["lerobot[feetech]", "pyzmq>=26.2.1,<28.0.0"]
 unitree_g1 = [
    # "unitree-sdk2==1.0.1",
-    "lerobot[pyzmq-dep]",
-    "lerobot[pyserial-dep]",
+    "pyzmq>=26.2.1,<28.0.0",
    "onnxruntime>=1.16.0,<2.0.0",
    "onnx>=1.16.0,<2.0.0",
    "meshcat>=0.3.0,<0.4.0",
@@ -201,8 +196,7 @@ async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"]
 peft = ["lerobot[transformers-dep]", "lerobot[peft-dep]"]

 # Development
-dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1", "mypy>=1.19.1", "ruff>=0.14.1", "lerobot[notebook]"]
-notebook = ["jupyter>=1.0.0,<2.0.0", "ipykernel>=6.0.0,<7.0.0"]
+dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1", "mypy>=1.19.1", "ruff>=0.14.1"]
 test = ["pytest>=8.1.0,<9.0.0", "pytest-timeout>=2.4.0,<3.0.0", "pytest-cov>=5.0.0,<8.0.0", "mock-serial>=0.0.1,<0.1.0 ; sys_platform != 'win32'"]
 video_benchmark = ["scikit-image>=0.23.2,<0.26.0", "pandas>=2.2.2,<2.4.0"]

@@ -275,7 +269,6 @@ lerobot-find-joint-limits="lerobot.scripts.lerobot_find_joint_limits:main"
 lerobot-imgtransform-viz="lerobot.scripts.lerobot_imgtransform_viz:main"
 lerobot-edit-dataset="lerobot.scripts.lerobot_edit_dataset:main"
 lerobot-setup-can="lerobot.scripts.lerobot_setup_can:main"
-lerobot-rollout="lerobot.scripts.lerobot_rollout:main"

 # ---------------- Tool Configurations ----------------
 [tool.setuptools.package-data]
@@ -33,7 +33,7 @@ import cv2  # type: ignore  # TODO: add type stubs for OpenCV
 import numpy as np  # type: ignore  # TODO: add type stubs for numpy

 from lerobot.utils.decorators import check_if_not_connected
-from lerobot.utils.import_utils import _reachy2_sdk_available, require_package
+from lerobot.utils.import_utils import _reachy2_sdk_available

 if TYPE_CHECKING or _reachy2_sdk_available:
    from reachy2_sdk.media.camera import CameraView
@@ -76,7 +76,6 @@ class Reachy2Camera(Camera):
        Args:
            config: The configuration settings for the camera.
        """
-        require_package("reachy2_sdk", extra="reachy2")
        super().__init__(config)

        self.config = config
@@ -19,18 +19,16 @@ Provides the RealSenseCamera class for capturing frames from Intel RealSense cam
 import logging
 import time
 from threading import Event, Lock, Thread
-from typing import TYPE_CHECKING, Any
+from typing import Any

 import cv2  # type: ignore  # TODO: add type stubs for OpenCV
 import numpy as np  # type: ignore  # TODO: add type stubs for numpy
 from numpy.typing import NDArray  # type: ignore  # TODO: add type stubs for numpy.typing

-from lerobot.utils.import_utils import _pyrealsense2_available, require_package
-
-if TYPE_CHECKING or _pyrealsense2_available:
-    import pyrealsense2 as rs
-else:
-    rs = None
+try:
+    import pyrealsense2 as rs  # type: ignore  # TODO: add type stubs for pyrealsense2
+except Exception as e:
+    logging.info(f"Could not import realsense: {e}")

 from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
 from lerobot.utils.errors import DeviceNotConnectedError
@@ -114,7 +112,7 @@ class RealSenseCamera(Camera):
        Args:
            config: The configuration settings for the camera.
        """
-        require_package("pyrealsense2", extra="intelrealsense")
+
        super().__init__(config)

        self.config = config
@@ -28,19 +28,12 @@ import json
 import logging
 import time
 from threading import Event, Lock, Thread
-from typing import TYPE_CHECKING, Any
+from typing import Any

 import cv2
 import numpy as np
 from numpy.typing import NDArray

-from lerobot.utils.import_utils import _zmq_available, require_package
-
-if TYPE_CHECKING or _zmq_available:
-    import zmq
-else:
-    zmq = None
-
 from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
 from lerobot.utils.errors import DeviceNotConnectedError

@@ -81,8 +74,8 @@ class ZMQCamera(Camera):
    """

    def __init__(self, config: ZMQCameraConfig):
-        require_package("pyzmq", extra="pyzmq-dep", import_name="zmq")
        super().__init__(config)
+        import zmq

        self.config = config
        self.server_address = config.server_address
@@ -124,6 +117,8 @@ class ZMQCamera(Camera):
        logger.info(f"Connecting to {self}...")

        try:
+            import zmq
+
            self.context = zmq.Context()
            self.socket = self.context.socket(zmq.SUB)
            self.socket.setsockopt_string(zmq.SUBSCRIBE, "")
@@ -185,8 +180,11 @@ class ZMQCamera(Camera):

        try:
            message = self.socket.recv_string()
-        except zmq.Again as e:
-            raise TimeoutError(f"{self} timeout after {self.timeout_ms}ms") from e
+        except Exception as e:
+            # zmq is lazy-imported in connect(), so check by name to avoid a top-level import
+            if type(e).__name__ == "Again":
+                raise TimeoutError(f"{self} timeout after {self.timeout_ms}ms") from e
+            raise

        # Decode JSON message
        data = json.loads(message)
@@ -28,12 +28,6 @@ import numpy as np
 import torch

 from lerobot.policies import PreTrainedPolicy, prepare_observation_for_inference
-from lerobot.utils.import_utils import _deepdiff_available, require_package
-
-if TYPE_CHECKING or _deepdiff_available:
-    from deepdiff import DeepDiff
-else:
-    DeepDiff = None

 if TYPE_CHECKING:
    from lerobot.datasets import LeRobotDataset
@@ -223,7 +217,10 @@ def sanity_check_dataset_robot_compatibility(
    Raises:
        ValueError: If any of the checked metadata fields do not match.
    """
-    require_package("deepdiff", extra="deepdiff-dep")
+    from lerobot.utils.import_utils import require_package
+
+    require_package("deepdiff", extra="hardware")
+    from deepdiff import DeepDiff

    from lerobot.utils.constants import DEFAULT_FEATURES

@@ -99,6 +99,7 @@ def save_checkpoint(
        optimizer (Optimizer | None, optional): The optimizer to save the state from. Defaults to None.
        scheduler (LRScheduler | None, optional): The scheduler to save the state from. Defaults to None.
        preprocessor: The preprocessor/pipeline to save. Defaults to None.
+        postprocessor: The postprocessor/pipeline to save. Defaults to None.
    """
    pretrained_dir = checkpoint_dir / PRETRAINED_MODEL_DIR
    policy.save_pretrained(pretrained_dir)
@@ -21,7 +21,6 @@ are intentionally NOT re-exported here to avoid circular dependencies
 Import them directly: ``from lerobot.configs.train import TrainPipelineConfig``
 """

-from .dataset import DatasetRecordConfig
 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
 from .types import (
@@ -40,7 +39,6 @@ __all__ = [
    "PolicyFeature",
    "RTCAttentionSchedule",
    # Config classes
-    "DatasetRecordConfig",
    "DatasetConfig",
    "EvalConfig",
    "PeftConfig",
@@ -1,77 +0,0 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Shared dataset recording configuration used by both ``lerobot-record`` and ``lerobot-rollout``."""
-
-from dataclasses import dataclass, field
-from datetime import datetime
-from pathlib import Path
-
-
-@dataclass
-class DatasetRecordConfig:
-    # Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test`).
-    repo_id: str = ""
-    # A short but accurate description of the task performed during the recording (e.g. "Pick the Lego block and drop it in the box on the right.")
-    single_task: str = ""
-    # Root directory where the dataset will be stored (e.g. 'dataset/path'). If None, defaults to $HF_LEROBOT_HOME/repo_id.
-    root: str | Path | None = None
-    # Limit the frames per second.
-    fps: int = 30
-    # Number of seconds for data recording for each episode.
-    episode_time_s: int | float = 60
-    # Number of seconds for resetting the environment after each episode.
-    reset_time_s: int | float = 60
-    # Number of episodes to record.
-    num_episodes: int = 50
-    # Encode frames in the dataset into video
-    video: bool = True
-    # Upload dataset to Hugging Face hub.
-    push_to_hub: bool = True
-    # Upload on private repository on the Hugging Face hub.
-    private: bool = False
-    # Add tags to your dataset on the hub.
-    tags: list[str] | None = None
-    # Number of subprocesses handling the saving of frames as PNG. Set to 0 to use threads only;
-    # set to ≥1 to use subprocesses, each using threads to write images. The best number of processes
-    # and threads depends on your system. We recommend 4 threads per camera with 0 processes.
-    # If fps is unstable, adjust the thread count. If still unstable, try using 1 or more subprocesses.
-    num_image_writer_processes: int = 0
-    # Number of threads writing the frames as png images on disk, per camera.
-    # Too many threads might cause unstable teleoperation fps due to main thread being blocked.
-    # Not enough threads might cause low camera fps.
-    num_image_writer_threads_per_camera: int = 4
-    # Number of episodes to record before batch encoding videos
-    # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
-    video_encoding_batch_size: int = 1
-    # Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto',
-    # or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'.
-    # Use 'auto' to auto-detect the best available hardware encoder.
-    vcodec: str = "libsvtav1"
-    # Enable streaming video encoding: encode frames in real-time during capture instead
-    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
-    streaming_encoding: bool = False
-    # Maximum number of frames to buffer per camera when using streaming encoding.
-    # ~1s buffer at 30fps. Provides backpressure if the encoder can't keep up.
-    encoder_queue_maxsize: int = 30
-    # Number of threads per encoder instance. None = auto (codec default).
-    # Lower values reduce CPU usage, maps to 'lp' (via svtav1-params) for libsvtav1 and 'threads' for h264/hevc..
-    encoder_threads: int | None = None
-    # Rename map for the observation to override the image and state keys
-    rename_map: dict[str, str] = field(default_factory=dict)
-
-    def __post_init__(self) -> None:
-        if self.repo_id:
-            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-            self.repo_id = f"{self.repo_id}_{timestamp}"
@@ -35,9 +35,6 @@ class DatasetConfig:
    revision: str | None = None
    use_imagenet_stats: bool = True
    video_backend: str = field(default_factory=get_safe_default_codec)
-    # When True, video frames are returned as uint8 tensors (0-255) instead of float32 (0.0-1.0).
-    # This reduces memory and speeds up DataLoader IPC. The training pipeline handles the conversion.
-    return_uint8: bool = False
    streaming: bool = False

    def __post_init__(self) -> None:
@@ -56,8 +56,6 @@ class TrainPipelineConfig(HubMixin):
    # Number of workers for the dataloader.
    num_workers: int = 4
    batch_size: int = 8
-    prefetch_factor: int = 4
-    persistent_workers: bool = True
    steps: int = 100_000
    eval_freq: int = 20_000
    log_freq: int = 200
@@ -209,10 +207,3 @@ class TrainPipelineConfig(HubMixin):
        cli_args = kwargs.pop("cli_args", [])
        with draccus.config_type("json"):
            return draccus.parse(cls, config_file, args=cli_args)
-
-
-@dataclass(kw_only=True)
-class TrainRLServerPipelineConfig(TrainPipelineConfig):
-    # NOTE: In RL, we don't need an offline dataset
-    # TODO: Make `TrainPipelineConfig.dataset` optional
-    dataset: DatasetConfig | None = None  # type: ignore[assignment] # because the parent class has made it's type non-optional
@@ -16,7 +16,6 @@
 """Private reader component for LeRobotDataset. Handles random-access reading (HF dataset, delta indices, video decoding)."""

 from collections.abc import Callable
-from concurrent.futures import ThreadPoolExecutor
 from pathlib import Path

 import datasets
@@ -50,7 +49,6 @@ class DatasetReader:
        video_backend: str,
        delta_timestamps: dict[str, list[float]] | None,
        image_transforms: Callable | None,
-        return_uint8: bool = False,
    ):
        """Initialize the reader with metadata, filtering, and transform config.

@@ -75,7 +73,6 @@ class DatasetReader:
        self._tolerance_s = tolerance_s
        self._video_backend = video_backend
        self._image_transforms = image_transforms
-        self._return_uint8 = return_uint8

        self.hf_dataset: datasets.Dataset | None = None
        self._absolute_to_relative_idx: dict[int, int] | None = None
@@ -108,8 +105,10 @@ class DatasetReader:
        """Build absolute-to-relative index mapping from loaded hf_dataset."""
        self._absolute_to_relative_idx = None
        if self.episodes is not None and self.hf_dataset is not None:
-            indices = self.hf_dataset.data.column("index").to_numpy()
-            self._absolute_to_relative_idx = dict(zip(indices.tolist(), range(len(indices)), strict=True))
+            self._absolute_to_relative_idx = {
+                abs_idx.item() if isinstance(abs_idx, torch.Tensor) else abs_idx: rel_idx
+                for rel_idx, abs_idx in enumerate(self.hf_dataset["index"])
+            }

    @property
    def num_frames(self) -> int:
@@ -236,30 +235,16 @@ class DatasetReader:
        Segmentation Fault.
        """
        ep = self._meta.episodes[ep_idx]
-
-        def _decode_single(vid_key: str, query_ts: list[float]) -> tuple[str, torch.Tensor]:
+        item = {}
+        for vid_key, query_ts in query_timestamps.items():
            from_timestamp = ep[f"videos/{vid_key}/from_timestamp"]
            shifted_query_ts = [from_timestamp + ts for ts in query_ts]
+
            video_path = self.root / self._meta.get_video_file_path(ep_idx, vid_key)
-            frames = decode_video_frames(
-                video_path,
-                shifted_query_ts,
-                self._tolerance_s,
-                self._video_backend,
-                return_uint8=self._return_uint8,
-            )
-            return vid_key, frames.squeeze(0)
+            frames = decode_video_frames(video_path, shifted_query_ts, self._tolerance_s, self._video_backend)
+            item[vid_key] = frames.squeeze(0)

-        items = list(query_timestamps.items())
-
-        # Single camera: no threading overhead
-        if len(items) <= 1:
-            return {vid_key: _decode_single(vid_key, query_ts)[1] for vid_key, query_ts in items}
-
-        # Multi-camera: decode in parallel (video decoding releases the GIL)
-        with ThreadPoolExecutor(max_workers=len(items)) as pool:
-            futures = [pool.submit(_decode_single, k, ts) for k, ts in items]
-            return dict(f.result() for f in futures)
+        return item

    def get_item(self, idx) -> dict:
        """Core __getitem__ logic. Assumes hf_dataset is loaded.
@@ -597,7 +597,7 @@ class DatasetWriter:

    def cleanup_interrupted_episode(self, episode_index: int) -> None:
        """Remove temporary image directories for an interrupted episode."""
-        for key in self._meta.camera_keys:
+        for key in self._meta.video_keys:
            img_dir = self._get_image_file_path(
                episode_index=episode_index, image_key=key, frame_index=0
            ).parent
@@ -92,7 +92,6 @@ def make_dataset(cfg: TrainPipelineConfig) -> LeRobotDataset | MultiLeRobotDatas
                image_transforms=image_transforms,
                revision=cfg.dataset.revision,
                video_backend=cfg.dataset.video_backend,
-                return_uint8=True,
                tolerance_s=cfg.tolerance_s,
            )
        else:
@@ -105,7 +104,6 @@ def make_dataset(cfg: TrainPipelineConfig) -> LeRobotDataset | MultiLeRobotDatas
                revision=cfg.dataset.revision,
                max_num_shards=cfg.num_workers,
                tolerance_s=cfg.tolerance_s,
-                return_uint8=True,
            )
    else:
        raise NotImplementedError("The MultiLeRobotDataset isn't supported for now.")
@@ -30,13 +30,13 @@ def safe_stop_image_writer(func):
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
-        except BaseException:
+        except Exception as e:
            dataset = kwargs.get("dataset")
            writer = getattr(dataset, "writer", None) if dataset else None
            if writer is not None and writer.image_writer is not None:
                logger.warning("Waiting for image writer to terminate...")
                writer.image_writer.stop()
-            raise
+            raise e

    return wrapper

@@ -56,7 +56,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
        force_cache_sync: bool = False,
        download_videos: bool = True,
        video_backend: str | None = None,
-        return_uint8: bool = False,
        batch_encoding_size: int = 1,
        vcodec: str = "libsvtav1",
        streaming_encoding: bool = False,
@@ -203,7 +202,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
        self.tolerance_s = tolerance_s
        self.revision = revision if revision else CODEBASE_VERSION
        self._video_backend = video_backend if video_backend else get_safe_default_codec()
-        self._return_uint8 = return_uint8
        self._batch_encoding_size = batch_encoding_size
        self._vcodec = resolve_vcodec(vcodec)
        self._encoder_threads = encoder_threads
@@ -227,7 +225,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
            video_backend=self._video_backend,
            delta_timestamps=delta_timestamps,
            image_transforms=image_transforms,
-            return_uint8=self._return_uint8,
        )

        # Load actual data
@@ -291,7 +288,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
                video_backend=self._video_backend,
                delta_timestamps=self.delta_timestamps,
                image_transforms=self.image_transforms,
-                return_uint8=self._return_uint8,
            )
        return self.reader

@@ -687,7 +683,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
        obj.delta_timestamps = None
        obj.episodes = None
        obj._video_backend = video_backend if video_backend is not None else get_safe_default_codec()
-        obj._return_uint8 = False
        obj._batch_encoding_size = batch_encoding_size
        obj._vcodec = vcodec
        obj._encoder_threads = encoder_threads
@@ -780,7 +775,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
        obj.delta_timestamps = None
        obj.episodes = None
        obj._video_backend = video_backend if video_backend else get_safe_default_codec()
-        obj._return_uint8 = False
        obj._batch_encoding_size = batch_encoding_size
        obj._vcodec = vcodec
        obj._encoder_threads = encoder_threads
@@ -251,7 +251,6 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
        seed: int = 42,
        rng: np.random.Generator | None = None,
        shuffle: bool = True,
-        return_uint8: bool = False,
    ):
        """Initialize a StreamingLeRobotDataset.

@@ -289,7 +288,6 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):

        self.streaming = streaming
        self.buffer_size = buffer_size
-        self._return_uint8 = return_uint8

        # We cache the video decoders to avoid re-initializing them at each frame (avoiding a ~10x slowdown)
        self.video_decoder_cache = None
@@ -555,11 +553,7 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
            root = self.meta.url_root if self.streaming and not self.streaming_from_local else self.root
            video_path = f"{root}/{self.meta.get_video_file_path(ep_idx, video_key)}"
            frames = decode_video_frames_torchcodec(
-                video_path,
-                query_ts,
-                self.tolerance_s,
-                decoder_cache=self.video_decoder_cache,
-                return_uint8=self._return_uint8,
+                video_path, query_ts, self.tolerance_s, decoder_cache=self.video_decoder_cache
            )

            item[video_key] = frames.squeeze(0) if len(query_ts) == 1 else frames
@@ -71,8 +71,8 @@ class ForwardCompatibilityError(CompatibilityError):


 DEFAULT_CHUNK_SIZE = 1000  # Max number of files per chunk
-DEFAULT_DATA_FILE_SIZE_IN_MB = 50  # Max size per file
-DEFAULT_VIDEO_FILE_SIZE_IN_MB = 100  # Max size per file
+DEFAULT_DATA_FILE_SIZE_IN_MB = 100  # Max size per file
+DEFAULT_VIDEO_FILE_SIZE_IN_MB = 200  # Max size per file

 INFO_PATH = "meta/info.json"
 STATS_PATH = "meta/stats.json"
@@ -123,7 +123,6 @@ def decode_video_frames(
    timestamps: list[float],
    tolerance_s: float,
    backend: str | None = None,
-    return_uint8: bool = False,
 ) -> torch.Tensor:
    """
    Decodes video frames using the specified backend.
@@ -132,23 +131,19 @@ def decode_video_frames(
        video_path (Path): Path to the video file.
        timestamps (list[float]): List of timestamps to extract frames.
        tolerance_s (float): Allowed deviation in seconds for frame retrieval.
-        backend (str, optional): Backend to use for decoding. Defaults to "torchcodec" when available in the platform; otherwise, defaults to "pyav".
-        return_uint8 (bool): If True, return raw uint8 frames without float32 normalization.
-            This reduces memory for DataLoader IPC; normalization can be done on GPU afterward.
+        backend (str, optional): Backend to use for decoding. Defaults to "torchcodec" when available in the platform; otherwise, defaults to "pyav"..

    Returns:
-        torch.Tensor: Decoded frames (float32 in [0,1] by default, or uint8 if return_uint8=True).
+        torch.Tensor: Decoded frames.

    Currently supports torchcodec on cpu and pyav.
    """
    if backend is None:
        backend = get_safe_default_codec()
    if backend == "torchcodec":
-        return decode_video_frames_torchcodec(video_path, timestamps, tolerance_s, return_uint8=return_uint8)
+        return decode_video_frames_torchcodec(video_path, timestamps, tolerance_s)
    elif backend in ["pyav", "video_reader"]:
-        return decode_video_frames_torchvision(
-            video_path, timestamps, tolerance_s, backend, return_uint8=return_uint8
-        )
+        return decode_video_frames_torchvision(video_path, timestamps, tolerance_s, backend)
    else:
        raise ValueError(f"Unsupported video backend: {backend}")

@@ -159,7 +154,6 @@ def decode_video_frames_torchvision(
    tolerance_s: float,
    backend: str = "pyav",
    log_loaded_timestamps: bool = False,
-    return_uint8: bool = False,
 ) -> torch.Tensor:
    """Loads frames associated to the requested timestamps of a video

@@ -246,17 +240,14 @@ def decode_video_frames_torchvision(
    if log_loaded_timestamps:
        logger.info(f"{closest_ts=}")

+    # convert to the pytorch format which is float32 in [0,1] range (and channel first)
+    closest_frames = closest_frames.type(torch.float32) / 255
+
    if len(timestamps) != len(closest_frames):
        raise FrameTimestampError(
            f"Number of retrieved frames ({len(closest_frames)}) does not match "
            f"number of queried timestamps ({len(timestamps)})"
        )
-
-    if return_uint8:
-        return closest_frames
-
-    # convert to the pytorch format which is float32 in [0,1] range (and channel first)
-    closest_frames = closest_frames.type(torch.float32) / 255
    return closest_frames


@@ -315,7 +306,6 @@ def decode_video_frames_torchcodec(
    tolerance_s: float,
    log_loaded_timestamps: bool = False,
    decoder_cache: VideoDecoderCache | None = None,
-    return_uint8: bool = False,
 ) -> torch.Tensor:
    """Loads frames associated with the requested timestamps of a video using torchcodec.

@@ -383,16 +373,14 @@ def decode_video_frames_torchcodec(
    if log_loaded_timestamps:
        logger.info(f"{closest_ts=}")

+    # convert to float32 in [0,1] range
+    closest_frames = (closest_frames / 255.0).type(torch.float32)
+
    if not len(timestamps) == len(closest_frames):
        raise FrameTimestampError(
            f"Retrieved timestamps differ from queried {set(closest_frames) - set(timestamps)}"
        )

-    if return_uint8:
-        return closest_frames
-
-    # convert to float32 in [0,1] range
-    closest_frames = (closest_frames / 255.0).type(torch.float32)
    return closest_frames


@@ -12,19 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from __future__ import annotations
-
-from typing import TYPE_CHECKING
-
 import numpy as np

-from lerobot.utils.import_utils import _placo_available, require_package
-
-if TYPE_CHECKING or _placo_available:
-    import placo  # type: ignore[import-not-found]
-else:
-    placo = None
-

 class RobotKinematics:
    """Robot kinematics using placo library for forward and inverse kinematics."""
@@ -43,7 +32,13 @@ class RobotKinematics:
            target_frame_name (str): Name of the end-effector frame in the URDF
            joint_names (list[str] | None): List of joint names to use for the kinematics solver
        """
-        require_package("placo", extra="placo-dep")
+        try:
+            import placo  # type: ignore[import-not-found] # C++ library with Python bindings, no type stubs available. TODO: Create stub file or request upstream typing support.
+        except ImportError as e:
+            raise ImportError(
+                "placo is required for RobotKinematics. "
+                "Please install the optional dependencies of `kinematics` in the package."
+            ) from e

        self.robot = placo.RobotWrapper(urdf_path)
        self.solver = placo.KinematicsSolver(self.robot)
@@ -24,7 +24,7 @@ from functools import cached_property
 from typing import TYPE_CHECKING, Any, TypedDict

 from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-from lerobot.utils.import_utils import _can_available, require_package
+from lerobot.utils.import_utils import _can_available

 if TYPE_CHECKING or _can_available:
    import can
@@ -111,7 +111,6 @@ class DamiaoMotorsBus(MotorsBusBase):
            bitrate: Nominal bitrate in bps (default: 1000000 = 1 Mbps)
            data_bitrate: Data bitrate for CAN FD in bps (default: 5000000 = 5 Mbps), ignored if use_can_fd is False
        """
-        require_package("python-can", extra="damiao", import_name="can")
        super().__init__(port, motors, calibration)
        self.port = port
        self.can_interface = can_interface
@@ -356,8 +356,8 @@ class SerialMotorsBus(MotorsBusBase):
        motors: dict[str, Motor],
        calibration: dict[str, MotorCalibration] | None = None,
    ):
-        require_package("pyserial", extra="pyserial-dep", import_name="serial")
-        require_package("deepdiff", extra="deepdiff-dep")
+        require_package("pyserial", extra="hardware", import_name="serial")
+        require_package("deepdiff", extra="hardware")
        super().__init__(port, motors, calibration)

        self.port_handler: PortHandler
@@ -23,12 +23,12 @@ from types import SimpleNamespace
 from typing import TYPE_CHECKING, Any, TypedDict

 from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-from lerobot.utils.import_utils import _can_available, require_package
+from lerobot.utils.import_utils import _can_available

 if TYPE_CHECKING or _can_available:
    import can
 else:
-    can = SimpleNamespace(Message=object, interface=None, BusABC=object)
+    can = SimpleNamespace(Message=object, interface=None)
 import numpy as np

 from lerobot.utils.errors import DeviceNotConnectedError
@@ -106,7 +106,6 @@ class RobstrideMotorsBus(MotorsBusBase):
            bitrate: Nominal bitrate in bps (default: 1000000 = 1 Mbps)
            data_bitrate: Data bitrate for CAN FD in bps (default: 5000000 = 5 Mbps), ignored if use_can_fd is False
        """
-        require_package("python-can", extra="robstride", import_name="can")
        super().__init__(port, motors, calibration)
        self.port = port
        self.can_interface = can_interface
@@ -18,21 +18,14 @@ import logging
 import math
 from dataclasses import asdict, dataclass
 from pathlib import Path
-from typing import TYPE_CHECKING

 import draccus
 from torch.optim import Optimizer
 from torch.optim.lr_scheduler import LambdaLR, LRScheduler

 from lerobot.utils.constants import SCHEDULER_STATE
-from lerobot.utils.import_utils import _diffusers_available, require_package
 from lerobot.utils.io_utils import deserialize_json_into_object, write_json

-if TYPE_CHECKING or _diffusers_available:
-    from diffusers.optimization import get_scheduler
-else:
-    get_scheduler = None
-

@dataclass
 class LRSchedulerConfig(draccus.ChoiceRegistry, abc.ABC):
@@ -54,7 +47,10 @@ class DiffuserSchedulerConfig(LRSchedulerConfig):
    num_warmup_steps: int | None = None

    def build(self, optimizer: Optimizer, num_training_steps: int) -> LambdaLR:
+        from lerobot.utils.import_utils import require_package
+
        require_package("diffusers", extra="diffusion")
+        from diffusers.optimization import get_scheduler

        kwargs = {**asdict(self), "num_training_steps": num_training_steps, "optimizer": optimizer}
        return get_scheduler(**kwargs)
@@ -12,19 +12,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from lerobot.utils.action_interpolator import ActionInterpolator as ActionInterpolator
-
 from .act.configuration_act import ACTConfig as ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig as DiffusionConfig
 from .factory import get_policy_class, make_policy, make_policy_config, make_pre_post_processors
+from .gaussian_actor.configuration_gaussian_actor import GaussianActorConfig as GaussianActorConfig
+from .gaussian_actor.reward_model.configuration_classifier import (
+    RewardClassifierConfig as RewardClassifierConfig,
+)
 from .groot.configuration_groot import GrootConfig as GrootConfig
 from .multi_task_dit.configuration_multi_task_dit import MultiTaskDiTConfig as MultiTaskDiTConfig
 from .pi0.configuration_pi0 import PI0Config as PI0Config
 from .pi0_fast.configuration_pi0_fast import PI0FastConfig as PI0FastConfig
 from .pi05.configuration_pi05 import PI05Config as PI05Config
 from .pretrained import PreTrainedPolicy as PreTrainedPolicy
-from .sac.configuration_sac import SACConfig as SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
+from .rtc import ActionInterpolator as ActionInterpolator
 from .sarm.configuration_sarm import SARMConfig as SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig as TDMPCConfig
@@ -33,21 +34,21 @@ from .vqbet.configuration_vqbet import VQBeTConfig as VQBeTConfig
 from .wall_x.configuration_wall_x import WallXConfig as WallXConfig
 from .xvla.configuration_xvla import XVLAConfig as XVLAConfig

-# NOTE: Policy modeling classes (e.g., SACPolicy) are intentionally NOT re-exported here.
+# NOTE: Policy modeling classes (e.g., GaussianActorPolicy) are intentionally NOT re-exported here.
 # They have heavy optional dependencies and are loaded lazily via get_policy_class().
-# Import directly: ``from lerobot.policies.sac.modeling_sac import SACPolicy``
+# Import directly: ``from lerobot.policies.gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy``

 __all__ = [
    # Configuration classes
    "ACTConfig",
    "DiffusionConfig",
+    "GaussianActorConfig",
    "GrootConfig",
    "MultiTaskDiTConfig",
    "PI0Config",
    "PI0FastConfig",
    "PI05Config",
    "RewardClassifierConfig",
-    "SACConfig",
    "SARMConfig",
    "SmolVLAConfig",
    "TDMPCConfig",
@@ -23,7 +23,6 @@ TODO(alexander-soare):
 import math
 from collections import deque
 from collections.abc import Callable
-from typing import TYPE_CHECKING

 import einops
 import numpy as np
@@ -33,14 +32,6 @@ import torchvision
 from torch import Tensor, nn

 from lerobot.utils.constants import ACTION, OBS_ENV_STATE, OBS_IMAGES, OBS_STATE
-from lerobot.utils.import_utils import _diffusers_available, require_package
-
-if TYPE_CHECKING or _diffusers_available:
-    from diffusers.schedulers.scheduling_ddim import DDIMScheduler
-    from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
-else:
-    DDIMScheduler = None
-    DDPMScheduler = None

 from ..pretrained import PreTrainedPolicy
 from ..utils import (
@@ -73,7 +64,6 @@ class DiffusionPolicy(PreTrainedPolicy):
            dataset_stats: Dataset statistics to be used for normalization. If not passed here, it is expected
                that they will be passed with a call to `load_state_dict` before the policy is used.
        """
-        require_package("diffusers", extra="diffusion")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -165,7 +155,11 @@ def _make_noise_scheduler(name: str, **kwargs: dict):
    Factory for noise scheduler instances of the requested type. All kwargs are passed
    to the scheduler.
    """
+    from lerobot.utils.import_utils import require_package
+
    require_package("diffusers", extra="diffusion")
+    from diffusers.schedulers.scheduling_ddim import DDIMScheduler
+    from diffusers.schedulers.scheduling_ddpm import DDPMScheduler

    if name == "DDPM":
        return DDPMScheduler(**kwargs)
@@ -46,13 +46,13 @@ from lerobot.utils.feature_utils import dataset_to_policy_features

 from .act.configuration_act import ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig
+from .gaussian_actor.configuration_gaussian_actor import GaussianActorConfig
+from .gaussian_actor.reward_model.configuration_classifier import RewardClassifierConfig
 from .groot.configuration_groot import GrootConfig
 from .multi_task_dit.configuration_multi_task_dit import MultiTaskDiTConfig
 from .pi0.configuration_pi0 import PI0Config
 from .pi05.configuration_pi05 import PI05Config
 from .pretrained import PreTrainedPolicy
-from .sac.configuration_sac import SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig
 from .sarm.configuration_sarm import SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig
@@ -89,7 +89,7 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:

    Args:
        name: The name of the policy. Supported names are "tdmpc", "diffusion", "act",
-            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "reward_classifier", "smolvla", "wall_x".
+            "multi_task_dit", "vqbet", "pi0", "pi05", "gaussian_actor", "reward_classifier", "smolvla", "wall_x".
    Returns:
        The policy class corresponding to the given name.

@@ -128,12 +128,12 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:
        from .pi05.modeling_pi05 import PI05Policy

        return PI05Policy
-    elif name == "sac":
-        from .sac.modeling_sac import SACPolicy
+    elif name == "gaussian_actor":
+        from .gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy

-        return SACPolicy
+        return GaussianActorPolicy
    elif name == "reward_classifier":
-        from .sac.reward_model.modeling_classifier import Classifier
+        from .gaussian_actor.reward_model.modeling_classifier import Classifier

        return Classifier
    elif name == "smolvla":
@@ -172,7 +172,7 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:

    Args:
        policy_type: The type of the policy. Supported types include "tdmpc",
-                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "sac",
+                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "gaussian_actor",
                     "smolvla", "reward_classifier", "wall_x".
        **kwargs: Keyword arguments to be passed to the configuration class constructor.

@@ -196,8 +196,8 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
        return PI0Config(**kwargs)
    elif policy_type == "pi05":
        return PI05Config(**kwargs)
-    elif policy_type == "sac":
-        return SACConfig(**kwargs)
+    elif policy_type == "gaussian_actor":
+        return GaussianActorConfig(**kwargs)
    elif policy_type == "smolvla":
        return SmolVLAConfig(**kwargs)
    elif policy_type == "reward_classifier":
@@ -370,16 +370,16 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

-    elif isinstance(policy_cfg, SACConfig):
-        from .sac.processor_sac import make_sac_pre_post_processors
+    elif isinstance(policy_cfg, GaussianActorConfig):
+        from .gaussian_actor.processor_gaussian_actor import make_gaussian_actor_pre_post_processors

-        processors = make_sac_pre_post_processors(
+        processors = make_gaussian_actor_pre_post_processors(
            config=policy_cfg,
            dataset_stats=kwargs.get("dataset_stats"),
        )

    elif isinstance(policy_cfg, RewardClassifierConfig):
-        from .sac.reward_model.processor_classifier import make_classifier_processor
+        from .gaussian_actor.reward_model.processor_classifier import make_classifier_processor

        processors = make_classifier_processor(
            config=policy_cfg,
@@ -12,8 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from .configuration_sac import SACConfig
-from .modeling_sac import SACPolicy
-from .processor_sac import make_sac_pre_post_processors
+from .configuration_gaussian_actor import GaussianActorConfig
+from .modeling_gaussian_actor import GaussianActorPolicy
+from .processor_gaussian_actor import make_gaussian_actor_pre_post_processors

-__all__ = ["SACConfig", "SACPolicy", "make_sac_pre_post_processors"]
+__all__ = ["GaussianActorConfig", "GaussianActorPolicy", "make_gaussian_actor_pre_post_processors"]
@@ -75,18 +75,19 @@ class PolicyConfig:
    init_final: float = 0.05


-@PreTrainedConfig.register_subclass("sac")
+@PreTrainedConfig.register_subclass("gaussian_actor")
@dataclass
-class SACConfig(PreTrainedConfig):
-    """Soft Actor-Critic (SAC) configuration.
+class GaussianActorConfig(PreTrainedConfig):
+    """Gaussian actor configuration.

-    SAC is an off-policy actor-critic deep RL algorithm based on the maximum entropy
-    reinforcement learning framework. It learns a policy and a Q-function simultaneously
-    using experience collected from the environment.
+    This configures the policy-side (actor + observation encoder) of a Gaussian
+    policy, as used by SAC and related maximum-entropy continuous-control algorithms.
+    By default the actor output is a tanh-squashed diagonal Gaussian
+    (``TanhMultivariateNormalDiag``); the tanh squashing can be disabled via
+    ``policy_kwargs.use_tanh_squash``. The critics, temperature, and Bellman-update
+    logic live on the algorithm side (see ``lerobot.rl.algorithms.sac``).

-    This configuration class contains all the parameters needed to define a SAC agent,
-    including network architectures, optimization settings, and algorithm-specific
-    hyperparameters.
+    CLI: ``--policy.type=gaussian_actor``.
    """

    # Mapping of feature types to normalization modes
@@ -122,7 +123,7 @@ class SACConfig(PreTrainedConfig):
    device: str = "cpu"
    # Device to store the model on
    storage_device: str = "cpu"
-    # Name of the vision encoder model (Set to "helper2424/resnet10" for hil serl resnet10)
+    # Name of the vision encoder model (Set to "lerobot/resnet10" for hil serl resnet10)
    vision_encoder_name: str | None = None
    # Whether to freeze the vision encoder during training
    freeze_vision_encoder: bool = True
@@ -135,78 +136,41 @@ class SACConfig(PreTrainedConfig):
    # Dimension of the image embedding pooling
    image_embedding_pooling_dim: int = 8

-    # Training parameter
-    # Number of steps for online training
-    online_steps: int = 1000000
-    # Capacity of the online replay buffer
-    online_buffer_capacity: int = 100000
-    # Capacity of the offline replay buffer
-    offline_buffer_capacity: int = 100000
-    # Whether to use asynchronous prefetching for the buffers
-    async_prefetch: bool = False
-    # Number of steps before learning starts
-    online_step_before_learning: int = 100
-    # Frequency of policy updates
-    policy_update_freq: int = 1
-
-    # SAC algorithm parameters
-    # Discount factor for the SAC algorithm
-    discount: float = 0.99
-    # Initial temperature value
-    temperature_init: float = 1.0
-    # Number of critics in the ensemble
-    num_critics: int = 2
-    # Number of subsampled critics for training
-    num_subsample_critics: int | None = None
-    # Learning rate for the critic network
-    critic_lr: float = 3e-4
-    # Learning rate for the actor network
-    actor_lr: float = 3e-4
-    # Learning rate for the temperature parameter
-    temperature_lr: float = 3e-4
-    # Weight for the critic target update
-    critic_target_update_weight: float = 0.005
-    # Update-to-data ratio for the UTD algorithm (If you want enable utd_ratio, you need to set it to >1)
-    utd_ratio: int = 1
+    # Encoder architecture
    # Hidden dimension size for the state encoder
    state_encoder_hidden_dim: int = 256
    # Dimension of the latent space
    latent_dim: int = 256
-    # Target entropy for the SAC algorithm
-    target_entropy: float | None = None
-    # Whether to use backup entropy for the SAC algorithm
-    use_backup_entropy: bool = True
-    # Gradient clipping norm for the SAC algorithm
-    grad_clip_norm: float = 40.0

-    # Network configuration
-    # Configuration for the critic network architecture
-    critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
-    # Configuration for the actor network architecture
-    actor_network_kwargs: ActorNetworkConfig = field(default_factory=ActorNetworkConfig)
-    # Configuration for the policy parameters
-    policy_kwargs: PolicyConfig = field(default_factory=PolicyConfig)
-    # Configuration for the discrete critic network
-    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
-    # Configuration for actor-learner architecture
+    # Online training (TODO(Khalil): relocate to TrainRLServerPipelineConfig)
+    online_steps: int = 1000000
+    online_buffer_capacity: int = 100000
+    offline_buffer_capacity: int = 100000
+    async_prefetch: bool = False
+    online_step_before_learning: int = 100
+
+    # Actor-learner transport (TODO(Khalil): relocate to TrainRLServerPipelineConfig).
    actor_learner_config: ActorLearnerConfig = field(default_factory=ActorLearnerConfig)
-    # Configuration for concurrency settings (you can use threads or processes for the actor and learner)
    concurrency: ConcurrencyConfig = field(default_factory=ConcurrencyConfig)

-    # Optimizations
-    use_torch_compile: bool = True
+    # Network architecture
+    # Actor network
+    actor_network_kwargs: ActorNetworkConfig = field(default_factory=ActorNetworkConfig)
+    # Gaussian head parameters
+    policy_kwargs: PolicyConfig = field(default_factory=PolicyConfig)
+    # Discrete critic
+    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)

    def __post_init__(self):
        super().__post_init__()
-        # Any validation specific to SAC configuration

    def get_optimizer_preset(self) -> MultiAdamConfig:
        return MultiAdamConfig(
            weight_decay=0.0,
            optimizer_groups={
-                "actor": {"lr": self.actor_lr},
-                "critic": {"lr": self.critic_lr},
-                "temperature": {"lr": self.temperature_lr},
+                "actor": {"lr": 3e-4},
+                "critic": {"lr": 3e-4},
+                "temperature": {"lr": 3e-4},
            },
        )

@@ -15,16 +15,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import math
 from collections.abc import Callable
 from dataclasses import asdict
-from typing import Literal
+from typing import Any

-import einops
-import numpy as np
 import torch
 import torch.nn as nn
-import torch.nn.functional as F  # noqa: N812
 from torch import Tensor
 from torch.distributions import MultivariateNormal, TanhTransform, Transform, TransformedDistribution

@@ -32,20 +28,29 @@ from lerobot.utils.constants import ACTION, OBS_ENV_STATE, OBS_STATE

 from ..pretrained import PreTrainedPolicy
 from ..utils import get_device_from_parameters
-from .configuration_sac import SACConfig, is_image_feature
+from .configuration_gaussian_actor import GaussianActorConfig, is_image_feature

 DISCRETE_DIMENSION_INDEX = -1  # Gripper is always the last dimension


-class SACPolicy(
+class GaussianActorPolicy(
    PreTrainedPolicy,
 ):
-    config_class = SACConfig
-    name = "sac"
+    """Gaussian actor + observation encoder.
+
+    Policy-side ``nn.Module`` used by SAC and related maximum-entropy continuous
+    control algorithms. It owns the actor network (``Policy``) and the observation
+    encoder (``GaussianActorObservationEncoder``); the critics, temperature, and
+    Bellman-update logic live on the algorithm side
+    (see ``lerobot.rl.algorithms.sac``).
+    """
+
+    config_class = GaussianActorConfig
+    name = "gaussian_actor"

    def __init__(
        self,
-        config: SACConfig | None = None,
+        config: GaussianActorConfig | None = None,
    ):
        super().__init__(config)
        config.validate_features()
@@ -54,9 +59,8 @@ class SACPolicy(
        # Determine action dimension and initialize all components
        continuous_action_dim = config.output_features[ACTION].shape[0]
        self._init_encoders()
-        self._init_critics(continuous_action_dim)
        self._init_actor(continuous_action_dim)
-        self._init_temperature()
+        self._init_discrete_critic()

    def get_optim_params(self) -> dict:
        optim_params = {
@@ -65,11 +69,7 @@ class SACPolicy(
                for n, p in self.actor.named_parameters()
                if not n.startswith("encoder") or not self.shared_encoder
            ],
-            "critic": self.critic_ensemble.parameters(),
-            "temperature": self.log_alpha,
        }
-        if self.config.num_discrete_actions is not None:
-            optim_params["discrete_critic"] = self.discrete_critic.parameters()
        return optim_params

    def reset(self):
@@ -79,7 +79,9 @@ class SACPolicy(
    @torch.no_grad()
    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
        """Predict a chunk of actions given environment observations."""
-        raise NotImplementedError("SACPolicy does not support action chunking. It returns single actions!")
+        raise NotImplementedError(
+            "GaussianActorPolicy does not support action chunking. It returns single actions!"
+        )

    @torch.no_grad()
    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
@@ -92,360 +94,55 @@ class SACPolicy(
        actions, _, _ = self.actor(batch, observations_features)

        if self.config.num_discrete_actions is not None:
-            discrete_action_value = self.discrete_critic(batch, observations_features)
-            discrete_action = torch.argmax(discrete_action_value, dim=-1, keepdim=True)
+            if self.discrete_critic is not None:
+                discrete_action_value = self.discrete_critic(batch, observations_features)
+                discrete_action = torch.argmax(discrete_action_value, dim=-1, keepdim=True)
+            else:
+                discrete_action = torch.ones(
+                    (*actions.shape[:-1], 1), device=actions.device, dtype=actions.dtype
+                )
            actions = torch.cat([actions, discrete_action], dim=-1)

        return actions

-    def critic_forward(
-        self,
-        observations: dict[str, Tensor],
-        actions: Tensor,
-        use_target: bool = False,
-        observation_features: Tensor | None = None,
-    ) -> Tensor:
-        """Forward pass through a critic network ensemble
+    def forward(self, batch: dict[str, Tensor | dict[str, Tensor]]) -> dict[str, Tensor]:
+        """Actor forward pass: sample actions and return log-probabilities.

        Args:
-            observations: Dictionary of observations
-            actions: Action tensor
-            use_target: If True, use target critics, otherwise use ensemble critics
+            batch: A flat observation dict, or a training dict containing
+                ``"state"`` (observations) and optionally ``"observation_feature"``
+                (pre-computed encoder features).

        Returns:
-            Tensor of Q-values from all critics
+            Dict with ``"action"``, ``"log_prob"``, and ``"action_mean"`` tensors.
        """
+        observations = batch.get("state", batch)
+        observation_features = batch.get("observation_feature") if isinstance(batch, dict) else None
+        actions, log_probs, means = self.actor(observations, observation_features)
+        return {"action": actions, "log_prob": log_probs, "action_mean": means}

-        critics = self.critic_target if use_target else self.critic_ensemble
-        q_values = critics(observations, actions, observation_features)
-        return q_values
+    def load_actor_weights(self, state_dicts: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        from lerobot.utils.transition import move_state_dict_to_device

-    def discrete_critic_forward(
-        self, observations, use_target=False, observation_features=None
-    ) -> torch.Tensor:
-        """Forward pass through a discrete critic network
+        actor_state_dict = move_state_dict_to_device(state_dicts["policy"], device=device)
+        self.actor.load_state_dict(actor_state_dict)

-        Args:
-            observations: Dictionary of observations
-            use_target: If True, use target critics, otherwise use ensemble critics
-            observation_features: Optional pre-computed observation features to avoid recomputing encoder output
-
-        Returns:
-            Tensor of Q-values from the discrete critic network
-        """
-        discrete_critic = self.discrete_critic_target if use_target else self.discrete_critic
-        q_values = discrete_critic(observations, observation_features)
-        return q_values
-
-    def forward(
-        self,
-        batch: dict[str, Tensor | dict[str, Tensor]],
-        model: Literal["actor", "critic", "temperature", "discrete_critic"] = "critic",
-    ) -> dict[str, Tensor]:
-        """Compute the loss for the given model
-
-        Args:
-            batch: Dictionary containing:
-                - action: Action tensor
-                - reward: Reward tensor
-                - state: Observations tensor dict
-                - next_state: Next observations tensor dict
-                - done: Done mask tensor
-                - observation_feature: Optional pre-computed observation features
-                - next_observation_feature: Optional pre-computed next observation features
-            model: Which model to compute the loss for ("actor", "critic", "discrete_critic", or "temperature")
-
-        Returns:
-            The computed loss tensor
-        """
-        # Extract common components from batch
-        actions: Tensor = batch[ACTION]
-        observations: dict[str, Tensor] = batch["state"]
-        observation_features: Tensor = batch.get("observation_feature")
-
-        if model == "critic":
-            # Extract critic-specific components
-            rewards: Tensor = batch["reward"]
-            next_observations: dict[str, Tensor] = batch["next_state"]
-            done: Tensor = batch["done"]
-            next_observation_features: Tensor = batch.get("next_observation_feature")
-
-            loss_critic = self.compute_loss_critic(
-                observations=observations,
-                actions=actions,
-                rewards=rewards,
-                next_observations=next_observations,
-                done=done,
-                observation_features=observation_features,
-                next_observation_features=next_observation_features,
+        if "discrete_critic" in state_dicts and self.discrete_critic is not None:
+            discrete_critic_state_dict = move_state_dict_to_device(
+                state_dicts["discrete_critic"], device=device
            )
-
-            return {"loss_critic": loss_critic}
-
-        if model == "discrete_critic" and self.config.num_discrete_actions is not None:
-            # Extract critic-specific components
-            rewards: Tensor = batch["reward"]
-            next_observations: dict[str, Tensor] = batch["next_state"]
-            done: Tensor = batch["done"]
-            next_observation_features: Tensor = batch.get("next_observation_feature")
-            complementary_info = batch.get("complementary_info")
-            loss_discrete_critic = self.compute_loss_discrete_critic(
-                observations=observations,
-                actions=actions,
-                rewards=rewards,
-                next_observations=next_observations,
-                done=done,
-                observation_features=observation_features,
-                next_observation_features=next_observation_features,
-                complementary_info=complementary_info,
-            )
-            return {"loss_discrete_critic": loss_discrete_critic}
-        if model == "actor":
-            return {
-                "loss_actor": self.compute_loss_actor(
-                    observations=observations,
-                    observation_features=observation_features,
-                )
-            }
-
-        if model == "temperature":
-            return {
-                "loss_temperature": self.compute_loss_temperature(
-                    observations=observations,
-                    observation_features=observation_features,
-                )
-            }
-
-        raise ValueError(f"Unknown model type: {model}")
-
-    def update_target_networks(self):
-        """Update target networks with exponential moving average"""
-        for target_param, param in zip(
-            self.critic_target.parameters(),
-            self.critic_ensemble.parameters(),
-            strict=True,
-        ):
-            target_param.data.copy_(
-                param.data * self.config.critic_target_update_weight
-                + target_param.data * (1.0 - self.config.critic_target_update_weight)
-            )
-        if self.config.num_discrete_actions is not None:
-            for target_param, param in zip(
-                self.discrete_critic_target.parameters(),
-                self.discrete_critic.parameters(),
-                strict=True,
-            ):
-                target_param.data.copy_(
-                    param.data * self.config.critic_target_update_weight
-                    + target_param.data * (1.0 - self.config.critic_target_update_weight)
-                )
-
-    @property
-    def temperature(self) -> float:
-        """Return the current temperature value, always in sync with log_alpha."""
-        return self.log_alpha.exp().item()
-
-    def compute_loss_critic(
-        self,
-        observations,
-        actions,
-        rewards,
-        next_observations,
-        done,
-        observation_features: Tensor | None = None,
-        next_observation_features: Tensor | None = None,
-    ) -> Tensor:
-        with torch.no_grad():
-            next_action_preds, next_log_probs, _ = self.actor(next_observations, next_observation_features)
-
-            # 2- compute q targets
-            q_targets = self.critic_forward(
-                observations=next_observations,
-                actions=next_action_preds,
-                use_target=True,
-                observation_features=next_observation_features,
-            )
-
-            # subsample critics to prevent overfitting if use high UTD (update to date)
-            # TODO: Get indices before forward pass to avoid unnecessary computation
-            if self.config.num_subsample_critics is not None:
-                indices = torch.randperm(self.config.num_critics)
-                indices = indices[: self.config.num_subsample_critics]
-                q_targets = q_targets[indices]
-
-            # critics subsample size
-            min_q, _ = q_targets.min(dim=0)  # Get values from min operation
-            if self.config.use_backup_entropy:
-                min_q = min_q - (self.temperature * next_log_probs)
-
-            td_target = rewards + (1 - done) * self.config.discount * min_q
-
-        # 3- compute predicted qs
-        if self.config.num_discrete_actions is not None:
-            # NOTE: We only want to keep the continuous action part
-            # In the buffer we have the full action space (continuous + discrete)
-            # We need to split them before concatenating them in the critic forward
-            actions: Tensor = actions[:, :DISCRETE_DIMENSION_INDEX]
-        q_preds = self.critic_forward(
-            observations=observations,
-            actions=actions,
-            use_target=False,
-            observation_features=observation_features,
-        )
-
-        # 4- Calculate loss
-        # Compute state-action value loss (TD loss) for all of the Q functions in the ensemble.
-        td_target_duplicate = einops.repeat(td_target, "b -> e b", e=q_preds.shape[0])
-        # You compute the mean loss of the batch for each critic and then to compute the final loss you sum them up
-        critics_loss = (
-            F.mse_loss(
-                input=q_preds,
-                target=td_target_duplicate,
-                reduction="none",
-            ).mean(dim=1)
-        ).sum()
-        return critics_loss
-
-    def compute_loss_discrete_critic(
-        self,
-        observations,
-        actions,
-        rewards,
-        next_observations,
-        done,
-        observation_features=None,
-        next_observation_features=None,
-        complementary_info=None,
-    ):
-        # NOTE: We only want to keep the discrete action part
-        # In the buffer we have the full action space (continuous + discrete)
-        # We need to split them before concatenating them in the critic forward
-        actions_discrete: Tensor = actions[:, DISCRETE_DIMENSION_INDEX:].clone()
-        actions_discrete = torch.round(actions_discrete)
-        actions_discrete = actions_discrete.long()
-
-        discrete_penalties: Tensor | None = None
-        if complementary_info is not None:
-            discrete_penalties: Tensor | None = complementary_info.get("discrete_penalty")
-
-        with torch.no_grad():
-            # For DQN, select actions using online network, evaluate with target network
-            next_discrete_qs = self.discrete_critic_forward(
-                next_observations, use_target=False, observation_features=next_observation_features
-            )
-            best_next_discrete_action = torch.argmax(next_discrete_qs, dim=-1, keepdim=True)
-
-            # Get target Q-values from target network
-            target_next_discrete_qs = self.discrete_critic_forward(
-                observations=next_observations,
-                use_target=True,
-                observation_features=next_observation_features,
-            )
-
-            # Use gather to select Q-values for best actions
-            target_next_discrete_q = torch.gather(
-                target_next_discrete_qs, dim=1, index=best_next_discrete_action
-            ).squeeze(-1)
-
-            # Compute target Q-value with Bellman equation
-            rewards_discrete = rewards
-            if discrete_penalties is not None:
-                rewards_discrete = rewards + discrete_penalties
-            target_discrete_q = rewards_discrete + (1 - done) * self.config.discount * target_next_discrete_q
-
-        # Get predicted Q-values for current observations
-        predicted_discrete_qs = self.discrete_critic_forward(
-            observations=observations, use_target=False, observation_features=observation_features
-        )
-
-        # Use gather to select Q-values for taken actions
-        predicted_discrete_q = torch.gather(predicted_discrete_qs, dim=1, index=actions_discrete).squeeze(-1)
-
-        # Compute MSE loss between predicted and target Q-values
-        discrete_critic_loss = F.mse_loss(input=predicted_discrete_q, target=target_discrete_q)
-        return discrete_critic_loss
-
-    def compute_loss_temperature(self, observations, observation_features: Tensor | None = None) -> Tensor:
-        """Compute the temperature loss"""
-        # calculate temperature loss
-        with torch.no_grad():
-            _, log_probs, _ = self.actor(observations, observation_features)
-        temperature_loss = (-self.log_alpha.exp() * (log_probs + self.target_entropy)).mean()
-        return temperature_loss
-
-    def compute_loss_actor(
-        self,
-        observations,
-        observation_features: Tensor | None = None,
-    ) -> Tensor:
-        actions_pi, log_probs, _ = self.actor(observations, observation_features)
-
-        q_preds = self.critic_forward(
-            observations=observations,
-            actions=actions_pi,
-            use_target=False,
-            observation_features=observation_features,
-        )
-        min_q_preds = q_preds.min(dim=0)[0]
-
-        actor_loss = ((self.temperature * log_probs) - min_q_preds).mean()
-        return actor_loss
+            self.discrete_critic.load_state_dict(discrete_critic_state_dict)

    def _init_encoders(self):
        """Initialize shared or separate encoders for actor and critic."""
        self.shared_encoder = self.config.shared_encoder
-        self.encoder_critic = SACObservationEncoder(self.config)
+        self.encoder_critic = GaussianActorObservationEncoder(self.config)
        self.encoder_actor = (
-            self.encoder_critic if self.shared_encoder else SACObservationEncoder(self.config)
+            self.encoder_critic if self.shared_encoder else GaussianActorObservationEncoder(self.config)
        )

-    def _init_critics(self, continuous_action_dim):
-        """Build critic ensemble, targets, and optional discrete critic."""
-        heads = [
-            CriticHead(
-                input_dim=self.encoder_critic.output_dim + continuous_action_dim,
-                **asdict(self.config.critic_network_kwargs),
-            )
-            for _ in range(self.config.num_critics)
-        ]
-        self.critic_ensemble = CriticEnsemble(encoder=self.encoder_critic, ensemble=heads)
-        target_heads = [
-            CriticHead(
-                input_dim=self.encoder_critic.output_dim + continuous_action_dim,
-                **asdict(self.config.critic_network_kwargs),
-            )
-            for _ in range(self.config.num_critics)
-        ]
-        self.critic_target = CriticEnsemble(encoder=self.encoder_critic, ensemble=target_heads)
-        self.critic_target.load_state_dict(self.critic_ensemble.state_dict())
-
-        if self.config.use_torch_compile:
-            self.critic_ensemble = torch.compile(self.critic_ensemble)
-            self.critic_target = torch.compile(self.critic_target)
-
-        if self.config.num_discrete_actions is not None:
-            self._init_discrete_critics()
-
-    def _init_discrete_critics(self):
-        """Build discrete discrete critic ensemble and target networks."""
-        self.discrete_critic = DiscreteCritic(
-            encoder=self.encoder_critic,
-            input_dim=self.encoder_critic.output_dim,
-            output_dim=self.config.num_discrete_actions,
-            **asdict(self.config.discrete_critic_network_kwargs),
-        )
-        self.discrete_critic_target = DiscreteCritic(
-            encoder=self.encoder_critic,
-            input_dim=self.encoder_critic.output_dim,
-            output_dim=self.config.num_discrete_actions,
-            **asdict(self.config.discrete_critic_network_kwargs),
-        )
-
-        # TODO: (maractingi, azouitine) Compile the discrete critic
-        self.discrete_critic_target.load_state_dict(self.discrete_critic.state_dict())
-
    def _init_actor(self, continuous_action_dim):
-        """Initialize policy actor network and default target entropy."""
+        """Initialize policy actor network."""
        # NOTE: The actor select only the continuous action part
        self.actor = Policy(
            encoder=self.encoder_actor,
@@ -455,21 +152,25 @@ class SACPolicy(
            **asdict(self.config.policy_kwargs),
        )

-        self.target_entropy = self.config.target_entropy
-        if self.target_entropy is None:
-            dim = continuous_action_dim + (1 if self.config.num_discrete_actions is not None else 0)
-            self.target_entropy = -np.prod(dim) / 2
+    def _init_discrete_critic(self) -> None:
+        """Initialize discrete critic network."""
+        if self.config.num_discrete_actions is None:
+            self.discrete_critic = None
+            return

-    def _init_temperature(self) -> None:
-        """Set up temperature parameter (log_alpha)."""
-        temp_init = self.config.temperature_init
-        self.log_alpha = nn.Parameter(torch.tensor([math.log(temp_init)]))
+        # TODO(Khalil): Compile the discrete critic
+        self.discrete_critic = DiscreteCritic(
+            encoder=self.encoder_critic,
+            input_dim=self.encoder_critic.output_dim,
+            output_dim=self.config.num_discrete_actions,
+            **asdict(self.config.discrete_critic_network_kwargs),
+        )


-class SACObservationEncoder(nn.Module):
+class GaussianActorObservationEncoder(nn.Module):
    """Encode image and/or state vector observations."""

-    def __init__(self, config: SACConfig) -> None:
+    def __init__(self, config: GaussianActorConfig) -> None:
        super().__init__()
        self.config = config
        self._init_image_layers()
@@ -677,84 +378,6 @@ class MLP(nn.Module):
        return self.net(x)


-class CriticHead(nn.Module):
-    def __init__(
-        self,
-        input_dim: int,
-        hidden_dims: list[int],
-        activations: Callable[[torch.Tensor], torch.Tensor] | str = nn.SiLU(),
-        activate_final: bool = False,
-        dropout_rate: float | None = None,
-        init_final: float | None = None,
-        final_activation: Callable[[torch.Tensor], torch.Tensor] | str | None = None,
-    ):
-        super().__init__()
-        self.net = MLP(
-            input_dim=input_dim,
-            hidden_dims=hidden_dims,
-            activations=activations,
-            activate_final=activate_final,
-            dropout_rate=dropout_rate,
-            final_activation=final_activation,
-        )
-        self.output_layer = nn.Linear(in_features=hidden_dims[-1], out_features=1)
-        if init_final is not None:
-            nn.init.uniform_(self.output_layer.weight, -init_final, init_final)
-            nn.init.uniform_(self.output_layer.bias, -init_final, init_final)
-        else:
-            orthogonal_init()(self.output_layer.weight)
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        return self.output_layer(self.net(x))
-
-
-class CriticEnsemble(nn.Module):
-    """
-    CriticEnsemble wraps multiple CriticHead modules into an ensemble.
-
-    Args:
-        encoder (SACObservationEncoder): encoder for observations.
-        ensemble (List[CriticHead]): list of critic heads.
-        init_final (float | None): optional initializer scale for final layers.
-
-    Forward returns a tensor of shape (num_critics, batch_size) containing Q-values.
-    """
-
-    def __init__(
-        self,
-        encoder: SACObservationEncoder,
-        ensemble: list[CriticHead],
-        init_final: float | None = None,
-    ):
-        super().__init__()
-        self.encoder = encoder
-        self.init_final = init_final
-        self.critics = nn.ModuleList(ensemble)
-
-    def forward(
-        self,
-        observations: dict[str, torch.Tensor],
-        actions: torch.Tensor,
-        observation_features: torch.Tensor | None = None,
-    ) -> torch.Tensor:
-        device = get_device_from_parameters(self)
-        # Move each tensor in observations to device
-        observations = {k: v.to(device) for k, v in observations.items()}
-
-        obs_enc = self.encoder(observations, cache=observation_features)
-
-        inputs = torch.cat([obs_enc, actions], dim=-1)
-
-        # Loop through critics and collect outputs
-        q_values = []
-        for critic in self.critics:
-            q_values.append(critic(inputs))
-
-        # Stack outputs to match expected shape [num_critics, batch_size]
-        q_values = torch.stack([q.squeeze(-1) for q in q_values], dim=0)
-        return q_values
-
-
 class DiscreteCritic(nn.Module):
    def __init__(
        self,
@@ -800,7 +423,7 @@ class DiscreteCritic(nn.Module):
 class Policy(nn.Module):
    def __init__(
        self,
-        encoder: SACObservationEncoder,
+        encoder: GaussianActorObservationEncoder,
        network: nn.Module,
        action_dim: int,
        std_min: float = -5,
@@ -811,7 +434,7 @@ class Policy(nn.Module):
        encoder_is_shared: bool = False,
    ):
        super().__init__()
-        self.encoder: SACObservationEncoder = encoder
+        self.encoder: GaussianActorObservationEncoder = encoder
        self.network = network
        self.action_dim = action_dim
        self.std_min = std_min
@@ -885,7 +508,7 @@ class Policy(nn.Module):


 class DefaultImageEncoder(nn.Module):
-    def __init__(self, config: SACConfig):
+    def __init__(self, config: GaussianActorConfig):
        super().__init__()
        image_key = next(key for key in config.input_features if is_image_feature(key))
        self.image_enc_layers = nn.Sequential(
@@ -931,12 +554,12 @@ def freeze_image_encoder(image_encoder: nn.Module):


 class PretrainedImageEncoder(nn.Module):
-    def __init__(self, config: SACConfig):
+    def __init__(self, config: GaussianActorConfig):
        super().__init__()

        self.image_enc_layers, self.image_enc_out_shape = self._load_pretrained_vision_encoder(config)

-    def _load_pretrained_vision_encoder(self, config: SACConfig):
+    def _load_pretrained_vision_encoder(self, config: GaussianActorConfig):
        """Set up CNN encoder"""
        from transformers import AutoModel

@@ -32,18 +32,18 @@ from lerobot.processor import (
 )
 from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME

-from .configuration_sac import SACConfig
+from .configuration_gaussian_actor import GaussianActorConfig


-def make_sac_pre_post_processors(
-    config: SACConfig,
+def make_gaussian_actor_pre_post_processors(
+    config: GaussianActorConfig,
    dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None,
 ) -> tuple[
    PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
    PolicyProcessorPipeline[PolicyAction, PolicyAction],
 ]:
    """
-    Constructs pre-processor and post-processor pipelines for the SAC policy.
+    Constructs pre-processor and post-processor pipelines for the Gaussian actor policy.

    The pre-processing pipeline prepares input data for the model by:
    1. Renaming features to match pretrained configurations.
@@ -56,7 +56,7 @@ def make_sac_pre_post_processors(
    2. Unnormalizing the output features to their original scale.

    Args:
-        config: The configuration object for the SAC policy.
+        config: The configuration object for the tanh-Gaussian policy.
        dataset_stats: A dictionary of statistics for normalization.

    Returns:
@@ -31,7 +31,7 @@ class RewardClassifierConfig(PreTrainedConfig):
    latent_dim: int = 256
    image_embedding_pooling_dim: int = 8
    dropout_rate: float = 0.1
-    model_name: str = "helper2424/resnet10"  # TODO: This needs to be updated. The model on the Hub doesn't call self.post_init() in its __init__, which is required by transformers v5 to set all_tied_weights_keys. The from_pretrained call fails when it tries to access this attribute during _finalize_model_loading.
+    model_name: str = "lerobot/resnet10"
    device: str = "cpu"
    model_type: str = "cnn"  # "transformer" or "cnn"
    num_cameras: int = 2
@@ -108,6 +108,7 @@ class Classifier(PreTrainedPolicy):
    def __init__(
        self,
        config: RewardClassifierConfig,
+        **kwargs,
    ):
        from transformers import AutoModel

@@ -269,10 +270,6 @@ class Classifier(PreTrainedPolicy):

    def predict_reward(self, batch, threshold=0.5):
        """Eval method. Returns predicted reward with the decision threshold as argument."""
-        # Check for both OBS_IMAGE and OBS_IMAGES prefixes
-        batch = self.normalize_inputs(batch)
-        batch = self.normalize_targets(batch)
-
        # Extract images from batch dict
        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]

@@ -43,7 +43,6 @@ from torch import Tensor

 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.utils.constants import ACTION, OBS_IMAGES
-from lerobot.utils.import_utils import require_package

 from ..pretrained import PreTrainedPolicy
 from .configuration_groot import GrootConfig
@@ -60,7 +59,6 @@ class GrootPolicy(PreTrainedPolicy):

    def __init__(self, config: GrootConfig, **kwargs):
        """Initialize Groot policy wrapper."""
-        require_package("transformers", extra="groot")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -36,7 +36,7 @@ import torch.nn.functional as F  # noqa: N812
 import torchvision
 from torch import Tensor

-from lerobot.utils.import_utils import _diffusers_available, _transformers_available, require_package
+from lerobot.utils.import_utils import _transformers_available

 from .configuration_multi_task_dit import MultiTaskDiTConfig

@@ -46,13 +46,6 @@ if TYPE_CHECKING or _transformers_available:
 else:
    CLIPTextModel = None
    CLIPVisionModel = None
-
-if TYPE_CHECKING or _diffusers_available:
-    from diffusers.schedulers.scheduling_ddim import DDIMScheduler
-    from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
-else:
-    DDIMScheduler = None
-    DDPMScheduler = None
 from lerobot.utils.constants import (
    ACTION,
    OBS_IMAGES,
@@ -72,8 +65,6 @@ class MultiTaskDiTPolicy(PreTrainedPolicy):
    name = "multi_task_dit"

    def __init__(self, config: MultiTaskDiTConfig, **kwargs):
-        require_package("transformers", extra="multi_task_dit")
-        require_package("diffusers", extra="multi_task_dit")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -652,6 +643,12 @@ class DiffusionObjective(nn.Module):
            "prediction_type": config.prediction_type,
        }

+        from lerobot.utils.import_utils import require_package
+
+        require_package("diffusers", extra="multi_task_dit")
+        from diffusers.schedulers.scheduling_ddim import DDIMScheduler
+        from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
+
        if config.noise_scheduler_type == "DDPM":
            self.noise_scheduler: DDPMScheduler | DDIMScheduler = DDPMScheduler(**scheduler_kwargs)
        elif config.noise_scheduler_type == "DDIM":
@@ -26,7 +26,7 @@ import torch
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor, nn

-from lerobot.utils.import_utils import _transformers_available, require_package
+from lerobot.utils.import_utils import _transformers_available

 # Conditional import for type checking and lazy loading
 if TYPE_CHECKING or _transformers_available:
@@ -947,7 +947,6 @@ class PI0Policy(PreTrainedPolicy):
        Args:
            config: Policy configuration class instance.
        """
-        require_package("transformers", extra="pi")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -26,7 +26,7 @@ import torch
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor, nn

-from lerobot.utils.import_utils import _transformers_available, require_package
+from lerobot.utils.import_utils import _transformers_available

 # Conditional import for type checking and lazy loading
 if TYPE_CHECKING or _transformers_available:
@@ -918,7 +918,6 @@ class PI05Policy(PreTrainedPolicy):
        Args:
            config: Policy configuration class instance.
        """
-        require_package("transformers", extra="pi")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -26,7 +26,7 @@ import torch
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor, nn

-from lerobot.utils.import_utils import _scipy_available, _transformers_available, require_package
+from lerobot.utils.import_utils import _scipy_available, _transformers_available

 # Conditional import for type checking and lazy loading
 if TYPE_CHECKING or _scipy_available:
@@ -35,7 +35,7 @@ else:
    idct = None

 if TYPE_CHECKING or _transformers_available:
-    from transformers import AutoProcessor, AutoTokenizer
+    from transformers import AutoTokenizer
    from transformers.models.auto import CONFIG_MAPPING

    from ..pi_gemma import (
@@ -44,7 +44,6 @@ if TYPE_CHECKING or _transformers_available:
    )
 else:
    CONFIG_MAPPING = None
-    AutoProcessor = None
    AutoTokenizer = None
    PiGemmaModel = None
    PaliGemmaForConditionalGenerationWithPiGemma = None
@@ -827,14 +826,14 @@ class PI0FastPolicy(PreTrainedPolicy):
        Args:
            config: Policy configuration class instance.
        """
-        require_package("transformers", extra="pi")
-        require_package("scipy", extra="pi")
        super().__init__(config)
        config.validate_features()
        self.config = config

        # Load tokenizers first
        try:
+            from transformers import AutoProcessor, AutoTokenizer
+
            # Load FAST tokenizer
            self.action_tokenizer = AutoProcessor.from_pretrained(
                config.action_tokenizer_name, trust_remote_code=True
@@ -1,4 +1,116 @@
-# Moved to lerobot.utils.action_interpolator — re-exported for backwards compatibility.
-from lerobot.utils.action_interpolator import ActionInterpolator
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

-__all__ = ["ActionInterpolator"]
+"""Action interpolation for smoother robot control.
+
+Provides configurable Nx control rate by interpolating between consecutive actions.
+Useful with RTC and action-chunking policies to reduce jerkiness.
+"""
+
+from torch import Tensor
+
+
+class ActionInterpolator:
+    """Interpolates between consecutive actions for smoother control.
+
+    When enabled with multiplier N, produces N actions per policy action
+    by linearly interpolating between the previous and current action.
+
+    Example with multiplier=3:
+        prev_action -> [1/3 interpolated, 2/3 interpolated, current_action]
+
+    This effectively multiplies the control rate for smoother motion.
+
+    Usage:
+        interpolator = ActionInterpolator(multiplier=2)  # 2x control rate
+
+        # In control loop:
+        if interpolator.needs_new_action():
+            new_action = queue.get()
+            if new_action:
+                interpolator.add(new_action.cpu())
+
+        action = interpolator.get()
+        if action:
+            robot.send_action(action)
+    """
+
+    def __init__(self, multiplier: int = 1):
+        """Initialize the interpolator.
+
+        Args:
+            multiplier: Control rate multiplier (1 = no interpolation, 2 = 2x, 3 = 3x, etc.)
+        """
+        if multiplier < 1:
+            raise ValueError(f"multiplier must be >= 1, got {multiplier}")
+        self.multiplier = multiplier
+        self._prev: Tensor | None = None
+        self._buffer: list[Tensor] = []
+        self._idx = 0
+
+    @property
+    def enabled(self) -> bool:
+        """Whether interpolation is active (multiplier > 1)."""
+        return self.multiplier > 1
+
+    def reset(self):
+        """Reset interpolation state (call between episodes)."""
+        self._prev = None
+        self._buffer = []
+        self._idx = 0
+
+    def needs_new_action(self) -> bool:
+        """Check if a new action is needed from the queue."""
+        return self._idx >= len(self._buffer)
+
+    def add(self, action: Tensor) -> None:
+        """Add a new action and compute interpolated sequence.
+
+        Args:
+            action: New action tensor from policy/queue (already on CPU).
+        """
+        if self.multiplier > 1 and self._prev is not None:
+            self._buffer = []
+            for i in range(1, self.multiplier + 1):
+                t = i / self.multiplier
+                interp = self._prev + t * (action - self._prev)
+                self._buffer.append(interp)
+        else:
+            # First step: no previous action yet, so run at base FPS without interpolation.
+            self._buffer = [action.clone()]
+        self._prev = action.clone()
+        self._idx = 0
+
+    def get(self) -> Tensor | None:
+        """Get the next interpolated action.
+
+        Returns:
+            Next action tensor, or None if buffer is exhausted.
+        """
+        if self._idx >= len(self._buffer):
+            return None
+        action = self._buffer[self._idx]
+        self._idx += 1
+        return action
+
+    def get_control_interval(self, fps: float) -> float:
+        """Get the control interval based on interpolation multiplier.
+
+        Args:
+            fps: Base frames per second.
+
+        Returns:
+            Control interval in seconds (divided by multiplier).
+        """
+        return 1.0 / (fps * self.multiplier)
@@ -92,10 +92,10 @@ class ActionQueue:
        Returns:
            int: Number of unconsumed actions.
        """
-        with self.lock:
-            if self.queue is None:
-                return 0
-            return len(self.queue) - self.last_index
+        if self.queue is None:
+            return 0
+        length = len(self.queue)
+        return length - self.last_index

    def empty(self) -> bool:
        """Check if the queue is empty.
@@ -103,10 +103,11 @@ class ActionQueue:
        Returns:
            bool: True if no actions remain, False otherwise.
        """
-        with self.lock:
-            if self.queue is None:
-                return True
-            return len(self.queue) - self.last_index <= 0
+        if self.queue is None:
+            return True
+
+        length = len(self.queue)
+        return length - self.last_index <= 0

    def get_action_index(self) -> int:
        """Get the current action consumption index.
@@ -114,8 +115,7 @@ class ActionQueue:
        Returns:
            int: Index of the next action to be consumed.
        """
-        with self.lock:
-            return self.last_index
+        return self.last_index

    def get_left_over(self) -> Tensor | None:
        """Get leftover original actions for RTC prev_chunk_left_over.
@@ -62,7 +62,6 @@ from torch import Tensor, nn

 from lerobot.utils.constants import ACTION, OBS_LANGUAGE_ATTENTION_MASK, OBS_LANGUAGE_TOKENS, OBS_STATE
 from lerobot.utils.device_utils import get_safe_dtype
-from lerobot.utils.import_utils import require_package

 from ..pretrained import PreTrainedPolicy
 from ..rtc.modeling_rtc import RTCProcessor
@@ -240,7 +239,6 @@ class SmolVLAPolicy(PreTrainedPolicy):
                    the configuration class is used.
        """

-        require_package("transformers", extra="smolvla")
        super().__init__(config)
        config.validate_features()
        self.config = config
@@ -27,7 +27,7 @@ import torch.distributed as distributed
 import torch.nn.functional as F  # noqa: N812
 from einops import pack, rearrange, reduce, repeat, unpack
 from torch import einsum, nn
-from torch.amp import autocast
+from torch.cuda.amp import autocast
 from torch.optim import Optimizer

 from .configuration_vqbet import VQBeTConfig
@@ -1370,7 +1370,7 @@ class EuclideanCodebook(nn.Module):
        batch_samples = rearrange(batch_samples, "h ... d -> h (...) d")
        self.replace(batch_samples, batch_mask=expired_codes)

-    @autocast("cuda", enabled=False)
+    @autocast(enabled=False)
    def forward(self, x, sample_codebook_temp=None, mask=None, freeze_codebook=False):
        needs_codebook_dim = x.ndim < 4
        sample_codebook_temp = (
@@ -61,6 +61,7 @@ from .hil_processor import (
    RewardClassifierProcessorStep,
    TimeLimitProcessorStep,
 )
+from .leader_follower_processor import LeaderArmInterventionStep
 from .newline_task_processor import NewLineTaskProcessorStep
 from .normalize_processor import NormalizerProcessorStep, UnnormalizerProcessorStep, hotswap_stats
 from .observation_processor import VanillaObservationProcessorStep
@@ -122,6 +123,7 @@ __all__ = [
    "ImageCropResizeProcessorStep",
    "InfoProcessorStep",
    "InterventionActionProcessorStep",
+    "LeaderArmInterventionStep",
    "make_default_processors",
    "make_default_teleop_action_processor",
    "make_default_robot_action_processor",
@@ -321,6 +321,7 @@ class GymHILAdapterProcessorStep(ProcessorStep):
    This step normalizes the `transition` object by:
    1. Copying `teleop_action` from `info` to `complementary_data`.
    2. Copying `is_intervention` from `info` (using the string key) to `info` (using the enum key).
+    3. Copying `discrete_penalty` from `info` to `complementary_data`.
    """

    def __call__(self, transition: EnvTransition) -> EnvTransition:
@@ -330,6 +331,9 @@ class GymHILAdapterProcessorStep(ProcessorStep):
        if TELEOP_ACTION_KEY in info:
            complementary_data[TELEOP_ACTION_KEY] = info[TELEOP_ACTION_KEY]

+        if DISCRETE_PENALTY_KEY in info:
+            complementary_data[DISCRETE_PENALTY_KEY] = info[DISCRETE_PENALTY_KEY]
+
        if "is_intervention" in info:
            info[TeleopEvents.IS_INTERVENTION] = info["is_intervention"]

@@ -348,18 +352,24 @@ class GymHILAdapterProcessorStep(ProcessorStep):
@ProcessorStepRegistry.register("gripper_penalty_processor")
 class GripperPenaltyProcessorStep(ProcessorStep):
    """
-    Applies a penalty for inefficient gripper usage.
+    Applies a small per-transition cost on the discrete gripper action.

-    This step penalizes actions that attempt to close an already closed gripper or
-    open an already open one, based on position thresholds.
+    Fires only when the commanded action would actually transition the gripper
+    from one extreme to the other (close-while-open or open-while-closed).
+    This discourages gripper oscillation while leaving "stay" and saturating-further
+    commands unpenalized.

    Attributes:
        penalty: The negative reward value to apply.
        max_gripper_pos: The maximum position value for the gripper, used for normalization.
+        open_threshold: Normalized state below which the gripper is considered "open".
+        closed_threshold: Normalized state above which the gripper is considered "closed".
    """

-    penalty: float = -0.01
+    penalty: float = -0.02
    max_gripper_pos: float = 30.0
+    open_threshold: float = 0.1
+    closed_threshold: float = 0.9

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        """
@@ -391,9 +401,13 @@ class GripperPenaltyProcessorStep(ProcessorStep):
        gripper_state_normalized = current_gripper_pos / self.max_gripper_pos

        # Calculate penalty boolean as in original
-        gripper_penalty_bool = (gripper_state_normalized < 0.5 and gripper_action_normalized > 0.5) or (
-            gripper_state_normalized > 0.75 and gripper_action_normalized < 0.5
-        )
+        #   - currently open  AND target is closed  -> close transition
+        #   - currently closed AND target is open   -> open transition
+        is_open = gripper_state_normalized < self.open_threshold
+        is_closed = gripper_state_normalized > self.closed_threshold
+        cmd_close = gripper_action_normalized > self.closed_threshold
+        cmd_open = gripper_action_normalized < self.open_threshold
+        gripper_penalty_bool = (is_open and cmd_close) or (is_closed and cmd_open)

        gripper_penalty = self.penalty * int(gripper_penalty_bool)

@@ -409,11 +423,14 @@ class GripperPenaltyProcessorStep(ProcessorStep):
        Returns the configuration of the step for serialization.

        Returns:
-            A dictionary containing the penalty value and max gripper position.
+            A dictionary containing the penalty value, max gripper position,
+            and the open/closed thresholds.
        """
        return {
            "penalty": self.penalty,
            "max_gripper_pos": self.max_gripper_pos,
+            "open_threshold": self.open_threshold,
+            "closed_threshold": self.closed_threshold,
        }

    def reset(self) -> None:
@@ -557,7 +574,7 @@ class RewardClassifierProcessorStep(ProcessorStep):
    def __post_init__(self):
        """Initializes the reward classifier model after the dataclass is created."""
        if self.pretrained_path is not None:
-            from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+            from lerobot.policies.gaussian_actor.reward_model.modeling_classifier import Classifier

            self.reward_classifier = Classifier.from_pretrained(self.pretrained_path)
            self.reward_classifier.to(self.device)
@@ -0,0 +1,270 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Processor step for using a leader arm as the HIL-SERL intervention device.
+
+Position-only port of the leader/follower control mode (no rotation): the leader
+arm acts as a 4-D end-effector delta source ``[dx, dy, dz, gripper]`` for the
+existing ``InterventionActionProcessorStep`` overriding pipeline.
+
+The teleop_action returned by the leader is a flat dictionary of joint angles
+(degrees) like ``{"shoulder_pan.pos": ..., ..., "gripper.pos": ...}``. This step
+converts that into a normalised EE-delta dictionary by:
+
+1. Running forward kinematics on the leader joints -> ``p_leader`` (xyz, m).
+2. Running forward kinematics on the follower joints (read from the env
+   transition's observation / complementary data) -> ``p_follower`` (xyz, m).
+3. Normalising ``p_leader - p_follower`` by ``end_effector_step_sizes`` and
+   clipping to ``[-1, 1]`` (matches the gamepad / keyboard EE convention).
+4. Mapping the leader gripper position ``[0, 100]`` to the discrete
+   ``{0=close, 1=stay, 2=open}`` action used by the SO follower.
+
+The output is written back to ``complementary_data["teleop_action"]`` so the
+rest of the action pipeline (``InterventionActionProcessorStep`` ->
+``MapTensorToDeltaActionDictStep`` -> IK) is unchanged.
+
+Additionally, when an optional ``teleop_device`` reference is provided, this
+step also pushes the follower's raw joint positions back to the leader via
+``teleop_device.send_action(follower_joints)`` every tick. Combined with
+:class:`SOLeaderFollower.send_action`, this implements the **haptic follow**
+behaviour from https://github.com/huggingface/lerobot/pull/2596: the leader
+mimics the follower while the human is hands-off, then drops torque the
+moment intervention is toggled so the user can grab and steer it.
+"""
+
+import logging
+from dataclasses import dataclass, field
+from typing import Any
+
+import numpy as np
+
+from lerobot.configs import PipelineFeatureType, PolicyFeature
+from lerobot.model import RobotKinematics
+from lerobot.types import EnvTransition, TransitionKey
+
+from .pipeline import ProcessorStep, ProcessorStepRegistry
+
+logger = logging.getLogger(__name__)
+
+TELEOP_ACTION_KEY = "teleop_action"
+RAW_JOINT_POSITIONS_KEY = "raw_joint_positions"
+GRIPPER_KEY = "gripper"
+
+# Leader gripper is in [0, 100] when calibrated.
+LEADER_GRIPPER_OPEN_DEFAULT = 60.0
+LEADER_GRIPPER_CLOSE_DEFAULT = 30.0
+
+# Discrete gripper command convention (matches GripperVelocityToJoint).
+GRIPPER_CLOSE = 0.0
+GRIPPER_STAY = 1.0
+GRIPPER_OPEN = 2.0
+
+
+def _joint_dict_to_array(joint_dict: dict[str, float], motor_names: list[str]) -> np.ndarray | None:
+    """Pull joint positions in ``motor_names`` order from a ``"<motor>.pos"`` dict.
+
+    Returns ``None`` if any motor is missing.
+    """
+    out = np.zeros(len(motor_names), dtype=float)
+    for i, name in enumerate(motor_names):
+        v = joint_dict.get(f"{name}.pos")
+        if v is None:
+            return None
+        out[i] = float(v)
+    return out
+
+
+@ProcessorStepRegistry.register("leader_arm_intervention")
+@dataclass
+class LeaderArmInterventionStep(ProcessorStep):
+    """Convert leader joint positions in ``teleop_action`` into a 4-D EE-delta dict.
+
+    This step is intended to run **between** ``AddTeleopActionAsComplimentaryDataStep``
+    (which populates ``complementary_data["teleop_action"]`` with raw leader joint
+    angles) and ``InterventionActionProcessorStep`` (which expects a delta dict).
+
+    Attributes:
+        kinematics: Robot kinematic model shared with the follower; used for FK
+            on both the leader arm and the follower arm. Both arms must use the
+            same URDF joint order.
+        motor_names: Ordered joint names matching ``kinematics.joint_names``,
+            used to slice joint dicts.
+        end_effector_step_sizes: Per-axis normalisation in metres, e.g.
+            ``{"x": 0.025, "y": 0.025, "z": 0.025}``. The clamped delta is
+            ``(p_leader - p_follower) / step_size``.
+        use_gripper: When ``True``, append a discrete gripper command derived from
+            the leader gripper joint to the output dict.
+        leader_gripper_open: Threshold (>= ) above which the leader gripper is
+            considered ``open`` -> command ``2``.
+        leader_gripper_close: Threshold (<= ) below which the leader gripper is
+            considered ``closed`` -> command ``0``.
+        teleop_device: Optional reference to the leader teleoperator. When set
+            and the device implements ``send_action(action_dict)``, this step
+            pushes the follower's raw joints to it every tick to drive haptic
+            follow. The teleop is responsible for gating actual motor writes on
+            its own intervention state (see :class:`SOLeaderFollower`).
+    """
+
+    kinematics: RobotKinematics
+    motor_names: list[str]
+    end_effector_step_sizes: dict[str, float]
+    use_gripper: bool = True
+    leader_gripper_open: float = LEADER_GRIPPER_OPEN_DEFAULT
+    leader_gripper_close: float = LEADER_GRIPPER_CLOSE_DEFAULT
+    teleop_device: Any = None
+
+    _initial_follower_joints: np.ndarray | None = field(default=None, init=False, repr=False)
+
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        new_transition = transition.copy()
+        complementary_data = dict(new_transition.get(TransitionKey.COMPLEMENTARY_DATA, {}) or {})
+
+        # Haptic follow: push follower joints to the leader every step (whether
+        # or not we have a usable leader action this tick). The leader's own
+        # send_action gates writes on its intervention state.
+        follower_joints_dict = self._read_follower_joints_dict(transition, complementary_data)
+        if follower_joints_dict is not None:
+            self._push_haptic_follow(follower_joints_dict)
+
+        leader_joints_dict = complementary_data.get(TELEOP_ACTION_KEY)
+        if not isinstance(leader_joints_dict, dict):
+            # Nothing to convert (e.g. teleop disconnected). Leave transition untouched.
+            return new_transition
+
+        if not any(k.endswith(".pos") for k in leader_joints_dict):
+            # Already in EE-delta form (or unrecognised); skip.
+            return new_transition
+
+        follower_joints = (
+            _joint_dict_to_array(follower_joints_dict, self.motor_names)
+            if follower_joints_dict is not None
+            else None
+        )
+        leader_joints = _joint_dict_to_array(leader_joints_dict, self.motor_names)
+
+        if follower_joints is None or leader_joints is None:
+            # Cannot compute delta this step; expose a zero-action so downstream
+            # InterventionActionProcessorStep does not propagate stale joints.
+            complementary_data[TELEOP_ACTION_KEY] = self._zero_action()
+            new_transition[TransitionKey.COMPLEMENTARY_DATA] = complementary_data
+            return new_transition
+
+        p_leader = self.kinematics.forward_kinematics(leader_joints)[:3, 3]
+        p_follower = self.kinematics.forward_kinematics(follower_joints)[:3, 3]
+
+        delta = p_leader - p_follower
+        delta_norm = np.array(
+            [
+                delta[0] / max(self.end_effector_step_sizes.get("x", 1.0), 1e-6),
+                delta[1] / max(self.end_effector_step_sizes.get("y", 1.0), 1e-6),
+                delta[2] / max(self.end_effector_step_sizes.get("z", 1.0), 1e-6),
+            ],
+            dtype=float,
+        )
+        delta_norm = np.clip(delta_norm, -1.0, 1.0)
+
+        teleop_action: dict[str, float] = {
+            "delta_x": float(delta_norm[0]),
+            "delta_y": float(delta_norm[1]),
+            "delta_z": float(delta_norm[2]),
+        }
+
+        if self.use_gripper:
+            leader_gripper = float(leader_joints_dict.get(f"{GRIPPER_KEY}.pos", 50.0))
+            teleop_action[GRIPPER_KEY] = self._discretise_gripper(leader_gripper)
+
+        complementary_data[TELEOP_ACTION_KEY] = teleop_action
+        new_transition[TransitionKey.COMPLEMENTARY_DATA] = complementary_data
+        return new_transition
+
+    def _read_follower_joints_dict(
+        self, transition: EnvTransition, complementary_data: dict[str, Any]
+    ) -> dict[str, float] | None:
+        """Best-effort read of the follower joints from the transition.
+
+        Tries (in order):
+        1. ``complementary_data["raw_joint_positions"]`` (set after env.step).
+        2. ``transition[OBSERVATION]`` if it is a flat ``"<motor>.pos"`` dict
+           (this is the convention used by ``step_env_and_process_transition``
+           when staging an action transition).
+
+        Returns the source dict if all expected motors are present, else
+        ``None``. We return the *dict* (not the array) because we want to feed
+        it back to ``teleop_device.send_action`` for haptic follow.
+        """
+        raw = complementary_data.get(RAW_JOINT_POSITIONS_KEY)
+        if isinstance(raw, dict) and all(f"{m}.pos" in raw for m in self.motor_names):
+            return raw  # type: ignore[return-value]
+
+        observation = transition.get(TransitionKey.OBSERVATION)
+        if isinstance(observation, dict) and all(f"{m}.pos" in observation for m in self.motor_names):
+            return observation  # type: ignore[return-value]
+
+        return None
+
+    def _push_haptic_follow(self, follower_joints_dict: dict[str, float]) -> None:
+        """Send the follower's joints back to the leader for haptic follow.
+
+        Errors are logged once and swallowed -- a failed haptic update must
+        never break the policy / learner loop.
+        """
+        if self.teleop_device is None:
+            return
+        send_action = getattr(self.teleop_device, "send_action", None)
+        if send_action is None:
+            return
+        try:
+            send_action(follower_joints_dict)
+        except NotImplementedError:
+            # Plain SOLeader / unsupported teleop -- silently disable haptic follow.
+            self.teleop_device = None
+        except Exception as e:  # pragma: no cover - hardware path
+            logger.warning(f"[LeaderArmInterventionStep] haptic follow failed: {e}")
+
+    def _discretise_gripper(self, leader_gripper_pos: float) -> float:
+        """Map a leader gripper position in ``[0, 100]`` to ``{0, 1, 2}``."""
+        if leader_gripper_pos >= self.leader_gripper_open:
+            return GRIPPER_OPEN
+        if leader_gripper_pos <= self.leader_gripper_close:
+            return GRIPPER_CLOSE
+        return GRIPPER_STAY
+
+    def _zero_action(self) -> dict[str, float]:
+        out: dict[str, float] = {"delta_x": 0.0, "delta_y": 0.0, "delta_z": 0.0}
+        if self.use_gripper:
+            out[GRIPPER_KEY] = GRIPPER_STAY
+        return out
+
+    def get_config(self) -> dict[str, Any]:
+        # `kinematics` and `teleop_device` are runtime objects (not JSON-serializable)
+        # and are re-injected by `gym_manipulator.make_processors`, so they are
+        # intentionally omitted from the saved config.
+        return {
+            "motor_names": list(self.motor_names),
+            "end_effector_step_sizes": dict(self.end_effector_step_sizes),
+            "use_gripper": self.use_gripper,
+            "leader_gripper_open": self.leader_gripper_open,
+            "leader_gripper_close": self.leader_gripper_close,
+        }
+
+    def reset(self) -> None:
+        self._initial_follower_joints = None
+
+    def transform_features(
+        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
+    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
+        return features
@@ -134,6 +134,15 @@ class _NormalizationMixin:
        if self.dtype is None:
            self.dtype = torch.float32
        self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)
+        self._reshape_visual_stats()
+
+    def _reshape_visual_stats(self) -> None:
+        """Reshape visual stats from ``[C]`` to ``[C, 1, 1]`` for image broadcasting."""
+        for key, feature in self.features.items():
+            if feature.type == FeatureType.VISUAL and key in self._tensor_stats:
+                for stat_name, stat_tensor in self._tensor_stats[key].items():
+                    if isinstance(stat_tensor, Tensor) and stat_tensor.ndim == 1:
+                        self._tensor_stats[key][stat_name] = stat_tensor.reshape(-1, 1, 1)

    def to(
        self, device: torch.device | str | None = None, dtype: torch.dtype | None = None
@@ -152,6 +161,7 @@ class _NormalizationMixin:
        if dtype is not None:
            self.dtype = dtype
        self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)
+        self._reshape_visual_stats()
        return self

    def state_dict(self) -> dict[str, Tensor]:
@@ -201,6 +211,7 @@ class _NormalizationMixin:
            # Don't load from state_dict, keep the explicitly provided stats
            # But ensure _tensor_stats is properly initialized
            self._tensor_stats = to_tensor(self.stats, device=self.device, dtype=self.dtype)  # type: ignore[assignment]
+            self._reshape_visual_stats()
            return

        # Normal behavior: load stats from state_dict
@@ -211,6 +222,7 @@ class _NormalizationMixin:
            self._tensor_stats.setdefault(key, {})[stat_name] = tensor.to(
                dtype=torch.float32, device=self.device
            )
+        self._reshape_visual_stats()

        # Reconstruct the original stats dict from tensor stats for compatibility with to() method
        # and other functions that rely on self.stats
@@ -12,23 +12,33 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-"""
-Reinforcement learning modules.
+"""Reinforcement learning modules.

-Requires: ``pip install 'lerobot[hilserl]'``
-
-Available modules (import directly)::
-
-    from lerobot.rl.actor import ...
-    from lerobot.rl.learner import ...
-    from lerobot.rl.learner_service import ...
-    from lerobot.rl.buffer import ...
-    from lerobot.rl.eval_policy import ...
-    from lerobot.rl.gym_manipulator import ...
+Distributed actor / learner entry points (``actor``, ``learner``,
+``learner_service``) require ``pip install 'lerobot[hilserl]'``. Algorithms,
+buffer, data sources and trainer are gRPC-free and usable standalone.
 """

-from lerobot.utils.import_utils import require_package
+from .algorithms.base import RLAlgorithm as RLAlgorithm
+from .algorithms.configs import RLAlgorithmConfig as RLAlgorithmConfig, TrainingStats as TrainingStats
+from .algorithms.factory import (
+    make_algorithm as make_algorithm,
+    make_algorithm_config as make_algorithm_config,
+)
+from .algorithms.sac.configuration_sac import SACAlgorithmConfig as SACAlgorithmConfig
+from .buffer import ReplayBuffer as ReplayBuffer
+from .data_sources import DataMixer as DataMixer, OnlineOfflineMixer as OnlineOfflineMixer
+from .trainer import RLTrainer as RLTrainer

-require_package("grpcio", extra="hilserl", import_name="grpc")
-
-__all__: list[str] = []
+__all__ = [
+    "RLAlgorithm",
+    "RLAlgorithmConfig",
+    "TrainingStats",
+    "make_algorithm",
+    "make_algorithm_config",
+    "SACAlgorithmConfig",
+    "RLTrainer",
+    "ReplayBuffer",
+    "DataMixer",
+    "OnlineOfflineMixer",
+]
@@ -51,17 +51,20 @@ import os
 import time
 from functools import lru_cache
 from queue import Empty
+from typing import Any

 import grpc
 import torch
 from torch import nn
-from torch.multiprocessing import Event, Queue
+from torch.multiprocessing import Queue

 from lerobot.cameras import opencv  # noqa: F401
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
-from lerobot.policies import make_policy
-from lerobot.policies.sac.modeling_sac import SACPolicy
+from lerobot.policies import PreTrainedPolicy, make_policy, make_pre_post_processors
+from lerobot.processor import TransitionKey
+from lerobot.rl.process import ProcessSignalHandler
+from lerobot.rl.queue import get_last_item_from_queue
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
 from lerobot.teleoperators.utils import TeleopEvents
@@ -74,14 +77,11 @@ from lerobot.transport.utils import (
    send_bytes_in_chunks,
    transitions_to_bytes,
 )
-from lerobot.types import TransitionKey
 from lerobot.utils.device_utils import get_safe_torch_device
-from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
 from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.transition import (
    Transition,
-    move_state_dict_to_device,
    move_transition_to_device,
 )
 from lerobot.utils.utils import (
@@ -90,12 +90,11 @@ from lerobot.utils.utils import (
 )

 from .gym_manipulator import (
-    create_transition,
    make_processors,
    make_robot_env,
+    reset_and_build_transition,
    step_env_and_process_transition,
 )
-from .queue import get_last_item_from_queue

 # Main entry point

@@ -212,7 +211,7 @@ def actor_cli(cfg: TrainRLServerPipelineConfig):

 def act_with_policy(
    cfg: TrainRLServerPipelineConfig,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    parameters_queue: Queue,
    transitions_queue: Queue,
    interactions_queue: Queue,
@@ -252,22 +251,21 @@ def act_with_policy(
    logging.info("make_policy")

    ### Instantiate the policy in both the actor and learner processes
-    ### To avoid sending a SACPolicy object through the port, we create a policy instance
+    ### To avoid sending a policy object through the port, we create a policy instance
    ### on both sides, the learner sends the updated parameters every n steps to update the actor's parameters
-    policy: SACPolicy = make_policy(
+    policy = make_policy(
        cfg=cfg.policy,
        env_cfg=cfg.env,
    )
-    policy = policy.eval()
+    policy = policy.to(device).eval()
    assert isinstance(policy, nn.Module)

-    obs, info = online_env.reset()
-    env_processor.reset()
-    action_processor.reset()
+    preprocessor, postprocessor = make_pre_post_processors(
+        policy_cfg=cfg.policy,
+        dataset_stats=cfg.policy.dataset_stats,
+    )

-    # Process initial observation
-    transition = create_transition(observation=obs, info=info)
-    transition = env_processor(transition)
+    transition = reset_and_build_transition(online_env, env_processor, action_processor)

    # NOTE: For the moment we will solely handle the case of a single environment
    sum_reward_episode = 0
@@ -291,8 +289,21 @@ def act_with_policy(

        # Time policy inference and check if it meets FPS requirement
        with policy_timer:
-            # Extract observation from transition for policy
-            action = policy.select_action(batch=observation)
+            normalized_observation = preprocessor.process_observation(observation)
+            action = policy.select_action(batch=normalized_observation)
+            # Unnormalize only the continuous part. When `num_discrete_actions` is set,
+            # `select_action` concatenates an argmax index in env space at the last dim;
+            # action stats cover the continuous dims only, so feeding the full vector to
+            # the unnormalizer would shape-mismatch and would also corrupt the discrete
+            # index by treating it as a normalized value.
+            if cfg.policy.num_discrete_actions is not None:
+                continuous_action = postprocessor.process_action(action[..., :-1])
+                discrete_action = action[..., -1:].to(
+                    device=continuous_action.device, dtype=continuous_action.dtype
+                )
+                action = torch.cat([continuous_action, discrete_action], dim=-1)
+            else:
+                action = postprocessor.process_action(action)
        policy_fps = policy_timer.fps_last

        log_policy_frequency_issue(policy_fps=policy_fps, cfg=cfg, interaction_step=interaction_step)
@@ -326,7 +337,8 @@ def act_with_policy(

        # Check for intervention from transition info
        intervention_info = new_transition[TransitionKey.INFO]
-        if intervention_info.get(TeleopEvents.IS_INTERVENTION, False):
+        is_intervention = bool(intervention_info.get(TeleopEvents.IS_INTERVENTION, False))
+        if is_intervention:
            episode_intervention = True
            episode_intervention_steps += 1

@@ -334,6 +346,10 @@ def act_with_policy(
            "discrete_penalty": torch.tensor(
                [new_transition[TransitionKey.COMPLEMENTARY_DATA].get("discrete_penalty", 0.0)]
            ),
+            # Forward the intervention flag so the learner can route this transition
+            # into the offline replay buffer (see `process_transitions` in learner.py).
+            # Use the plain string key so the payload survives torch.load(weights_only=True).
+            TeleopEvents.IS_INTERVENTION.value: is_intervention,
        }
        # Create transition for learner (convert to old format)
        list_transition_to_send_to_learner.append(
@@ -390,14 +406,7 @@ def act_with_policy(
            episode_intervention_steps = 0
            episode_total_steps = 0

-            # Reset environment and processors
-            obs, info = online_env.reset()
-            env_processor.reset()
-            action_processor.reset()
-
-            # Process initial observation
-            transition = create_transition(observation=obs, info=info)
-            transition = env_processor(transition)
+            transition = reset_and_build_transition(online_env, env_processor, action_processor)

        if cfg.env.fps is not None:
            dt_time = time.perf_counter() - start_time
@@ -409,7 +418,7 @@ def act_with_policy(

 def establish_learner_connection(
    stub: services_pb2_grpc.LearnerServiceStub,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    attempts: int = 30,
 ):
    """Establish a connection with the learner.
@@ -461,7 +470,7 @@ def learner_service_client(
 def receive_policy(
    cfg: TrainRLServerPipelineConfig,
    parameters_queue: Queue,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ):
@@ -513,7 +522,7 @@ def receive_policy(
 def send_transitions(
    cfg: TrainRLServerPipelineConfig,
    transitions_queue: Queue,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ) -> services_pb2.Empty:
@@ -563,7 +572,7 @@ def send_transitions(
 def send_interactions(
    cfg: TrainRLServerPipelineConfig,
    interactions_queue: Queue,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ) -> services_pb2.Empty:
@@ -613,7 +622,11 @@ def send_interactions(
    logging.info("[ACTOR] Interactions process stopped")


-def transitions_stream(shutdown_event: Event, transitions_queue: Queue, timeout: float) -> services_pb2.Empty:  # type: ignore
+def transitions_stream(
+    shutdown_event: Any,  # Event
+    transitions_queue: Queue,
+    timeout: float,
+) -> services_pb2.Empty:
    while not shutdown_event.is_set():
        try:
            message = transitions_queue.get(block=True, timeout=timeout)
@@ -629,9 +642,9 @@ def transitions_stream(shutdown_event: Event, transitions_queue: Queue, timeout:


 def interactions_stream(
-    shutdown_event: Event,
+    shutdown_event: Any,  # Event
    interactions_queue: Queue,
-    timeout: float,  # type: ignore
+    timeout: float,
 ) -> services_pb2.Empty:
    while not shutdown_event.is_set():
        try:
@@ -652,7 +665,7 @@ def interactions_stream(
 #  Policy functions


-def update_policy_parameters(policy: SACPolicy, parameters_queue: Queue, device):
+def update_policy_parameters(policy: PreTrainedPolicy, parameters_queue: Queue, device):
    bytes_state_dict = get_last_item_from_queue(parameters_queue, block=False)
    if bytes_state_dict is not None:
        logging.info("[ACTOR] Load new parameters from Learner.")
@@ -667,18 +680,7 @@ def update_policy_parameters(policy: SACPolicy, parameters_queue: Queue, device)
        # - Send critic's encoder state when shared_encoder=True
        # - Skip encoder params entirely when freeze_vision_encoder=True
        # - Ensure discrete_critic gets correct encoder state (currently uses encoder_critic)
-
-        # Load actor state dict
-        actor_state_dict = move_state_dict_to_device(state_dicts["policy"], device=device)
-        policy.actor.load_state_dict(actor_state_dict)
-
-        # Load discrete critic if present
-        if hasattr(policy, "discrete_critic") and "discrete_critic" in state_dicts:
-            discrete_critic_state_dict = move_state_dict_to_device(
-                state_dicts["discrete_critic"], device=device
-            )
-            policy.discrete_critic.load_state_dict(discrete_critic_state_dict)
-            logging.info("[ACTOR] Loaded discrete critic parameters from Learner.")
+        policy.load_actor_weights(state_dicts, device=device)


 #  Utilities functions
@@ -0,0 +1,20 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .sac import SACAlgorithm as SACAlgorithm, SACAlgorithmConfig as SACAlgorithmConfig
+
+__all__ = [
+    "SACAlgorithm",
+    "SACAlgorithmConfig",
+]
@@ -0,0 +1,106 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+from collections.abc import Iterator
+from typing import TYPE_CHECKING, Any
+
+import torch
+from torch.optim import Optimizer
+
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig, TrainingStats
+
+if TYPE_CHECKING:
+    from lerobot.rl.data_sources.data_mixer import DataMixer
+
+BatchType = dict[str, Any]
+
+
+class RLAlgorithm(abc.ABC):
+    """Base for all RL algorithms."""
+
+    config_class: type[RLAlgorithmConfig] | None = None
+    name: str | None = None
+
+    def __init_subclass__(cls, **kwargs):
+        super().__init_subclass__(**kwargs)
+        if not getattr(cls, "config_class", None):
+            raise TypeError(f"Class {cls.__name__} must define 'config_class'")
+        if not getattr(cls, "name", None):
+            raise TypeError(f"Class {cls.__name__} must define 'name'")
+
+    @abc.abstractmethod
+    def update(self, batch_iterator: Iterator[BatchType]) -> TrainingStats:
+        """One complete training step.
+
+        The algorithm calls ``next(batch_iterator)`` as many times as it
+        needs (e.g. ``utd_ratio`` times for SAC) to obtain fresh batches.
+        The iterator is owned by the trainer; the algorithm just consumes
+        from it.
+        """
+        ...
+
+    def configure_data_iterator(
+        self,
+        data_mixer: DataMixer,
+        batch_size: int,
+        *,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ) -> Iterator[BatchType]:
+        """Create the data iterator this algorithm needs.
+
+        The default implementation uses the standard ``data_mixer.get_iterator()``.
+        Algorithms that need specialised sampling should override this method.
+        """
+        return data_mixer.get_iterator(
+            batch_size=batch_size,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+    def make_optimizers_and_scheduler(self) -> dict[str, Optimizer]:
+        """Create, store, and return the optimizers needed for training.
+
+        Called on the **learner** side after construction.  Subclasses must
+        override this with algorithm-specific optimizer setup.
+        """
+        return {}
+
+    def get_optimizers(self) -> dict[str, Optimizer]:
+        """Return optimizers for checkpointing / external scheduling."""
+        return {}
+
+    @property
+    def optimization_step(self) -> int:
+        """Current learner optimization step.
+
+        Part of the stable contract for checkpoint/resume. Algorithms can
+        either use this default storage or override for custom behavior.
+        """
+        return getattr(self, "_optimization_step", 0)
+
+    @optimization_step.setter
+    def optimization_step(self, value: int) -> None:
+        self._optimization_step = int(value)
+
+    def get_weights(self) -> dict[str, Any]:
+        """Policy state-dict to push to actors."""
+        return {}
+
+    @abc.abstractmethod
+    def load_weights(self, weights: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        """Load policy state-dict received from the learner."""
@@ -0,0 +1,76 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Any
+
+import draccus
+import torch
+
+if TYPE_CHECKING:
+    from lerobot.rl.algorithms.base import RLAlgorithm
+
+
+@dataclass
+class TrainingStats:
+    """Returned by ``algorithm.update()`` for logging and checkpointing."""
+
+    losses: dict[str, float] = field(default_factory=dict)
+    grad_norms: dict[str, float] = field(default_factory=dict)
+    extra: dict[str, float] = field(default_factory=dict)
+
+    def to_log_dict(self) -> dict[str, float]:
+        """Flatten all stats into a single dict for logging."""
+
+        d: dict[str, float] = {}
+        for name, val in self.losses.items():
+            d[name] = val
+        for name, val in self.grad_norms.items():
+            d[f"{name}_grad_norm"] = val
+        for name, val in self.extra.items():
+            d[name] = val
+        return d
+
+
+@dataclass
+class RLAlgorithmConfig(draccus.ChoiceRegistry, abc.ABC):
+    """Registry for algorithm configs."""
+
+    @property
+    def type(self) -> str:
+        """Registered name of this algorithm config (e.g. ``"sac"``)."""
+        choice_name = self.get_choice_name(self.__class__)
+        if not isinstance(choice_name, str):
+            raise TypeError(f"Expected string from get_choice_name, got {type(choice_name)}")
+        return choice_name
+
+    @abc.abstractmethod
+    def build_algorithm(self, policy: torch.nn.Module) -> RLAlgorithm:
+        """Construct the :class:`RLAlgorithm` for this config.
+
+        Must be overridden by every registered config subclass.
+        """
+        raise NotImplementedError(f"{type(self).__name__} must implement build_algorithm()")
+
+    @classmethod
+    @abc.abstractmethod
+    def from_policy_config(cls, policy_cfg: Any) -> RLAlgorithmConfig:
+        """Build an algorithm config from a policy config.
+
+        Must be overridden by every registered config subclass.
+        """
+        raise NotImplementedError(f"{cls.__name__} must implement from_policy_config()")
@@ -0,0 +1,47 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import torch
+
+from lerobot.rl.algorithms.base import RLAlgorithm
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+
+
+def make_algorithm_config(algorithm_type: str, **kwargs) -> RLAlgorithmConfig:
+    """Instantiate an :class:`RLAlgorithmConfig` from its registered type name.
+
+    Args:
+        algorithm_type: Registry key of the algorithm (e.g. ``"sac"``).
+        **kwargs: Keyword arguments forwarded to the config class constructor.
+
+    Returns:
+        An instance of the matching ``RLAlgorithmConfig`` subclass.
+
+    Raises:
+        ValueError: If ``algorithm_type`` is not registered.
+    """
+    try:
+        cls = RLAlgorithmConfig.get_choice_class(algorithm_type)
+    except KeyError as err:
+        raise ValueError(
+            f"Algorithm type '{algorithm_type}' is not registered. "
+            f"Available: {list(RLAlgorithmConfig.get_known_choices().keys())}"
+        ) from err
+    return cls(**kwargs)
+
+
+def make_algorithm(cfg: RLAlgorithmConfig, policy: torch.nn.Module) -> RLAlgorithm:
+    return cfg.build_algorithm(policy)
@@ -0,0 +1,18 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from lerobot.rl.algorithms.sac.configuration_sac import SACAlgorithmConfig
+from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+__all__ = ["SACAlgorithm", "SACAlgorithmConfig"]
@@ -0,0 +1,90 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING
+
+import torch
+
+from lerobot.policies.gaussian_actor.configuration_gaussian_actor import (
+    CriticNetworkConfig,
+    GaussianActorConfig,
+)
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+
+if TYPE_CHECKING:
+    from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+
+@RLAlgorithmConfig.register_subclass("sac")
+@dataclass
+class SACAlgorithmConfig(RLAlgorithmConfig):
+    """SAC algorithm hyperparameters."""
+
+    # Optimizer learning rates
+    actor_lr: float = 3e-4
+    critic_lr: float = 3e-4
+    temperature_lr: float = 3e-4
+
+    # Bellman update
+    discount: float = 0.99
+    use_backup_entropy: bool = True
+    critic_target_update_weight: float = 0.005
+
+    # Critic ensemble
+    num_critics: int = 2
+    num_subsample_critics: int | None = None
+    critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
+    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
+
+    # Temperature / entropy
+    temperature_init: float = 1.0
+    # Target entropy for automatic temperature tuning. If ``None``, defaults to
+    # ``-|A|/2`` where ``|A|`` is the total action dimension (continuous + 1 if
+    # there is a discrete action head).
+    target_entropy: float | None = None
+
+    # Update loop
+    utd_ratio: int = 1
+    policy_update_freq: int = 1
+    grad_clip_norm: float = 40.0
+
+    # Optimizations
+    # torch.compile is currently disabled by default
+    use_torch_compile: bool = False
+
+    # Policy config
+    policy_config: GaussianActorConfig | None = None
+
+    @classmethod
+    def from_policy_config(cls, policy_cfg: GaussianActorConfig) -> SACAlgorithmConfig:
+        """Build an algorithm config with default hyperparameters for a given policy."""
+        return cls(
+            policy_config=policy_cfg,
+            discrete_critic_network_kwargs=policy_cfg.discrete_critic_network_kwargs,
+        )
+
+    def build_algorithm(self, policy: torch.nn.Module) -> SACAlgorithm:
+        if self.policy_config is None:
+            raise ValueError(
+                "SACAlgorithmConfig.policy_config is None. "
+                "It must be populated (typically by TrainRLServerPipelineConfig.validate) "
+                "before calling build_algorithm()."
+            )
+
+        from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+        return SACAlgorithm(policy=policy, config=self)
@@ -0,0 +1,595 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import math
+from collections.abc import Callable, Iterator
+from dataclasses import asdict
+from typing import Any
+
+import einops
+import torch
+import torch.nn as nn
+import torch.nn.functional as F  # noqa: N812
+from torch import Tensor
+from torch.optim import Optimizer
+
+from lerobot.policies.gaussian_actor.modeling_gaussian_actor import (
+    DISCRETE_DIMENSION_INDEX,
+    MLP,
+    DiscreteCritic,
+    GaussianActorObservationEncoder,
+    GaussianActorPolicy,
+    orthogonal_init,
+)
+from lerobot.policies.utils import get_device_from_parameters
+from lerobot.rl.algorithms.base import BatchType, RLAlgorithm
+from lerobot.rl.algorithms.configs import TrainingStats
+from lerobot.rl.algorithms.sac.configuration_sac import SACAlgorithmConfig
+from lerobot.utils.constants import ACTION
+from lerobot.utils.transition import move_state_dict_to_device
+
+
+class SACAlgorithm(RLAlgorithm):
+    """Soft Actor-Critic. Owns critics, targets, temperature, and loss computation."""
+
+    config_class = SACAlgorithmConfig
+    name = "sac"
+
+    def __init__(
+        self,
+        policy: GaussianActorPolicy,
+        config: SACAlgorithmConfig,
+    ):
+        self.config = config
+        self.policy_config = config.policy_config
+        self.policy = policy
+        self.optimizers: dict[str, Optimizer] = {}
+        self._optimization_step: int = 0
+
+        action_dim = self.policy.config.output_features[ACTION].shape[0]
+        self._init_critics(action_dim)
+        self._init_temperature(action_dim)
+
+        self._device = torch.device(self.policy.config.device)
+        self._move_to_device()
+
+    def _init_critics(self, action_dim) -> None:
+        """Build critic ensemble, targets."""
+        encoder = self.policy.encoder_critic
+
+        heads = [
+            CriticHead(
+                input_dim=encoder.output_dim + action_dim,
+                **asdict(self.config.critic_network_kwargs),
+            )
+            for _ in range(self.config.num_critics)
+        ]
+        self.critic_ensemble = CriticEnsemble(encoder=encoder, ensemble=heads)
+        target_heads = [
+            CriticHead(
+                input_dim=encoder.output_dim + action_dim,
+                **asdict(self.config.critic_network_kwargs),
+            )
+            for _ in range(self.config.num_critics)
+        ]
+        self.critic_target = CriticEnsemble(encoder=encoder, ensemble=target_heads)
+        self.critic_target.load_state_dict(self.critic_ensemble.state_dict())
+
+        # TODO(Khalil): Investigate and fix torch.compile
+        # NOTE: torch.compile is disabled, policy does not converge when enabled.
+        if self.config.use_torch_compile:
+            self.critic_ensemble = torch.compile(self.critic_ensemble)
+            self.critic_target = torch.compile(self.critic_target)
+
+        self.discrete_critic_target = None
+        if self.policy_config.num_discrete_actions is not None:
+            self.discrete_critic_target = self._init_discrete_critic_target(encoder)
+
+    def _init_discrete_critic_target(self, encoder: GaussianActorObservationEncoder) -> DiscreteCritic:
+        """Build target discrete critic (main network is owned by the policy)."""
+        discrete_critic_target = DiscreteCritic(
+            encoder=encoder,
+            input_dim=encoder.output_dim,
+            output_dim=self.policy_config.num_discrete_actions,
+            **asdict(self.config.discrete_critic_network_kwargs),
+        )
+        # TODO(Khalil): Compile the discrete critic
+        discrete_critic_target.load_state_dict(self.policy.discrete_critic.state_dict())
+        return discrete_critic_target
+
+    def _init_temperature(self, continuous_action_dim: int) -> None:
+        """Set up temperature parameter (log_alpha) and target entropy."""
+        temp_init = self.config.temperature_init
+        self.log_alpha = nn.Parameter(torch.tensor([math.log(temp_init)]))
+
+        self.target_entropy = self.config.target_entropy
+        if self.target_entropy is None:
+            total_action_dim = continuous_action_dim + (
+                1 if self.policy_config.num_discrete_actions is not None else 0
+            )
+            self.target_entropy = -total_action_dim / 2
+
+    def _move_to_device(self) -> None:
+        self.policy.to(self._device)
+        self.critic_ensemble.to(self._device)
+        self.critic_target.to(self._device)
+        self.log_alpha = nn.Parameter(self.log_alpha.data.to(self._device))
+        if self.discrete_critic_target is not None:
+            self.discrete_critic_target.to(self._device)
+
+    @property
+    def temperature(self) -> float:
+        """Return the current temperature value, always in sync with log_alpha."""
+        return self.log_alpha.exp().item()
+
+    def _critic_forward(
+        self,
+        observations: dict[str, Tensor],
+        actions: Tensor,
+        use_target: bool = False,
+        observation_features: Tensor | None = None,
+    ) -> Tensor:
+        """Forward pass through a critic network ensemble
+
+        Args:
+            observations: Dictionary of observations
+            actions: Action tensor
+            use_target: If True, use target critics, otherwise use ensemble critics
+
+        Returns:
+            Tensor of Q-values from all critics
+        """
+
+        critics = self.critic_target if use_target else self.critic_ensemble
+        q_values = critics(observations, actions, observation_features)
+        return q_values
+
+    def _discrete_critic_forward(
+        self, observations, use_target=False, observation_features=None
+    ) -> torch.Tensor:
+        """Forward pass through a discrete critic network
+
+        Args:
+            observations: Dictionary of observations
+            use_target: If True, use target critics, otherwise use ensemble critics
+            observation_features: Optional pre-computed observation features to avoid recomputing encoder output
+
+        Returns:
+            Tensor of Q-values from the discrete critic network
+        """
+        discrete_critic = self.discrete_critic_target if use_target else self.policy.discrete_critic
+        q_values = discrete_critic(observations, observation_features)
+        return q_values
+
+    def update(self, batch_iterator: Iterator[BatchType]) -> TrainingStats:
+        clip = self.config.grad_clip_norm
+
+        for _ in range(self.config.utd_ratio - 1):
+            batch = next(batch_iterator)
+            fb = self._prepare_forward_batch(batch, include_complementary_info=True)
+
+            loss_critic = self._compute_loss_critic(fb)
+            self.optimizers["critic"].zero_grad()
+            loss_critic.backward()
+            torch.nn.utils.clip_grad_norm_(self.critic_ensemble.parameters(), max_norm=clip)
+            self.optimizers["critic"].step()
+
+            if self.policy_config.num_discrete_actions is not None:
+                loss_dc = self._compute_loss_discrete_critic(fb)
+                self.optimizers["discrete_critic"].zero_grad()
+                loss_dc.backward()
+                torch.nn.utils.clip_grad_norm_(self.policy.discrete_critic.parameters(), max_norm=clip)
+                self.optimizers["discrete_critic"].step()
+
+            self._update_target_networks()
+
+        batch = next(batch_iterator)
+        fb = self._prepare_forward_batch(batch, include_complementary_info=False)
+
+        loss_critic = self._compute_loss_critic(fb)
+        self.optimizers["critic"].zero_grad()
+        loss_critic.backward()
+        critic_grad = torch.nn.utils.clip_grad_norm_(self.critic_ensemble.parameters(), max_norm=clip).item()
+        self.optimizers["critic"].step()
+
+        stats = TrainingStats(
+            losses={"loss_critic": loss_critic.item()},
+            grad_norms={"critic": critic_grad},
+        )
+
+        if self.policy_config.num_discrete_actions is not None:
+            loss_dc = self._compute_loss_discrete_critic(fb)
+            self.optimizers["discrete_critic"].zero_grad()
+            loss_dc.backward()
+            dc_grad = torch.nn.utils.clip_grad_norm_(
+                self.policy.discrete_critic.parameters(), max_norm=clip
+            ).item()
+            self.optimizers["discrete_critic"].step()
+            stats.losses["loss_discrete_critic"] = loss_dc.item()
+            stats.grad_norms["discrete_critic"] = dc_grad
+
+        if self._optimization_step % self.config.policy_update_freq == 0:
+            for _ in range(self.config.policy_update_freq):
+                loss_actor = self._compute_loss_actor(fb)
+                self.optimizers["actor"].zero_grad()
+                loss_actor.backward()
+                actor_grad = torch.nn.utils.clip_grad_norm_(
+                    self.policy.actor.parameters(), max_norm=clip
+                ).item()
+                self.optimizers["actor"].step()
+
+                loss_temp = self._compute_loss_temperature(fb)
+                self.optimizers["temperature"].zero_grad()
+                loss_temp.backward()
+                temp_grad = torch.nn.utils.clip_grad_norm_([self.log_alpha], max_norm=clip).item()
+                self.optimizers["temperature"].step()
+
+            stats.losses["loss_actor"] = loss_actor.item()
+            stats.losses["loss_temperature"] = loss_temp.item()
+            stats.grad_norms["actor"] = actor_grad
+            stats.grad_norms["temperature"] = temp_grad
+            stats.extra["temperature"] = self.temperature
+
+        self._update_target_networks()
+        self._optimization_step += 1
+        return stats
+
+    def _compute_loss_critic(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        actions = batch[ACTION]
+        rewards = batch["reward"]
+        next_observations = batch["next_state"]
+        done = batch["done"]
+        observation_features = batch.get("observation_feature")
+        next_observation_features = batch.get("next_observation_feature")
+
+        with torch.no_grad():
+            next_action_preds, next_log_probs, _ = self.policy.actor(
+                next_observations, next_observation_features
+            )
+
+            # 2- compute q targets
+            q_targets = self._critic_forward(
+                observations=next_observations,
+                actions=next_action_preds,
+                use_target=True,
+                observation_features=next_observation_features,
+            )
+
+            # subsample critics to prevent overfitting if use high UTD (update to date)
+            # TODO: Get indices before forward pass to avoid unnecessary computation
+            if self.config.num_subsample_critics is not None:
+                indices = torch.randperm(self.config.num_critics)
+                indices = indices[: self.config.num_subsample_critics]
+                q_targets = q_targets[indices]
+
+            # critics subsample size
+            min_q, _ = q_targets.min(dim=0)  # Get values from min operation
+            if self.config.use_backup_entropy:
+                min_q = min_q - (self.temperature * next_log_probs)
+
+            td_target = rewards + (1 - done) * self.config.discount * min_q
+
+        # 3- compute predicted qs
+        if self.policy_config.num_discrete_actions is not None:
+            # NOTE: We only want to keep the continuous action part
+            # In the buffer we have the full action space (continuous + discrete)
+            # We need to split them before concatenating them in the critic forward
+            actions: Tensor = actions[:, :DISCRETE_DIMENSION_INDEX]
+        q_preds = self._critic_forward(
+            observations=observations,
+            actions=actions,
+            use_target=False,
+            observation_features=observation_features,
+        )
+
+        # 4- Calculate loss
+        # Compute state-action value loss (TD loss) for all of the Q functions in the ensemble.
+        td_target_duplicate = einops.repeat(td_target, "b -> e b", e=q_preds.shape[0])
+        # You compute the mean loss of the batch for each critic and then to compute the final loss you sum them up
+        critics_loss = (
+            F.mse_loss(
+                input=q_preds,
+                target=td_target_duplicate,
+                reduction="none",
+            ).mean(dim=1)
+        ).sum()
+        return critics_loss
+
+    def _compute_loss_discrete_critic(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        actions = batch[ACTION]
+        rewards = batch["reward"]
+        next_observations = batch["next_state"]
+        done = batch["done"]
+        observation_features = batch.get("observation_feature")
+        next_observation_features = batch.get("next_observation_feature")
+        complementary_info = batch.get("complementary_info")
+
+        # NOTE: We only want to keep the discrete action part
+        # In the buffer we have the full action space (continuous + discrete)
+        # We need to split them before concatenating them in the critic forward
+        actions_discrete: Tensor = actions[:, DISCRETE_DIMENSION_INDEX:].clone()
+        actions_discrete = torch.round(actions_discrete)
+        actions_discrete = actions_discrete.long()
+
+        discrete_penalties: Tensor | None = None
+        if complementary_info is not None:
+            discrete_penalties = complementary_info.get("discrete_penalty")
+
+        with torch.no_grad():
+            # For DQN, select actions using online network, evaluate with target network
+            next_discrete_qs = self._discrete_critic_forward(
+                next_observations, use_target=False, observation_features=next_observation_features
+            )
+            best_next_discrete_action = torch.argmax(next_discrete_qs, dim=-1, keepdim=True)
+
+            # Get target Q-values from target network
+            target_next_discrete_qs = self._discrete_critic_forward(
+                observations=next_observations,
+                use_target=True,
+                observation_features=next_observation_features,
+            )
+
+            # Use gather to select Q-values for best actions
+            target_next_discrete_q = torch.gather(
+                target_next_discrete_qs, dim=1, index=best_next_discrete_action
+            ).squeeze(-1)
+
+            # Compute target Q-value with Bellman equation
+            rewards_discrete = rewards
+            if discrete_penalties is not None:
+                rewards_discrete = rewards + discrete_penalties
+            target_discrete_q = rewards_discrete + (1 - done) * self.config.discount * target_next_discrete_q
+
+        # Get predicted Q-values for current observations
+        predicted_discrete_qs = self._discrete_critic_forward(
+            observations=observations, use_target=False, observation_features=observation_features
+        )
+
+        # Use gather to select Q-values for taken actions
+        predicted_discrete_q = torch.gather(predicted_discrete_qs, dim=1, index=actions_discrete).squeeze(-1)
+
+        # Compute MSE loss between predicted and target Q-values
+        discrete_critic_loss = F.mse_loss(input=predicted_discrete_q, target=target_discrete_q)
+        return discrete_critic_loss
+
+    def _compute_loss_actor(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        observation_features = batch.get("observation_feature")
+
+        actions_pi, log_probs, _ = self.policy.actor(observations, observation_features)
+
+        q_preds = self._critic_forward(
+            observations=observations,
+            actions=actions_pi,
+            use_target=False,
+            observation_features=observation_features,
+        )
+        min_q_preds = q_preds.min(dim=0)[0]
+
+        actor_loss = ((self.temperature * log_probs) - min_q_preds).mean()
+        return actor_loss
+
+    def _compute_loss_temperature(self, batch: dict[str, Any]) -> Tensor:
+        """Compute the temperature loss"""
+        observations = batch["state"]
+        observation_features = batch.get("observation_feature")
+
+        # calculate temperature loss
+        with torch.no_grad():
+            _, log_probs, _ = self.policy.actor(observations, observation_features)
+
+        temperature_loss = (-self.log_alpha.exp() * (log_probs + self.target_entropy)).mean()
+        return temperature_loss
+
+    def _update_target_networks(self) -> None:
+        """Update target networks with exponential moving average"""
+        for target_p, p in zip(
+            self.critic_target.parameters(), self.critic_ensemble.parameters(), strict=True
+        ):
+            target_p.data.copy_(
+                p.data * self.config.critic_target_update_weight
+                + target_p.data * (1.0 - self.config.critic_target_update_weight)
+            )
+        if self.policy_config.num_discrete_actions is not None:
+            for target_p, p in zip(
+                self.discrete_critic_target.parameters(),
+                self.policy.discrete_critic.parameters(),
+                strict=True,
+            ):
+                target_p.data.copy_(
+                    p.data * self.config.critic_target_update_weight
+                    + target_p.data * (1.0 - self.config.critic_target_update_weight)
+                )
+
+    def _prepare_forward_batch(
+        self, batch: BatchType, *, include_complementary_info: bool = True
+    ) -> dict[str, Any]:
+        observations = batch["state"]
+        next_observations = batch["next_state"]
+        observation_features, next_observation_features = self.get_observation_features(
+            observations, next_observations
+        )
+        forward_batch: dict[str, Any] = {
+            ACTION: batch[ACTION],
+            "reward": batch["reward"],
+            "state": observations,
+            "next_state": next_observations,
+            "done": batch["done"],
+            "observation_feature": observation_features,
+            "next_observation_feature": next_observation_features,
+        }
+        if include_complementary_info and "complementary_info" in batch:
+            forward_batch["complementary_info"] = batch["complementary_info"]
+        return forward_batch
+
+    def make_optimizers_and_scheduler(self) -> dict[str, Optimizer]:
+        """
+        Creates and returns optimizers for the actor, critic, and temperature components of a reinforcement learning policy.
+
+        This function sets up Adam optimizers for:
+        - The **actor network**, ensuring that only relevant parameters are optimized.
+        - The **critic ensemble**, which evaluates the value function.
+        - The **temperature parameter**, which controls the entropy in soft actor-critic (SAC)-like methods.
+
+        It also initializes a learning rate scheduler, though currently, it is set to `None`.
+
+        NOTE:
+        - If the encoder is shared, its parameters are excluded from the actor's optimization process.
+        - The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
+
+        Args:
+            cfg: Configuration object containing hyperparameters.
+            policy (nn.Module): The policy model containing the actor, critic, and temperature components.
+
+        Returns:
+            A dictionary mapping component names ("actor", "critic", "temperature")
+            to their respective Adam optimizers.
+        """
+        actor_params = self.policy.get_optim_params()["actor"]
+        self.optimizers = {
+            "actor": torch.optim.Adam(actor_params, lr=self.config.actor_lr),
+            "critic": torch.optim.Adam(self.critic_ensemble.parameters(), lr=self.config.critic_lr),
+            "temperature": torch.optim.Adam([self.log_alpha], lr=self.config.temperature_lr),
+        }
+        if self.policy_config.num_discrete_actions is not None:
+            self.optimizers["discrete_critic"] = torch.optim.Adam(
+                self.policy.discrete_critic.parameters(), lr=self.config.critic_lr
+            )
+        return self.optimizers
+
+    def get_optimizers(self) -> dict[str, Optimizer]:
+        return self.optimizers
+
+    def get_weights(self) -> dict[str, Any]:
+        """Send actor + discrete-critic state dicts."""
+        state_dicts: dict[str, Any] = {
+            "policy": move_state_dict_to_device(self.policy.actor.state_dict(), device="cpu"),
+        }
+        if self.policy_config.num_discrete_actions is not None:
+            state_dicts["discrete_critic"] = move_state_dict_to_device(
+                self.policy.discrete_critic.state_dict(), device="cpu"
+            )
+        return state_dicts
+
+    def load_weights(self, weights: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        """Load actor + discrete-critic weights into the policy."""
+        self.policy.load_actor_weights(weights, device=device)
+
+    def get_observation_features(
+        self, observations: Tensor, next_observations: Tensor
+    ) -> tuple[Tensor | None, Tensor | None]:
+        """
+        Get observation features from the policy encoder. It act as cache for the observation features.
+        when the encoder is frozen, the observation features are not updated.
+        We can save compute by caching the observation features.
+
+        Args:
+            policy: The policy model
+            observations: The current observations
+            next_observations: The next observations
+
+        Returns:
+            tuple: observation_features, next_observation_features
+        """
+
+        if self.policy.config.vision_encoder_name is None or not self.policy.config.freeze_vision_encoder:
+            return None, None
+
+        with torch.no_grad():
+            observation_features = self.policy.actor.encoder.get_cached_image_features(observations)
+            next_observation_features = self.policy.actor.encoder.get_cached_image_features(next_observations)
+
+        return observation_features, next_observation_features
+
+
+class CriticHead(nn.Module):
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dims: list[int],
+        activations: Callable[[torch.Tensor], torch.Tensor] | str = nn.SiLU(),
+        activate_final: bool = False,
+        dropout_rate: float | None = None,
+        init_final: float | None = None,
+        final_activation: Callable[[torch.Tensor], torch.Tensor] | str | None = None,
+    ):
+        super().__init__()
+        self.net = MLP(
+            input_dim=input_dim,
+            hidden_dims=hidden_dims,
+            activations=activations,
+            activate_final=activate_final,
+            dropout_rate=dropout_rate,
+            final_activation=final_activation,
+        )
+        self.output_layer = nn.Linear(in_features=hidden_dims[-1], out_features=1)
+        if init_final is not None:
+            nn.init.uniform_(self.output_layer.weight, -init_final, init_final)
+            nn.init.uniform_(self.output_layer.bias, -init_final, init_final)
+        else:
+            orthogonal_init()(self.output_layer.weight)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.output_layer(self.net(x))
+
+
+class CriticEnsemble(nn.Module):
+    """
+    CriticEnsemble wraps multiple CriticHead modules into an ensemble.
+
+    Args:
+        encoder (GaussianActorObservationEncoder): encoder for observations.
+        ensemble (List[CriticHead]): list of critic heads.
+        init_final (float | None): optional initializer scale for final layers.
+
+    Forward returns a tensor of shape (num_critics, batch_size) containing Q-values.
+    """
+
+    def __init__(
+        self,
+        encoder: GaussianActorObservationEncoder,
+        ensemble: list[CriticHead],
+        init_final: float | None = None,
+    ):
+        super().__init__()
+        self.encoder = encoder
+        self.init_final = init_final
+        self.critics = nn.ModuleList(ensemble)
+
+    def forward(
+        self,
+        observations: dict[str, torch.Tensor],
+        actions: torch.Tensor,
+        observation_features: torch.Tensor | None = None,
+    ) -> torch.Tensor:
+        device = get_device_from_parameters(self)
+        # Move each tensor in observations to device
+        observations = {k: v.to(device) for k, v in observations.items()}
+
+        obs_enc = self.encoder(observations, cache=observation_features)
+
+        inputs = torch.cat([obs_enc, actions], dim=-1)
+
+        # Loop through critics and collect outputs
+        q_values = []
+        for critic in self.critics:
+            q_values.append(critic(inputs))
+
+        # Stack outputs to match expected shape [num_critics, batch_size]
+        q_values = torch.stack([q.squeeze(-1) for q in q_values], dim=0)
+        return q_values
@@ -97,8 +97,8 @@ class ReplayBuffer:
        Args:
            capacity (int): Maximum number of transitions to store in the buffer.
            device (str): The device where the tensors will be moved when sampling ("cuda:0" or "cpu").
-            state_keys (List[str]): The list of keys that appear in `state` and `next_state`.
-            image_augmentation_function (Optional[Callable]): A function that takes a batch of images
+            state_keys (list[str]): The list of keys that appear in `state` and `next_state`.
+            image_augmentation_function (Callable | None): A function that takes a batch of images
                and returns a batch of augmented images. If None, a default augmentation function is used.
            use_drq (bool): Whether to use the default DRQ image augmentation style, when sampling in the buffer.
            storage_device: The device (e.g. "cpu" or "cuda:0") where the data will be stored.
@@ -634,7 +634,7 @@ class ReplayBuffer:
                If None, you must handle or define default keys.

        Returns:
-            transitions (List[Transition]):
+            transitions (list[Transition]):
                A list of Transition dictionaries with the same length as `dataset`.
        """
        if state_keys is None:
@@ -176,11 +176,11 @@ def convert_lerobot_dataset_to_cropped_lerobot_dataset(

    Args:
        original_dataset (LeRobotDataset): The source dataset.
-        crop_params_dict (Dict[str, Tuple[int, int, int, int]]):
+        crop_params_dict (dict[str, Tuple[int, int, int, int]]):
            A dictionary mapping observation keys to crop parameters (top, left, height, width).
        new_repo_id (str): Repository id for the new dataset.
        new_dataset_root (str): The root directory where the new dataset will be written.
-        resize_size (Tuple[int, int], optional): The target size (height, width) after cropping.
+        resize_size (tuple[int, int], optional): The target size (height, width) after cropping.
            Defaults to (128, 128).

    Returns:
@@ -0,0 +1,17 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .data_mixer import BatchType, DataMixer, OnlineOfflineMixer
+
+__all__ = ["BatchType", "DataMixer", "OnlineOfflineMixer"]
@@ -0,0 +1,112 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+
+from lerobot.rl.algorithms.base import BatchType
+from lerobot.rl.buffer import ReplayBuffer, concatenate_batch_transitions
+
+
+class DataMixer(abc.ABC):
+    """Abstract interface for all data mixing strategies.
+
+    Subclasses must implement ``sample(batch_size)`` and may override
+    ``get_iterator`` for specialised iteration.
+    """
+
+    @abc.abstractmethod
+    def sample(self, batch_size: int) -> BatchType:
+        """Draw one batch of ``batch_size`` transitions."""
+        ...
+
+    def get_iterator(
+        self,
+        batch_size: int,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ):
+        """Infinite iterator that yields batches.
+
+        The default implementation repeatedly calls ``self.sample()``.
+        Subclasses with underlying buffer iterators (async prefetch)
+        should override this for better throughput.
+        """
+        while True:
+            yield self.sample(batch_size)
+
+
+class OnlineOfflineMixer(DataMixer):
+    """Mixes transitions from an online and an optional offline replay buffer.
+
+    When both buffers are present, each batch is constructed by sampling
+    ``ceil(batch_size * online_ratio)`` from the online buffer and the
+    remainder from the offline buffer, then concatenating.
+
+    This mixer assumes both online and offline buffers are present.
+    """
+
+    def __init__(
+        self,
+        online_buffer: ReplayBuffer,
+        offline_buffer: ReplayBuffer | None = None,
+        online_ratio: float = 1.0,
+    ):
+        if not 0.0 <= online_ratio <= 1.0:
+            raise ValueError(f"online_ratio must be in [0, 1], got {online_ratio}")
+        self.online_buffer = online_buffer
+        self.offline_buffer = offline_buffer
+        self.online_ratio = online_ratio
+
+    def sample(self, batch_size: int) -> BatchType:
+        if self.offline_buffer is None:
+            return self.online_buffer.sample(batch_size)
+
+        n_online = max(1, int(batch_size * self.online_ratio))
+        n_offline = batch_size - n_online
+
+        online_batch = self.online_buffer.sample(n_online)
+        offline_batch = self.offline_buffer.sample(n_offline)
+        return concatenate_batch_transitions(online_batch, offline_batch)
+
+    def get_iterator(
+        self,
+        batch_size: int,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ):
+        """Yield batches by composing buffer async iterators."""
+
+        n_online = max(1, int(batch_size * self.online_ratio))
+
+        online_iter = self.online_buffer.get_iterator(
+            batch_size=n_online,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+        if self.offline_buffer is None:
+            yield from online_iter
+            return
+
+        n_offline = batch_size - n_online
+        offline_iter = self.offline_buffer.get_iterator(
+            batch_size=n_offline,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+        while True:
+            yield concatenate_batch_transitions(next(online_iter), next(offline_iter))
@@ -17,9 +17,9 @@ import logging

 from lerobot.cameras import opencv  # noqa: F401
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
 from lerobot.datasets import LeRobotDataset
 from lerobot.policies import make_policy
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
 from lerobot.robots import (  # noqa: F401
    RobotConfig,
    make_robot_from_config,
@@ -39,6 +39,7 @@ from lerobot.processor import (
    GymHILAdapterProcessorStep,
    ImageCropResizeProcessorStep,
    InterventionActionProcessorStep,
+    LeaderArmInterventionStep,
    MapDeltaActionToRobotActionStep,
    MapTensorToDeltaActionDictStep,
    Numpy2TorchActionProcessorStep,
@@ -383,10 +384,21 @@ def make_processors(
            GymHILAdapterProcessorStep(),
            Numpy2TorchActionProcessorStep(),
            VanillaObservationProcessorStep(),
-            AddBatchDimensionProcessorStep(),
-            DeviceProcessorStep(device=device),
        ]

+        # Add time limit processor if reset config exists
+        if cfg.processor.reset is not None:
+            env_pipeline_steps.append(
+                TimeLimitProcessorStep(max_episode_steps=int(cfg.processor.reset.control_time_s * cfg.fps))
+            )
+
+        env_pipeline_steps.extend(
+            [
+                AddBatchDimensionProcessorStep(),
+                DeviceProcessorStep(device=device),
+            ]
+        )
+
        return DataProcessorPipeline(
            steps=env_pipeline_steps, to_transition=identity_transition, to_output=identity_transition
        ), DataProcessorPipeline(
@@ -470,15 +482,41 @@ def make_processors(
    env_pipeline_steps.append(AddBatchDimensionProcessorStep())
    env_pipeline_steps.append(DeviceProcessorStep(device=device))

-    action_pipeline_steps = [
+    action_pipeline_steps: list = [
        AddTeleopActionAsComplimentaryDataStep(teleop_device=teleop_device),
        AddTeleopEventsAsInfoStep(teleop_device=teleop_device),
-        InterventionActionProcessorStep(
-            use_gripper=cfg.processor.gripper.use_gripper if cfg.processor.gripper is not None else False,
-            terminate_on_success=terminate_on_success,
-        ),
    ]

+    use_gripper_for_intervention = (
+        cfg.processor.gripper.use_gripper if cfg.processor.gripper is not None else False
+    )
+
+    # Leader-arm intervention: convert raw leader joints in `teleop_action`
+    # into a 4-D EE-delta dict before the override step consumes it. The same
+    # step also drives haptic follow on the leader (when `teleop_device` is a
+    # `SOLeaderFollower`) by pushing the follower joints back via send_action.
+    if (
+        getattr(cfg.processor, "control_mode", "gamepad") == "leader"
+        and cfg.processor.inverse_kinematics is not None
+        and kinematics_solver is not None
+    ):
+        action_pipeline_steps.append(
+            LeaderArmInterventionStep(
+                kinematics=kinematics_solver,
+                motor_names=motor_names,
+                end_effector_step_sizes=cfg.processor.inverse_kinematics.end_effector_step_sizes,
+                teleop_device=teleop_device,
+                use_gripper=use_gripper_for_intervention,
+            )
+        )
+
+    action_pipeline_steps.append(
+        InterventionActionProcessorStep(
+            use_gripper=use_gripper_for_intervention,
+            terminate_on_success=terminate_on_success,
+        )
+    )
+
    # Replace InverseKinematicsProcessor with new kinematic processors
    if cfg.processor.inverse_kinematics is not None and kinematics_solver is not None:
        # Add EE bounds and safety processor
@@ -551,8 +589,19 @@ def step_env_and_process_transition(
    terminated = terminated or processed_action_transition[TransitionKey.DONE]
    truncated = truncated or processed_action_transition[TransitionKey.TRUNCATED]
    complementary_data = processed_action_transition[TransitionKey.COMPLEMENTARY_DATA].copy()
+
+    if hasattr(env, "get_raw_joint_positions"):
+        raw_joint_positions = env.get_raw_joint_positions()
+        if raw_joint_positions is not None:
+            complementary_data["raw_joint_positions"] = raw_joint_positions
+
+    # Merge env and action-processor info: env wins for str keys, action-processor
+    # wins for `TeleopEvents` enum keys
+    action_info = processed_action_transition[TransitionKey.INFO]
    new_info = info.copy()
-    new_info.update(processed_action_transition[TransitionKey.INFO])
+    for key, value in action_info.items():
+        if isinstance(key, TeleopEvents):
+            new_info[key] = value

    new_transition = create_transition(
        observation=obs,
@@ -568,6 +617,24 @@ def step_env_and_process_transition(
    return new_transition


+def reset_and_build_transition(
+    env: gym.Env,
+    env_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
+    action_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
+) -> EnvTransition:
+    """Reset env + processors and return the first env-processed transition."""
+    obs, info = env.reset()
+    env_processor.reset()
+    action_processor.reset()
+    complementary_data: dict[str, Any] = {}
+    if hasattr(env, "get_raw_joint_positions"):
+        raw_joint_positions = env.get_raw_joint_positions()
+        if raw_joint_positions is not None:
+            complementary_data["raw_joint_positions"] = raw_joint_positions
+    transition = create_transition(observation=obs, info=info, complementary_data=complementary_data)
+    return env_processor(data=transition)
+
+
 def control_loop(
    env: gym.Env,
    env_processor: DataProcessorPipeline[EnvTransition, EnvTransition],
@@ -593,17 +660,7 @@ def control_loop(
    print("- When not intervening, robot will stay still")
    print("- Press Ctrl+C to exit")

-    # Reset environment and processors
-    obs, info = env.reset()
-    complementary_data = (
-        {"raw_joint_positions": info.pop("raw_joint_positions")} if "raw_joint_positions" in info else {}
-    )
-    env_processor.reset()
-    action_processor.reset()
-
-    # Process initial observation
-    transition = create_transition(observation=obs, info=info, complementary_data=complementary_data)
-    transition = env_processor(data=transition)
+    transition = reset_and_build_transition(env, env_processor, action_processor)

    # Determine if gripper is used
    use_gripper = cfg.env.processor.gripper.use_gripper if cfg.env.processor.gripper is not None else True
@@ -659,79 +716,81 @@ def control_loop(
    episode_step = 0
    episode_start_time = time.perf_counter()

-    while episode_idx < cfg.dataset.num_episodes_to_record:
-        step_start_time = time.perf_counter()
+    try:
+        while episode_idx < cfg.dataset.num_episodes_to_record:
+            step_start_time = time.perf_counter()

-        # Create a neutral action (no movement)
-        neutral_action = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32)
-        if use_gripper:
-            neutral_action = torch.cat([neutral_action, torch.tensor([0.0])])  # Gripper stay
-
-        # Use the new step function
-        transition = step_env_and_process_transition(
-            env=env,
-            transition=transition,
-            action=neutral_action,
-            env_processor=env_processor,
-            action_processor=action_processor,
-        )
-        terminated = transition.get(TransitionKey.DONE, False)
-        truncated = transition.get(TransitionKey.TRUNCATED, False)
-
-        if cfg.mode == "record":
-            observations = {
-                k: v.squeeze(0).cpu()
-                for k, v in transition[TransitionKey.OBSERVATION].items()
-                if isinstance(v, torch.Tensor)
-            }
-            # Use teleop_action if available, otherwise use the action from the transition
-            action_to_record = transition[TransitionKey.COMPLEMENTARY_DATA].get(
-                "teleop_action", transition[TransitionKey.ACTION]
-            )
-            frame = {
-                **observations,
-                ACTION: action_to_record.cpu(),
-                REWARD: np.array([transition[TransitionKey.REWARD]], dtype=np.float32),
-                DONE: np.array([terminated or truncated], dtype=bool),
-            }
+            # Create a neutral action (no movement)
+            neutral_action = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32)
            if use_gripper:
-                discrete_penalty = transition[TransitionKey.COMPLEMENTARY_DATA].get("discrete_penalty", 0.0)
-                frame["complementary_info.discrete_penalty"] = np.array([discrete_penalty], dtype=np.float32)
+                neutral_action = torch.cat([neutral_action, torch.tensor([1.0])])  # Gripper stay

-            if dataset is not None:
-                frame["task"] = cfg.dataset.task
-                dataset.add_frame(frame)
-
-        episode_step += 1
-
-        # Handle episode termination
-        if terminated or truncated:
-            episode_time = time.perf_counter() - episode_start_time
-            logging.info(
-                f"Episode ended after {episode_step} steps in {episode_time:.1f}s with reward {transition[TransitionKey.REWARD]}"
+            transition = step_env_and_process_transition(
+                env=env,
+                transition=transition,
+                action=neutral_action,
+                env_processor=env_processor,
+                action_processor=action_processor,
            )
-            episode_step = 0
-            episode_idx += 1
+            terminated = transition.get(TransitionKey.DONE, False)
+            truncated = transition.get(TransitionKey.TRUNCATED, False)

-            if dataset is not None:
-                if transition[TransitionKey.INFO].get(TeleopEvents.RERECORD_EPISODE, False):
-                    logging.info(f"Re-recording episode {episode_idx}")
-                    dataset.clear_episode_buffer()
-                    episode_idx -= 1
-                else:
-                    logging.info(f"Saving episode {episode_idx}")
-                    dataset.save_episode()
+            if cfg.mode == "record":
+                observations = {
+                    k: v.squeeze(0).cpu()
+                    for k, v in transition[TransitionKey.OBSERVATION].items()
+                    if isinstance(v, torch.Tensor)
+                }
+                action_to_record = transition[TransitionKey.COMPLEMENTARY_DATA].get(
+                    "teleop_action", transition[TransitionKey.ACTION]
+                )
+                frame = {
+                    **observations,
+                    ACTION: action_to_record.cpu(),
+                    REWARD: np.array([transition[TransitionKey.REWARD]], dtype=np.float32),
+                    DONE: np.array([terminated or truncated], dtype=bool),
+                }
+                if use_gripper:
+                    discrete_penalty = transition[TransitionKey.COMPLEMENTARY_DATA].get(
+                        "discrete_penalty", 0.0
+                    )
+                    frame["complementary_info.discrete_penalty"] = np.array(
+                        [discrete_penalty], dtype=np.float32
+                    )

-            # Reset for new episode
-            obs, info = env.reset()
-            env_processor.reset()
-            action_processor.reset()
+                if dataset is not None:
+                    frame["task"] = cfg.dataset.task
+                    dataset.add_frame(frame)

-            transition = create_transition(observation=obs, info=info)
-            transition = env_processor(transition)
+            episode_step += 1

-        # Maintain fps timing
-        precise_sleep(max(dt - (time.perf_counter() - step_start_time), 0.0))
+            # Handle episode termination
+            if terminated or truncated:
+                episode_time = time.perf_counter() - episode_start_time
+                logging.info(
+                    f"Episode ended after {episode_step} steps in {episode_time:.1f}s with reward {transition[TransitionKey.REWARD]}"
+                )
+                episode_step = 0
+                episode_idx += 1
+
+                if dataset is not None:
+                    if transition[TransitionKey.INFO].get(TeleopEvents.RERECORD_EPISODE, False):
+                        logging.info(f"Re-recording episode {episode_idx}")
+                        dataset.clear_episode_buffer()
+                        episode_idx -= 1
+                    else:
+                        logging.info(f"Saving episode {episode_idx}")
+                        dataset.save_episode()
+
+                # Reset for new episode
+                transition = reset_and_build_transition(env, env_processor, action_processor)
+
+            # Maintain fps timing
+            precise_sleep(max(dt - (time.perf_counter() - step_start_time), 0.0))
+    finally:
+        if dataset is not None and dataset.writer is not None and dataset.writer.image_writer is not None:
+            logging.info("Waiting for image writer to finish...")
+            dataset.writer.image_writer.stop()

    if dataset is not None and cfg.dataset.push_to_hub:
        logging.info("Finalizing dataset before pushing to hub")
@@ -51,6 +51,7 @@ import time
 from concurrent.futures import ThreadPoolExecutor
 from pathlib import Path
 from pprint import pformat
+from typing import Any

 import grpc
 import torch
@@ -68,10 +69,15 @@ from lerobot.common.train_utils import (
 )
 from lerobot.common.wandb_utils import WandBLogger
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
 from lerobot.datasets import LeRobotDataset, make_dataset
-from lerobot.policies import make_policy
-from lerobot.policies.sac.modeling_sac import SACPolicy
+from lerobot.policies import make_policy, make_pre_post_processors
+from lerobot.rl.algorithms.base import RLAlgorithm
+from lerobot.rl.algorithms.factory import make_algorithm
+from lerobot.rl.buffer import ReplayBuffer
+from lerobot.rl.data_sources import OnlineOfflineMixer
+from lerobot.rl.process import ProcessSignalHandler
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
+from lerobot.rl.trainer import RLTrainer
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
 from lerobot.teleoperators.utils import TeleopEvents
@@ -90,15 +96,12 @@ from lerobot.utils.constants import (
    TRAINING_STATE_DIR,
 )
 from lerobot.utils.device_utils import get_safe_torch_device
-from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
-from lerobot.utils.transition import move_state_dict_to_device, move_transition_to_device
 from lerobot.utils.utils import (
    format_big_number,
    init_logging,
 )

-from .buffer import ReplayBuffer, concatenate_batch_transitions
 from .learner_service import MAX_WORKERS, SHUTDOWN_TIMEOUT, LearnerService


@@ -179,7 +182,7 @@ def train(cfg: TrainRLServerPipelineConfig, job_name: str | None = None):
 def start_learner_threads(
    cfg: TrainRLServerPipelineConfig,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
 ) -> None:
    """
    Start the learner threads for training.
@@ -253,7 +256,7 @@ def start_learner_threads(
 def add_actor_information_and_train(
    cfg: TrainRLServerPipelineConfig,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    transition_queue: Queue,
    interaction_message_queue: Queue,
    parameters_queue: Queue,
@@ -266,8 +269,8 @@ def add_actor_information_and_train(
    - Transfers transitions from the actor to the replay buffer.
    - Logs received interaction messages.
    - Ensures training begins only when the replay buffer has a sufficient number of transitions.
-    - Samples batches from the replay buffer and performs multiple critic updates.
-    - Periodically updates the actor, critic, and temperature optimizers.
+    - Delegates training updates to an ``RLAlgorithm``.
+    - Periodically pushes updated weights to actors.
    - Logs training statistics, including loss values and optimization frequency.

    NOTE: This function doesn't have a single responsibility, it should be split into multiple functions
@@ -286,17 +289,13 @@ def add_actor_information_and_train(
    # of 7%
    device = get_safe_torch_device(try_device=cfg.policy.device, log=True)
    storage_device = get_safe_torch_device(try_device=cfg.policy.storage_device)
-    clip_grad_norm_value = cfg.policy.grad_clip_norm
    online_step_before_learning = cfg.policy.online_step_before_learning
-    utd_ratio = cfg.policy.utd_ratio
    fps = cfg.env.fps
    log_freq = cfg.log_freq
    save_freq = cfg.save_freq
-    policy_update_freq = cfg.policy.policy_update_freq
    policy_parameters_push_frequency = cfg.policy.actor_learner_config.policy_parameters_push_frequency
    saving_checkpoint = cfg.save_checkpoint
    online_steps = cfg.policy.online_steps
-    async_prefetch = cfg.policy.async_prefetch

    # Initialize logging for multiprocessing
    if not use_threads(cfg):
@@ -308,7 +307,7 @@ def add_actor_information_and_train(

    logging.info("Initializing policy")

-    policy: SACPolicy = make_policy(
+    policy = make_policy(
        cfg=cfg.policy,
        env_cfg=cfg.env,
    )
@@ -317,15 +316,17 @@ def add_actor_information_and_train(

    policy.train()

-    push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)
+    algorithm = make_algorithm(cfg=cfg.algorithm, policy=policy)

+    preprocessor, postprocessor = make_pre_post_processors(
+        policy_cfg=cfg.policy,
+        dataset_stats=cfg.policy.dataset_stats,
+    )
+
+    # Push initial policy weights to actors
+    push_actor_policy_to_queue(parameters_queue=parameters_queue, algorithm=algorithm)
    last_time_policy_pushed = time.time()

-    optimizers, lr_scheduler = make_optimizers_and_scheduler(cfg=cfg, policy=policy)
-
-    # If we are resuming, we need to load the training state
-    resume_optimization_step, resume_interaction_step = load_training_state(cfg=cfg, optimizers=optimizers)
-
    log_training_info(cfg=cfg, policy=policy)

    replay_buffer = initialize_replay_buffer(cfg, device, storage_device)
@@ -338,21 +339,35 @@ def add_actor_information_and_train(
            device=device,
            storage_device=storage_device,
        )
-        batch_size: int = batch_size // 2  # We will sample from both replay buffer
+
+    # DataMixer: online-only or online/offline 50-50 mix
+    data_mixer = OnlineOfflineMixer(
+        online_buffer=replay_buffer,
+        offline_buffer=offline_replay_buffer,
+        online_ratio=cfg.online_ratio,
+    )
+    # RLTrainer owns the iterator, preprocessor, and creates optimizers.
+    trainer = RLTrainer(
+        algorithm=algorithm,
+        data_mixer=data_mixer,
+        batch_size=batch_size,
+        preprocessor=preprocessor,
+    )
+
+    # If we are resuming, we need to load the training state
+    optimizers = algorithm.get_optimizers()
+    resume_optimization_step, resume_interaction_step = load_training_state(cfg=cfg, optimizers=optimizers)

    logging.info("Starting learner thread")
    interaction_message = None
    optimization_step = resume_optimization_step if resume_optimization_step is not None else 0
+    algorithm.optimization_step = optimization_step
    interaction_step_shift = resume_interaction_step if resume_interaction_step is not None else 0

    dataset_repo_id = None
    if cfg.dataset is not None:
        dataset_repo_id = cfg.dataset.repo_id

-    # Initialize iterators
-    online_iterator = None
-    offline_iterator = None
-
    # NOTE: THIS IS THE MAIN LOOP OF THE LEARNER
    while True:
        # Exit the training loop if shutdown is requested
@@ -365,7 +380,6 @@ def add_actor_information_and_train(
            transition_queue=transition_queue,
            replay_buffer=replay_buffer,
            offline_replay_buffer=offline_replay_buffer,
-            device=device,
            dataset_repo_id=dataset_repo_id,
            shutdown_event=shutdown_event,
        )
@@ -382,180 +396,20 @@ def add_actor_information_and_train(
        if len(replay_buffer) < online_step_before_learning:
            continue

-        if online_iterator is None:
-            online_iterator = replay_buffer.get_iterator(
-                batch_size=batch_size, async_prefetch=async_prefetch, queue_size=2
-            )
-
-        if offline_replay_buffer is not None and offline_iterator is None:
-            offline_iterator = offline_replay_buffer.get_iterator(
-                batch_size=batch_size, async_prefetch=async_prefetch, queue_size=2
-            )
-
        time_for_one_optimization_step = time.time()
-        for _ in range(utd_ratio - 1):
-            # Sample from the iterators
-            batch = next(online_iterator)

-            if dataset_repo_id is not None:
-                batch_offline = next(offline_iterator)
-                batch = concatenate_batch_transitions(
-                    left_batch_transitions=batch, right_batch_transition=batch_offline
-                )
-
-            actions = batch[ACTION]
-            rewards = batch["reward"]
-            observations = batch["state"]
-            next_observations = batch["next_state"]
-            done = batch["done"]
-            check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
-
-            observation_features, next_observation_features = get_observation_features(
-                policy=policy, observations=observations, next_observations=next_observations
-            )
-
-            # Create a batch dictionary with all required elements for the forward method
-            forward_batch = {
-                ACTION: actions,
-                "reward": rewards,
-                "state": observations,
-                "next_state": next_observations,
-                "done": done,
-                "observation_feature": observation_features,
-                "next_observation_feature": next_observation_features,
-                "complementary_info": batch["complementary_info"],
-            }
-
-            # Use the forward method for critic loss
-            critic_output = policy.forward(forward_batch, model="critic")
-
-            # Main critic optimization
-            loss_critic = critic_output["loss_critic"]
-            optimizers["critic"].zero_grad()
-            loss_critic.backward()
-            critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                parameters=policy.critic_ensemble.parameters(), max_norm=clip_grad_norm_value
-            )
-            optimizers["critic"].step()
-
-            # Discrete critic optimization (if available)
-            if policy.config.num_discrete_actions is not None:
-                discrete_critic_output = policy.forward(forward_batch, model="discrete_critic")
-                loss_discrete_critic = discrete_critic_output["loss_discrete_critic"]
-                optimizers["discrete_critic"].zero_grad()
-                loss_discrete_critic.backward()
-                discrete_critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=policy.discrete_critic.parameters(), max_norm=clip_grad_norm_value
-                )
-                optimizers["discrete_critic"].step()
-
-            # Update target networks (main and discrete)
-            policy.update_target_networks()
-
-        # Sample for the last update in the UTD ratio
-        batch = next(online_iterator)
-
-        if dataset_repo_id is not None:
-            batch_offline = next(offline_iterator)
-            batch = concatenate_batch_transitions(
-                left_batch_transitions=batch, right_batch_transition=batch_offline
-            )
-
-        actions = batch[ACTION]
-        rewards = batch["reward"]
-        observations = batch["state"]
-        next_observations = batch["next_state"]
-        done = batch["done"]
-
-        check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
-
-        observation_features, next_observation_features = get_observation_features(
-            policy=policy, observations=observations, next_observations=next_observations
-        )
-
-        # Create a batch dictionary with all required elements for the forward method
-        forward_batch = {
-            ACTION: actions,
-            "reward": rewards,
-            "state": observations,
-            "next_state": next_observations,
-            "done": done,
-            "observation_feature": observation_features,
-            "next_observation_feature": next_observation_features,
-        }
-
-        critic_output = policy.forward(forward_batch, model="critic")
-
-        loss_critic = critic_output["loss_critic"]
-        optimizers["critic"].zero_grad()
-        loss_critic.backward()
-        critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-            parameters=policy.critic_ensemble.parameters(), max_norm=clip_grad_norm_value
-        ).item()
-        optimizers["critic"].step()
-
-        # Initialize training info dictionary
-        training_infos = {
-            "loss_critic": loss_critic.item(),
-            "critic_grad_norm": critic_grad_norm,
-        }
-
-        # Discrete critic optimization (if available)
-        if policy.config.num_discrete_actions is not None:
-            discrete_critic_output = policy.forward(forward_batch, model="discrete_critic")
-            loss_discrete_critic = discrete_critic_output["loss_discrete_critic"]
-            optimizers["discrete_critic"].zero_grad()
-            loss_discrete_critic.backward()
-            discrete_critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                parameters=policy.discrete_critic.parameters(), max_norm=clip_grad_norm_value
-            ).item()
-            optimizers["discrete_critic"].step()
-
-            # Add discrete critic info to training info
-            training_infos["loss_discrete_critic"] = loss_discrete_critic.item()
-            training_infos["discrete_critic_grad_norm"] = discrete_critic_grad_norm
-
-        # Actor and temperature optimization (at specified frequency)
-        if optimization_step % policy_update_freq == 0:
-            for _ in range(policy_update_freq):
-                # Actor optimization
-                actor_output = policy.forward(forward_batch, model="actor")
-                loss_actor = actor_output["loss_actor"]
-                optimizers["actor"].zero_grad()
-                loss_actor.backward()
-                actor_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=policy.actor.parameters(), max_norm=clip_grad_norm_value
-                ).item()
-                optimizers["actor"].step()
-
-                # Add actor info to training info
-                training_infos["loss_actor"] = loss_actor.item()
-                training_infos["actor_grad_norm"] = actor_grad_norm
-
-                # Temperature optimization
-                temperature_output = policy.forward(forward_batch, model="temperature")
-                loss_temperature = temperature_output["loss_temperature"]
-                optimizers["temperature"].zero_grad()
-                loss_temperature.backward()
-                temp_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=[policy.log_alpha], max_norm=clip_grad_norm_value
-                ).item()
-                optimizers["temperature"].step()
-
-                # Add temperature info to training info
-                training_infos["loss_temperature"] = loss_temperature.item()
-                training_infos["temperature_grad_norm"] = temp_grad_norm
-                training_infos["temperature"] = policy.temperature
+        # One training step (trainer owns data_mixer iterator; algorithm owns UTD loop)
+        stats = trainer.training_step()

        # Push policy to actors if needed
        if time.time() - last_time_policy_pushed > policy_parameters_push_frequency:
-            push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)
+            push_actor_policy_to_queue(parameters_queue=parameters_queue, algorithm=algorithm)
            last_time_policy_pushed = time.time()

-        # Update target networks (main and discrete)
-        policy.update_target_networks()
+        training_infos = stats.to_log_dict()

        # Log training metrics at specified intervals
+        optimization_step = algorithm.optimization_step
        if optimization_step % log_freq == 0:
            training_infos["replay_buffer_size"] = len(replay_buffer)
            if offline_replay_buffer is not None:
@@ -583,7 +437,6 @@ def add_actor_information_and_train(
                custom_step_key="Optimization step",
            )

-        optimization_step += 1
        if optimization_step % log_freq == 0:
            logging.info(f"[LEARNER] Number of optimization step: {optimization_step}")

@@ -600,6 +453,8 @@ def add_actor_information_and_train(
                offline_replay_buffer=offline_replay_buffer,
                dataset_repo_id=dataset_repo_id,
                fps=fps,
+                preprocessor=preprocessor,
+                postprocessor=postprocessor,
            )


@@ -607,7 +462,7 @@ def start_learner(
    parameters_queue: Queue,
    transition_queue: Queue,
    interaction_message_queue: Queue,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    cfg: TrainRLServerPipelineConfig,
 ):
    """
@@ -684,6 +539,8 @@ def save_training_checkpoint(
    offline_replay_buffer: ReplayBuffer | None = None,
    dataset_repo_id: str | None = None,
    fps: int = 30,
+    preprocessor=None,
+    postprocessor=None,
 ) -> None:
    """
    Save training checkpoint and associated data.
@@ -707,6 +564,8 @@ def save_training_checkpoint(
        offline_replay_buffer: Optional offline replay buffer to save
        dataset_repo_id: Repository ID for dataset
        fps: Frames per second for dataset
+        preprocessor: Optional preprocessor pipeline to save
+        postprocessor: Optional postprocessor pipeline to save
    """
    logging.info(f"Checkpoint policy after step {optimization_step}")
    _num_digits = max(6, len(str(online_steps)))
@@ -723,6 +582,8 @@ def save_training_checkpoint(
        policy=policy,
        optimizer=optimizers,
        scheduler=None,
+        preprocessor=preprocessor,
+        postprocessor=postprocessor,
    )

    # Save interaction step manually
@@ -760,58 +621,6 @@ def save_training_checkpoint(
    logging.info("Resume training")


-def make_optimizers_and_scheduler(cfg: TrainRLServerPipelineConfig, policy: nn.Module):
-    """
-    Creates and returns optimizers for the actor, critic, and temperature components of a reinforcement learning policy.
-
-    This function sets up Adam optimizers for:
-    - The **actor network**, ensuring that only relevant parameters are optimized.
-    - The **critic ensemble**, which evaluates the value function.
-    - The **temperature parameter**, which controls the entropy in soft actor-critic (SAC)-like methods.
-
-    It also initializes a learning rate scheduler, though currently, it is set to `None`.
-
-    NOTE:
-    - If the encoder is shared, its parameters are excluded from the actor's optimization process.
-    - The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
-
-    Args:
-        cfg: Configuration object containing hyperparameters.
-        policy (nn.Module): The policy model containing the actor, critic, and temperature components.
-
-    Returns:
-        Tuple[Dict[str, torch.optim.Optimizer], Optional[torch.optim.lr_scheduler._LRScheduler]]:
-        A tuple containing:
-        - `optimizers`: A dictionary mapping component names ("actor", "critic", "temperature") to their respective Adam optimizers.
-        - `lr_scheduler`: Currently set to `None` but can be extended to support learning rate scheduling.
-
-    """
-    optimizer_actor = torch.optim.Adam(
-        params=[
-            p
-            for n, p in policy.actor.named_parameters()
-            if not policy.config.shared_encoder or not n.startswith("encoder")
-        ],
-        lr=cfg.policy.actor_lr,
-    )
-    optimizer_critic = torch.optim.Adam(params=policy.critic_ensemble.parameters(), lr=cfg.policy.critic_lr)
-
-    if cfg.policy.num_discrete_actions is not None:
-        optimizer_discrete_critic = torch.optim.Adam(
-            params=policy.discrete_critic.parameters(), lr=cfg.policy.critic_lr
-        )
-    optimizer_temperature = torch.optim.Adam(params=[policy.log_alpha], lr=cfg.policy.critic_lr)
-    lr_scheduler = None
-    optimizers = {
-        "actor": optimizer_actor,
-        "critic": optimizer_critic,
-        "temperature": optimizer_temperature,
-    }
-    if cfg.policy.num_discrete_actions is not None:
-        optimizers["discrete_critic"] = optimizer_discrete_critic
-    return optimizers, lr_scheduler
-
-
 # Training setup functions


@@ -1016,33 +825,6 @@ def initialize_offline_replay_buffer(
 # Utilities/Helpers functions


-def get_observation_features(
-    policy: SACPolicy, observations: torch.Tensor, next_observations: torch.Tensor
-) -> tuple[torch.Tensor | None, torch.Tensor | None]:
-    """
-    Get observation features from the policy encoder. It act as cache for the observation features.
-    when the encoder is frozen, the observation features are not updated.
-    We can save compute by caching the observation features.
-
-    Args:
-        policy: The policy model
-        observations: The current observations
-        next_observations: The next observations
-
-    Returns:
-        tuple: observation_features, next_observation_features
-    """
-
-    if policy.config.vision_encoder_name is None or not policy.config.freeze_vision_encoder:
-        return None, None
-
-    with torch.no_grad():
-        observation_features = policy.actor.encoder.get_cached_image_features(observations)
-        next_observation_features = policy.actor.encoder.get_cached_image_features(next_observations)
-
-    return observation_features, next_observation_features
-
-
 def use_threads(cfg: TrainRLServerPipelineConfig) -> bool:
    return cfg.policy.concurrency.learner == "threads"

@@ -1093,19 +875,11 @@ def check_nan_in_transition(
    return nan_detected


-def push_actor_policy_to_queue(parameters_queue: Queue, policy: nn.Module):
+def push_actor_policy_to_queue(parameters_queue: Queue, algorithm: RLAlgorithm) -> None:
    logging.debug("[LEARNER] Pushing actor policy to the queue")

    # Create a dictionary to hold all the state dicts
-    state_dicts = {"policy": move_state_dict_to_device(policy.actor.state_dict(), device="cpu")}
-
-    # Add discrete critic if it exists
-    if hasattr(policy, "discrete_critic") and policy.discrete_critic is not None:
-        state_dicts["discrete_critic"] = move_state_dict_to_device(
-            policy.discrete_critic.state_dict(), device="cpu"
-        )
-        logging.debug("[LEARNER] Including discrete critic in state dict push")
-
+    state_dicts = algorithm.get_weights()
    state_bytes = state_to_bytes(state_dicts)
    parameters_queue.put(state_bytes)

@@ -1129,9 +903,8 @@ def process_transitions(
    transition_queue: Queue,
    replay_buffer: ReplayBuffer,
    offline_replay_buffer: ReplayBuffer,
-    device: str,
    dataset_repo_id: str | None,
-    shutdown_event: any,
+    shutdown_event: Any,  # Event
 ):
    """Process all available transitions from the queue.

@@ -1139,7 +912,6 @@ def process_transitions(
        transition_queue: Queue for receiving transitions from the actor
        replay_buffer: Replay buffer to add transitions to
        offline_replay_buffer: Offline replay buffer to add transitions to
-        device: Device to move transitions to
        dataset_repo_id: Repository ID for dataset
        shutdown_event: Event to signal shutdown
    """
@@ -1148,8 +920,6 @@ def process_transitions(
        transition_list = bytes_to_transitions(buffer=transition_list)

        for transition in transition_list:
-            transition = move_transition_to_device(transition=transition, device=device)
-
            # Skip transitions with NaN values
            if check_nan_in_transition(
                observations=transition["state"],
@@ -1163,7 +933,7 @@ def process_transitions(

            # Add to offline buffer if it's an intervention
            if dataset_repo_id is not None and transition.get("complementary_info", {}).get(
-                TeleopEvents.IS_INTERVENTION
+                TeleopEvents.IS_INTERVENTION.value
            ):
                offline_replay_buffer.add(**transition)

@@ -1172,7 +942,7 @@ def process_interaction_messages(
    interaction_message_queue: Queue,
    interaction_step_shift: int,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,
+    shutdown_event: Any,  # Event
 ) -> dict | None:
    """Process all available interaction messages from the queue.

@@ -0,0 +1,54 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Top-level pipeline config for distributed RL training (actor / learner)."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from lerobot.configs.default import DatasetConfig
+from lerobot.configs.train import TrainPipelineConfig
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+from lerobot.rl.algorithms.factory import make_algorithm_config
+from lerobot.rl.algorithms.sac import SACAlgorithmConfig  # noqa: F401
+
+
+@dataclass(kw_only=True)
+class TrainRLServerPipelineConfig(TrainPipelineConfig):
+    # NOTE: In RL, we don't need an offline dataset
+    # TODO: Make `TrainPipelineConfig.dataset` optional
+    dataset: DatasetConfig | None = None  # type: ignore[assignment] # because the parent class has made it's type non-optional
+
+    # Algorithm config (a `draccus.ChoiceRegistry` subclass selected by `type`,
+    # e.g. ``"type": "sac"``). When omitted, defaults to a SAC config with
+    # default hyperparameters. The top-level `policy` is injected into
+    # ``algorithm.policy_config`` at validation time.
+    algorithm: RLAlgorithmConfig | None = None
+
+    # Data mixer strategy name. Currently supports "online_offline".
+    mixer: str = "online_offline"
+    # Fraction sampled from online replay when using OnlineOfflineMixer.
+    online_ratio: float = 0.5
+
+    def validate(self) -> None:
+        super().validate()
+
+        if self.algorithm is None:
+            self.algorithm = make_algorithm_config("sac")
+
+        # The pipeline owns the policy config; inject it so the algorithm can
+        # introspect policy architecture (e.g. ``num_discrete_actions``).
+        if getattr(self.algorithm, "policy_config", None) is None:
+            self.algorithm.policy_config = self.policy
@@ -0,0 +1,103 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+from collections.abc import Iterator
+from typing import Any
+
+from lerobot.rl.algorithms.base import BatchType, RLAlgorithm
+from lerobot.rl.algorithms.configs import TrainingStats
+from lerobot.rl.data_sources.data_mixer import DataMixer
+
+
+class RLTrainer:
+    """Unified training step orchestrator.
+
+    Holds the algorithm, a DataMixer, and an optional preprocessor.
+    """
+
+    def __init__(
+        self,
+        algorithm: RLAlgorithm,
+        data_mixer: DataMixer,
+        batch_size: int,
+        *,
+        preprocessor: Any | None = None,
+    ):
+        self.algorithm = algorithm
+        self.data_mixer = data_mixer
+        self.batch_size = batch_size
+        self._preprocessor = preprocessor
+
+        self._iterator: Iterator[BatchType] | None = None
+
+        self.algorithm.make_optimizers_and_scheduler()
+
+    def _build_data_iterator(self) -> Iterator[BatchType]:
+        """Create a fresh algorithm-configured iterator (optionally preprocessed)."""
+        raw = self.algorithm.configure_data_iterator(
+            data_mixer=self.data_mixer,
+            batch_size=self.batch_size,
+        )
+        if self._preprocessor is not None:
+            return _PreprocessedIterator(raw, self._preprocessor)
+        return raw
+
+    def reset_data_iterator(self) -> None:
+        """Discard the current iterator so it will be rebuilt lazily next step."""
+        self._iterator = None
+
+    def set_data_mixer(self, data_mixer: DataMixer, *, reset: bool = True) -> None:
+        """Swap the active data mixer, optionally resetting the iterator."""
+        self.data_mixer = data_mixer
+        if reset:
+            self.reset_data_iterator()
+
+    def training_step(self) -> TrainingStats:
+        """Run one training step (algorithm-agnostic)."""
+        if self._iterator is None:
+            self._iterator = self._build_data_iterator()
+        return self.algorithm.update(self._iterator)
+
+
+def preprocess_rl_batch(preprocessor: Any, batch: BatchType) -> BatchType:
+    """Apply policy preprocessing to RL observations only.
+
+    This mirrors the pre-refactor SAC learner behavior where actions are left
+    unchanged and only state/next_state observations are normalized.
+    """
+    observations = batch["state"]
+    next_observations = batch["next_state"]
+    batch["state"] = preprocessor.process_observation(observations)
+    batch["next_state"] = preprocessor.process_observation(next_observations)
+
+    return batch
+
+
+class _PreprocessedIterator:
+    """Iterator wrapper that preprocesses each sampled RL batch."""
+
+    __slots__ = ("_raw", "_preprocessor")
+
+    def __init__(self, raw_iterator: Iterator[BatchType], preprocessor: Any) -> None:
+        self._raw = raw_iterator
+        self._preprocessor = preprocessor
+
+    def __iter__(self) -> _PreprocessedIterator:
+        return self
+
+    def __next__(self) -> BatchType:
+        batch = next(self._raw)
+        return preprocess_rl_batch(self._preprocessor, batch)
@@ -20,7 +20,7 @@ from typing import TYPE_CHECKING, Any

 from lerobot.cameras import make_cameras_from_configs
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.import_utils import _reachy2_sdk_available, require_package
+from lerobot.utils.import_utils import _reachy2_sdk_available

 from ..robot import Robot
 from ..utils import ensure_safe_goal_position
@@ -81,7 +81,6 @@ class Reachy2Robot(Robot):
    name = "reachy2"

    def __init__(self, config: Reachy2RobotConfig):
-        require_package("reachy2_sdk", extra="reachy2")
        super().__init__(config)

        self.config = config
@@ -353,7 +353,8 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
        speed_factor: A scaling factor to convert the normalized velocity command to a position change.
        clip_min: The minimum allowed gripper joint position.
        clip_max: The maximum allowed gripper joint position.
-        discrete_gripper: If True, treat the input action as discrete (0: open, 1: close, 2: stay).
+        discrete_gripper: If True, interpret the input as a discrete class index
+            {0 = close, 1 = stay, 2 = open}, matching `GamepadTeleop.GripperAction`.
    """

    speed_factor: float = 20.0
@@ -377,10 +378,10 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
            raise ValueError("Joints observation is require for computing robot kinematics")

        if self.discrete_gripper:
-            # Discrete gripper actions are in [0, 1, 2]
-            # 0: open, 1: close, 2: stay
-            # We need to shift them to [-1, 0, 1] and then scale them to clip_max
-            gripper_vel = (gripper_vel - 1) * self.clip_max
+            # Map discrete command {0=close, 1=stay, 2=open} -> signed velocity.
+            # Negation accounts for SO100 sign (joint position increases on close).
+            #   0 -> +clip_max (close), 1 -> 0 (stay), 2 -> -clip_max (open)
+            gripper_vel = -(gripper_vel - 1) * self.clip_max

        # Compute desired gripper position
        delta = gripper_vel * float(self.speed_factor)
@@ -27,7 +27,7 @@ import numpy as np

 from lerobot.cameras import make_cameras_from_configs
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.import_utils import _unitree_sdk_available, require_package
+from lerobot.utils.import_utils import _unitree_sdk_available

 from ..robot import Robot
 from .config_unitree_g1 import UnitreeG1Config
@@ -111,7 +111,6 @@ class UnitreeG1(Robot):
    name = "unitree_g1"

    def __init__(self, config: UnitreeG1Config):
-        require_package("unitree-sdk2py", extra="unitree_g1", import_name="unitree_sdk2py")
        super().__init__(config)

        logger.info("Initialize UnitreeG1...")
@@ -1,82 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Policy deployment engine with pluggable rollout strategies."""
-
-from lerobot.utils.import_utils import require_package
-
-require_package("datasets", extra="dataset")
-
-from .configs import (
-    BaseStrategyConfig,
-    DAggerKeyboardConfig,
-    DAggerPedalConfig,
-    DAggerStrategyConfig,
-    DatasetRecordConfig,
-    HighlightStrategyConfig,
-    RolloutConfig,
-    RolloutStrategyConfig,
-    SentryStrategyConfig,
-)
-from .context import (
-    DatasetContext,
-    HardwareContext,
-    PolicyContext,
-    ProcessorContext,
-    RolloutContext,
-    RuntimeContext,
-    build_rollout_context,
-)
-from .inference import (
-    InferenceEngine,
-    InferenceEngineConfig,
-    RTCInferenceConfig,
-    RTCInferenceEngine,
-    SyncInferenceConfig,
-    SyncInferenceEngine,
-    create_inference_engine,
-)
-from .ring_buffer import RolloutRingBuffer
-from .robot_wrapper import ThreadSafeRobot
-from .strategies import RolloutStrategy, create_strategy
-
-__all__ = [
-    "BaseStrategyConfig",
-    "DAggerKeyboardConfig",
-    "DAggerPedalConfig",
-    "DAggerStrategyConfig",
-    "DatasetContext",
-    "DatasetRecordConfig",
-    "HardwareContext",
-    "HighlightStrategyConfig",
-    "InferenceEngine",
-    "InferenceEngineConfig",
-    "PolicyContext",
-    "ProcessorContext",
-    "RTCInferenceConfig",
-    "RTCInferenceEngine",
-    "RolloutConfig",
-    "RolloutContext",
-    "RolloutRingBuffer",
-    "RolloutStrategy",
-    "RolloutStrategyConfig",
-    "RuntimeContext",
-    "SentryStrategyConfig",
-    "SyncInferenceConfig",
-    "SyncInferenceEngine",
-    "ThreadSafeRobot",
-    "build_rollout_context",
-    "create_inference_engine",
-    "create_strategy",
-]
@@ -1,270 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Configuration dataclasses for the rollout deployment engine."""
-
-from __future__ import annotations
-
-import abc
-import logging
-from dataclasses import dataclass, field
-
-import draccus
-
-from lerobot.configs import PreTrainedConfig, parser
-from lerobot.configs.dataset import DatasetRecordConfig
-from lerobot.robots.config import RobotConfig
-from lerobot.teleoperators.config import TeleoperatorConfig
-
-from .inference import InferenceEngineConfig, SyncInferenceConfig
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# Strategy configs (polymorphic dispatch via draccus ChoiceRegistry)
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RolloutStrategyConfig(draccus.ChoiceRegistry, abc.ABC):
-    """Abstract base for rollout strategy configurations.
-
-    Use ``--strategy.type=<name>`` on the CLI to select a strategy.
-    """
-
-    @property
-    def type(self) -> str:
-        return self.get_choice_name(self.__class__)
-
-
-@RolloutStrategyConfig.register_subclass("base")
-@dataclass
-class BaseStrategyConfig(RolloutStrategyConfig):
-    """Autonomous rollout with no data recording."""
-
-    pass
-
-
-@RolloutStrategyConfig.register_subclass("sentry")
-@dataclass
-class SentryStrategyConfig(RolloutStrategyConfig):
-    """Continuous autonomous rollout with always-on recording.
-
-    Episode duration is derived from camera resolution, FPS, and
-    ``target_video_file_size_mb`` so that each saved episode produces a
-    video file that has crossed the target size.  This aligns episode
-    boundaries with the dataset's video file chunking, so each
-    ``push_to_hub`` call uploads complete video files rather than
-    re-uploading a growing file that hasn't crossed the chunk boundary.
-    """
-
-    upload_every_n_episodes: int = 5
-    # Target video file size in MB for episode rotation.  Episodes are
-    # saved once the estimated video duration would exceed this limit.
-    # Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when set to None.
-    target_video_file_size_mb: float | None = None
-
-
-@RolloutStrategyConfig.register_subclass("highlight")
-@dataclass
-class HighlightStrategyConfig(RolloutStrategyConfig):
-    """Autonomous rollout with on-demand recording via ring buffer.
-
-    A memory-bounded ring buffer continuously captures telemetry.  When
-    the user presses the save key, the buffer contents are flushed to
-    the dataset and live recording continues until the key is pressed
-    again.
-    """
-
-    ring_buffer_seconds: float = 30.0
-    ring_buffer_max_memory_mb: float = 2048.0
-    save_key: str = "s"
-    push_key: str = "h"
-
-
-@dataclass
-class DAggerKeyboardConfig:
-    """Keyboard key bindings for DAgger controls.
-
-    Keys are specified as single characters (e.g. ``"c"``, ``"h"``) or
-    special key names (``"space"``).
-    """
-
-    pause_resume: str = "space"
-    correction: str = "tab"
-    upload: str = "enter"
-
-
-@dataclass
-class DAggerPedalConfig:
-    """Foot pedal configuration for DAgger controls.
-
-    Pedal codes are evdev key code strings (e.g. ``"KEY_A"``).
-    """
-
-    device_path: str = "/dev/input/by-id/usb-PCsensor_FootSwitch-event-kbd"
-    pause_resume: str = "KEY_A"
-    correction: str = "KEY_B"
-    upload: str = "KEY_C"
-
-
-@RolloutStrategyConfig.register_subclass("dagger")
-@dataclass
-class DAggerStrategyConfig(RolloutStrategyConfig):
-    """Human-in-the-loop data collection (DAgger / RaC).
-
-    Alternates between autonomous policy execution and human intervention.
-    Intervention frames are tagged with ``intervention=True``.
-
-    Input is controlled via either a keyboard or foot pedal, selected by
-    ``input_device``.  Each device exposes three actions:
-
-    1. **pause_resume** — toggle policy execution on/off.
-    2. **correction** — toggle human correction recording.
-    3. **upload** — push dataset to hub on demand (corrections-only mode).
-
-    When ``record_autonomous=True`` (default) both autonomous and correction
-    frames are recorded with size-based episode rotation (same as Sentry)
-    and background uploading.  ``push_to_hub`` is blocked while a correction
-    is in progress.  Set to ``False`` to record only the human-correction
-    windows, where each correction becomes its own episode.
-    """
-
-    num_episodes: int = 10
-    record_autonomous: bool = False
-    upload_every_n_episodes: int = 5
-    # Target video file size in MB for episode rotation (record_autonomous
-    # mode only).  Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when None.
-    target_video_file_size_mb: float | None = None
-    input_device: str = "keyboard"
-    keyboard: DAggerKeyboardConfig = field(default_factory=DAggerKeyboardConfig)
-    pedal: DAggerPedalConfig = field(default_factory=DAggerPedalConfig)
-
-    def __post_init__(self):
-        if self.input_device not in ("keyboard", "pedal"):
-            raise ValueError(f"DAgger input_device must be 'keyboard' or 'pedal', got '{self.input_device}'")
-
-
-# ---------------------------------------------------------------------------
-# Top-level rollout config
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RolloutConfig:
-    """Top-level configuration for the ``lerobot-rollout`` CLI.
-
-    Combines hardware, policy, strategy, and runtime settings.  The
-    ``__post_init__`` method performs fail-fast validation to reject
-    invalid flag combinations early.
-    """
-
-    # Hardware
-    robot: RobotConfig | None = None
-    teleop: TeleoperatorConfig | None = None
-
-    # Policy (loaded from --policy.path via __post_init__)
-    policy: PreTrainedConfig | None = None
-
-    # Strategy (polymorphic: --strategy.type=base|sentry|highlight|dagger)
-    strategy: RolloutStrategyConfig = field(default_factory=BaseStrategyConfig)
-
-    # Inference backend (polymorphic: --inference.type=sync|rtc)
-    inference: InferenceEngineConfig = field(default_factory=SyncInferenceConfig)
-
-    # Dataset (required for sentry, highlight, dagger; None for base)
-    dataset: DatasetRecordConfig | None = None
-
-    # Runtime
-    fps: float = 30.0
-    duration: float = 0.0  # 0 = infinite (24/7 mode)
-    interpolation_multiplier: int = 1
-    device: str | None = None
-    task: str = ""
-    display_data: bool = False
-    # Display data on a remote Rerun server
-    display_ip: str | None = None
-    # Port of the remote Rerun server
-    display_port: int | None = None
-    # Whether to display compressed images in Rerun
-    display_compressed_images: bool = False
-    # Use vocal synthesis to read events
-    play_sounds: bool = True
-    resume: bool = False
-
-    # Torch compile
-    use_torch_compile: bool = False
-    torch_compile_backend: str = "inductor"
-    torch_compile_mode: str = "default"
-    compile_warmup_inferences: int = 2
-
-    def __post_init__(self):
-        """Validate config invariants and load the policy config from ``--policy.path``."""
-        # --- Strategy-specific validation ---
-        if isinstance(self.strategy, DAggerStrategyConfig) and self.teleop is None:
-            raise ValueError("DAgger strategy requires --teleop.type to be set")
-
-        needs_dataset = isinstance(self.strategy, (SentryStrategyConfig, HighlightStrategyConfig))
-        if needs_dataset and (self.dataset is None or not self.dataset.repo_id):
-            raise ValueError(f"{self.strategy.type} strategy requires --dataset.repo_id to be set")
-
-        if isinstance(self.strategy, BaseStrategyConfig) and self.dataset is not None:
-            raise ValueError(
-                "Base strategy does not record data. Use sentry, highlight, or dagger for recording."
-            )
-
-        # Sentry MUST use streaming encoding to avoid disk I/O blocking the control loop
-        if (
-            isinstance(self.strategy, SentryStrategyConfig)
-            and self.dataset is not None
-            and not self.dataset.streaming_encoding
-        ):
-            logger.warning("Sentry mode forces streaming_encoding=True")
-            self.dataset.streaming_encoding = True
-
-        # Highlight writes frames while the policy is still running, so streaming is mandatory.
-        if (
-            isinstance(self.strategy, HighlightStrategyConfig)
-            and self.dataset is not None
-            and not self.dataset.streaming_encoding
-        ):
-            logger.warning("Highlight mode forces streaming_encoding=True")
-            self.dataset.streaming_encoding = True
-
-        # DAgger: streaming is mandatory only when the autonomous phase is also recorded.
-        if (
-            isinstance(self.strategy, DAggerStrategyConfig)
-            and self.strategy.record_autonomous
-            and self.dataset is not None
-            and not self.dataset.streaming_encoding
-        ):
-            logger.warning("DAgger with record_autonomous=True forces streaming_encoding=True")
-            self.dataset.streaming_encoding = True
-
-        # --- Policy loading ---
-        if self.robot is None:
-            raise ValueError("--robot.type is required for rollout")
-
-        policy_path = parser.get_path_arg("policy")
-        if policy_path:
-            cli_overrides = parser.get_cli_overrides("policy")
-            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
-            self.policy.pretrained_path = policy_path
-        if self.policy is None:
-            raise ValueError("--policy.path is required for rollout")
-
-    @classmethod
-    def __get_path_fields__(cls) -> list[str]:
-        return ["policy"]
@@ -1,429 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Rollout context: shared state created once before strategy dispatch.
-
-Grouped into five topical sub-contexts — :class:`RuntimeContext`,
-:class:`HardwareContext`, :class:`PolicyContext`, :class:`ProcessorContext`,
-and :class:`DatasetContext` — assembled into :class:`RolloutContext`.
-"""
-
-from __future__ import annotations
-
-import logging
-from dataclasses import dataclass, field
-from threading import Event
-
-import torch
-
-from lerobot.configs import FeatureType, PreTrainedConfig
-from lerobot.datasets import (
-    LeRobotDataset,
-    aggregate_pipeline_dataset_features,
-    create_initial_features,
-)
-from lerobot.policies import get_policy_class, make_pre_post_processors
-from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.processor import (
-    PolicyProcessorPipeline,
-    RobotAction,
-    RobotObservation,
-    RobotProcessorPipeline,
-    make_default_processors,
-    rename_stats,
-)
-from lerobot.robots import make_robot_from_config
-from lerobot.teleoperators import Teleoperator, make_teleoperator_from_config
-from lerobot.utils.feature_utils import combine_feature_dicts, hw_to_dataset_features
-
-from .configs import BaseStrategyConfig, DAggerStrategyConfig, RolloutConfig
-from .inference import (
-    InferenceEngine,
-    RTCInferenceConfig,
-    create_inference_engine,
-)
-from .robot_wrapper import ThreadSafeRobot
-
-logger = logging.getLogger(__name__)
-
-
-def _resolve_action_key_order(
-    policy_action_names: list[str] | None, dataset_action_names: list[str]
-) -> list[str]:
-    """Choose action name ordering for mapping policy tensor outputs to robot action dicts."""
-    if not policy_action_names:
-        return dataset_action_names
-    policy_action_names = list(policy_action_names)
-    if len(policy_action_names) != len(dataset_action_names):
-        logger.warning(
-            "policy.action_feature_names length (%d) != dataset action dim (%d); using dataset order",
-            len(policy_action_names),
-            len(dataset_action_names),
-        )
-        return dataset_action_names
-    if set(dataset_action_names) != set(policy_action_names):
-        logger.warning("policy.action_feature_names keys don't match dataset; using dataset order")
-        return dataset_action_names
-    return policy_action_names
-
-
-# ---------------------------------------------------------------------------
-# Sub-contexts
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class RuntimeContext:
-    """Runtime knobs shared with every strategy."""
-
-    cfg: RolloutConfig
-    shutdown_event: Event
-
-
-@dataclass
-class HardwareContext:
-    """Connected hardware.
-
-    The raw robot is available via ``robot_wrapper.inner`` when needed
-    (e.g. for disconnect); strategies should otherwise go through the
-    thread-safe wrapper.
-
-    ``initial_position`` stores the robot's joint positions at connect
-    time.  Strategies use it to return the robot to a safe pose before
-    shutting down.
-    """
-
-    robot_wrapper: ThreadSafeRobot
-    teleop: Teleoperator | None
-    initial_position: dict | None = None
-
-
-@dataclass
-class PolicyContext:
-    """Loaded policy and its inference engine."""
-
-    policy: PreTrainedPolicy
-    preprocessor: PolicyProcessorPipeline
-    postprocessor: PolicyProcessorPipeline
-    inference: InferenceEngine
-
-
-@dataclass
-class ProcessorContext:
-    """Robot-side pipelines (run outside the policy)."""
-
-    teleop_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
-    robot_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
-    robot_observation_processor: RobotProcessorPipeline[RobotObservation, RobotObservation]
-
-
-@dataclass
-class DatasetContext:
-    """Dataset and feature bookkeeping."""
-
-    dataset: LeRobotDataset | None
-    dataset_features: dict = field(default_factory=dict)
-    hw_features: dict = field(default_factory=dict)
-    ordered_action_keys: list[str] = field(default_factory=list)
-
-
-@dataclass
-class RolloutContext:
-    """Bundle of sub-contexts passed to every rollout strategy.
-
-    Built once by :func:`build_rollout_context` before strategy dispatch.
-    """
-
-    runtime: RuntimeContext
-    hardware: HardwareContext
-    policy: PolicyContext
-    processors: ProcessorContext
-    data: DatasetContext
-
-
-# ---------------------------------------------------------------------------
-# Build
-# ---------------------------------------------------------------------------
-
-
-def build_rollout_context(
-    cfg: RolloutConfig,
-    shutdown_event: Event,
-    teleop_action_processor: RobotProcessorPipeline | None = None,
-    robot_action_processor: RobotProcessorPipeline | None = None,
-    robot_observation_processor: RobotProcessorPipeline | None = None,
-) -> RolloutContext:
-    """Wire up policy, processors, hardware, dataset, and inference engine.
-
-    The order is policy-first / hardware-last so a bad ``--policy.path``
-    fails fast without touching the robot.
-    """
-    is_rtc = isinstance(cfg.inference, RTCInferenceConfig)
-
-    # --- 1. Policy (heavy I/O, but no hardware yet) -------------------
-    logger.info("Loading policy from '%s'...", cfg.policy.pretrained_path)
-    policy_config = cfg.policy
-    policy_class = get_policy_class(policy_config.type)
-
-    full_config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
-    for attr in ("device", "use_amp"):
-        if hasattr(cfg.policy, attr) and hasattr(full_config, attr):
-            cli_val = getattr(cfg.policy, attr)
-            if cli_val is not None:
-                setattr(full_config, attr, cli_val)
-
-    if hasattr(full_config, "compile_model"):
-        full_config.compile_model = cfg.use_torch_compile
-
-    if full_config.type == "vqbet" and cfg.device == "mps":
-        raise NotImplementedError(
-            "Current implementation of VQBeT does not support `mps` backend. "
-            "Please use `cpu` or `cuda` backend."
-        )
-
-    if full_config.use_peft:
-        from peft import PeftConfig, PeftModel
-
-        peft_path = cfg.policy.pretrained_path
-        peft_config = PeftConfig.from_pretrained(peft_path)
-        policy = policy_class.from_pretrained(
-            pretrained_name_or_path=peft_config.base_model_name_or_path, config=full_config
-        )
-        policy = PeftModel.from_pretrained(policy, peft_path, config=peft_config)
-    else:
-        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=full_config)
-
-    if is_rtc:
-        policy.config.rtc_config = cfg.inference.rtc
-        if hasattr(policy, "init_rtc_processor"):
-            policy.init_rtc_processor()
-
-    policy = policy.to(cfg.device)
-    policy.eval()
-    logger.info("Policy loaded: type=%s, device=%s", policy_config.type, cfg.device)
-
-    if cfg.use_torch_compile and policy.type not in ("pi0", "pi05"):
-        try:
-            if hasattr(torch, "compile"):
-                compile_kwargs = {
-                    "backend": cfg.torch_compile_backend,
-                    "mode": cfg.torch_compile_mode,
-                    "options": {"triton.cudagraphs": False},
-                }
-                policy.predict_action_chunk = torch.compile(policy.predict_action_chunk, **compile_kwargs)
-                logger.info("torch.compile applied to predict_action_chunk")
-        except Exception as e:
-            logger.warning("Failed to apply torch.compile: %s", e)
-
-    # --- 2. Robot-side processors (user-supplied or defaults) --------
-    if (
-        teleop_action_processor is None
-        or robot_action_processor is None
-        or robot_observation_processor is None
-    ):
-        _t, _r, _o = make_default_processors()
-        teleop_action_processor = teleop_action_processor or _t
-        robot_action_processor = robot_action_processor or _r
-        robot_observation_processor = robot_observation_processor or _o
-
-    # --- 3. Hardware (heaviest side-effect, deferred) -----------------
-    logger.info("Connecting robot (%s)...", cfg.robot.type if cfg.robot else "?")
-    robot = make_robot_from_config(cfg.robot)
-    robot.connect()
-    logger.info("Robot connected: %s", robot.name)
-
-    # Store the initial joint positions so we can return to a safe pose on shutdown.
-    initial_obs = robot.get_observation()
-    initial_position = {k: v for k, v in initial_obs.items() if k.endswith(".pos")}
-    logger.info("Captured initial robot position (%d keys)", len(initial_position))
-
-    robot_wrapper = ThreadSafeRobot(robot)
-
-    teleop = None
-    if cfg.teleop is not None:
-        logger.info("Connecting teleoperator (%s)...", cfg.teleop.type if cfg.teleop else "?")
-        teleop = make_teleoperator_from_config(cfg.teleop)
-        teleop.connect()
-        logger.info("Teleoperator connected")
-
-    # DAgger requires teleop with motor control capabilities (enable_torque,
-    # disable_torque, write_goal_positions).
-    # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-    # user is responsible for moving the teleop to the same position as the robot when starting the correction.
-    # if isinstance(cfg.strategy, DAggerStrategyConfig) and teleop is not None:
-    #     required_teleop_methods = ("enable_torque", "disable_torque", "write_goal_positions")
-    #     missing = [m for m in required_teleop_methods if not callable(getattr(teleop, m, None))]
-    #     if missing:
-    #         teleop.disconnect()
-    #         raise ValueError(
-    #             f"DAgger strategy requires a teleoperator with motor control methods "
-    #             f"{required_teleop_methods}. '{type(teleop).__name__}' is missing: {missing}"
-    #         )
-
-    # --- 4. Features + action-key reconciliation ---------------------
-    all_obs_features = robot.observation_features
-    observation_features_hw = {
-        k: v for k, v in all_obs_features.items() if v is float or isinstance(v, tuple)
-    }
-    action_features_hw = robot.action_features
-
-    # The action side is always needed: sync inference reads action names from
-    # ``dataset_features[ACTION]`` to map policy tensors back to robot actions.
-    action_dataset_features = aggregate_pipeline_dataset_features(
-        pipeline=teleop_action_processor,
-        initial_features=create_initial_features(action=action_features_hw),
-        use_videos=cfg.dataset.video if cfg.dataset else True,
-    )
-    # Observation-side aggregation is needed because of build_dataset_frame
-    observation_dataset_features = aggregate_pipeline_dataset_features(
-        pipeline=robot_observation_processor,
-        initial_features=create_initial_features(observation=observation_features_hw),
-        use_videos=cfg.dataset.video if cfg.dataset else True,
-    )
-    dataset_features = combine_feature_dicts(action_dataset_features, observation_dataset_features)
-    hw_features = hw_to_dataset_features(observation_features_hw, "observation")
-    raw_action_keys = list(robot.action_features.keys())
-    policy_action_names = getattr(policy_config, "action_feature_names", None)
-    ordered_action_keys = _resolve_action_key_order(
-        list(policy_action_names) if policy_action_names else None,
-        raw_action_keys,
-    )
-
-    # Validate visual features if no rename_map is active
-    rename_map = cfg.dataset.rename_map if cfg.dataset else {}
-    if not rename_map:
-        expected_visuals = {k for k, v in full_config.input_features.items() if v.type == FeatureType.VISUAL}
-        provided_visuals = {
-            f"observation.{k}" for k, v in robot.observation_features.items() if isinstance(v, tuple)
-        }
-        policy_subset = expected_visuals.issubset(provided_visuals)
-        hw_subset = provided_visuals.issubset(expected_visuals)
-        if not (policy_subset or hw_subset):
-            raise ValueError(
-                f"Visual feature mismatch between policy and robot hardware.\n"
-                f"Policy expects: {expected_visuals}\n"
-                f"Robot provides: {provided_visuals}"
-            )
-
-    # --- 5. Dataset -------------
-    dataset = None
-    if cfg.dataset is not None and not isinstance(cfg.strategy, BaseStrategyConfig):
-        logger.info("Setting up dataset (repo_id=%s)...", cfg.dataset.repo_id)
-        if cfg.resume:
-            dataset = LeRobotDataset.resume(
-                cfg.dataset.repo_id,
-                root=cfg.dataset.root,
-                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
-                streaming_encoding=cfg.dataset.streaming_encoding,
-                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
-                image_writer_processes=cfg.dataset.num_image_writer_processes,
-                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
-                * len(robot.cameras if hasattr(robot, "cameras") else []),
-            )
-        else:
-            if isinstance(cfg.strategy, DAggerStrategyConfig):
-                dataset_features["intervention"] = {
-                    "dtype": "bool",
-                    "shape": (1,),
-                    "names": None,
-                }
-
-            dataset = LeRobotDataset.create(
-                cfg.dataset.repo_id,
-                cfg.dataset.fps,
-                root=cfg.dataset.root,
-                robot_type=robot.name,
-                features=dataset_features,
-                use_videos=cfg.dataset.video,
-                image_writer_processes=cfg.dataset.num_image_writer_processes,
-                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
-                * len(robot.cameras if hasattr(robot, "cameras") else []),
-                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
-                streaming_encoding=cfg.dataset.streaming_encoding,
-                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
-            )
-
-    if dataset is not None:
-        logger.info("Dataset ready: %s (%d existing episodes)", dataset.repo_id, dataset.num_episodes)
-
-    # --- 6. Policy pre/post processors (needs dataset stats if any) ---
-    dataset_stats = None
-    if dataset is not None:
-        dataset_stats = rename_stats(
-            dataset.meta.stats,
-            cfg.dataset.rename_map if cfg.dataset else {},
-        )
-
-    preprocessor, postprocessor = make_pre_post_processors(
-        policy_cfg=policy_config,
-        pretrained_path=cfg.policy.pretrained_path,
-        dataset_stats=dataset_stats,
-        preprocessor_overrides={
-            "device_processor": {"device": cfg.device or getattr(policy_config, "device", "cpu")},
-            "rename_observations_processor": {"rename_map": cfg.dataset.rename_map if cfg.dataset else {}},
-        },
-    )
-
-    # --- 7. Inference strategy (needs policy + pre/post + hardware) --
-    logger.info(
-        "Creating inference engine (type=%s)...",
-        cfg.inference.type if hasattr(cfg.inference, "type") else "sync",
-    )
-    task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
-    inference_strategy = create_inference_engine(
-        cfg.inference,
-        policy=policy,
-        preprocessor=preprocessor,
-        postprocessor=postprocessor,
-        robot_wrapper=robot_wrapper,
-        hw_features=hw_features,
-        dataset_features=dataset_features,
-        ordered_action_keys=ordered_action_keys,
-        task=task_str,
-        fps=cfg.fps,
-        device=cfg.device,
-        use_torch_compile=cfg.use_torch_compile,
-        compile_warmup_inferences=cfg.compile_warmup_inferences,
-        shutdown_event=shutdown_event,
-    )
-
-    # --- 8. Assemble ---------------------------------------------------
-    logger.info("Rollout context assembled successfully")
-    return RolloutContext(
-        runtime=RuntimeContext(cfg=cfg, shutdown_event=shutdown_event),
-        hardware=HardwareContext(
-            robot_wrapper=robot_wrapper, teleop=teleop, initial_position=initial_position
-        ),
-        policy=PolicyContext(
-            policy=policy,
-            preprocessor=preprocessor,
-            postprocessor=postprocessor,
-            inference=inference_strategy,
-        ),
-        processors=ProcessorContext(
-            teleop_action_processor=teleop_action_processor,
-            robot_action_processor=robot_action_processor,
-            robot_observation_processor=robot_observation_processor,
-        ),
-        data=DatasetContext(
-            dataset=dataset,
-            dataset_features=dataset_features,
-            hw_features=hw_features,
-            ordered_action_keys=ordered_action_keys,
-        ),
-    )
@@ -1,39 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Inference engine package — backend-agnostic action production.
-
-Concrete strategies (sync, RTC, …) expose the same small interface so
-rollout strategies never branch on the inference backend.
-"""
-
-from .base import InferenceEngine
-from .factory import (
-    InferenceEngineConfig,
-    RTCInferenceConfig,
-    SyncInferenceConfig,
-    create_inference_engine,
-)
-from .rtc import RTCInferenceEngine
-from .sync import SyncInferenceEngine
-
-__all__ = [
-    "InferenceEngine",
-    "InferenceEngineConfig",
-    "RTCInferenceConfig",
-    "RTCInferenceEngine",
-    "SyncInferenceConfig",
-    "SyncInferenceEngine",
-    "create_inference_engine",
-]
@@ -1,88 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Inference engine ABC.
-
-Rollout strategies consume actions through this small interface so they
-do not need to know whether the inference engine is synchronous, runs in
-a background thread (RTC), or comes from an external source.
-"""
-
-from __future__ import annotations
-
-import abc
-
-import torch
-
-
-class InferenceEngine(abc.ABC):
-    """Abstract backend for producing actions during rollout.
-
-    Subclasses decide whether inference happens inline, in a background
-    thread, or externally. The contract is minimal so new backends can
-    be added without touching rollout strategies.
-
-    Lifecycle
-    ---------
-    ``start`` — prepare the backend (e.g. launch a background thread).
-    ``stop`` — shut the backend down cleanly.
-    ``reset`` — clear episode-scoped state (policy hidden state, queues…).
-
-    Action production
-    -----------------
-    ``get_action(obs_frame)`` — return the next action tensor, or
-    ``None`` if none is available (e.g. async queue empty).  Sync
-    backends always compute from ``obs_frame``; async backends may
-    ignore it (they get observations via ``notify_observation``).
-
-    Optional hooks
-    --------------
-    ``notify_observation`` / ``pause`` / ``resume`` have a no-op default
-    so rollout strategies can invoke them unconditionally.
-    """
-
-    @abc.abstractmethod
-    def start(self) -> None:
-        """Initialise the backend."""
-
-    @abc.abstractmethod
-    def stop(self) -> None:
-        """Tear the backend down."""
-
-    @abc.abstractmethod
-    def reset(self) -> None:
-        """Clear episode-scoped state."""
-
-    @abc.abstractmethod
-    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
-        """Return the next action tensor, or ``None`` if unavailable."""
-
-    def notify_observation(self, obs: dict) -> None:  # noqa: B027
-        """Publish the latest processed observation.  Default: no-op."""
-
-    def pause(self) -> None:  # noqa: B027
-        """Pause background inference.  Default: no-op."""
-
-    def resume(self) -> None:  # noqa: B027
-        """Resume background inference.  Default: no-op."""
-
-    @property
-    def ready(self) -> bool:
-        """True once the backend can produce actions (e.g. warmup done)."""
-        return True
-
-    @property
-    def failed(self) -> bool:
-        """True if an unrecoverable error occurred in the backend."""
-        return False
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Khalil Meftah	7b82a5c381	feat(rl): add bus-control primitives and smooth move functionality for leader intervention	2026-04-27 18:31:13 +02:00
Khalil Meftah	13418dcd7b	feat(rl): port haptic follow + torque toggle from #2596 to leader intervention	2026-04-27 17:50:29 +02:00
Khalil Meftah	a3cb9f5317	feat(rl): leader arm as HIL-SERL intervention device (position-only)	2026-04-27 17:26:29 +02:00
Khalil Meftah	e298474bf3	fix(tests): gate RL tests on the `datasets` extra	2026-04-27 16:53:34 +02:00
Khalil Meftah	577f14337a	refactor(tests): remove grpc import checks from test files for cleaner code	2026-04-27 16:20:13 +02:00
Khalil Meftah	47be90f040	refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility	2026-04-27 15:59:59 +02:00
Khalil Meftah	47dd65347e	refactor(rl): add type property to RLAlgorithmConfig for better clarity	2026-04-27 15:57:24 +02:00
Khalil Meftah	fd5a788120	refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation	2026-04-27 15:55:16 +02:00
Khalil Meftah	9ce9e01469	refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable	2026-04-27 13:39:03 +02:00
Khalil Meftah	21c16a27f0	Revert "perf(observation_processor): add CUDA support for image processing" This reverts commit `38b88c414c`.	2026-04-27 11:52:19 +02:00
Khalil Meftah	b3164543f4	fix(rl): enhance intervention handling in actor and learner (cherry picked from commit `ef8bfffbd7`)	2026-04-27 11:35:21 +02:00
Khalil Meftah	f3993cbbb1	fix(rl): improve action processing for discrete and continuous actions (cherry picked from commit `f887ab3f6a`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	c278cfa026	fix(rl): postprocess action in actor (cherry picked from commit `c2556439e5`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	77d18659b1	fix(rl): mirror gym_manipulator in actor (cherry picked from commit `d2a046dfc5`)	2026-04-27 11:35:19 +02:00
Khalil Meftah	6347edefb1	fix(rl): merge environment and action-processor info in transition processing (cherry picked from commit `30e1886b64`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	eda47eca18	fix(rl): update neutral gripper action (cherry picked from commit `9c9064e5be`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	a64e6f5070	fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100 (cherry picked from commit `494f469a2b`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	3def86c2c3	fix(rl): add time limit processor to environment pipeline (cherry picked from commit `cd105f65cb`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	356a64d8c4	fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline (cherry picked from commit `9c2af818ff`)	2026-04-27 11:35:16 +02:00
Khalil Meftah	38b88c414c	perf(observation_processor): add CUDA support for image processing	2026-04-24 13:36:26 +02:00
Khalil Meftah	1ed32210c7	refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic	2026-04-24 13:18:33 +02:00
Khalil Meftah	06255996ea	refactor(policies): rename policies/sac → policies/gaussian_actor	2026-04-23 19:13:18 +02:00
Khalil Meftah	8065bf15c7	fix test for flat dict structure	2026-04-21 12:06:25 +02:00
Khalil Meftah	8191d2d87f	remove unused type alias	2026-04-21 11:56:27 +02:00
Khalil Meftah	6b93f31238	fix docstring	2026-04-21 11:55:17 +02:00
Khalil Meftah	a4c0c9e358	update losses names in tests	2026-04-21 11:53:32 +02:00
Khalil Meftah	a84b0e8132	refactor(sac): decouple algorithm hyperparameters from policy config	2026-04-18 16:40:56 +02:00
Khalil Meftah	2487a6ee6d	perf(rl): use async iterators in OnlineOfflineMixer.get_iterator	2026-04-18 16:02:28 +02:00
Khalil Meftah	72fb0faf62	refactor(sac): simplify optimizer return structure	2026-04-18 15:45:22 +02:00
Khalil Meftah	2c97cb23c8	refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity	2026-04-18 15:39:32 +02:00
Khalil Meftah	87d4c9879c	fix(sac): clarify torch.compile status	2026-04-18 15:19:35 +02:00
Khalil Meftah	e4c1a8472d	fix(config): update vision encoder model name to lerobot/resnet10	2026-04-18 15:15:59 +02:00
Khalil Meftah	d7e25c8326	refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages	2026-04-16 15:46:34 +02:00
Khalil Meftah	a5ad273b62	fix(tests): skip tests that require grpc if not available	2026-04-15 16:30:20 +02:00
Khalil Meftah	23bece96a4	fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests	2026-04-15 16:12:08 +02:00
Khalil Meftah	7a1c9e74c3	fix: skip tests that require grpc if not available	2026-04-15 15:18:04 +02:00
Khalil Meftah	c88cf979f1	fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error	2026-04-15 11:49:38 +02:00
Khalil Meftah	79a9ebdaa6	fix: add try/finally to control_loop to ensure image writer cleanup on exit	2026-04-14 17:54:35 +02:00
Khalil Meftah	da6e36fd03	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor	2026-04-14 17:14:56 +02:00
Khalil Meftah	64dc08cb7b	fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer	2026-04-14 16:35:08 +02:00
Khalil Meftah	e6d282108d	Fix: add kwargs in reward classifier __init__()	2026-04-14 11:13:43 +02:00
Khalil Meftah	a8838c081b	perf: remove redundant CPU→GPU→CPU transition move in learner	2026-04-13 19:06:28 +02:00
Khalil Meftah	ee0814ef60	refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference	2026-04-13 18:31:17 +02:00
Khalil Meftah	7b0bdf2a98	fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()	2026-04-13 18:27:24 +02:00
Khalil Meftah	9422dc98c2	fix: remove leftover normalization calls from reward classifier predict_reward Fixes #2355	2026-04-13 13:30:50 +02:00
Khalil Meftah	11a0b0174f	fix(teleop): keyboard EE teleop not registering special keys and losing intervention state Fixes #2345 Co-authored-by: jpizarrom <jpizarrom@gmail.com>	2026-04-13 12:31:00 +02:00
Khalil Meftah	036b310a97	chore: clarify torch.compile disabled note in SACAlgorithm	2026-04-13 11:49:27 +02:00
Khalil Meftah	e022207c75	refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring	2026-04-13 11:39:48 +02:00