feat(so_follower): synchronize goal position with present position to prevent positional error during torque re-enablement

feat(teleop): enhance leader-follower behavior and torque management in SO101 teleoperation
feat(teleop): add SO100/SO101 leader-follower teleoperation example
2026-05-13 07:39:53 +00:00 · 2026-04-28 18:40:48 +02:00 · 2026-04-28 17:46:06 +02:00 · 2026-04-28 17:28:15 +02:00 · 2026-04-28 16:53:36 +02:00 · 2026-04-28 12:04:13 +02:00
122 changed files with 10070 additions and 4717 deletions
@@ -61,6 +61,8 @@
    title: SARM
  title: "Reward Models"
 - sections:
+  - local: inference
+    title: Policy Deployment (lerobot-rollout)
  - local: async
    title: Use Async Inference
  - local: rtc
@@ -50,30 +50,30 @@ This process can be repeated iteratively: deploy, collect, fine-tune, repeat. Ea

 ### Teleoperator Requirements

-The `examples/hil` HIL scripts require **teleoperators with active motors** that can:
+The `lerobot-rollout --strategy.type=dagger` mode requires **teleoperators with active motors** that can:

 - Enable/disable torque programmatically
 - Move to target positions (to mirror the robot state when pausing)

-**Compatible teleoperators in the current `examples/hil` scripts:**
+**Compatible teleoperators:**

 - `openarm_mini` - OpenArm Mini
 - `so_leader` - SO100 / SO101 leader arm

 > [!IMPORTANT]
-> The provided `examples/hil` commands default to `bi_openarm_follower` + `openarm_mini`.
+> The provided commands default to `bi_openarm_follower` + `openarm_mini`.
 > `so_follower` + `so_leader` configs are also registered and can be used via CLI flags.

 ---

 ## Script

-A single script handles both synchronous and RTC-based inference. Toggle RTC with `--rtc.enabled=true`:
+Use `lerobot-rollout` with `--strategy.type=dagger` for HIL data collection. Select the inference backend with `--inference.type=sync|rtc`:

-| Mode                     | Flag                 | Models                |
-| ------------------------ | -------------------- | --------------------- |
-| Standard (default)       | _(no flag needed)_   | ACT, Diffusion Policy |
-| Real-Time Chunking (RTC) | `--rtc.enabled=true` | Pi0, Pi0.5, SmolVLA   |
+| Mode                     | Flag                   | Models                |
+| ------------------------ | ---------------------- | --------------------- |
+| Standard (default)       | _(no flag needed)_     | ACT, Diffusion Policy |
+| Real-Time Chunking (RTC) | `--inference.type=rtc` | Pi0, Pi0.5, SmolVLA   |

 ---

@@ -97,7 +97,7 @@ python src/lerobot/scripts/lerobot_train.py \
 **Standard inference (ACT, Diffusion Policy):**

 ```bash
-python examples/hil/hil_data_collection.py \
+lerobot-rollout --strategy.type=dagger \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -108,11 +108,10 @@ python examples/hil/hil_data_collection.py \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/hil-dataset \
+    --dataset.repo_id=your-username/rollout_hil_dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --dataset.episode_time_s=1000 \
-    --dataset.num_episodes=50 \
+    --strategy.num_episodes=50 \
    --interpolation_multiplier=2
 ```

@@ -121,11 +120,11 @@ python examples/hil/hil_data_collection.py \
 For models with high inference latency, enable RTC for smooth execution:

 ```bash
-python examples/hil/hil_data_collection.py \
-    --rtc.enabled=true \
-    --rtc.execution_horizon=20 \
-    --rtc.max_guidance_weight=5.0 \
-    --rtc.prefix_attention_schedule=LINEAR \
+lerobot-rollout --strategy.type=dagger \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=20 \
+    --inference.rtc.max_guidance_weight=5.0 \
+    --inference.rtc.prefix_attention_schedule=LINEAR \
    --robot.type=bi_openarm_follower \
    --robot.left_arm_config.port=can1 \
    --robot.left_arm_config.side=left \
@@ -136,11 +135,10 @@ python examples/hil/hil_data_collection.py \
    --teleop.port_left=/dev/ttyACM0 \
    --teleop.port_right=/dev/ttyACM1 \
    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
-    --dataset.repo_id=your-username/hil-rtc-dataset \
+    --dataset.repo_id=your-username/rollout_hil_rtc_dataset \
    --dataset.single_task="Fold the T-shirt properly" \
    --dataset.fps=30 \
-    --dataset.episode_time_s=1000 \
-    --dataset.num_episodes=50 \
+    --strategy.num_episodes=50 \
    --interpolation_multiplier=3
 ```

@@ -235,7 +233,7 @@ This HIL data collection approach builds on ideas from interactive imitation lea

 - **HG-DAgger** (Kelly et al., 2019) made this practical for robotics: a human expert monitors the robot and only intervenes when needed, rather than labeling every state. The gating between autonomous and human control is exactly the pause → takeover → return-to-policy loop used in the scripts here.

- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the HIL scripts in `examples/hil`.
+- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the DAgger strategy in `lerobot-rollout`.

 - **π0.6/RECAP** (Physical Intelligence, 2025) applies the same iterative collect-and-finetune loop at scale with VLA models, showing that even large pretrained policies benefit substantially from targeted human corrections on their own failure modes. π0.6 is trained using RECAP.

@@ -820,10 +820,10 @@ The LeRobot system uses a distributed actor-learner architecture for training. T

 Create a training configuration file (example available [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/rl/train_config.json)). The training config is based on the main `TrainRLServerPipelineConfig` class in `lerobot/configs/train.py`.

-1. Configure the policy settings (`type="sac"`, `device`, etc.)
+1. Configure the policy settings (`type="gaussian_actor"`, `device`, etc.)
 2. Set `dataset` to your cropped dataset
 3. Configure environment settings with crop parameters
-4. Check the other parameters related to SAC in [configuration_sac.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/sac/configuration_sac.py#L79).
+4. Check the other parameters related to the Gaussian Actor in [configuration_gaussian_actor.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/gaussian_actor/configuration_gaussian_actor.py#L79).
 5. Verify that the `policy` config is correct with the right `input_features` and `output_features` for your task.

 **Starting the Learner**
@@ -926,7 +926,7 @@ The ideal behaviour is that your intervention rate should drop gradually during

 Some configuration values have a disproportionate impact on training stability and speed:

- **`temperature_init`** (`policy.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
+- **`temperature_init`** (`algorithm.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
 - **`policy_parameters_push_frequency`** (`policy.actor_learner_config.policy_parameters_push_frequency`) – interval in _seconds_ between two weight pushes from the learner to the actor. The default is `4 s`. Decrease to **1-2 s** to provide fresher weights (at the cost of more network traffic); increase only if your connection is slow, as this will reduce sample efficiency.
 - **`storage_device`** (`policy.storage_device`) – device on which the learner keeps the policy parameters. If you have spare GPU memory, set this to `"cuda"` (instead of the default `"cpu"`). Keeping the weights on-GPU removes CPU→GPU transfer overhead and can significantly increase the number of learner updates per second.

@@ -509,121 +509,42 @@ hf upload ${HF_USER}/act_so101_test${CKPT} \

 ## Run inference and evaluate your policy

-You can use the `record` script from [`lerobot-record`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy. For instance, run this command or API example to run inference and record 10 evaluation episodes:
+Use `lerobot-rollout` to deploy a trained policy on your robot. You can choose different strategies depending on your needs:

 <hfoptions id="eval">
-<hfoption id="Command">
+<hfoption id="Base mode (no recording)">
 ```bash
-lerobot-record  \
+lerobot-rollout \
+  --strategy.type=base \
+  --policy.path=${HF_USER}/my_policy \
  --robot.type=so100_follower \
  --robot.port=/dev/ttyACM1 \
  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
-  --robot.id=my_awesome_follower_arm \
-  --display_data=false \
-  --dataset.repo_id=${HF_USER}/eval_so100 \
-  --dataset.single_task="Put lego brick into the transparent box" \
-  --dataset.streaming_encoding=true \
-  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
-  # <- Teleop optional if you want to teleoperate in between episodes \
-  # --teleop.type=so100_leader \
-  # --teleop.port=/dev/ttyACM0 \
-  # --teleop.id=my_awesome_leader_arm \
-  --policy.path=${HF_USER}/my_policy
+  --task="Put lego brick into the transparent box" \
+  --duration=60
 ```
 </hfoption>
-<hfoption id="API example">
-
-<!-- prettier-ignore-start -->
-```python
-from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.datasets import LeRobotDataset
-from lerobot.utils.feature_utils import hw_to_dataset_features
-from lerobot.policies.act import ACTPolicy
-from lerobot.policies import make_pre_post_processors
-from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
-from lerobot.scripts.lerobot_record import record_loop
-from lerobot.common.control_utils import init_keyboard_listener
-from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun
-
-
-NUM_EPISODES = 5
-FPS = 30
-EPISODE_TIME_SEC = 60
-TASK_DESCRIPTION = "My task description"
-HF_MODEL_ID = "<hf_username>/<model_repo_id>"
-HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"
-
-# Create the robot configuration
-camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
-robot_config = SO100FollowerConfig(
-    port="/dev/tty.usbmodem58760434471", id="my_awesome_follower_arm", cameras=camera_config
-)
-
-# Initialize the robot
-robot = SO100Follower(robot_config)
-
-# Initialize the policy
-policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
-
-# Configure the dataset features
-action_features = hw_to_dataset_features(robot.action_features, "action")
-obs_features = hw_to_dataset_features(robot.observation_features, "observation")
-dataset_features = {**action_features, **obs_features}
-
-# Create the dataset
-dataset = LeRobotDataset.create(
-    repo_id=HF_DATASET_ID,
-    fps=FPS,
-    features=dataset_features,
-    robot_type=robot.name,
-    use_videos=True,
-    image_writer_threads=4,
-)
-
-# Initialize the keyboard listener and rerun visualization
-_, events = init_keyboard_listener()
-init_rerun(session_name="recording")
-
-# Connect the robot
-robot.connect()
-
-preprocessor, postprocessor = make_pre_post_processors(
-    policy_cfg=policy,
-    pretrained_path=HF_MODEL_ID,
-    dataset_stats=dataset.meta.stats,
-)
-
-for episode_idx in range(NUM_EPISODES):
-    log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
-
-    # Run the policy inference loop
-    record_loop(
-        robot=robot,
-        events=events,
-        fps=FPS,
-        policy=policy,
-        preprocessor=preprocessor,
-        postprocessor=postprocessor,
-        dataset=dataset,
-        control_time_s=EPISODE_TIME_SEC,
-        single_task=TASK_DESCRIPTION,
-        display_data=True,
-    )
-
-    dataset.save_episode()
-
-# Clean up
-robot.disconnect()
-dataset.push_to_hub()
+<hfoption id="Sentry mode (with recording)">
+```bash
+lerobot-rollout \
+  --strategy.type=sentry \
+  --strategy.upload_every_n_episodes=5 \
+  --policy.path=${HF_USER}/my_policy \
+  --robot.type=so100_follower \
+  --robot.port=/dev/ttyACM1 \
+  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --dataset.repo_id=${HF_USER}/eval_so100 \
+  --dataset.single_task="Put lego brick into the transparent box" \
+  --duration=600
 ```
-<!-- prettier-ignore-end -->
-
 </hfoption>
 </hfoptions>

-As you can see, it's almost the same command as previously used to record your training dataset. Two things changed:
+The `--strategy.type` flag selects the execution mode:

-1. There is an additional `--control.policy.path` argument which indicates the path to your policy checkpoint with (e.g. `outputs/train/eval_act_so101_test/checkpoints/last/pretrained_model`). You can also use the model repository if you uploaded a model checkpoint to the hub (e.g. `${HF_USER}/act_so101_test`).
-2. The name of dataset begins by `eval` to reflect that you are running inference (e.g. `${HF_USER}/eval_act_so101_test`).
+- `base`: Autonomous rollout with no data recording (useful for quick evaluation)
+- `sentry`: Continuous recording with auto-upload (useful for large-scale evaluation)
+- `highlight`: Ring buffer recording with keystroke save (useful for capturing interesting events)
+- `dagger`: Human-in-the-loop data collection (see [HIL Data Collection](./hil_data_collection))
+
+All strategies support `--inference.type=rtc` for smooth execution with slow VLA models (Pi0, Pi0.5, SmolVLA).
@@ -0,0 +1,261 @@
+# Policy Deployment (lerobot-rollout)
+
+`lerobot-rollout` is the single CLI for deploying trained policies on real robots. It supports multiple execution strategies and inference backends, from quick evaluation to continuous recording and human-in-the-loop data collection.
+
+## Quick Start
+
+No extra dependencies are needed beyond your robot and policy extras.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --policy.path=lerobot/act_koch_real \
+    --robot.type=koch_follower \
+    --robot.port=/dev/ttyACM0 \
+    --task="pick up cube" \
+    --duration=30
+```
+
+This runs the policy for 30 seconds with no recording.
+
+---
+
+## Strategies
+
+Select a strategy with `--strategy.type=<name>`. Each strategy defines a different control loop with its own recording and interaction semantics.
+
+### Base (`--strategy.type=base`)
+
+Autonomous policy execution with no data recording. Use this for quick evaluation, demos, or when you only need to observe the robot.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --task="Put lego brick into the box" \
+    --duration=60
+```
+
+| Flag             | Description                                            |
+| ---------------- | ------------------------------------------------------ |
+| `--duration`     | Run time in seconds (0 = infinite)                     |
+| `--task`         | Task description passed to the policy                  |
+| `--display_data` | Stream observations/actions to Rerun for visualization |
+
+### Sentry (`--strategy.type=sentry`)
+
+Continuous autonomous recording with periodic upload to the Hugging Face Hub. Episode boundaries are auto-computed from camera resolution and FPS so each saved episode produces a complete video file, keeping uploads efficient.
+
+Policy state (hidden state, RTC queue) persists across episode boundaries: the robot does not reset between episodes.
+
+```bash
+lerobot-rollout \
+    --strategy.type=sentry \
+    --strategy.upload_every_n_episodes=5 \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --dataset.repo_id=${HF_USER}/rollout_eval_data \
+    --dataset.single_task="Put lego brick into the box" \
+    --duration=3600
+```
+
+| Flag                                   | Description                                                 |
+| -------------------------------------- | ----------------------------------------------------------- |
+| `--strategy.upload_every_n_episodes`   | Push to Hub every N episodes (default: 5)                   |
+| `--strategy.target_video_file_size_mb` | Target video file size for episode rotation (default: auto) |
+| `--dataset.repo_id`                    | **Required.** Hub repository for the recorded dataset       |
+| `--dataset.push_to_hub`                | Whether to push to Hub on teardown (default: true)          |
+
+### Highlight (`--strategy.type=highlight`)
+
+Autonomous rollout with on-demand recording via a memory-bounded ring buffer. The robot runs continuously while the buffer captures the last N seconds of telemetry. Press the save key to flush the buffer and start live recording; press it again to save the episode.
+
+```bash
+lerobot-rollout \
+    --strategy.type=highlight \
+    --strategy.ring_buffer_seconds=30 \
+    --strategy.save_key=s \
+    --strategy.push_key=h \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=koch_follower \
+    --robot.port=/dev/ttyACM0 \
+    --dataset.repo_id=${HF_USER}/rollout_highlight_data \
+    --dataset.single_task="Pick up the red cube"
+```
+
+**Keyboard controls:**
+
+| Key                | Action                                                   |
+| ------------------ | -------------------------------------------------------- |
+| `s` (configurable) | Start recording (flushes buffer) / stop and save episode |
+| `h` (configurable) | Push dataset to Hub                                      |
+| `ESC`              | Stop the session                                         |
+
+| Flag                                   | Description                                    |
+| -------------------------------------- | ---------------------------------------------- |
+| `--strategy.ring_buffer_seconds`       | Duration of buffered telemetry (default: 30)   |
+| `--strategy.ring_buffer_max_memory_mb` | Memory cap for the ring buffer (default: 2048) |
+| `--strategy.save_key`                  | Key to toggle recording (default: `s`)         |
+| `--strategy.push_key`                  | Key to push to Hub (default: `h`)              |
+
+### DAgger (`--strategy.type=dagger`)
+
+Human-in-the-loop data collection. Alternates between autonomous policy execution and human intervention via a teleoperator. Intervention frames are tagged with `intervention=True`. Requires a teleoperator (`--teleop.type`).
+
+See the [Human-In-the-Loop Data Collection](./hil_data_collection) guide for a detailed walkthrough.
+
+**Corrections-only mode** (default): Only human correction windows are recorded. Each correction becomes one episode.
+
+```bash
+lerobot-rollout \
+    --strategy.type=dagger \
+    --strategy.num_episodes=20 \
+    --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
+    --robot.type=bi_openarm_follower \
+    --teleop.type=openarm_mini \
+    --dataset.repo_id=${HF_USER}/rollout_hil_data \
+    --dataset.single_task="Fold the T-shirt"
+```
+
+**Continuous recording mode** (`--strategy.record_autonomous=true`): Both autonomous and correction frames are recorded with time-based episode rotation (same as Sentry).
+
+```bash
+lerobot-rollout \
+    --strategy.type=dagger \
+    --strategy.record_autonomous=true \
+    --strategy.num_episodes=50 \
+    --policy.path=${HF_USER}/my_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --teleop.type=so101_leader \
+    --teleop.port=/dev/ttyACM1 \
+    --dataset.repo_id=${HF_USER}/rollout_dagger_data \
+    --dataset.single_task="Grasp the block"
+```
+
+**Keyboard controls** (default input device):
+
+| Key     | Action                                      |
+| ------- | ------------------------------------------- |
+| `Space` | Pause / resume policy execution             |
+| `Tab`   | Start / stop human correction               |
+| `Enter` | Push dataset to Hub (corrections-only mode) |
+| `ESC`   | Stop the session                            |
+
+Foot pedal input is also supported via `--strategy.input_device=pedal`. Configure pedal codes with `--strategy.pedal.*` flags.
+
+| Flag                                 | Description                                             |
+| ------------------------------------ | ------------------------------------------------------- |
+| `--strategy.num_episodes`            | Number of correction episodes to record (default: 10)   |
+| `--strategy.record_autonomous`       | Record autonomous frames too (default: false)           |
+| `--strategy.upload_every_n_episodes` | Push to Hub every N episodes (default: 5)               |
+| `--strategy.input_device`            | Input device: `keyboard` or `pedal` (default: keyboard) |
+| `--teleop.type`                      | **Required.** Teleoperator type                         |
+
+---
+
+## Inference Backends
+
+Select a backend with `--inference.type=<name>`. All strategies work with both backends.
+
+### Sync (default)
+
+One policy call per control tick. The main loop blocks until the action is computed.
+
+Works with all policies. No extra flags needed.
+
+### Real-Time Chunking (`--inference.type=rtc`)
+
+A background thread produces action chunks asynchronously. The main control loop polls for the next ready action while the policy computes the next chunk in parallel.
+
+Use RTC with large, slow VLA models (Pi0, Pi0.5, SmolVLA) for smooth, continuous motion despite high inference latency.
+
+```bash
+lerobot-rollout \
+    --strategy.type=base \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=10 \
+    --inference.rtc.max_guidance_weight=10.0 \
+    --policy.path=${HF_USER}/pi0_policy \
+    --robot.type=so100_follower \
+    --robot.port=/dev/ttyACM0 \
+    --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
+    --task="Pick up the cube" \
+    --duration=60 \
+    --device=cuda
+```
+
+| Flag                                        | Description                                                    |
+| ------------------------------------------- | -------------------------------------------------------------- |
+| `--inference.rtc.execution_horizon`         | Steps to blend with previous chunk (default: varies by policy) |
+| `--inference.rtc.max_guidance_weight`       | Consistency enforcement strength (default: varies by policy)   |
+| `--inference.rtc.prefix_attention_schedule` | Blend schedule: `LINEAR`, `EXP`, `ONES`, `ZEROS`               |
+| `--inference.queue_threshold`               | Max queue size before backpressure (default: 30)               |
+
+See the [Real-Time Chunking](./rtc) guide for details on tuning RTC parameters.
+
+---
+
+## Common Flags
+
+| Flag                              | Description                                                       | Default |
+| --------------------------------- | ----------------------------------------------------------------- | ------- |
+| `--policy.path`                   | **Required.** HF Hub model ID or local checkpoint path            | --      |
+| `--robot.type`                    | **Required.** Robot type (e.g. `so100_follower`, `koch_follower`) | --      |
+| `--robot.port`                    | Serial port for the robot                                         | --      |
+| `--robot.cameras`                 | Camera configuration (JSON dict)                                  | --      |
+| `--fps`                           | Control loop frequency                                            | 30      |
+| `--duration`                      | Run time in seconds (0 = infinite)                                | 0       |
+| `--device`                        | Torch device (`cpu`, `cuda`, `mps`)                               | auto    |
+| `--task`                          | Task description (used when no dataset is provided)               | --      |
+| `--display_data`                  | Stream telemetry to Rerun visualization                           | false   |
+| `--display_ip` / `--display_port` | Remote Rerun server address                                       | --      |
+| `--interpolation_multiplier`      | Action interpolation factor                                       | 1       |
+| `--use_torch_compile`             | Enable `torch.compile` for inference                              | false   |
+| `--resume`                        | Resume a previous recording session                               | false   |
+| `--play_sounds`                   | Vocal synthesis for events                                        | true    |
+
+---
+
+## Programmatic Usage
+
+For custom deployments (e.g. with kinematics processors), use the rollout module API directly:
+
+```python
+from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
+from lerobot.rollout.inference import SyncInferenceConfig
+from lerobot.rollout.strategies import BaseStrategy
+from lerobot.utils.process import ProcessSignalHandler
+
+cfg = RolloutConfig(
+    robot=my_robot_config,
+    policy=my_policy_config,
+    strategy=BaseStrategyConfig(),
+    inference=SyncInferenceConfig(),
+    fps=30,
+    duration=60,
+    task="my task",
+)
+
+signal_handler = ProcessSignalHandler(use_threads=True)
+ctx = build_rollout_context(
+    cfg,
+    signal_handler.shutdown_event,
+    robot_action_processor=my_custom_action_processor,       # optional
+    robot_observation_processor=my_custom_obs_processor,     # optional
+)
+
+strategy = BaseStrategy(cfg.strategy)
+try:
+    strategy.setup(ctx)
+    strategy.run(ctx)
+finally:
+    strategy.teardown(ctx)
+```
+
+See `examples/so100_to_so100_EE/rollout.py` and `examples/phone_to_so100/rollout.py` for full examples with kinematics processors.
@@ -61,17 +61,6 @@ lerobot-eval \
  --rename_map='{"observation.images.image": "observation.images.base_0_rgb", "observation.images.image2": "observation.images.left_wrist_0_rgb"}'
 ```

-### Recording
-
-`lerobot-record` also supports rename maps, nested under the dataset config:
-
-```bash
-lerobot-record \ # When running inference
-  --policy.path="<user>/smolVLA_finetuned" \
-  ... \
-  --dataset.rename_map='{"observation.images.glove2": "observation.images.image"}'
-```
-
 ## Alternative: edit the policy config directly

 If you always use the same dataset or environment, you can **edit the policy's `config.json`** so its observation keys match your data source. Then no rename map is needed.
@@ -105,10 +94,10 @@ XVLA-base has three visual inputs and `empty_cameras=0` by default. Your dataset

 ## Quick reference

-| Goal                                      | What to do                                                                  |
-| ----------------------------------------- | --------------------------------------------------------------------------- |
-| Dataset keys ≠ policy keys                | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
-| Env keys ≠ policy keys (eval)             | `--rename_map='{"env_key": "policy_key", ...}'`                             |
-| Recording with different keys (inference) | `--dataset.rename_map='{"source_key": "policy_key", ...}'`.                 |
-| Fewer cameras than policy expects         | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
-| Avoid passing a rename map                | Edit the policy's `config.json` so its keys match your data source          |
+| Goal                                    | What to do                                                                  |
+| --------------------------------------- | --------------------------------------------------------------------------- |
+| Dataset keys ≠ policy keys              | `--rename_map='{"dataset_key": "policy_key", ...}'`                         |
+| Env keys ≠ policy keys (eval)           | `--rename_map='{"env_key": "policy_key", ...}'`                             |
+| Rollout with different keys (inference) | `--rename_map='{"source_key": "policy_key", ...}'`.                         |
+| Fewer cameras than policy expects       | `--policy.empty_cameras=N` (supported by PI0, PI05, PI0Fast, SmolVLA, XVLA) |
+| Avoid passing a rename map              | Edit the policy's `config.json` so its keys match your data source          |
@@ -34,7 +34,7 @@ pip install -e ".[smolvla]"

 ### Using RTC with Pi0

-You can find a complete reference implementation in [eval_with_real_robot.py](examples/rtc/eval_with_real_robot.py).
+You can use `lerobot-rollout --strategy.type=base --inference.type=rtc` for RTC deployment on real robots.
 The snippet below provides a simplified pseudo-example of how RTC operates with Pi0 in your pipeline:

 ```python
@@ -137,8 +137,12 @@ The script generates a visualization of the denoising process, comparing standar
 ## Testing RTC with a Real Robot

 ```bash
-python examples/rtc/eval_with_real_robot.py \
+lerobot-rollout \
+    --strategy.type=base \
    --policy.path=${HF_USERNAME}/policy_repo_id \
+    --inference.type=rtc \
+    --inference.rtc.execution_horizon=10 \
+    --inference.rtc.max_guidance_weight=10.0 \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58FA0834591 \
    --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
@@ -178,7 +182,7 @@ visualizer = RTCDebugVisualizer()
 # ... create plots
 ```

-See `examples/rtc/eval_dataset.py` for a complete example of visualization.
+See `examples/rtc/eval_dataset.py` for a complete example of offline RTC visualization.

 ## References

@@ -274,7 +274,8 @@ python src/lerobot/scripts/lerobot_train.py \
 Once trained, we recommend deploying policies using inference-time RTC:

 ```bash
-python examples/rtc/eval_with_real_robot.py \
+lerobot-rollout \
+  --strategy.type=base \
  --policy.path=your-username/your-repo-id \
  --policy.device=cuda \
  --robot.type=unitree_g1 \
@@ -284,7 +285,7 @@ python examples/rtc/eval_with_real_robot.py \
  --task="task_description" \
  --duration=1000 \
  --fps=30 \
-  --rtc.enabled=true
+  --inference.type=rtc
 ```

 ---
@@ -1,226 +0,0 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Shared utilities for Human-in-the-Loop data collection scripts."""
-
-import logging
-import time
-from dataclasses import dataclass, field
-from pathlib import Path
-
-from lerobot.common.control_utils import is_headless
-from lerobot.processor import (
-    IdentityProcessorStep,
-    RobotAction,
-    RobotObservation,
-    RobotProcessorPipeline,
-    observation_to_transition,
-    robot_action_observation_to_transition,
-    transition_to_observation,
-    transition_to_robot_action,
-)
-from lerobot.robots import Robot
-from lerobot.teleoperators import Teleoperator
-from lerobot.utils.robot_utils import precise_sleep
-
-logger = logging.getLogger(__name__)
-
-
-@dataclass
-class HILDatasetConfig:
-    repo_id: str
-    single_task: str
-    root: str | Path | None = None
-    fps: int = 30
-    episode_time_s: float = 120
-    num_episodes: int = 50
-    video: bool = True
-    push_to_hub: bool = True
-    private: bool = False
-    tags: list[str] | None = None
-    num_image_writer_processes: int = 0
-    num_image_writer_threads_per_camera: int = 4
-    video_encoding_batch_size: int = 1
-    vcodec: str = "auto"
-    streaming_encoding: bool = True
-    encoder_queue_maxsize: int = 30
-    encoder_threads: int | None = None
-    rename_map: dict[str, str] = field(default_factory=dict)
-
-
-def teleop_has_motor_control(teleop: Teleoperator) -> bool:
-    """Check if teleoperator has motor control capabilities."""
-    return all(hasattr(teleop, attr) for attr in ("enable_torque", "disable_torque", "write_goal_positions"))
-
-
-def teleop_disable_torque(teleop: Teleoperator) -> None:
-    """Disable teleop torque if supported."""
-    if hasattr(teleop, "disable_torque"):
-        teleop.disable_torque()
-
-
-def teleop_enable_torque(teleop: Teleoperator) -> None:
-    """Enable teleop torque if supported."""
-    if hasattr(teleop, "enable_torque"):
-        teleop.enable_torque()
-
-
-def teleop_smooth_move_to(teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50):
-    """Smoothly move teleop to target position if motor control is available."""
-    if not teleop_has_motor_control(teleop):
-        logger.warning("Teleop does not support motor control - cannot mirror robot position")
-        return
-
-    teleop_enable_torque(teleop)
-    current = teleop.get_action()
-    steps = max(int(duration_s * fps), 1)
-
-    for step in range(steps + 1):
-        t = step / steps
-        interp = {}
-        for k in current:
-            if k in target_pos:
-                interp[k] = current[k] * (1 - t) + target_pos[k] * t
-            else:
-                interp[k] = current[k]
-        teleop.write_goal_positions(interp)
-        time.sleep(1 / fps)
-
-
-def init_keyboard_listener():
-    """Initialize keyboard listener with HIL controls."""
-    events = {
-        "exit_early": False,
-        "rerecord_episode": False,
-        "stop_recording": False,
-        "policy_paused": False,
-        "correction_active": False,
-        "resume_policy": False,
-        "in_reset": False,
-        "start_next_episode": False,
-    }
-
-    if is_headless():
-        logger.warning("Headless environment - keyboard controls unavailable")
-        return None, events
-
-    from pynput import keyboard
-
-    def on_press(key):
-        try:
-            if events["in_reset"]:
-                if key in [keyboard.Key.space, keyboard.Key.right]:
-                    logger.info("[HIL] Starting next episode...")
-                    events["start_next_episode"] = True
-                elif hasattr(key, "char") and key.char == "c":
-                    events["start_next_episode"] = True
-                elif key == keyboard.Key.esc:
-                    logger.info("[HIL] ESC - Stop recording, pushing to hub...")
-                    events["stop_recording"] = True
-                    events["start_next_episode"] = True
-            else:
-                if key == keyboard.Key.space:
-                    if not events["policy_paused"] and not events["correction_active"]:
-                        logger.info("[HIL] PAUSED - Press 'c' to take control or 'p' to resume policy")
-                        events["policy_paused"] = True
-                elif hasattr(key, "char") and key.char == "c":
-                    if events["policy_paused"] and not events["correction_active"]:
-                        logger.info("[HIL] Taking control...")
-                        events["start_next_episode"] = True
-                elif hasattr(key, "char") and key.char == "p":
-                    if events["policy_paused"] or events["correction_active"]:
-                        logger.info("[HIL] Resuming policy...")
-                        events["resume_policy"] = True
-                elif key == keyboard.Key.right:
-                    logger.info("[HIL] End episode")
-                    events["exit_early"] = True
-                elif key == keyboard.Key.left:
-                    logger.info("[HIL] Re-record episode")
-                    events["rerecord_episode"] = True
-                    events["exit_early"] = True
-                elif key == keyboard.Key.esc:
-                    logger.info("[HIL] ESC - Stop recording...")
-                    events["stop_recording"] = True
-                    events["exit_early"] = True
-        except Exception as e:
-            logger.info(f"Key error: {e}")
-
-    listener = keyboard.Listener(on_press=on_press)
-    listener.start()
-    return listener, events
-
-
-def make_identity_processors():
-    """Create identity processors for recording."""
-    teleop_proc = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        steps=[IdentityProcessorStep()],
-        to_transition=robot_action_observation_to_transition,
-        to_output=transition_to_robot_action,
-    )
-    obs_proc = RobotProcessorPipeline[RobotObservation, RobotObservation](
-        steps=[IdentityProcessorStep()],
-        to_transition=observation_to_transition,
-        to_output=transition_to_observation,
-    )
-    return teleop_proc, obs_proc
-
-
-def reset_loop(robot: Robot, teleop: Teleoperator, events: dict, fps: int):
-    """Reset period where human repositions environment."""
-    logger.info("[HIL] RESET")
-
-    events["in_reset"] = True
-    events["start_next_episode"] = False
-
-    obs = robot.get_observation()
-    robot_pos = {k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features}
-    teleop_smooth_move_to(teleop, robot_pos, duration_s=2.0, fps=50)
-
-    logger.info("Press any key to enable teleoperation")
-    while not events["start_next_episode"] and not events["stop_recording"]:
-        precise_sleep(0.05)
-
-    if events["stop_recording"]:
-        return
-
-    events["start_next_episode"] = False
-    teleop_disable_torque(teleop)
-    logger.info("Teleop enabled - press any key to start episode")
-
-    while not events["start_next_episode"] and not events["stop_recording"]:
-        loop_start = time.perf_counter()
-        action = teleop.get_action()
-        robot.send_action(action)
-        precise_sleep(1 / fps - (time.perf_counter() - loop_start))
-
-    events["in_reset"] = False
-    events["start_next_episode"] = False
-    events["exit_early"] = False
-    events["policy_paused"] = False
-    events["correction_active"] = False
-    events["resume_policy"] = False
-
-
-def print_controls(rtc: bool = False):
-    """Print control instructions."""
-    mode = "Human-in-the-Loop Data Collection" + (" (RTC)" if rtc else "")
-    logger.info(
-        "%s\n  Controls:\n"
-        "    SPACE  - Pause policy\n"
-        "    c      - Take control\n"
-        "    p      - Resume policy after pause/correction\n"
-        "    →      - End episode\n"
-        "    ESC    - Stop and push to hub",
-        mode,
-    )
@@ -14,17 +14,21 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from lerobot.common.control_utils import init_keyboard_listener
+import logging
+import time
+
+from lerobot.common.control_utils import init_keyboard_listener, predict_action
 from lerobot.datasets import LeRobotDataset
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
+from lerobot.policies.utils import make_robot_action
 from lerobot.processor import make_default_processors
 from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig
-from lerobot.scripts.lerobot_record import record_loop
 from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.feature_utils import hw_to_dataset_features
+from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
+from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun
+from lerobot.utils.visualization_utils import init_rerun, log_rerun_data

 NUM_EPISODES = 2
 FPS = 30
@@ -35,6 +39,9 @@ HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"


 def main():
+    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
+    # This script provides a self-contained example for educational purposes.
+
    # Create the robot configuration & robot
    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")

@@ -83,43 +90,67 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
+        control_interval = 1 / FPS
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
            log_say(f"Running inference, recording eval episode {recorded_episodes} of {NUM_EPISODES}")

-            # Main record loop
-            record_loop(
-                robot=robot,
-                events=events,
-                fps=FPS,
-                policy=policy,
-                preprocessor=preprocessor,  # Pass the pre and post policy processors
-                postprocessor=postprocessor,
-                dataset=dataset,
-                control_time_s=EPISODE_TIME_SEC,
-                single_task=TASK_DESCRIPTION,
-                display_data=True,
-                teleop_action_processor=teleop_action_processor,
-                robot_action_processor=robot_action_processor,
-                robot_observation_processor=robot_observation_processor,
-            )
+            # Inline evaluation loop: predict actions and send to robot
+            timestamp = 0
+            start_episode_t = time.perf_counter()
+            while timestamp < EPISODE_TIME_SEC:
+                start_loop_t = time.perf_counter()
+
+                if events["exit_early"]:
+                    events["exit_early"] = False
+                    break
+
+                # Get robot observation
+                obs = robot.get_observation()
+                obs_processed = robot_observation_processor(obs)
+                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
+
+                # Predict action using the policy
+                action_tensor = predict_action(
+                    observation=observation_frame,
+                    policy=policy,
+                    device=policy.config.device,
+                    preprocessor=preprocessor,
+                    postprocessor=postprocessor,
+                    use_amp=policy.config.device.type == "cuda",
+                    task=TASK_DESCRIPTION,
+                    robot_type=robot.name,
+                )
+
+                # Convert policy output to robot action dict
+                action_values = make_robot_action(action_tensor, dataset.features)
+
+                # Process and send action to robot
+                robot_action_to_send = robot_action_processor((action_values, obs))
+                robot.send_action(robot_action_to_send)
+
+                # Write to dataset
+                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
+                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
+                dataset.add_frame(frame)
+
+                log_rerun_data(observation=obs_processed, action=action_values)
+
+                dt_s = time.perf_counter() - start_loop_t
+                sleep_time_s = control_interval - dt_s
+                if sleep_time_s < 0:
+                    logging.warning(
+                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
+                    )
+                precise_sleep(max(sleep_time_s, 0.0))
+                timestamp = time.perf_counter() - start_episode_t

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                record_loop(
-                    robot=robot,
-                    events=events,
-                    fps=FPS,
-                    control_time_s=EPISODE_TIME_SEC,
-                    single_task=TASK_DESCRIPTION,
-                    display_data=True,
-                    teleop_action_processor=teleop_action_processor,
-                    robot_action_processor=robot_action_processor,
-                    robot_observation_processor=robot_observation_processor,
-                )
+                log_say("Waiting for environment reset, press right arrow key when ready...")

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -45,9 +45,6 @@ def main():
    leader_arm = SO100Leader(leader_arm_config)
    keyboard = KeyboardTeleop(keyboard_config)

-    # TODO(Steven): Update this example to use pipelines
-    teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
-
    # Configure the dataset features
    action_features = hw_to_dataset_features(robot.action_features, ACTION)
    obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
@@ -77,6 +74,10 @@ def main():
        if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
            raise ValueError("Robot or teleop is not connected!")

+        teleop_action_processor, robot_action_processor, robot_observation_processor = (
+            make_default_processors()
+        )
+
        print("Starting record loop...")
        recorded_episodes = 0
        while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
@@ -87,14 +88,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
+                teleop_action_processor=teleop_action_processor,
+                robot_action_processor=robot_action_processor,
+                robot_observation_processor=robot_observation_processor,
                dataset=dataset,
                teleop=[leader_arm, keyboard],
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
-                teleop_action_processor=teleop_action_processor,
-                robot_action_processor=robot_action_processor,
-                robot_observation_processor=robot_observation_processor,
            )

            # Reset the environment if not stopping or re-recording
@@ -106,13 +107,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
+                    teleop_action_processor=teleop_action_processor,
+                    robot_action_processor=robot_action_processor,
+                    robot_observation_processor=robot_observation_processor,
                    teleop=[leader_arm, keyboard],
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
-                    teleop_action_processor=teleop_action_processor,
-                    robot_action_processor=robot_action_processor,
-                    robot_observation_processor=robot_observation_processor,
                )

            if events["rerecord_episode"]:
@@ -0,0 +1,77 @@
+# !/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Run a trained policy on LeKiwi without recording (base rollout).
+
+Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
+no dataset) with :class:`SyncInferenceConfig` (inline policy call per
+control tick).  For a CLI entry point with the same capabilities plus
+recording, upload, and human-in-the-loop variants, see ``lerobot-rollout``.
+"""
+
+from lerobot.configs import PreTrainedConfig
+from lerobot.robots.lekiwi import LeKiwiClientConfig
+from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
+from lerobot.rollout.inference import SyncInferenceConfig
+from lerobot.rollout.strategies import BaseStrategy
+from lerobot.utils.process import ProcessSignalHandler
+from lerobot.utils.utils import init_logging
+
+FPS = 30
+DURATION_SEC = 60
+TASK_DESCRIPTION = "My task description"
+HF_MODEL_ID = "<hf_username>/<model_repo_id>"
+
+
+def main():
+    init_logging()
+
+    # Robot: LeKiwi client — make sure lekiwi_host is already running on the robot.
+    robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
+
+    # Policy: load the pretrained config.  ``pretrained_path`` is read downstream
+    # by ``build_rollout_context`` to reload the full model.
+    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
+    policy_config.pretrained_path = HF_MODEL_ID
+
+    # Assemble the rollout config: base strategy (no recording) + sync inference.
+    cfg = RolloutConfig(
+        robot=robot_config,
+        policy=policy_config,
+        strategy=BaseStrategyConfig(),
+        inference=SyncInferenceConfig(),
+        fps=FPS,
+        duration=DURATION_SEC,
+        task=TASK_DESCRIPTION,
+    )
+
+    # Graceful Ctrl-C: the strategy loop exits when shutdown_event is set.
+    signal_handler = ProcessSignalHandler(use_threads=True)
+
+    # Build the context (connects robot, loads policy, wires the inference strategy).
+    # No custom processors here — LeKiwi runs on raw joint features.
+    ctx = build_rollout_context(cfg, signal_handler.shutdown_event)
+
+    strategy = BaseStrategy(cfg.strategy)
+    try:
+        strategy.setup(ctx)
+        strategy.run(ctx)
+    finally:
+        strategy.teardown(ctx)
+
+
+if __name__ == "__main__":
+    main()
@@ -14,13 +14,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+import logging
+import time
+
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener
+from lerobot.common.control_utils import init_keyboard_listener, predict_action
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
+from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -34,11 +38,12 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
-from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.feature_utils import combine_feature_dicts
+from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
+from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun
+from lerobot.utils.visualization_utils import init_rerun, log_rerun_data

 NUM_EPISODES = 5
 FPS = 30
@@ -49,6 +54,9 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
+    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
+    # This script provides a self-contained example for educational purposes.
+
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -143,43 +151,67 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
+        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Main record loop
-            record_loop(
-                robot=robot,
-                events=events,
-                fps=FPS,
-                policy=policy,
-                preprocessor=preprocessor,  # Pass the pre and post policy processors
-                postprocessor=postprocessor,
-                dataset=dataset,
-                control_time_s=EPISODE_TIME_SEC,
-                single_task=TASK_DESCRIPTION,
-                display_data=True,
-                teleop_action_processor=make_default_teleop_action_processor(),
-                robot_action_processor=robot_ee_to_joints_processor,
-                robot_observation_processor=robot_joints_to_ee_pose_processor,
-            )
+            # Inline evaluation loop: predict actions and send to robot
+            timestamp = 0
+            start_episode_t = time.perf_counter()
+            while timestamp < EPISODE_TIME_SEC:
+                start_loop_t = time.perf_counter()
+
+                if events["exit_early"]:
+                    events["exit_early"] = False
+                    break
+
+                # Get robot observation
+                obs = robot.get_observation()
+                obs_processed = robot_joints_to_ee_pose_processor(obs)
+                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
+
+                # Predict action using the policy
+                action_tensor = predict_action(
+                    observation=observation_frame,
+                    policy=policy,
+                    device=policy.config.device,
+                    preprocessor=preprocessor,
+                    postprocessor=postprocessor,
+                    use_amp=policy.config.device.type == "cuda",
+                    task=TASK_DESCRIPTION,
+                    robot_type=robot.name,
+                )
+
+                # Convert policy output to robot action dict
+                action_values = make_robot_action(action_tensor, dataset.features)
+
+                # Process and send action to robot (EE -> joints via IK)
+                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
+                robot.send_action(robot_action_to_send)
+
+                # Write to dataset
+                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
+                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
+                dataset.add_frame(frame)
+
+                log_rerun_data(observation=obs_processed, action=action_values)
+
+                dt_s = time.perf_counter() - start_loop_t
+                sleep_time_s = control_interval - dt_s
+                if sleep_time_s < 0:
+                    logging.warning(
+                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
+                    )
+                precise_sleep(max(sleep_time_s, 0.0))
+                timestamp = time.perf_counter() - start_episode_t

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                record_loop(
-                    robot=robot,
-                    events=events,
-                    fps=FPS,
-                    control_time_s=EPISODE_TIME_SEC,
-                    single_task=TASK_DESCRIPTION,
-                    display_data=True,
-                    teleop_action_processor=make_default_teleop_action_processor(),
-                    robot_action_processor=robot_ee_to_joints_processor,
-                    robot_observation_processor=robot_joints_to_ee_pose_processor,
-                )
+                log_say("Waiting for environment reset, press right arrow key when ready...")

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -190,7 +222,6 @@ def main():

            # Save episode
            dataset.save_episode()
-            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -65,14 +65,15 @@ def main():
    robot = SO100Follower(robot_config)
    phone = Phone(teleop_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
+    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(robot.bus.motors.keys()),
    )

-    # Build pipeline to convert phone action to EE action
+    # Build pipeline to convert phone action to EE action (with gripper velocity mapped to joint).
    phone_to_robot_ee_pose_processor = RobotProcessorPipeline[
        tuple[RobotAction, RobotObservation], RobotAction
    ](
@@ -94,7 +95,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to joints action
+    # Build pipeline to convert EE action to joints action (IK).
    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            InverseKinematicsEEToJoints(
@@ -107,7 +108,7 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert joint observation to EE observation
+    # Build pipeline to convert joint observation to EE observation (FK).
    robot_joints_to_ee_pose = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -118,13 +119,12 @@ def main():
        to_output=transition_to_observation,
    )

-    # Create the dataset
+    # Create the dataset, deriving features from the pipelines so the on-disk schema
+    # matches exactly what the pipelines produce at runtime.
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
-            # Run the feature contract of the pipelines
-            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=phone_to_robot_ee_pose_processor,
                initial_features=create_initial_features(action=phone.action_features),
@@ -163,14 +163,14 @@ def main():
                robot=robot,
                events=events,
                fps=FPS,
+                teleop_action_processor=phone_to_robot_ee_pose_processor,
+                robot_action_processor=robot_ee_to_joints_processor,
+                robot_observation_processor=robot_joints_to_ee_pose,
                teleop=phone,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
-                teleop_action_processor=phone_to_robot_ee_pose_processor,
-                robot_action_processor=robot_ee_to_joints_processor,
-                robot_observation_processor=robot_joints_to_ee_pose,
            )

            # Reset the environment if not stopping or re-recording
@@ -182,13 +182,13 @@ def main():
                    robot=robot,
                    events=events,
                    fps=FPS,
+                    teleop_action_processor=phone_to_robot_ee_pose_processor,
+                    robot_action_processor=robot_ee_to_joints_processor,
+                    robot_observation_processor=robot_joints_to_ee_pose,
                    teleop=phone,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
-                    teleop_action_processor=phone_to_robot_ee_pose_processor,
-                    robot_action_processor=robot_ee_to_joints_processor,
-                    robot_observation_processor=robot_joints_to_ee_pose,
                )

            if events["rerecord_episode"]:
@@ -0,0 +1,126 @@
+# !/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Run a trained EE-space policy on SO100 (phone-trained) without recording.
+
+Mirrors ``examples/so100_to_so100_EE/rollout.py`` — the model was trained
+with phone teleoperation in EE space, so at deployment we only need the
+joint↔EE conversion on the robot side; the phone is not used.
+
+Uses :class:`BaseStrategy` (no recording) + :class:`SyncInferenceConfig`
+(inline policy call).  For recording during rollout, switch to Sentry,
+Highlight, or DAgger via ``lerobot-rollout --strategy.type=...``.
+"""
+
+from lerobot.cameras.opencv import OpenCVCameraConfig
+from lerobot.configs import PreTrainedConfig
+from lerobot.model.kinematics import RobotKinematics
+from lerobot.processor import (
+    RobotProcessorPipeline,
+    observation_to_transition,
+    robot_action_observation_to_transition,
+    transition_to_observation,
+    transition_to_robot_action,
+)
+from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
+from lerobot.robots.so_follower.robot_kinematic_processor import (
+    ForwardKinematicsJointsToEE,
+    InverseKinematicsEEToJoints,
+)
+from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
+from lerobot.rollout.inference import SyncInferenceConfig
+from lerobot.rollout.strategies import BaseStrategy
+from lerobot.types import RobotAction, RobotObservation
+from lerobot.utils.process import ProcessSignalHandler
+from lerobot.utils.utils import init_logging
+
+FPS = 30
+DURATION_SEC = 60
+TASK_DESCRIPTION = "My task description"
+HF_MODEL_ID = "<hf_username>/<model_repo_id>"
+
+
+def main():
+    init_logging()
+
+    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
+    robot_config = SO100FollowerConfig(
+        port="/dev/tty.usbmodem58760434471",
+        id="my_awesome_follower_arm",
+        cameras=camera_config,
+        use_degrees=True,
+    )
+
+    # Peek at motor names once to build the kinematic solver.
+    temp_robot = SO100Follower(robot_config)
+    motor_names = list(temp_robot.bus.motors.keys())
+
+    kinematics_solver = RobotKinematics(
+        urdf_path="./SO101/so101_new_calib.urdf",
+        target_frame_name="gripper_frame_link",
+        joint_names=motor_names,
+    )
+
+    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
+        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
+        to_transition=observation_to_transition,
+        to_output=transition_to_observation,
+    )
+
+    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
+        steps=[
+            InverseKinematicsEEToJoints(
+                kinematics=kinematics_solver,
+                motor_names=motor_names,
+                initial_guess_current_joints=True,
+            ),
+        ],
+        to_transition=robot_action_observation_to_transition,
+        to_output=transition_to_robot_action,
+    )
+
+    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
+    policy_config.pretrained_path = HF_MODEL_ID
+
+    cfg = RolloutConfig(
+        robot=robot_config,
+        policy=policy_config,
+        strategy=BaseStrategyConfig(),
+        inference=SyncInferenceConfig(),
+        fps=FPS,
+        duration=DURATION_SEC,
+        task=TASK_DESCRIPTION,
+    )
+
+    signal_handler = ProcessSignalHandler(use_threads=True)
+
+    ctx = build_rollout_context(
+        cfg,
+        signal_handler.shutdown_event,
+        robot_action_processor=robot_ee_to_joints_processor,
+        robot_observation_processor=robot_joints_to_ee_pose_processor,
+    )
+
+    strategy = BaseStrategy(cfg.strategy)
+    try:
+        strategy.setup(ctx)
+        strategy.run(ctx)
+    finally:
+        strategy.teardown(ctx)
+
+
+if __name__ == "__main__":
+    main()
@@ -1,673 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Demo script showing how to use Real-Time Chunking (RTC) with action chunking policies on real robots.
-
-This script demonstrates:
-1. Creating a robot and policy (SmolVLA, Pi0, etc.) with RTC
-2. Consuming actions from the policy while the robot executes
-3. Periodically requesting new action chunks in the background using threads
-4. Managing action buffers and timing for real-time operation
-
-For simulation environments, see eval_with_simulation.py
-
-Usage:
-    # Run RTC with Real robot with RTC
-    uv run examples/rtc/eval_with_real_robot.py \
-        --policy.path=<USER>/smolvla_check_rtc_last3 \
-        --policy.device=mps \
-        --rtc.enabled=true \
-        --rtc.execution_horizon=20 \
-        --robot.type=so100_follower \
-        --robot.port=/dev/tty.usbmodem58FA0834591 \
-        --robot.id=so100_follower \
-        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-        --task="Move green small object into the purple platform" \
-        --duration=120
-
-    # Run RTC with Real robot without RTC
-    uv run examples/rtc/eval_with_real_robot.py \
-        --policy.path=<USER>/smolvla_check_rtc_last3 \
-        --policy.device=mps \
-        --rtc.enabled=false \
-        --robot.type=so100_follower \
-        --robot.port=/dev/tty.usbmodem58FA0834591 \
-        --robot.id=so100_follower \
-        --robot.cameras="{ gripper: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-        --task="Move green small object into the purple platform" \
-        --duration=120
-
-    # Run RTC with Real robot with pi0.5 policy
-    uv run examples/rtc/eval_with_real_robot.py \
-        --policy.path=<USER>/pi05_check_rtc \
-        --policy.device=mps \
-        --rtc.enabled=true \
-        --rtc.execution_horizon=20 \
-        --robot.type=so100_follower \
-        --robot.port=/dev/tty.usbmodem58FA0834591 \
-        --robot.id=so100_follower \
-        --robot.cameras="{ gripper: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}}" \
-        --task="Move green small object into the purple platform" \
-        --duration=120
-
-    # Run RTC with bi_openarm_follower (dual-arm OpenArms) and pi0.5 policy
-    python examples/rtc/eval_with_real_robot.py \
-        --policy.path=lerobot-data-collection/folding_final \
-        --robot.type=bi_openarm_follower \
-        --robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}}' \
-        --robot.left_arm_config.port=can0 \
-        --robot.left_arm_config.side=left \
-        --robot.left_arm_config.can_interface=socketcan \
-        --robot.left_arm_config.disable_torque_on_disconnect=true \
-        --robot.left_arm_config.max_relative_target=8.0 \
-        --robot.right_arm_config.port=can1 \
-        --robot.right_arm_config.side=right \
-        --robot.right_arm_config.can_interface=socketcan \
-        --robot.right_arm_config.disable_torque_on_disconnect=true \
-        --robot.right_arm_config.max_relative_target=8.0 \
-        --task="Fold the T-shirt properly" \
-        --fps=30 \
-        --duration=2000 \
-        --interpolation_multiplier=3 \
-        --rtc.enabled=true \
-        --rtc.execution_horizon=20 \
-        --rtc.max_guidance_weight=5.0 \
-        --rtc.prefix_attention_schedule=LINEAR \
-        --device=cuda
-"""
-
-import logging
-import math
-import sys
-import time
-import traceback
-from dataclasses import dataclass, field
-from threading import Event, Lock, Thread
-
-import torch
-from torch import Tensor
-
-from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
-from lerobot.cameras.realsense import RealSenseCameraConfig  # noqa: F401
-from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
-from lerobot.configs import PreTrainedConfig, RTCAttentionSchedule, parser
-from lerobot.policies import get_policy_class, make_pre_post_processors
-from lerobot.policies.rtc import ActionInterpolator, ActionQueue, LatencyTracker, RTCConfig
-from lerobot.processor import (
-    NormalizerProcessorStep,
-    RelativeActionsProcessorStep,
-    TransitionKey,
-    create_transition,
-    make_default_robot_action_processor,
-    make_default_robot_observation_processor,
-    to_relative_actions,
-)
-from lerobot.rl.process import ProcessSignalHandler
-from lerobot.robots import (  # noqa: F401
-    Robot,
-    RobotConfig,
-    bi_openarm_follower,
-    bi_so_follower,
-    koch_follower,
-    so_follower,
-    unitree_g1,
-)
-from lerobot.robots.utils import make_robot_from_config
-from lerobot.utils.constants import OBS_IMAGES, OBS_STATE
-from lerobot.utils.feature_utils import build_dataset_frame, hw_to_dataset_features
-from lerobot.utils.hub import HubMixin
-from lerobot.utils.utils import init_logging
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-
-class RobotWrapper:
-    def __init__(self, robot: Robot):
-        self.robot = robot
-        self.lock = Lock()
-
-    def get_observation(self) -> dict[str, Tensor]:
-        with self.lock:
-            return self.robot.get_observation()
-
-    def send_action(self, action: Tensor):
-        with self.lock:
-            self.robot.send_action(action)
-
-    def observation_features(self) -> list[str]:
-        with self.lock:
-            return self.robot.observation_features
-
-    def action_features(self) -> list[str]:
-        with self.lock:
-            return self.robot.action_features
-
-
-@dataclass
-class RTCDemoConfig(HubMixin):
-    """Configuration for RTC demo with action chunking policies and real robots."""
-
-    # Policy configuration
-    policy: PreTrainedConfig | None = None
-
-    # Robot configuration
-    robot: RobotConfig | None = None
-
-    # RTC configuration
-    rtc: RTCConfig = field(
-        default_factory=lambda: RTCConfig(
-            execution_horizon=10,
-            max_guidance_weight=1.0,
-            prefix_attention_schedule=RTCAttentionSchedule.EXP,
-        )
-    )
-
-    # Demo parameters
-    duration: float = 30.0  # Duration to run the demo (seconds)
-    fps: float = 10.0  # Action execution frequency (Hz)
-    interpolation_multiplier: int = 1  # Control rate multiplier (1=off, 2=2x, 3=3x)
-
-    # Compute device
-    device: str | None = None  # Device to run on (cuda, cpu, auto)
-
-    # Get new actions horizon. The amount of executed steps after which will be requested new actions.
-    # It should be higher than inference delay + execution horizon.
-    action_queue_size_to_get_new_actions: int = 30
-
-    # Task to execute
-    task: str = field(default="", metadata={"help": "Task to execute"})
-
-    # Torch compile configuration
-    use_torch_compile: bool = field(
-        default=False,
-        metadata={"help": "Use torch.compile for faster inference (PyTorch 2.0+)"},
-    )
-
-    torch_compile_backend: str = field(
-        default="inductor",
-        metadata={"help": "Backend for torch.compile (inductor, aot_eager, cudagraphs)"},
-    )
-
-    torch_compile_mode: str = field(
-        default="default",
-        metadata={"help": "Compilation mode (default, reduce-overhead, max-autotune)"},
-    )
-
-    torch_compile_disable_cudagraphs: bool = field(
-        default=True,
-        metadata={
-            "help": "Disable CUDA graphs in torch.compile. Required due to in-place tensor "
-            "operations in denoising loop (x_t += dt * v_t) which cause tensor aliasing issues."
-        },
-    )
-
-    def __post_init__(self):
-        # HACK: We parse again the cli args here to get the pretrained path if there was one.
-        policy_path = parser.get_path_arg("policy")
-        if policy_path:
-            cli_overrides = parser.get_cli_overrides("policy")
-            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
-            self.policy.pretrained_path = policy_path
-        else:
-            raise ValueError("Policy path is required")
-
-        # Validate that robot configuration is provided
-        if self.robot is None:
-            raise ValueError("Robot configuration must be provided")
-
-    @classmethod
-    def __get_path_fields__(cls) -> list[str]:
-        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
-        return ["policy"]
-
-
-def is_image_key(k: str) -> bool:
-    return k.startswith(OBS_IMAGES)
-
-
-def _reanchor_relative_rtc_prefix(
-    prev_actions_absolute: Tensor,
-    current_state: Tensor,
-    relative_step: RelativeActionsProcessorStep,
-    normalizer_step: NormalizerProcessorStep | None,
-    policy_device: torch.device | str,
-) -> Tensor:
-    """Convert absolute leftovers into model-space for relative-action RTC policies.
-
-    When a policy uses relative actions, the RTC prefix (leftover actions from
-    the previous chunk) is stored in absolute space. Before feeding it back to
-    the policy we need to re-express it relative to the *current* robot state
-    and then re-normalize.
-    """
-    state = current_state.detach().cpu()
-    if state.dim() == 1:
-        state = state.unsqueeze(0)
-
-    action_cpu = prev_actions_absolute.detach().cpu()
-    mask = relative_step._build_mask(action_cpu.shape[-1])
-    relative_actions = to_relative_actions(action_cpu, state, mask)
-
-    transition = create_transition(action=relative_actions)
-    if normalizer_step is not None:
-        transition = normalizer_step(transition)
-
-    return transition[TransitionKey.ACTION].to(policy_device)
-
-
-def get_actions(
-    policy,
-    robot: RobotWrapper,
-    robot_observation_processor,
-    action_queue: ActionQueue,
-    shutdown_event: Event,
-    cfg: RTCDemoConfig,
-):
-    """Thread function to request action chunks from the policy.
-
-    Args:
-        policy: The policy instance (SmolVLA, Pi0, etc.)
-        robot: The robot instance for getting observations
-        robot_observation_processor: Processor for raw robot observations
-        action_queue: Queue to put new action chunks
-        shutdown_event: Event to signal shutdown
-        cfg: Demo configuration
-    """
-    try:
-        logger.info("[GET_ACTIONS] Starting get actions thread")
-
-        latency_tracker = LatencyTracker()  # Track latency of action chunks
-        fps = cfg.fps
-        time_per_chunk = 1.0 / fps
-
-        # Only keep .pos joints + camera streams if the policy was trained on positions,
-        # not the full pos/vel/torque state the robot exposes.
-        observation_features_hw = {
-            key: value
-            for key, value in robot.observation_features().items()
-            if key.endswith(".pos") or isinstance(value, tuple)
-        }
-
-        dataset_features = hw_to_dataset_features(observation_features_hw, "observation")
-        policy_device = policy.config.device
-
-        # Load preprocessor and postprocessor from pretrained files
-        # The stats are embedded in the processor .safetensors files
-        logger.info(f"[GET_ACTIONS] Loading preprocessor/postprocessor from {cfg.policy.pretrained_path}")
-
-        preprocessor, postprocessor = make_pre_post_processors(
-            policy_cfg=cfg.policy,
-            pretrained_path=cfg.policy.pretrained_path,
-            dataset_stats=None,  # Will load from pretrained processor files
-            preprocessor_overrides={
-                "device_processor": {"device": cfg.policy.device},
-            },
-        )
-
-        logger.info("[GET_ACTIONS] Preprocessor/postprocessor loaded successfully with embedded stats")
-
-        relative_step = next(
-            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
-            None,
-        )
-        normalizer_step = next(
-            (s for s in preprocessor.steps if isinstance(s, NormalizerProcessorStep)),
-            None,
-        )
-        if relative_step is not None:
-            if relative_step.action_names is None:
-                cfg_names = getattr(cfg.policy, "action_feature_names", None)
-                if cfg_names:
-                    relative_step.action_names = list(cfg_names)
-                else:
-                    relative_step.action_names = [
-                        k for k in robot.robot.action_features if k.endswith(".pos")
-                    ]
-            logger.info("[GET_ACTIONS] Relative actions enabled: will re-anchor RTC prefix")
-
-        get_actions_threshold = cfg.action_queue_size_to_get_new_actions
-
-        if not cfg.rtc.enabled:
-            get_actions_threshold = 0
-
-        while not shutdown_event.is_set():
-            if action_queue.qsize() <= get_actions_threshold:
-                current_time = time.perf_counter()
-                action_index_before_inference = action_queue.get_action_index()
-                prev_actions = action_queue.get_left_over()
-
-                inference_latency = latency_tracker.max()
-                inference_delay = math.ceil(inference_latency / time_per_chunk)
-
-                obs = robot.get_observation()
-
-                # Apply robot observation processor
-                obs_processed = robot_observation_processor(obs)
-
-                obs_with_policy_features = build_dataset_frame(
-                    dataset_features, obs_processed, prefix="observation"
-                )
-
-                for name in obs_with_policy_features:
-                    obs_with_policy_features[name] = torch.from_numpy(obs_with_policy_features[name])
-                    if "image" in name:
-                        obs_with_policy_features[name] = (
-                            obs_with_policy_features[name].type(torch.float32) / 255
-                        )
-                        obs_with_policy_features[name] = (
-                            obs_with_policy_features[name].permute(2, 0, 1).contiguous()
-                        )
-                    obs_with_policy_features[name] = obs_with_policy_features[name].unsqueeze(0)
-                    obs_with_policy_features[name] = obs_with_policy_features[name].to(policy_device)
-
-                obs_with_policy_features["task"] = [cfg.task]  # Task should be a list, not a string!
-                obs_with_policy_features["robot_type"] = (
-                    robot.robot.name if hasattr(robot.robot, "name") else ""
-                )
-
-                preproceseded_obs = preprocessor(obs_with_policy_features)
-
-                # Re-anchor leftover actions for relative-action policies.
-                # We need the *postprocessed* (absolute) leftover, not the original
-                # (normalized/relative) one that get_left_over() returns.
-                if (
-                    prev_actions is not None
-                    and relative_step is not None
-                    and OBS_STATE in obs_with_policy_features
-                ):
-                    with action_queue.lock:
-                        if action_queue.queue is not None:
-                            prev_actions_abs = action_queue.queue[action_queue.last_index :].clone()
-                        else:
-                            prev_actions_abs = None
-                    if prev_actions_abs is not None and prev_actions_abs.numel() > 0:
-                        prev_actions = _reanchor_relative_rtc_prefix(
-                            prev_actions_absolute=prev_actions_abs,
-                            current_state=obs_with_policy_features[OBS_STATE],
-                            relative_step=relative_step,
-                            normalizer_step=normalizer_step,
-                            policy_device=policy_device,
-                        )
-
-                # Generate actions WITH RTC
-                actions = policy.predict_action_chunk(
-                    preproceseded_obs,
-                    inference_delay=inference_delay,
-                    prev_chunk_left_over=prev_actions,
-                )
-
-                # Store original actions (before postprocessing) for RTC
-                original_actions = actions.squeeze(0).clone()
-
-                postprocessed_actions = postprocessor(actions)
-
-                postprocessed_actions = postprocessed_actions.squeeze(0)
-
-                new_latency = time.perf_counter() - current_time
-                new_delay = math.ceil(new_latency / time_per_chunk)
-                latency_tracker.add(new_latency)
-
-                if cfg.action_queue_size_to_get_new_actions < cfg.rtc.execution_horizon + new_delay:
-                    logger.warning(
-                        "[GET_ACTIONS] cfg.action_queue_size_to_get_new_actions Too small, It should be higher than inference delay + execution horizon."
-                    )
-
-                action_queue.merge(
-                    original_actions, postprocessed_actions, new_delay, action_index_before_inference
-                )
-            else:
-                # Small sleep to prevent busy waiting
-                time.sleep(0.1)
-
-        logger.info("[GET_ACTIONS] get actions thread shutting down")
-    except Exception as e:
-        logger.error(f"[GET_ACTIONS] Fatal exception in get_actions thread: {e}")
-        logger.error(traceback.format_exc())
-        sys.exit(1)
-
-
-def actor_control(
-    robot: RobotWrapper,
-    robot_action_processor,
-    action_queue: ActionQueue,
-    shutdown_event: Event,
-    cfg: RTCDemoConfig,
-):
-    """Thread function to execute actions on the robot.
-
-    Args:
-        robot: The robot instance
-        action_queue: Queue to get actions from
-        shutdown_event: Event to signal shutdown
-        cfg: Demo configuration
-    """
-    try:
-        logger.info("[ACTOR] Starting actor thread")
-
-        action_keys = [k for k in robot.action_features() if k.endswith(".pos")]
-
-        action_count = 0
-        interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
-        action_interval = interpolator.get_control_interval(cfg.fps)
-
-        while not shutdown_event.is_set():
-            start_time = time.perf_counter()
-
-            if interpolator.needs_new_action():
-                new_action = action_queue.get()
-                if new_action is not None:
-                    interpolator.add(new_action.cpu())
-
-            action = interpolator.get()
-            if action is not None:
-                action = action.cpu()
-                action_dict = {key: action[i].item() for i, key in enumerate(action_keys)}
-                action_processed = robot_action_processor((action_dict, None))
-                robot.send_action(action_processed)
-                action_count += 1
-
-            dt_s = time.perf_counter() - start_time
-            time.sleep(max(0, (action_interval - dt_s) - 0.001))
-
-        logger.info(f"[ACTOR] Actor thread shutting down. Total actions executed: {action_count}")
-    except Exception as e:
-        logger.error(f"[ACTOR] Fatal exception in actor_control thread: {e}")
-        logger.error(traceback.format_exc())
-        sys.exit(1)
-
-
-def _apply_torch_compile(policy, cfg: RTCDemoConfig):
-    """Apply torch.compile to the policy's predict_action_chunk method.
-
-    Args:
-        policy: Policy instance to compile
-        cfg: Configuration containing torch compile settings
-
-    Returns:
-        Policy with compiled predict_action_chunk method
-    """
-
-    # PI models handle their own compilation
-    if policy.type == "pi05" or policy.type == "pi0":
-        return policy
-
-    try:
-        # Check if torch.compile is available (PyTorch 2.0+)
-        if not hasattr(torch, "compile"):
-            logger.warning(
-                f"torch.compile is not available. Requires PyTorch 2.0+. "
-                f"Current version: {torch.__version__}. Skipping compilation."
-            )
-            return policy
-
-        logger.info("Applying torch.compile to predict_action_chunk...")
-        logger.info(f"  Backend: {cfg.torch_compile_backend}")
-        logger.info(f"  Mode: {cfg.torch_compile_mode}")
-        logger.info(f"  Disable CUDA graphs: {cfg.torch_compile_disable_cudagraphs}")
-
-        # Compile the predict_action_chunk method
-        # - CUDA graphs disabled to prevent tensor aliasing from in-place ops (x_t += dt * v_t)
-        compile_kwargs = {
-            "backend": cfg.torch_compile_backend,
-            "mode": cfg.torch_compile_mode,
-        }
-
-        # Disable CUDA graphs if requested (prevents tensor aliasing issues)
-        if cfg.torch_compile_disable_cudagraphs:
-            compile_kwargs["options"] = {"triton.cudagraphs": False}
-
-        original_method = policy.predict_action_chunk
-        compiled_method = torch.compile(original_method, **compile_kwargs)
-        policy.predict_action_chunk = compiled_method
-        logger.info("✓ Successfully compiled predict_action_chunk")
-
-    except Exception as e:
-        logger.error(f"Failed to apply torch.compile: {e}")
-        logger.warning("Continuing without torch.compile")
-
-    return policy
-
-
-@parser.wrap()
-def demo_cli(cfg: RTCDemoConfig):
-    """Main entry point for RTC demo with draccus configuration."""
-
-    # Initialize logging
-    init_logging()
-
-    logger.info(f"Using device: {cfg.device}")
-
-    # Setup signal handler for graceful shutdown
-    signal_handler = ProcessSignalHandler(use_threads=True, display_pid=False)
-    shutdown_event = signal_handler.shutdown_event
-
-    policy = None
-    robot = None
-    get_actions_thread = None
-    actor_thread = None
-
-    policy_class = get_policy_class(cfg.policy.type)
-
-    # Load config and set compile_model for pi0/pi05 models
-    config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
-
-    if cfg.policy.type == "pi05" or cfg.policy.type == "pi0":
-        config.compile_model = cfg.use_torch_compile
-
-    if config.use_peft:
-        from peft import PeftConfig, PeftModel
-
-        peft_pretrained_path = cfg.policy.pretrained_path
-        peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
-
-        policy = policy_class.from_pretrained(
-            pretrained_name_or_path=peft_config.base_model_name_or_path, config=config
-        )
-        policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
-    else:
-        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
-
-    # Turn on RTC
-    policy.config.rtc_config = cfg.rtc
-
-    # Init RTC processort, as by default if RTC disabled in the config
-    # The processor won't be created
-    policy.init_rtc_processor()
-
-    assert policy.name in ["smolvla", "pi05", "pi0"], "Only smolvla, pi05, and pi0 are supported for RTC"
-
-    policy = policy.to(cfg.device)
-    policy.eval()
-
-    # Apply torch.compile to predict_action_chunk method if enabled
-    if cfg.use_torch_compile:
-        policy = _apply_torch_compile(policy, cfg)
-
-    # Create robot
-    logger.info(f"Initializing robot: {cfg.robot.type}")
-    robot = make_robot_from_config(cfg.robot)
-    robot.connect()
-    robot_wrapper = RobotWrapper(robot)
-
-    # Create robot observation processor
-    robot_observation_processor = make_default_robot_observation_processor()
-    robot_action_processor = make_default_robot_action_processor()
-
-    # Create action queue for communication between threads
-    action_queue = ActionQueue(cfg.rtc)
-
-    # Start chunk requester thread
-    get_actions_thread = Thread(
-        target=get_actions,
-        args=(policy, robot_wrapper, robot_observation_processor, action_queue, shutdown_event, cfg),
-        daemon=True,
-        name="GetActions",
-    )
-    get_actions_thread.start()
-    logger.info("Started get actions thread")
-
-    # Start action executor thread
-    actor_thread = Thread(
-        target=actor_control,
-        args=(robot_wrapper, robot_action_processor, action_queue, shutdown_event, cfg),
-        daemon=True,
-        name="Actor",
-    )
-    actor_thread.start()
-    logger.info("Started actor thread")
-
-    logger.info("Started stop by duration thread")
-
-    # Main thread monitors for duration or shutdown
-    logger.info(f"Running demo for {cfg.duration} seconds...")
-    start_time = time.time()
-
-    while not shutdown_event.is_set() and (time.time() - start_time) < cfg.duration:
-        time.sleep(10)
-
-        # Log queue status periodically
-        if int(time.time() - start_time) % 5 == 0:
-            logger.info(f"[MAIN] Action queue size: {action_queue.qsize()}")
-
-        if time.time() - start_time > cfg.duration:
-            break
-
-    logger.info("Demo duration reached or shutdown requested")
-
-    # Signal shutdown
-    shutdown_event.set()
-
-    # Wait for threads to finish
-    if get_actions_thread and get_actions_thread.is_alive():
-        logger.info("Waiting for chunk requester thread to finish...")
-        get_actions_thread.join()
-
-    if actor_thread and actor_thread.is_alive():
-        logger.info("Waiting for action executor thread to finish...")
-        actor_thread.join()
-
-    # Cleanup robot
-    if robot:
-        robot.disconnect()
-        logger.info("Robot disconnected")
-
-    logger.info("Cleanup completed")
-
-
-if __name__ == "__main__":
-    demo_cli()
-    logging.info("RTC demo finished")
@@ -0,0 +1,175 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Simple SO100/SO101 leader-follower teleoperation with spacebar intervention toggle.
+
+Modes:
+  - Default (not intervening): follower holds its current position.
+    The leader arm has torque ENABLED and mirrors the follower so there is no
+    large position jump when intervention starts.
+  - Intervention (SPACE pressed): leader torque DISABLED, human moves the leader
+    freely, and the follower mirrors the leader joint-by-joint.
+
+Usage:
+    uv run python examples/so100_teleop/teleop.py
+
+Controls:
+    SPACE  — toggle intervention on/off
+    Ctrl+C — exit
+"""
+
+import logging
+import os
+import sys
+import time
+from threading import Event, Thread
+
+from lerobot.robots.so_follower import SO101Follower, SO101FollowerConfig
+from lerobot.teleoperators.so_leader import SO101Leader
+from lerobot.teleoperators.so_leader.config_so_leader import SOLeaderTeleopConfig
+from lerobot.utils.robot_utils import precise_sleep
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# ── pynput keyboard listener ─────────────────────────────────────────────────
+PYNPUT_AVAILABLE = True
+try:
+    if "DISPLAY" not in os.environ and "linux" in sys.platform:
+        raise ImportError("No DISPLAY set, pynput skipped.")
+    from pynput import keyboard as pynput_keyboard
+except Exception:
+    pynput_keyboard = None
+    PYNPUT_AVAILABLE = False
+
+# ── Configure ports ──────────────────────────────────────────────────────────
+FOLLOWER_PORT = "/dev/ttyUSB0"  # ← change to your follower port
+LEADER_PORT = "/dev/ttyUSB1"  # ← change to your leader port
+FPS = 30
+
+
+def hold_position(robot) -> dict:
+    """Read current joint positions and write them back as the goal.
+
+    This prevents the motors from snapping to a stale Goal_Position register
+    value (which can happen when torque is re-enabled after calibration).
+    Returns the current position dict for reuse.
+    """
+    current = robot.bus.sync_read("Present_Position")
+    robot.bus.sync_write("Goal_Position", current)
+    return {f"{motor}.pos": val for motor, val in current.items()}
+
+
+# ── Connect ───────────────────────────────────────────────────────────────────
+follower_config = SO101FollowerConfig(
+    port=FOLLOWER_PORT,
+    id="follower_arm",
+    use_degrees=True,
+)
+leader_config = SOLeaderTeleopConfig(
+    port=LEADER_PORT,
+    id="leader_arm",
+    use_degrees=True,
+)
+
+follower = SO101Follower(follower_config)
+leader = SO101Leader(leader_config)
+
+follower.connect()
+leader.connect()
+
+# ── CRITICAL: hold both arms at their current position before doing anything ─
+# configure() enables follower torque, and the Goal_Position register may contain
+# a stale value from a previous session. Writing current→goal prevents sudden motion.
+follower_current = hold_position(follower)
+leader_current = hold_position(leader)  # leader torque is still off here, but sets the register
+
+# ── Intervention state + keyboard listener ───────────────────────────────────
+is_intervening = False
+stop_event = Event()
+
+
+def _start_keyboard_listener():
+    if not PYNPUT_AVAILABLE:
+        logger.warning("pynput not available — spacebar toggle disabled.")
+        return None
+
+    def on_press(key):
+        global is_intervening
+        if key == pynput_keyboard.Key.space:
+            is_intervening = not is_intervening
+            state = "INTERVENTION  (leader → follower)" if is_intervening else "IDLE  (follower holds)"
+            print(f"\n[SPACE] {state}\n")
+
+    def listen():
+        with pynput_keyboard.Listener(on_press=on_press) as listener:
+            while not stop_event.is_set():
+                time.sleep(0.05)
+            listener.stop()
+
+    t = Thread(target=listen, daemon=True)
+    t.start()
+    return t
+
+
+kbd_thread = _start_keyboard_listener()
+
+# Enable leader torque AFTER writing its goal to current position, so it holds in place.
+leader.bus.sync_write("Torque_Enable", 1)
+leader_torque_on = True
+
+print("\nTeleoperation ready.")
+print("  SPACE  → toggle intervention (leader controls follower)")
+print("  Ctrl+C → exit\n")
+
+try:
+    while True:
+        t0 = time.perf_counter()
+
+        if is_intervening:
+            # ── Intervention: leader torque OFF, follower mirrors leader ──────
+            if leader_torque_on:
+                leader.bus.sync_write("Torque_Enable", 0)
+                leader_torque_on = False
+
+            leader_action = leader.get_action()  # reads present leader joints
+            follower.send_action(leader_action)  # follower tracks leader
+
+        else:
+            # ── Idle: leader torque ON, leader mirrors follower, follower holds
+            if not leader_torque_on:
+                # Before re-enabling torque, set the leader's goal to its current
+                # position so it doesn't snap to the follower position suddenly.
+                hold_position(leader)
+                leader.bus.sync_write("Torque_Enable", 1)
+                leader_torque_on = True
+
+            follower_obs = follower.get_observation()
+            # Command leader to match follower (so next intervention has no jump)
+            goal_pos = {motor: follower_obs[f"{motor}.pos"] for motor in leader.bus.motors}
+            leader.bus.sync_write("Goal_Position", goal_pos)
+            # Follower holds — no send_action call
+
+        precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
+
+except KeyboardInterrupt:
+    print("\nExiting...")
+finally:
+    stop_event.set()
+    leader.bus.sync_write("Torque_Enable", 0)
+    follower.disconnect()
+    leader.disconnect()
@@ -14,13 +14,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+import logging
+import time
+
 from lerobot.cameras.opencv import OpenCVCameraConfig
-from lerobot.common.control_utils import init_keyboard_listener
+from lerobot.common.control_utils import init_keyboard_listener, predict_action
 from lerobot.configs import FeatureType, PolicyFeature
 from lerobot.datasets import LeRobotDataset, aggregate_pipeline_dataset_features, create_initial_features
 from lerobot.model.kinematics import RobotKinematics
 from lerobot.policies import make_pre_post_processors
 from lerobot.policies.act import ACTPolicy
+from lerobot.policies.utils import make_robot_action
 from lerobot.processor import (
    RobotProcessorPipeline,
    make_default_teleop_action_processor,
@@ -34,11 +38,12 @@ from lerobot.robots.so_follower.robot_kinematic_processor import (
    ForwardKinematicsJointsToEE,
    InverseKinematicsEEToJoints,
 )
-from lerobot.scripts.lerobot_record import record_loop
 from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.feature_utils import combine_feature_dicts
+from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
+from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.utils import log_say
-from lerobot.utils.visualization_utils import init_rerun
+from lerobot.utils.visualization_utils import init_rerun, log_rerun_data

 NUM_EPISODES = 5
 FPS = 30
@@ -49,6 +54,9 @@ HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"


 def main():
+    # NOTE: For production policy deployment, use `lerobot-rollout` CLI instead.
+    # This script provides a self-contained example for educational purposes.
+
    # Create the robot configuration & robot
    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
    robot_config = SO100FollowerConfig(
@@ -143,43 +151,67 @@ def main():
            raise ValueError("Robot is not connected!")

        print("Starting evaluate loop...")
+        control_interval = 1 / FPS
        episode_idx = 0
        for episode_idx in range(NUM_EPISODES):
            log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")

-            # Main record loop
-            record_loop(
-                robot=robot,
-                events=events,
-                fps=FPS,
-                policy=policy,
-                preprocessor=preprocessor,  # Pass the pre and post policy processors
-                postprocessor=postprocessor,
-                dataset=dataset,
-                control_time_s=EPISODE_TIME_SEC,
-                single_task=TASK_DESCRIPTION,
-                display_data=True,
-                teleop_action_processor=make_default_teleop_action_processor(),
-                robot_action_processor=robot_ee_to_joints_processor,
-                robot_observation_processor=robot_joints_to_ee_pose_processor,
-            )
+            # Inline evaluation loop: predict actions and send to robot
+            timestamp = 0
+            start_episode_t = time.perf_counter()
+            while timestamp < EPISODE_TIME_SEC:
+                start_loop_t = time.perf_counter()
+
+                if events["exit_early"]:
+                    events["exit_early"] = False
+                    break
+
+                # Get robot observation
+                obs = robot.get_observation()
+                obs_processed = robot_joints_to_ee_pose_processor(obs)
+                observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
+
+                # Predict action using the policy
+                action_tensor = predict_action(
+                    observation=observation_frame,
+                    policy=policy,
+                    device=policy.config.device,
+                    preprocessor=preprocessor,
+                    postprocessor=postprocessor,
+                    use_amp=policy.config.device.type == "cuda",
+                    task=TASK_DESCRIPTION,
+                    robot_type=robot.name,
+                )
+
+                # Convert policy output to robot action dict
+                action_values = make_robot_action(action_tensor, dataset.features)
+
+                # Process and send action to robot (EE -> joints via IK)
+                robot_action_to_send = robot_ee_to_joints_processor((action_values, obs))
+                robot.send_action(robot_action_to_send)
+
+                # Write to dataset
+                action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
+                frame = {**observation_frame, **action_frame, "task": TASK_DESCRIPTION}
+                dataset.add_frame(frame)
+
+                log_rerun_data(observation=obs_processed, action=action_values)
+
+                dt_s = time.perf_counter() - start_loop_t
+                sleep_time_s = control_interval - dt_s
+                if sleep_time_s < 0:
+                    logging.warning(
+                        f"Evaluate loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({FPS} Hz)."
+                    )
+                precise_sleep(max(sleep_time_s, 0.0))
+                timestamp = time.perf_counter() - start_episode_t

            # Reset the environment if not stopping or re-recording
            if not events["stop_recording"] and (
                (episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]
            ):
                log_say("Reset the environment")
-                record_loop(
-                    robot=robot,
-                    events=events,
-                    fps=FPS,
-                    control_time_s=EPISODE_TIME_SEC,
-                    single_task=TASK_DESCRIPTION,
-                    display_data=True,
-                    teleop_action_processor=make_default_teleop_action_processor(),
-                    robot_action_processor=robot_ee_to_joints_processor,
-                    robot_observation_processor=robot_joints_to_ee_pose_processor,
-                )
+                log_say("Waiting for environment reset, press right arrow key when ready...")

            if events["rerecord_episode"]:
                log_say("Re-record episode")
@@ -190,7 +222,6 @@ def main():

            # Save episode
            dataset.save_episode()
-            episode_idx += 1
    finally:
        # Clean up
        log_say("Stop recording")
@@ -62,21 +62,20 @@ def main():
    follower = SO100Follower(follower_config)
    leader = SO100Leader(leader_config)

-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
+    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    follower_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(follower.bus.motors.keys()),
    )
-
-    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
    leader_kinematics_solver = RobotKinematics(
        urdf_path="./SO101/so101_new_calib.urdf",
        target_frame_name="gripper_frame_link",
        joint_names=list(leader.bus.motors.keys()),
    )

-    # Build pipeline to convert follower joints to EE observation
+    # Build pipeline to convert follower joints to EE observation.
    follower_joints_to_ee = RobotProcessorPipeline[RobotObservation, RobotObservation](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -87,7 +86,7 @@ def main():
        to_output=transition_to_observation,
    )

-    # Build pipeline to convert leader joints to EE action
+    # Build pipeline to convert leader joints to EE action.
    leader_joints_to_ee = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
        steps=[
            ForwardKinematicsJointsToEE(
@@ -98,9 +97,9 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Build pipeline to convert EE action to follower joints
+    # Build pipeline to convert EE action to follower joints (with safety bounds).
    ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
-        [
+        steps=[
            EEBoundsAndSafety(
                end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
                max_ee_step_m=0.10,
@@ -115,13 +114,12 @@ def main():
        to_output=transition_to_robot_action,
    )

-    # Create the dataset
+    # Create the dataset, deriving features from the pipelines so the on-disk schema
+    # matches exactly what the pipelines produce at runtime.
    dataset = LeRobotDataset.create(
        repo_id=HF_REPO_ID,
        fps=FPS,
        features=combine_feature_dicts(
-            # Run the feature contract of the pipelines
-            # This tells you how the features would look like after the pipeline steps
            aggregate_pipeline_dataset_features(
                pipeline=leader_joints_to_ee,
                initial_features=create_initial_features(action=leader.action_features),
@@ -144,7 +142,7 @@ def main():

    # Initialize the keyboard listener and rerun visualization
    listener, events = init_keyboard_listener()
-    init_rerun(session_name="recording_phone")
+    init_rerun(session_name="recording_so100_ee")

    try:
        if not leader.is_connected or not follower.is_connected:
@@ -160,14 +158,14 @@ def main():
                robot=follower,
                events=events,
                fps=FPS,
+                teleop_action_processor=leader_joints_to_ee,
+                robot_action_processor=ee_to_follower_joints,
+                robot_observation_processor=follower_joints_to_ee,
                teleop=leader,
                dataset=dataset,
                control_time_s=EPISODE_TIME_SEC,
                single_task=TASK_DESCRIPTION,
                display_data=True,
-                teleop_action_processor=leader_joints_to_ee,
-                robot_action_processor=ee_to_follower_joints,
-                robot_observation_processor=follower_joints_to_ee,
            )

            # Reset the environment if not stopping or re-recording
@@ -179,13 +177,13 @@ def main():
                    robot=follower,
                    events=events,
                    fps=FPS,
+                    teleop_action_processor=leader_joints_to_ee,
+                    robot_action_processor=ee_to_follower_joints,
+                    robot_observation_processor=follower_joints_to_ee,
                    teleop=leader,
                    control_time_s=RESET_TIME_SEC,
                    single_task=TASK_DESCRIPTION,
                    display_data=True,
-                    teleop_action_processor=leader_joints_to_ee,
-                    robot_action_processor=ee_to_follower_joints,
-                    robot_observation_processor=follower_joints_to_ee,
                )

            if events["rerecord_episode"]:
@@ -0,0 +1,134 @@
+# !/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Run a trained EE-space policy on SO100 without recording (base rollout).
+
+Uses the rollout engine's :class:`BaseStrategy` (autonomous execution,
+no dataset) with :class:`SyncInferenceConfig` (inline policy call per
+control tick).  The custom observation/action processors convert between
+joint space (robot hardware) and end-effector space (policy I/O) via
+forward/inverse kinematics.
+"""
+
+from lerobot.cameras.opencv import OpenCVCameraConfig
+from lerobot.configs import PreTrainedConfig
+from lerobot.model.kinematics import RobotKinematics
+from lerobot.processor import (
+    RobotProcessorPipeline,
+    observation_to_transition,
+    robot_action_observation_to_transition,
+    transition_to_observation,
+    transition_to_robot_action,
+)
+from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
+from lerobot.robots.so_follower.robot_kinematic_processor import (
+    ForwardKinematicsJointsToEE,
+    InverseKinematicsEEToJoints,
+)
+from lerobot.rollout import BaseStrategyConfig, RolloutConfig, build_rollout_context
+from lerobot.rollout.inference import SyncInferenceConfig
+from lerobot.rollout.strategies import BaseStrategy
+from lerobot.types import RobotAction, RobotObservation
+from lerobot.utils.process import ProcessSignalHandler
+from lerobot.utils.utils import init_logging
+
+FPS = 30
+DURATION_SEC = 60
+TASK_DESCRIPTION = "My task description"
+HF_MODEL_ID = "<hf_username>/<model_repo_id>"
+
+
+def main():
+    init_logging()
+
+    # Robot configuration — the rollout engine will connect it inside build_rollout_context.
+    camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
+    robot_config = SO100FollowerConfig(
+        port="/dev/tty.usbmodem5A460814411",
+        id="my_awesome_follower_arm",
+        cameras=camera_config,
+        use_degrees=True,
+    )
+
+    # Kinematic solver: we need the motor-name list, so peek at the robot once.
+    # (The rollout engine owns the connected instance; we only use this for introspection.)
+    temp_robot = SO100Follower(robot_config)
+    motor_names = list(temp_robot.bus.motors.keys())
+
+    # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo:
+    #   https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+    kinematics_solver = RobotKinematics(
+        urdf_path="./SO101/so101_new_calib.urdf",
+        target_frame_name="gripper_frame_link",
+        joint_names=motor_names,
+    )
+
+    # Joint-space observation → EE-space observation (consumed by the policy).
+    robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
+        steps=[ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=motor_names)],
+        to_transition=observation_to_transition,
+        to_output=transition_to_observation,
+    )
+
+    # EE-space action (produced by the policy) → joint-space action (sent to robot).
+    robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
+        steps=[
+            InverseKinematicsEEToJoints(
+                kinematics=kinematics_solver,
+                motor_names=motor_names,
+                initial_guess_current_joints=True,
+            ),
+        ],
+        to_transition=robot_action_observation_to_transition,
+        to_output=transition_to_robot_action,
+    )
+
+    # Policy config (full model is loaded inside build_rollout_context).
+    policy_config = PreTrainedConfig.from_pretrained(HF_MODEL_ID)
+    policy_config.pretrained_path = HF_MODEL_ID
+
+    cfg = RolloutConfig(
+        robot=robot_config,
+        policy=policy_config,
+        strategy=BaseStrategyConfig(),
+        inference=SyncInferenceConfig(),
+        fps=FPS,
+        duration=DURATION_SEC,
+        task=TASK_DESCRIPTION,
+    )
+
+    signal_handler = ProcessSignalHandler(use_threads=True)
+
+    # Pass the EE kinematic processors via kwargs; the defaults (identity) would
+    # otherwise skip the joint↔EE conversion and the policy would receive the
+    # wrong observation/action space.
+    ctx = build_rollout_context(
+        cfg,
+        signal_handler.shutdown_event,
+        robot_action_processor=robot_ee_to_joints_processor,
+        robot_observation_processor=robot_joints_to_ee_pose_processor,
+    )
+
+    strategy = BaseStrategy(cfg.strategy)
+    try:
+        strategy.setup(ctx)
+        strategy.run(ctx)
+    finally:
+        strategy.teardown(ctx)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,365 @@
+# !/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+from dataclasses import dataclass
+
+import numpy as np
+import torch
+
+from lerobot.configs.types import PipelineFeatureType, PolicyFeature
+from lerobot.model.kinematics import RobotKinematics
+from lerobot.processor import (
+    ProcessorStepRegistry,
+    RobotAction,
+    RobotActionProcessorStep,
+    RobotObservation,
+    RobotProcessorPipeline,
+    TransitionKey,
+)
+from lerobot.processor.converters import (
+    create_transition,
+    identity_transition,
+)
+from lerobot.robots.robot import Robot
+from lerobot.robots.so100_follower.robot_kinematic_processor import (
+    EEBoundsAndSafety,
+    EEReferenceAndDelta,
+    GripperVelocityToJoint,
+    InverseKinematicsRLStep,
+)
+from lerobot.robots.so101_follower.config_so101_follower import SO101FollowerConfig
+from lerobot.robots.so101_follower.so101_follower import SO101Follower
+from lerobot.teleoperators.so101_leader.config_so101_leader import SO101LeaderConfig
+from lerobot.teleoperators.so101_leader.so101_leader import SO101Leader
+from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.rotation import Rotation
+
+
+def reset_follower_position(robot_arm: Robot, target_position: np.ndarray) -> None:
+    """Reset robot arm to target position using smooth trajectory."""
+    current_position_dict = robot_arm.bus.sync_read("Present_Position")
+    current_position = np.array(
+        [current_position_dict[name] for name in current_position_dict],
+        dtype=np.float32,
+    )
+    trajectory = torch.from_numpy(
+        np.linspace(current_position, target_position, 50)
+    )  # NOTE: 30 is just an arbitrary number
+    for pose in trajectory:
+        action_dict = dict(zip(current_position_dict, pose, strict=False))
+        robot_arm.bus.sync_write("Goal_Position", action_dict)
+        precise_sleep(0.015)
+
+
+@dataclass
+class LogRobotAction(RobotActionProcessorStep):
+    def action(self, action: RobotAction) -> RobotAction:
+        print(f"Robot action: {action}")
+        return action
+
+    def transform_features(self, features):
+        # features[PipelineFeatureType.ACTION][ACTION] = PolicyFeature(
+        #     type=FeatureType.ACTION, shape=(len(self.motor_names),)
+        # )
+        return features
+
+
+@ProcessorStepRegistry.register("forward_kinematics_joints_to_ee_target_action")
+@dataclass
+class ForwardKinematicsJointsToEETargetAction(RobotActionProcessorStep):
+    """
+    Computes the end-effector pose from joint positions using forward kinematics (FK).
+
+    This step is typically used to add the robot's Cartesian pose to the observation space,
+    which can be useful for visualization or as an input to a policy.
+
+    Attributes:
+        kinematics: The robot's kinematic model.
+    """
+
+    kinematics: RobotKinematics
+    motor_names: list[str]
+    end_effector_step_sizes: dict
+    max_gripper_pos: float
+    use_ik_solution: bool = False
+
+    def action(self, action: RobotAction) -> RobotAction:
+        # return compute_forward_kinematics_joints_to_ee(action, self.kinematics, self.motor_names)
+        teleop_action = action
+        raw_joint_pos = self.transition.get(TransitionKey.OBSERVATION)
+
+        leader_pos = np.array([teleop_action[f"{motor}.pos"] for motor in self.motor_names])
+
+        leader_ee = self.kinematics.forward_kinematics(leader_pos)
+
+        if self.use_ik_solution and "IK_solution" in self.transition.get(TransitionKey.COMPLEMENTARY_DATA):
+            follower_pos = transition.get(TransitionKey.COMPLEMENTARY_DATA)["IK_solution"]
+        else:
+            follower_pos = np.array([raw_joint_pos[f"{motor}.pos"] for motor in self.motor_names])
+
+        follower_ee = self.kinematics.forward_kinematics(follower_pos)
+
+        follower_ee_pos = follower_ee[:3, 3]
+        follower_ee_rvec = Rotation.from_matrix(follower_ee[:3, :3]).as_rotvec()
+        # follower_gripper_pos = raw_joint_pos["gripper.pos"]
+        follower_gripper_pos = follower_pos[-1]  # assuming gripper is the last motor
+
+        leader_ee_pos = leader_ee[:3, 3]
+        leader_ee_rvec = Rotation.from_matrix(leader_ee[:3, :3]).as_rotvec()
+        leader_gripper_pos = np.clip(
+            teleop_action["gripper.pos"], -self.max_gripper_pos, self.max_gripper_pos
+        )
+
+        print("f pos:", follower_ee_pos)
+        print("l pos:", leader_ee_pos)
+
+        print("f rvec:", follower_ee_rvec)
+        print("l rvec:", leader_ee_rvec)
+
+        # follower_ee_pos = follower_ee[:3, 3]
+        # follower_ee_rvec = Rotation.from_matrix(follower_ee[:3, :3]).as_rotvec()
+
+        delta_pos = leader_ee_pos - follower_ee_pos
+
+        # For rotation: compute relative rotation from follower to leader
+        # R_leader = R_follower * R_delta  =>  R_delta = R_follower^T * R_leader
+        r_delta = follower_ee[:3, :3].T @ leader_ee[:3, :3]
+        delta_rvec = Rotation.from_matrix(r_delta).as_rotvec()
+        delta_gripper = leader_gripper_pos - follower_gripper_pos
+
+        desired = np.eye(4, dtype=float)
+        desired[:3, :3] = follower_ee[:3, :3] @ r_delta
+        desired[:3, 3] = follower_ee[:3, 3] + delta_pos
+
+        pos = desired[:3, 3]
+        tw = Rotation.from_matrix(desired[:3, :3]).as_rotvec()
+
+        assert np.allclose(pos, leader_ee_pos), "Position delta computation error"
+        assert np.allclose(tw, leader_ee_rvec), "Orientation delta computation error"
+        assert np.isclose(follower_gripper_pos + delta_gripper, leader_gripper_pos), (
+            "Gripper delta computation error"
+        )
+
+        # Normalize the action to the range [-1, 1]
+        delta_pos = delta_pos / np.array(
+            [
+                self.end_effector_step_sizes["x"],
+                self.end_effector_step_sizes["y"],
+                self.end_effector_step_sizes["z"],
+            ]
+        )
+        delta_rvec = delta_rvec / np.array(
+            [
+                self.end_effector_step_sizes["wx"],
+                self.end_effector_step_sizes["wy"],
+                self.end_effector_step_sizes["wz"],
+            ]
+        )
+
+        # Check if any of the normalized deltas exceed 1.0
+
+        max_normalized_pos = max(
+            abs(delta_pos[0]),
+            abs(delta_pos[1]),
+            abs(delta_pos[2]),
+        )
+
+        max_normalized_rot = max(
+            abs(delta_rvec[0]),
+            abs(delta_rvec[1]),
+            abs(delta_rvec[2]),
+        )
+
+        # Use the same scaling factor for both position and rotation
+        max_normalized = max(max_normalized_pos, max_normalized_rot)
+        if max_normalized > 1.0:
+            print(f"Warning: EE delta too large, scaling. Max normalized delta: {max_normalized_pos}")
+            print(f"Original delta_pos: {delta_pos}, delta_rvec: {delta_rvec}")
+            # Scale proportionally
+            delta_pos = delta_pos / max_normalized
+            delta_rvec = delta_rvec / max_normalized
+
+        new_action = {}
+        new_action["enabled"] = True
+        new_action["target_x"] = float(delta_pos[0])
+        new_action["target_y"] = float(delta_pos[1])
+        new_action["target_z"] = float(delta_pos[2])
+        new_action["target_wx"] = float(delta_rvec[0])
+        new_action["target_wy"] = float(delta_rvec[1])
+        new_action["target_wz"] = float(delta_rvec[2])
+        new_action["gripper_vel"] = float(
+            np.clip(delta_gripper, -self.max_gripper_pos, self.max_gripper_pos) / self.max_gripper_pos
+        )
+        return new_action
+
+    def transform_features(
+        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
+    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
+        # TODO: implement feature transformation
+        return features
+
+
+FPS = 20
+
+# Initialize the robot and teleoperator config
+follower_config = SO101FollowerConfig(port="/dev/usb_follower_arm_a", id="follower_arm_a", use_degrees=True)
+leader_config = SO101LeaderConfig(port="/dev/usb_leader_arm_a", id="leader_arm_a", use_degrees=True)
+
+# Initialize the robot and teleoperator
+follower = SO101Follower(follower_config)
+leader = SO101Leader(leader_config)
+
+# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+follower_kinematics_solver = RobotKinematics(
+    urdf_path="../SO-ARM100/Simulation/SO101/so101_new_calib.urdf",
+    target_frame_name="gripper_frame_link",
+    joint_names=list(follower.bus.motors.keys()),
+)
+
+# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
+leader_kinematics_solver = RobotKinematics(
+    urdf_path="../SO-ARM100/Simulation/SO101/so101_new_calib.urdf",
+    target_frame_name="gripper_frame_link",
+    joint_names=list(leader.bus.motors.keys()),
+)
+
+end_effector_step_sizes = {
+    "x": 0.004,
+    "y": 0.004,
+    "z": 0.004,
+    "wx": 5 * np.pi / 180,
+    "wy": 5 * np.pi / 180,
+    "wz": 5 * np.pi / 180,
+}
+
+
+# Build pipeline to convert teleop joints to EE action
+leader_to_ee = RobotProcessorPipeline[RobotAction, RobotAction](
+    steps=[
+        LogRobotAction(),
+        ForwardKinematicsJointsToEETargetAction(
+            kinematics=leader_kinematics_solver,
+            motor_names=list(leader.bus.motors.keys()),
+            end_effector_step_sizes=end_effector_step_sizes,
+            max_gripper_pos=30.0,
+            use_ik_solution=True,
+        ),
+        LogRobotAction(),
+    ],
+    to_transition=identity_transition,
+    to_output=identity_transition,
+)
+
+# build pipeline to convert EE action to robot joints
+ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
+    [
+        LogRobotAction(),
+        EEReferenceAndDelta(
+            kinematics=follower_kinematics_solver,
+            # end_effector_step_sizes={"x": 0.006, "y": 0.01, "z": 0.005},
+            end_effector_step_sizes=end_effector_step_sizes,
+            motor_names=list(follower.bus.motors.keys()),
+            use_latched_reference=False,
+            use_ik_solution=True,
+        ),
+        LogRobotAction(),
+        EEBoundsAndSafety(
+            end_effector_bounds={
+                "min": [-0.05, -0.55, -0.0075],
+                "max": [0.55, 0.55, 0.55],
+            },
+            # end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
+            max_ee_step_m=0.05,
+        ),
+        LogRobotAction(),
+        GripperVelocityToJoint(
+            clip_max=30.0,
+            speed_factor=0.2,
+            discrete_gripper=False,
+            scale_velocity=True,
+            use_ik_solution=True,
+        ),
+        LogRobotAction(),
+        InverseKinematicsRLStep(
+            kinematics=follower_kinematics_solver,
+            motor_names=list(follower.bus.motors.keys()),
+            initial_guess_current_joints=False,
+        ),
+        LogRobotAction(),
+    ],
+    to_transition=identity_transition,
+    to_output=identity_transition,
+)
+
+# Connect to the robot and teleoperator
+follower.connect()
+leader.connect()
+
+reset_pose = [0.0, 10, 20, 60.00, 90.00, 10.00]
+
+start_time = time.perf_counter()
+reset_follower_position(follower, np.array(reset_pose))
+reset_follower_position(leader, np.array(reset_pose))
+precise_sleep(5.0 - (time.perf_counter() - start_time))
+# time.sleep(10)
+leader.bus.sync_write("Torque_Enable", 0)
+
+# Init rerun viewer
+# init_rerun(session_name="so100_so100_EE_teleop")
+
+transition = None
+
+print("Starting teleop loop...")
+while True:
+    print("New loop iteration")
+    t0 = time.perf_counter()
+
+    # Get robot observation
+    robot_obs = follower.get_observation()
+
+    # Get teleop observation
+    leader_joints_obs = leader.get_action()
+
+    # teleop joints -> teleop EE action
+    if transition is None:
+        transition = create_transition(action=leader_joints_obs, observation=robot_obs)
+    else:
+        transition = create_transition(
+            action=leader_joints_obs,
+            observation=robot_obs,
+            complementary_data=transition.get(TransitionKey.COMPLEMENTARY_DATA),
+        )
+
+    transition = leader_to_ee(transition)
+    leader_ee_act = transition[TransitionKey.ACTION]
+
+    # teleop EE -> robot joints
+    transition = create_transition(
+        action=leader_ee_act,
+        observation=robot_obs,
+        complementary_data=transition.get(TransitionKey.COMPLEMENTARY_DATA),
+    )
+    transition = ee_to_follower_joints(transition)
+    follower_joints_act = transition[TransitionKey.ACTION]
+
+    # Send action to robot
+    _ = follower.send_action(follower_joints_act)
+
+    # Visualize
+    # log_rerun_data(observation=leader_ee_act, action=follower_joints_act)
+
+    precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
@@ -4,13 +4,13 @@ from pathlib import Path
 from queue import Empty, Full

 import torch
-import torch.optim as optim

 from lerobot.datasets import LeRobotDataset
 from lerobot.envs.configs import HILSerlProcessorConfig, HILSerlRobotEnvConfig
-from lerobot.policies import SACConfig
-from lerobot.policies.sac.modeling_sac import SACPolicy
-from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+from lerobot.policies import GaussianActorConfig
+from lerobot.policies.gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy
+from lerobot.policies.gaussian_actor.reward_model.modeling_classifier import Classifier
+from lerobot.rl.algorithms.sac import SACAlgorithm, SACAlgorithmConfig
 from lerobot.rl.buffer import ReplayBuffer
 from lerobot.rl.gym_manipulator import make_robot_env
 from lerobot.robots.so_follower import SO100FollowerConfig
@@ -28,7 +28,7 @@ def run_learner(
    transitions_queue: mp.Queue,
    parameters_queue: mp.Queue,
    shutdown_event: mp.Event,
-    policy_learner: SACPolicy,
+    policy_learner: GaussianActorPolicy,
    online_buffer: ReplayBuffer,
    offline_buffer: ReplayBuffer,
    lr: float = 3e-4,
@@ -40,8 +40,9 @@ def run_learner(
    policy_learner.train()
    policy_learner.to(device)

-    # Create Adam optimizer from scratch - simple and clean
-    optimizer = optim.Adam(policy_learner.parameters(), lr=lr)
+    algo_config = SACAlgorithmConfig.from_policy_config(policy_learner.config)
+    algorithm = SACAlgorithm(policy=policy_learner, config=algo_config)
+    algorithm.make_optimizers_and_scheduler()

    print(f"[LEARNER] Online buffer capacity: {online_buffer.capacity}")
    print(f"[LEARNER] Offline buffer capacity: {offline_buffer.capacity}")
@@ -83,24 +84,26 @@ def run_learner(
                else:
                    batch[key] = online_batch[key]

-            loss, _ = policy_learner.forward(batch)
+            def batch_iter(b=batch):
+                while True:
+                    yield b

-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
+            stats = algorithm.update(batch_iter())
            training_step += 1

            if training_step % LOG_EVERY == 0:
+                log_dict = stats.to_log_dict()
                print(
-                    f"[LEARNER] Training step {training_step}, Loss: {loss.item():.4f}, "
+                    f"[LEARNER] Training step {training_step}, "
+                    f"critic_loss: {log_dict.get('critic', 'N/A'):.4f}, "
                    f"Buffers: Online={len(online_buffer)}, Offline={len(offline_buffer)}"
                )

            # Send updated parameters to actor every 10 training steps
            if training_step % SEND_EVERY == 0:
                try:
-                    state_dict = {k: v.cpu() for k, v in policy_learner.state_dict().items()}
-                    parameters_queue.put_nowait(state_dict)
+                    weights = algorithm.get_weights()
+                    parameters_queue.put_nowait(weights)
                    print("[LEARNER] Sent updated parameters to actor")
                except Full:
                    # Missing write due to queue not being consumed (should happen rarely)
@@ -113,7 +116,7 @@ def run_actor(
    transitions_queue: mp.Queue,
    parameters_queue: mp.Queue,
    shutdown_event: mp.Event,
-    policy_actor: SACPolicy,
+    policy_actor: GaussianActorPolicy,
    reward_classifier: Classifier,
    env_cfg: HILSerlRobotEnvConfig,
    device: torch.device = "mps",
@@ -144,15 +147,15 @@ def run_actor(

            while step < MAX_STEPS_PER_EPISODE and not shutdown_event.is_set():
                try:
-                    new_params = parameters_queue.get_nowait()
-                    policy_actor.load_state_dict(new_params)
+                    new_weights = parameters_queue.get_nowait()
+                    policy_actor.load_state_dict(new_weights)
                    print("[ACTOR] Updated policy parameters from learner")
                except Empty:  # No new updated parameters available from learner, waiting
                    pass

-                # Get action from policy
+                # Get action from policy (returns full action: continuous + discrete)
                policy_obs = make_policy_obs(obs, device=device)
-                action_tensor = policy_actor.select_action(policy_obs)  # predicts a single action
+                action_tensor = policy_actor.select_action(policy_obs)
                action = action_tensor.squeeze(0).cpu().numpy()

                # Step environment
@@ -261,14 +264,14 @@ def main():
    action_features = hw_to_dataset_features(env.robot.action_features, "action")

    # Create SAC policy for action selection
-    policy_cfg = SACConfig(
+    policy_cfg = GaussianActorConfig(
        device=device,
        input_features=obs_features,
        output_features=action_features,
    )

-    policy_actor = SACPolicy(policy_cfg)
-    policy_learner = SACPolicy(policy_cfg)
+    policy_actor = GaussianActorPolicy(policy_cfg)
+    policy_learner = GaussianActorPolicy(policy_cfg)

    demonstrations_repo_id = "lerobot/example_hil_serl_dataset"
    offline_dataset = LeRobotDataset(repo_id=demonstrations_repo_id)
@@ -289,6 +289,7 @@ lerobot-find-joint-limits="lerobot.scripts.lerobot_find_joint_limits:main"
 lerobot-imgtransform-viz="lerobot.scripts.lerobot_imgtransform_viz:main"
 lerobot-edit-dataset="lerobot.scripts.lerobot_edit_dataset:main"
 lerobot-setup-can="lerobot.scripts.lerobot_setup_can:main"
+lerobot-rollout="lerobot.scripts.lerobot_rollout:main"

 # ---------------- Tool Configurations ----------------
 [tool.setuptools.package-data]
@@ -99,6 +99,7 @@ def save_checkpoint(
        optimizer (Optimizer | None, optional): The optimizer to save the state from. Defaults to None.
        scheduler (LRScheduler | None, optional): The scheduler to save the state from. Defaults to None.
        preprocessor: The preprocessor/pipeline to save. Defaults to None.
+        postprocessor: The postprocessor/pipeline to save. Defaults to None.
    """
    pretrained_dir = checkpoint_dir / PRETRAINED_MODEL_DIR
    policy.save_pretrained(pretrained_dir)
@@ -21,6 +21,7 @@ are intentionally NOT re-exported here to avoid circular dependencies
 Import them directly: ``from lerobot.configs.train import TrainPipelineConfig``
 """

+from .dataset import DatasetRecordConfig
 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
 from .types import (
@@ -39,6 +40,7 @@ __all__ = [
    "PolicyFeature",
    "RTCAttentionSchedule",
    # Config classes
+    "DatasetRecordConfig",
    "DatasetConfig",
    "EvalConfig",
    "PeftConfig",
@@ -0,0 +1,80 @@
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Shared dataset recording configuration used by both ``lerobot-record`` and ``lerobot-rollout``."""
+
+from dataclasses import dataclass
+from datetime import datetime
+from pathlib import Path
+
+
+@dataclass
+class DatasetRecordConfig:
+    # Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test`).
+    repo_id: str = ""
+    # A short but accurate description of the task performed during the recording (e.g. "Pick the Lego block and drop it in the box on the right.")
+    single_task: str = ""
+    # Root directory where the dataset will be stored (e.g. 'dataset/path'). If None, defaults to $HF_LEROBOT_HOME/repo_id.
+    root: str | Path | None = None
+    # Limit the frames per second.
+    fps: int = 30
+    # Number of seconds for data recording for each episode.
+    episode_time_s: int | float = 60
+    # Number of seconds for resetting the environment after each episode.
+    reset_time_s: int | float = 60
+    # Number of episodes to record.
+    num_episodes: int = 50
+    # Encode frames in the dataset into video
+    video: bool = True
+    # Upload dataset to Hugging Face hub.
+    push_to_hub: bool = True
+    # Upload on private repository on the Hugging Face hub.
+    private: bool = False
+    # Add tags to your dataset on the hub.
+    tags: list[str] | None = None
+    # Number of subprocesses handling the saving of frames as PNG. Set to 0 to use threads only;
+    # set to ≥1 to use subprocesses, each using threads to write images. The best number of processes
+    # and threads depends on your system. We recommend 4 threads per camera with 0 processes.
+    # If fps is unstable, adjust the thread count. If still unstable, try using 1 or more subprocesses.
+    num_image_writer_processes: int = 0
+    # Number of threads writing the frames as png images on disk, per camera.
+    # Too many threads might cause unstable teleoperation fps due to main thread being blocked.
+    # Not enough threads might cause low camera fps.
+    num_image_writer_threads_per_camera: int = 4
+    # Number of episodes to record before batch encoding videos
+    # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
+    video_encoding_batch_size: int = 1
+    # Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto',
+    # or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'.
+    # Use 'auto' to auto-detect the best available hardware encoder.
+    vcodec: str = "libsvtav1"
+    # Enable streaming video encoding: encode frames in real-time during capture instead
+    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
+    streaming_encoding: bool = False
+    # Maximum number of frames to buffer per camera when using streaming encoding.
+    # ~1s buffer at 30fps. Provides backpressure if the encoder can't keep up.
+    encoder_queue_maxsize: int = 30
+    # Number of threads per encoder instance. None = auto (codec default).
+    # Lower values reduce CPU usage, maps to 'lp' (via svtav1-params) for libsvtav1 and 'threads' for h264/hevc..
+    encoder_threads: int | None = None
+
+    def stamp_repo_id(self) -> None:
+        """Append a date-time tag to ``repo_id`` so each recording session gets a unique name.
+
+        Must be called explicitly at dataset *creation* time — not on resume,
+        where the existing ``repo_id`` (already stamped) must be preserved.
+        """
+        if self.repo_id:
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            self.repo_id = f"{self.repo_id}_{timestamp}"
@@ -209,10 +209,3 @@ class TrainPipelineConfig(HubMixin):
        cli_args = kwargs.pop("cli_args", [])
        with draccus.config_type("json"):
            return draccus.parse(cls, config_file, args=cli_args)
-
-
-@dataclass(kw_only=True)
-class TrainRLServerPipelineConfig(TrainPipelineConfig):
-    # NOTE: In RL, we don't need an offline dataset
-    # TODO: Make `TrainPipelineConfig.dataset` optional
-    dataset: DatasetConfig | None = None  # type: ignore[assignment] # because the parent class has made it's type non-optional
@@ -630,6 +630,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
        encoder_threads: int | None = None,
+        video_files_size_in_mb: int | None = None,
+        data_files_size_in_mb: int | None = None,
    ) -> "LeRobotDataset":
        """Create a new LeRobotDataset from scratch for recording data.

@@ -677,6 +679,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
            root=root,
            use_videos=use_videos,
            metadata_buffer_size=metadata_buffer_size,
+            video_files_size_in_mb=video_files_size_in_mb,
+            data_files_size_in_mb=data_files_size_in_mb,
        )
        obj.repo_id = obj.meta.repo_id
        obj._requested_root = obj.meta.root
@@ -299,6 +299,7 @@ class HILSerlProcessorConfig:
    inverse_kinematics: InverseKinematicsConfig | None = None
    reward_classifier: RewardClassifierConfig | None = None
    max_gripper_pos: float | None = 100.0
+    gripper_speed_factor: float | None = None


@EnvConfig.register_subclass(name="gym_manipulator")
@@ -12,18 +12,21 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from lerobot.utils.action_interpolator import ActionInterpolator as ActionInterpolator
+
 from .act.configuration_act import ACTConfig as ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig as DiffusionConfig
 from .factory import get_policy_class, make_policy, make_policy_config, make_pre_post_processors
+from .gaussian_actor.configuration_gaussian_actor import GaussianActorConfig as GaussianActorConfig
+from .gaussian_actor.reward_model.configuration_classifier import (
+    RewardClassifierConfig as RewardClassifierConfig,
+)
 from .groot.configuration_groot import GrootConfig as GrootConfig
 from .multi_task_dit.configuration_multi_task_dit import MultiTaskDiTConfig as MultiTaskDiTConfig
 from .pi0.configuration_pi0 import PI0Config as PI0Config
 from .pi0_fast.configuration_pi0_fast import PI0FastConfig as PI0FastConfig
 from .pi05.configuration_pi05 import PI05Config as PI05Config
 from .pretrained import PreTrainedPolicy as PreTrainedPolicy
-from .rtc import ActionInterpolator as ActionInterpolator
-from .sac.configuration_sac import SACConfig as SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
 from .sarm.configuration_sarm import SARMConfig as SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig as TDMPCConfig
@@ -32,21 +35,21 @@ from .vqbet.configuration_vqbet import VQBeTConfig as VQBeTConfig
 from .wall_x.configuration_wall_x import WallXConfig as WallXConfig
 from .xvla.configuration_xvla import XVLAConfig as XVLAConfig

-# NOTE: Policy modeling classes (e.g., SACPolicy) are intentionally NOT re-exported here.
+# NOTE: Policy modeling classes (e.g., GaussianActorPolicy) are intentionally NOT re-exported here.
 # They have heavy optional dependencies and are loaded lazily via get_policy_class().
-# Import directly: ``from lerobot.policies.sac.modeling_sac import SACPolicy``
+# Import directly: ``from lerobot.policies.gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy``

 __all__ = [
    # Configuration classes
    "ACTConfig",
    "DiffusionConfig",
+    "GaussianActorConfig",
    "GrootConfig",
    "MultiTaskDiTConfig",
    "PI0Config",
    "PI0FastConfig",
    "PI05Config",
    "RewardClassifierConfig",
-    "SACConfig",
    "SARMConfig",
    "SmolVLAConfig",
    "TDMPCConfig",
@@ -46,13 +46,13 @@ from lerobot.utils.feature_utils import dataset_to_policy_features

 from .act.configuration_act import ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig
+from .gaussian_actor.configuration_gaussian_actor import GaussianActorConfig
+from .gaussian_actor.reward_model.configuration_classifier import RewardClassifierConfig
 from .groot.configuration_groot import GrootConfig
 from .multi_task_dit.configuration_multi_task_dit import MultiTaskDiTConfig
 from .pi0.configuration_pi0 import PI0Config
 from .pi05.configuration_pi05 import PI05Config
 from .pretrained import PreTrainedPolicy
-from .sac.configuration_sac import SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig
 from .sarm.configuration_sarm import SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig
@@ -89,7 +89,7 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:

    Args:
        name: The name of the policy. Supported names are "tdmpc", "diffusion", "act",
-            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "reward_classifier", "smolvla", "wall_x".
+            "multi_task_dit", "vqbet", "pi0", "pi05", "gaussian_actor", "reward_classifier", "smolvla", "wall_x".
    Returns:
        The policy class corresponding to the given name.

@@ -128,12 +128,12 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:
        from .pi05.modeling_pi05 import PI05Policy

        return PI05Policy
-    elif name == "sac":
-        from .sac.modeling_sac import SACPolicy
+    elif name == "gaussian_actor":
+        from .gaussian_actor.modeling_gaussian_actor import GaussianActorPolicy

-        return SACPolicy
+        return GaussianActorPolicy
    elif name == "reward_classifier":
-        from .sac.reward_model.modeling_classifier import Classifier
+        from .gaussian_actor.reward_model.modeling_classifier import Classifier

        return Classifier
    elif name == "smolvla":
@@ -172,7 +172,7 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:

    Args:
        policy_type: The type of the policy. Supported types include "tdmpc",
-                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "sac",
+                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "gaussian_actor",
                     "smolvla", "reward_classifier", "wall_x".
        **kwargs: Keyword arguments to be passed to the configuration class constructor.

@@ -196,8 +196,8 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
        return PI0Config(**kwargs)
    elif policy_type == "pi05":
        return PI05Config(**kwargs)
-    elif policy_type == "sac":
-        return SACConfig(**kwargs)
+    elif policy_type == "gaussian_actor":
+        return GaussianActorConfig(**kwargs)
    elif policy_type == "smolvla":
        return SmolVLAConfig(**kwargs)
    elif policy_type == "reward_classifier":
@@ -370,16 +370,16 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

-    elif isinstance(policy_cfg, SACConfig):
-        from .sac.processor_sac import make_sac_pre_post_processors
+    elif isinstance(policy_cfg, GaussianActorConfig):
+        from .gaussian_actor.processor_gaussian_actor import make_gaussian_actor_pre_post_processors

-        processors = make_sac_pre_post_processors(
+        processors = make_gaussian_actor_pre_post_processors(
            config=policy_cfg,
            dataset_stats=kwargs.get("dataset_stats"),
        )

    elif isinstance(policy_cfg, RewardClassifierConfig):
-        from .sac.reward_model.processor_classifier import make_classifier_processor
+        from .gaussian_actor.reward_model.processor_classifier import make_classifier_processor

        processors = make_classifier_processor(
            config=policy_cfg,
@@ -12,8 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from .configuration_sac import SACConfig
-from .modeling_sac import SACPolicy
-from .processor_sac import make_sac_pre_post_processors
+from .configuration_gaussian_actor import GaussianActorConfig
+from .modeling_gaussian_actor import GaussianActorPolicy
+from .processor_gaussian_actor import make_gaussian_actor_pre_post_processors

-__all__ = ["SACConfig", "SACPolicy", "make_sac_pre_post_processors"]
+__all__ = ["GaussianActorConfig", "GaussianActorPolicy", "make_gaussian_actor_pre_post_processors"]
@@ -75,18 +75,19 @@ class PolicyConfig:
    init_final: float = 0.05


-@PreTrainedConfig.register_subclass("sac")
+@PreTrainedConfig.register_subclass("gaussian_actor")
@dataclass
-class SACConfig(PreTrainedConfig):
-    """Soft Actor-Critic (SAC) configuration.
+class GaussianActorConfig(PreTrainedConfig):
+    """Gaussian actor configuration.

-    SAC is an off-policy actor-critic deep RL algorithm based on the maximum entropy
-    reinforcement learning framework. It learns a policy and a Q-function simultaneously
-    using experience collected from the environment.
+    This configures the policy-side (actor + observation encoder) of a Gaussian
+    policy, as used by SAC and related maximum-entropy continuous-control algorithms.
+    By default the actor output is a tanh-squashed diagonal Gaussian
+    (``TanhMultivariateNormalDiag``); the tanh squashing can be disabled via
+    ``policy_kwargs.use_tanh_squash``. The critics, temperature, and Bellman-update
+    logic live on the algorithm side (see ``lerobot.rl.algorithms.sac``).

-    This configuration class contains all the parameters needed to define a SAC agent,
-    including network architectures, optimization settings, and algorithm-specific
-    hyperparameters.
+    CLI: ``--policy.type=gaussian_actor``.
    """

    # Mapping of feature types to normalization modes
@@ -122,7 +123,7 @@ class SACConfig(PreTrainedConfig):
    device: str = "cpu"
    # Device to store the model on
    storage_device: str = "cpu"
-    # Name of the vision encoder model (Set to "helper2424/resnet10" for hil serl resnet10)
+    # Name of the vision encoder model (Set to "lerobot/resnet10" for hil serl resnet10)
    vision_encoder_name: str | None = None
    # Whether to freeze the vision encoder during training
    freeze_vision_encoder: bool = True
@@ -135,78 +136,41 @@ class SACConfig(PreTrainedConfig):
    # Dimension of the image embedding pooling
    image_embedding_pooling_dim: int = 8

-    # Training parameter
-    # Number of steps for online training
-    online_steps: int = 1000000
-    # Capacity of the online replay buffer
-    online_buffer_capacity: int = 100000
-    # Capacity of the offline replay buffer
-    offline_buffer_capacity: int = 100000
-    # Whether to use asynchronous prefetching for the buffers
-    async_prefetch: bool = False
-    # Number of steps before learning starts
-    online_step_before_learning: int = 100
-    # Frequency of policy updates
-    policy_update_freq: int = 1
-
-    # SAC algorithm parameters
-    # Discount factor for the SAC algorithm
-    discount: float = 0.99
-    # Initial temperature value
-    temperature_init: float = 1.0
-    # Number of critics in the ensemble
-    num_critics: int = 2
-    # Number of subsampled critics for training
-    num_subsample_critics: int | None = None
-    # Learning rate for the critic network
-    critic_lr: float = 3e-4
-    # Learning rate for the actor network
-    actor_lr: float = 3e-4
-    # Learning rate for the temperature parameter
-    temperature_lr: float = 3e-4
-    # Weight for the critic target update
-    critic_target_update_weight: float = 0.005
-    # Update-to-data ratio for the UTD algorithm (If you want enable utd_ratio, you need to set it to >1)
-    utd_ratio: int = 1
+    # Encoder architecture
    # Hidden dimension size for the state encoder
    state_encoder_hidden_dim: int = 256
    # Dimension of the latent space
    latent_dim: int = 256
-    # Target entropy for the SAC algorithm
-    target_entropy: float | None = None
-    # Whether to use backup entropy for the SAC algorithm
-    use_backup_entropy: bool = True
-    # Gradient clipping norm for the SAC algorithm
-    grad_clip_norm: float = 40.0

-    # Network configuration
-    # Configuration for the critic network architecture
-    critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
-    # Configuration for the actor network architecture
-    actor_network_kwargs: ActorNetworkConfig = field(default_factory=ActorNetworkConfig)
-    # Configuration for the policy parameters
-    policy_kwargs: PolicyConfig = field(default_factory=PolicyConfig)
-    # Configuration for the discrete critic network
-    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
-    # Configuration for actor-learner architecture
+    # Online training (TODO(Khalil): relocate to TrainRLServerPipelineConfig)
+    online_steps: int = 1000000
+    online_buffer_capacity: int = 100000
+    offline_buffer_capacity: int = 100000
+    async_prefetch: bool = False
+    online_step_before_learning: int = 100
+
+    # Actor-learner transport (TODO(Khalil): relocate to TrainRLServerPipelineConfig).
    actor_learner_config: ActorLearnerConfig = field(default_factory=ActorLearnerConfig)
-    # Configuration for concurrency settings (you can use threads or processes for the actor and learner)
    concurrency: ConcurrencyConfig = field(default_factory=ConcurrencyConfig)

-    # Optimizations
-    use_torch_compile: bool = True
+    # Network architecture
+    # Actor network
+    actor_network_kwargs: ActorNetworkConfig = field(default_factory=ActorNetworkConfig)
+    # Gaussian head parameters
+    policy_kwargs: PolicyConfig = field(default_factory=PolicyConfig)
+    # Discrete critic
+    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)

    def __post_init__(self):
        super().__post_init__()
-        # Any validation specific to SAC configuration

    def get_optimizer_preset(self) -> MultiAdamConfig:
        return MultiAdamConfig(
            weight_decay=0.0,
            optimizer_groups={
-                "actor": {"lr": self.actor_lr},
-                "critic": {"lr": self.critic_lr},
-                "temperature": {"lr": self.temperature_lr},
+                "actor": {"lr": 3e-4},
+                "critic": {"lr": 3e-4},
+                "temperature": {"lr": 3e-4},
            },
        )

@@ -15,16 +15,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import math
 from collections.abc import Callable
 from dataclasses import asdict
-from typing import Literal
+from typing import Any

-import einops
-import numpy as np
 import torch
 import torch.nn as nn
-import torch.nn.functional as F  # noqa: N812
 from torch import Tensor
 from torch.distributions import MultivariateNormal, TanhTransform, Transform, TransformedDistribution

@@ -32,20 +28,20 @@ from lerobot.utils.constants import ACTION, OBS_ENV_STATE, OBS_STATE

 from ..pretrained import PreTrainedPolicy
 from ..utils import get_device_from_parameters
-from .configuration_sac import SACConfig, is_image_feature
+from .configuration_gaussian_actor import GaussianActorConfig, is_image_feature

 DISCRETE_DIMENSION_INDEX = -1  # Gripper is always the last dimension


-class SACPolicy(
+class GaussianActorPolicy(
    PreTrainedPolicy,
 ):
-    config_class = SACConfig
-    name = "sac"
+    config_class = GaussianActorConfig
+    name = "gaussian_actor"

    def __init__(
        self,
-        config: SACConfig | None = None,
+        config: GaussianActorConfig | None = None,
    ):
        super().__init__(config)
        config.validate_features()
@@ -54,9 +50,8 @@ class SACPolicy(
        # Determine action dimension and initialize all components
        continuous_action_dim = config.output_features[ACTION].shape[0]
        self._init_encoders()
-        self._init_critics(continuous_action_dim)
        self._init_actor(continuous_action_dim)
-        self._init_temperature()
+        self._init_discrete_critic()

    def get_optim_params(self) -> dict:
        optim_params = {
@@ -65,11 +60,7 @@ class SACPolicy(
                for n, p in self.actor.named_parameters()
                if not n.startswith("encoder") or not self.shared_encoder
            ],
-            "critic": self.critic_ensemble.parameters(),
-            "temperature": self.log_alpha,
        }
-        if self.config.num_discrete_actions is not None:
-            optim_params["discrete_critic"] = self.discrete_critic.parameters()
        return optim_params

    def reset(self):
@@ -79,7 +70,9 @@ class SACPolicy(
    @torch.no_grad()
    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
        """Predict a chunk of actions given environment observations."""
-        raise NotImplementedError("SACPolicy does not support action chunking. It returns single actions!")
+        raise NotImplementedError(
+            "GaussianActorPolicy does not support action chunking. It returns single actions!"
+        )

    @torch.no_grad()
    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
@@ -92,360 +85,55 @@ class SACPolicy(
        actions, _, _ = self.actor(batch, observations_features)

        if self.config.num_discrete_actions is not None:
-            discrete_action_value = self.discrete_critic(batch, observations_features)
-            discrete_action = torch.argmax(discrete_action_value, dim=-1, keepdim=True)
+            if self.discrete_critic is not None:
+                discrete_action_value = self.discrete_critic(batch, observations_features)
+                discrete_action = torch.argmax(discrete_action_value, dim=-1, keepdim=True)
+            else:
+                discrete_action = torch.ones(
+                    (*actions.shape[:-1], 1), device=actions.device, dtype=actions.dtype
+                )
            actions = torch.cat([actions, discrete_action], dim=-1)

        return actions

-    def critic_forward(
-        self,
-        observations: dict[str, Tensor],
-        actions: Tensor,
-        use_target: bool = False,
-        observation_features: Tensor | None = None,
-    ) -> Tensor:
-        """Forward pass through a critic network ensemble
+    def forward(self, batch: dict[str, Tensor | dict[str, Tensor]]) -> dict[str, Tensor]:
+        """Actor forward pass: sample actions and return log-probabilities.

        Args:
-            observations: Dictionary of observations
-            actions: Action tensor
-            use_target: If True, use target critics, otherwise use ensemble critics
+            batch: A flat observation dict, or a training dict containing
+                ``"state"`` (observations) and optionally ``"observation_feature"``
+                (pre-computed encoder features).

        Returns:
-            Tensor of Q-values from all critics
+            Dict with ``"action"``, ``"log_prob"``, and ``"action_mean"`` tensors.
        """
+        observations = batch.get("state", batch)
+        observation_features = batch.get("observation_feature") if isinstance(batch, dict) else None
+        actions, log_probs, means = self.actor(observations, observation_features)
+        return {"action": actions, "log_prob": log_probs, "action_mean": means}

-        critics = self.critic_target if use_target else self.critic_ensemble
-        q_values = critics(observations, actions, observation_features)
-        return q_values
+    def load_actor_weights(self, state_dicts: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        from lerobot.utils.transition import move_state_dict_to_device

-    def discrete_critic_forward(
-        self, observations, use_target=False, observation_features=None
-    ) -> torch.Tensor:
-        """Forward pass through a discrete critic network
+        actor_state_dict = move_state_dict_to_device(state_dicts["policy"], device=device)
+        self.actor.load_state_dict(actor_state_dict)

-        Args:
-            observations: Dictionary of observations
-            use_target: If True, use target critics, otherwise use ensemble critics
-            observation_features: Optional pre-computed observation features to avoid recomputing encoder output
-
-        Returns:
-            Tensor of Q-values from the discrete critic network
-        """
-        discrete_critic = self.discrete_critic_target if use_target else self.discrete_critic
-        q_values = discrete_critic(observations, observation_features)
-        return q_values
-
-    def forward(
-        self,
-        batch: dict[str, Tensor | dict[str, Tensor]],
-        model: Literal["actor", "critic", "temperature", "discrete_critic"] = "critic",
-    ) -> dict[str, Tensor]:
-        """Compute the loss for the given model
-
-        Args:
-            batch: Dictionary containing:
-                - action: Action tensor
-                - reward: Reward tensor
-                - state: Observations tensor dict
-                - next_state: Next observations tensor dict
-                - done: Done mask tensor
-                - observation_feature: Optional pre-computed observation features
-                - next_observation_feature: Optional pre-computed next observation features
-            model: Which model to compute the loss for ("actor", "critic", "discrete_critic", or "temperature")
-
-        Returns:
-            The computed loss tensor
-        """
-        # Extract common components from batch
-        actions: Tensor = batch[ACTION]
-        observations: dict[str, Tensor] = batch["state"]
-        observation_features: Tensor = batch.get("observation_feature")
-
-        if model == "critic":
-            # Extract critic-specific components
-            rewards: Tensor = batch["reward"]
-            next_observations: dict[str, Tensor] = batch["next_state"]
-            done: Tensor = batch["done"]
-            next_observation_features: Tensor = batch.get("next_observation_feature")
-
-            loss_critic = self.compute_loss_critic(
-                observations=observations,
-                actions=actions,
-                rewards=rewards,
-                next_observations=next_observations,
-                done=done,
-                observation_features=observation_features,
-                next_observation_features=next_observation_features,
+        if "discrete_critic" in state_dicts and self.discrete_critic is not None:
+            discrete_critic_state_dict = move_state_dict_to_device(
+                state_dicts["discrete_critic"], device=device
            )
-
-            return {"loss_critic": loss_critic}
-
-        if model == "discrete_critic" and self.config.num_discrete_actions is not None:
-            # Extract critic-specific components
-            rewards: Tensor = batch["reward"]
-            next_observations: dict[str, Tensor] = batch["next_state"]
-            done: Tensor = batch["done"]
-            next_observation_features: Tensor = batch.get("next_observation_feature")
-            complementary_info = batch.get("complementary_info")
-            loss_discrete_critic = self.compute_loss_discrete_critic(
-                observations=observations,
-                actions=actions,
-                rewards=rewards,
-                next_observations=next_observations,
-                done=done,
-                observation_features=observation_features,
-                next_observation_features=next_observation_features,
-                complementary_info=complementary_info,
-            )
-            return {"loss_discrete_critic": loss_discrete_critic}
-        if model == "actor":
-            return {
-                "loss_actor": self.compute_loss_actor(
-                    observations=observations,
-                    observation_features=observation_features,
-                )
-            }
-
-        if model == "temperature":
-            return {
-                "loss_temperature": self.compute_loss_temperature(
-                    observations=observations,
-                    observation_features=observation_features,
-                )
-            }
-
-        raise ValueError(f"Unknown model type: {model}")
-
-    def update_target_networks(self):
-        """Update target networks with exponential moving average"""
-        for target_param, param in zip(
-            self.critic_target.parameters(),
-            self.critic_ensemble.parameters(),
-            strict=True,
-        ):
-            target_param.data.copy_(
-                param.data * self.config.critic_target_update_weight
-                + target_param.data * (1.0 - self.config.critic_target_update_weight)
-            )
-        if self.config.num_discrete_actions is not None:
-            for target_param, param in zip(
-                self.discrete_critic_target.parameters(),
-                self.discrete_critic.parameters(),
-                strict=True,
-            ):
-                target_param.data.copy_(
-                    param.data * self.config.critic_target_update_weight
-                    + target_param.data * (1.0 - self.config.critic_target_update_weight)
-                )
-
-    @property
-    def temperature(self) -> float:
-        """Return the current temperature value, always in sync with log_alpha."""
-        return self.log_alpha.exp().item()
-
-    def compute_loss_critic(
-        self,
-        observations,
-        actions,
-        rewards,
-        next_observations,
-        done,
-        observation_features: Tensor | None = None,
-        next_observation_features: Tensor | None = None,
-    ) -> Tensor:
-        with torch.no_grad():
-            next_action_preds, next_log_probs, _ = self.actor(next_observations, next_observation_features)
-
-            # 2- compute q targets
-            q_targets = self.critic_forward(
-                observations=next_observations,
-                actions=next_action_preds,
-                use_target=True,
-                observation_features=next_observation_features,
-            )
-
-            # subsample critics to prevent overfitting if use high UTD (update to date)
-            # TODO: Get indices before forward pass to avoid unnecessary computation
-            if self.config.num_subsample_critics is not None:
-                indices = torch.randperm(self.config.num_critics)
-                indices = indices[: self.config.num_subsample_critics]
-                q_targets = q_targets[indices]
-
-            # critics subsample size
-            min_q, _ = q_targets.min(dim=0)  # Get values from min operation
-            if self.config.use_backup_entropy:
-                min_q = min_q - (self.temperature * next_log_probs)
-
-            td_target = rewards + (1 - done) * self.config.discount * min_q
-
-        # 3- compute predicted qs
-        if self.config.num_discrete_actions is not None:
-            # NOTE: We only want to keep the continuous action part
-            # In the buffer we have the full action space (continuous + discrete)
-            # We need to split them before concatenating them in the critic forward
-            actions: Tensor = actions[:, :DISCRETE_DIMENSION_INDEX]
-        q_preds = self.critic_forward(
-            observations=observations,
-            actions=actions,
-            use_target=False,
-            observation_features=observation_features,
-        )
-
-        # 4- Calculate loss
-        # Compute state-action value loss (TD loss) for all of the Q functions in the ensemble.
-        td_target_duplicate = einops.repeat(td_target, "b -> e b", e=q_preds.shape[0])
-        # You compute the mean loss of the batch for each critic and then to compute the final loss you sum them up
-        critics_loss = (
-            F.mse_loss(
-                input=q_preds,
-                target=td_target_duplicate,
-                reduction="none",
-            ).mean(dim=1)
-        ).sum()
-        return critics_loss
-
-    def compute_loss_discrete_critic(
-        self,
-        observations,
-        actions,
-        rewards,
-        next_observations,
-        done,
-        observation_features=None,
-        next_observation_features=None,
-        complementary_info=None,
-    ):
-        # NOTE: We only want to keep the discrete action part
-        # In the buffer we have the full action space (continuous + discrete)
-        # We need to split them before concatenating them in the critic forward
-        actions_discrete: Tensor = actions[:, DISCRETE_DIMENSION_INDEX:].clone()
-        actions_discrete = torch.round(actions_discrete)
-        actions_discrete = actions_discrete.long()
-
-        discrete_penalties: Tensor | None = None
-        if complementary_info is not None:
-            discrete_penalties: Tensor | None = complementary_info.get("discrete_penalty")
-
-        with torch.no_grad():
-            # For DQN, select actions using online network, evaluate with target network
-            next_discrete_qs = self.discrete_critic_forward(
-                next_observations, use_target=False, observation_features=next_observation_features
-            )
-            best_next_discrete_action = torch.argmax(next_discrete_qs, dim=-1, keepdim=True)
-
-            # Get target Q-values from target network
-            target_next_discrete_qs = self.discrete_critic_forward(
-                observations=next_observations,
-                use_target=True,
-                observation_features=next_observation_features,
-            )
-
-            # Use gather to select Q-values for best actions
-            target_next_discrete_q = torch.gather(
-                target_next_discrete_qs, dim=1, index=best_next_discrete_action
-            ).squeeze(-1)
-
-            # Compute target Q-value with Bellman equation
-            rewards_discrete = rewards
-            if discrete_penalties is not None:
-                rewards_discrete = rewards + discrete_penalties
-            target_discrete_q = rewards_discrete + (1 - done) * self.config.discount * target_next_discrete_q
-
-        # Get predicted Q-values for current observations
-        predicted_discrete_qs = self.discrete_critic_forward(
-            observations=observations, use_target=False, observation_features=observation_features
-        )
-
-        # Use gather to select Q-values for taken actions
-        predicted_discrete_q = torch.gather(predicted_discrete_qs, dim=1, index=actions_discrete).squeeze(-1)
-
-        # Compute MSE loss between predicted and target Q-values
-        discrete_critic_loss = F.mse_loss(input=predicted_discrete_q, target=target_discrete_q)
-        return discrete_critic_loss
-
-    def compute_loss_temperature(self, observations, observation_features: Tensor | None = None) -> Tensor:
-        """Compute the temperature loss"""
-        # calculate temperature loss
-        with torch.no_grad():
-            _, log_probs, _ = self.actor(observations, observation_features)
-        temperature_loss = (-self.log_alpha.exp() * (log_probs + self.target_entropy)).mean()
-        return temperature_loss
-
-    def compute_loss_actor(
-        self,
-        observations,
-        observation_features: Tensor | None = None,
-    ) -> Tensor:
-        actions_pi, log_probs, _ = self.actor(observations, observation_features)
-
-        q_preds = self.critic_forward(
-            observations=observations,
-            actions=actions_pi,
-            use_target=False,
-            observation_features=observation_features,
-        )
-        min_q_preds = q_preds.min(dim=0)[0]
-
-        actor_loss = ((self.temperature * log_probs) - min_q_preds).mean()
-        return actor_loss
+            self.discrete_critic.load_state_dict(discrete_critic_state_dict)

    def _init_encoders(self):
        """Initialize shared or separate encoders for actor and critic."""
        self.shared_encoder = self.config.shared_encoder
-        self.encoder_critic = SACObservationEncoder(self.config)
+        self.encoder_critic = GaussianActorObservationEncoder(self.config)
        self.encoder_actor = (
-            self.encoder_critic if self.shared_encoder else SACObservationEncoder(self.config)
+            self.encoder_critic if self.shared_encoder else GaussianActorObservationEncoder(self.config)
        )

-    def _init_critics(self, continuous_action_dim):
-        """Build critic ensemble, targets, and optional discrete critic."""
-        heads = [
-            CriticHead(
-                input_dim=self.encoder_critic.output_dim + continuous_action_dim,
-                **asdict(self.config.critic_network_kwargs),
-            )
-            for _ in range(self.config.num_critics)
-        ]
-        self.critic_ensemble = CriticEnsemble(encoder=self.encoder_critic, ensemble=heads)
-        target_heads = [
-            CriticHead(
-                input_dim=self.encoder_critic.output_dim + continuous_action_dim,
-                **asdict(self.config.critic_network_kwargs),
-            )
-            for _ in range(self.config.num_critics)
-        ]
-        self.critic_target = CriticEnsemble(encoder=self.encoder_critic, ensemble=target_heads)
-        self.critic_target.load_state_dict(self.critic_ensemble.state_dict())
-
-        if self.config.use_torch_compile:
-            self.critic_ensemble = torch.compile(self.critic_ensemble)
-            self.critic_target = torch.compile(self.critic_target)
-
-        if self.config.num_discrete_actions is not None:
-            self._init_discrete_critics()
-
-    def _init_discrete_critics(self):
-        """Build discrete discrete critic ensemble and target networks."""
-        self.discrete_critic = DiscreteCritic(
-            encoder=self.encoder_critic,
-            input_dim=self.encoder_critic.output_dim,
-            output_dim=self.config.num_discrete_actions,
-            **asdict(self.config.discrete_critic_network_kwargs),
-        )
-        self.discrete_critic_target = DiscreteCritic(
-            encoder=self.encoder_critic,
-            input_dim=self.encoder_critic.output_dim,
-            output_dim=self.config.num_discrete_actions,
-            **asdict(self.config.discrete_critic_network_kwargs),
-        )
-
-        # TODO: (maractingi, azouitine) Compile the discrete critic
-        self.discrete_critic_target.load_state_dict(self.discrete_critic.state_dict())
-
    def _init_actor(self, continuous_action_dim):
-        """Initialize policy actor network and default target entropy."""
+        """Initialize policy actor network."""
        # NOTE: The actor select only the continuous action part
        self.actor = Policy(
            encoder=self.encoder_actor,
@@ -455,21 +143,25 @@ class SACPolicy(
            **asdict(self.config.policy_kwargs),
        )

-        self.target_entropy = self.config.target_entropy
-        if self.target_entropy is None:
-            dim = continuous_action_dim + (1 if self.config.num_discrete_actions is not None else 0)
-            self.target_entropy = -np.prod(dim) / 2
+    def _init_discrete_critic(self) -> None:
+        """Initialize discrete critic network."""
+        if self.config.num_discrete_actions is None:
+            self.discrete_critic = None
+            return

-    def _init_temperature(self) -> None:
-        """Set up temperature parameter (log_alpha)."""
-        temp_init = self.config.temperature_init
-        self.log_alpha = nn.Parameter(torch.tensor([math.log(temp_init)]))
+        # TODO(Khalil): Compile the discrete critic
+        self.discrete_critic = DiscreteCritic(
+            encoder=self.encoder_critic,
+            input_dim=self.encoder_critic.output_dim,
+            output_dim=self.config.num_discrete_actions,
+            **asdict(self.config.discrete_critic_network_kwargs),
+        )


-class SACObservationEncoder(nn.Module):
+class GaussianActorObservationEncoder(nn.Module):
    """Encode image and/or state vector observations."""

-    def __init__(self, config: SACConfig) -> None:
+    def __init__(self, config: GaussianActorConfig) -> None:
        super().__init__()
        self.config = config
        self._init_image_layers()
@@ -677,84 +369,6 @@ class MLP(nn.Module):
        return self.net(x)


-class CriticHead(nn.Module):
-    def __init__(
-        self,
-        input_dim: int,
-        hidden_dims: list[int],
-        activations: Callable[[torch.Tensor], torch.Tensor] | str = nn.SiLU(),
-        activate_final: bool = False,
-        dropout_rate: float | None = None,
-        init_final: float | None = None,
-        final_activation: Callable[[torch.Tensor], torch.Tensor] | str | None = None,
-    ):
-        super().__init__()
-        self.net = MLP(
-            input_dim=input_dim,
-            hidden_dims=hidden_dims,
-            activations=activations,
-            activate_final=activate_final,
-            dropout_rate=dropout_rate,
-            final_activation=final_activation,
-        )
-        self.output_layer = nn.Linear(in_features=hidden_dims[-1], out_features=1)
-        if init_final is not None:
-            nn.init.uniform_(self.output_layer.weight, -init_final, init_final)
-            nn.init.uniform_(self.output_layer.bias, -init_final, init_final)
-        else:
-            orthogonal_init()(self.output_layer.weight)
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        return self.output_layer(self.net(x))
-
-
-class CriticEnsemble(nn.Module):
-    """
-    CriticEnsemble wraps multiple CriticHead modules into an ensemble.
-
-    Args:
-        encoder (SACObservationEncoder): encoder for observations.
-        ensemble (List[CriticHead]): list of critic heads.
-        init_final (float | None): optional initializer scale for final layers.
-
-    Forward returns a tensor of shape (num_critics, batch_size) containing Q-values.
-    """
-
-    def __init__(
-        self,
-        encoder: SACObservationEncoder,
-        ensemble: list[CriticHead],
-        init_final: float | None = None,
-    ):
-        super().__init__()
-        self.encoder = encoder
-        self.init_final = init_final
-        self.critics = nn.ModuleList(ensemble)
-
-    def forward(
-        self,
-        observations: dict[str, torch.Tensor],
-        actions: torch.Tensor,
-        observation_features: torch.Tensor | None = None,
-    ) -> torch.Tensor:
-        device = get_device_from_parameters(self)
-        # Move each tensor in observations to device
-        observations = {k: v.to(device) for k, v in observations.items()}
-
-        obs_enc = self.encoder(observations, cache=observation_features)
-
-        inputs = torch.cat([obs_enc, actions], dim=-1)
-
-        # Loop through critics and collect outputs
-        q_values = []
-        for critic in self.critics:
-            q_values.append(critic(inputs))
-
-        # Stack outputs to match expected shape [num_critics, batch_size]
-        q_values = torch.stack([q.squeeze(-1) for q in q_values], dim=0)
-        return q_values
-
-
 class DiscreteCritic(nn.Module):
    def __init__(
        self,
@@ -800,7 +414,7 @@ class DiscreteCritic(nn.Module):
 class Policy(nn.Module):
    def __init__(
        self,
-        encoder: SACObservationEncoder,
+        encoder: GaussianActorObservationEncoder,
        network: nn.Module,
        action_dim: int,
        std_min: float = -5,
@@ -811,7 +425,7 @@ class Policy(nn.Module):
        encoder_is_shared: bool = False,
    ):
        super().__init__()
-        self.encoder: SACObservationEncoder = encoder
+        self.encoder: GaussianActorObservationEncoder = encoder
        self.network = network
        self.action_dim = action_dim
        self.std_min = std_min
@@ -885,7 +499,7 @@ class Policy(nn.Module):


 class DefaultImageEncoder(nn.Module):
-    def __init__(self, config: SACConfig):
+    def __init__(self, config: GaussianActorConfig):
        super().__init__()
        image_key = next(key for key in config.input_features if is_image_feature(key))
        self.image_enc_layers = nn.Sequential(
@@ -931,12 +545,12 @@ def freeze_image_encoder(image_encoder: nn.Module):


 class PretrainedImageEncoder(nn.Module):
-    def __init__(self, config: SACConfig):
+    def __init__(self, config: GaussianActorConfig):
        super().__init__()

        self.image_enc_layers, self.image_enc_out_shape = self._load_pretrained_vision_encoder(config)

-    def _load_pretrained_vision_encoder(self, config: SACConfig):
+    def _load_pretrained_vision_encoder(self, config: GaussianActorConfig):
        """Set up CNN encoder"""
        from transformers import AutoModel

@@ -32,18 +32,18 @@ from lerobot.processor import (
 )
 from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME

-from .configuration_sac import SACConfig
+from .configuration_gaussian_actor import GaussianActorConfig


-def make_sac_pre_post_processors(
-    config: SACConfig,
+def make_gaussian_actor_pre_post_processors(
+    config: GaussianActorConfig,
    dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None,
 ) -> tuple[
    PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
    PolicyProcessorPipeline[PolicyAction, PolicyAction],
 ]:
    """
-    Constructs pre-processor and post-processor pipelines for the SAC policy.
+    Constructs pre-processor and post-processor pipelines for the Gaussian actor policy.

    The pre-processing pipeline prepares input data for the model by:
    1. Renaming features to match pretrained configurations.
@@ -56,7 +56,7 @@ def make_sac_pre_post_processors(
    2. Unnormalizing the output features to their original scale.

    Args:
-        config: The configuration object for the SAC policy.
+        config: The configuration object for the tanh-Gaussian policy.
        dataset_stats: A dictionary of statistics for normalization.

    Returns:
@@ -31,7 +31,7 @@ class RewardClassifierConfig(PreTrainedConfig):
    latent_dim: int = 256
    image_embedding_pooling_dim: int = 8
    dropout_rate: float = 0.1
-    model_name: str = "helper2424/resnet10"  # TODO: This needs to be updated. The model on the Hub doesn't call self.post_init() in its __init__, which is required by transformers v5 to set all_tied_weights_keys. The from_pretrained call fails when it tries to access this attribute during _finalize_model_loading.
+    model_name: str = "lerobot/resnet10"
    device: str = "cpu"
    model_type: str = "cnn"  # "transformer" or "cnn"
    num_cameras: int = 2
@@ -108,6 +108,7 @@ class Classifier(PreTrainedPolicy):
    def __init__(
        self,
        config: RewardClassifierConfig,
+        **kwargs,
    ):
        from transformers import AutoModel

@@ -269,10 +270,6 @@ class Classifier(PreTrainedPolicy):

    def predict_reward(self, batch, threshold=0.5):
        """Eval method. Returns predicted reward with the decision threshold as argument."""
-        # Check for both OBS_IMAGE and OBS_IMAGES prefixes
-        batch = self.normalize_inputs(batch)
-        batch = self.normalize_targets(batch)
-
        # Extract images from batch dict
        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]

@@ -227,6 +227,7 @@ class PI0FastPaliGemma(nn.Module):
        # forward(..., adarms_cond=...) is supported (same as pi0/pi05).
        if use_adarms[0]:
            text_config = self.paligemma.config.text_config
+            del self.paligemma.model.language_model
            self.paligemma.model.language_model = PiGemmaModel(text_config)

        self.to_bfloat16_for_selected_params(precision)
@@ -197,6 +197,9 @@ class PiGemmaModel(GemmaModel):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
+        # Free parent-allocated layers/norm before replacing to avoid ~2x peak memory.
+        del self.layers
+        del self.norm
        # if not getattr(config, "use_adarms", False):
        #     return
        cond_dim = getattr(config, "adarms_cond_dim", None)
@@ -328,6 +331,7 @@ class PiGemmaForCausalLM(GemmaForCausalLM):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
+        del self.model
        self.model = PiGemmaModel(config)


@@ -336,6 +340,7 @@ class PaliGemmaModelWithPiGemma(PaliGemmaModel):

    def __init__(self, config):
        super().__init__(config)
+        del self.language_model
        self.language_model = PiGemmaModel(config.text_config)


@@ -344,6 +349,7 @@ class PaliGemmaForConditionalGenerationWithPiGemma(PaliGemmaForConditionalGenera

    def __init__(self, config):
        super().__init__(config)
+        del self.model
        self.model = PaliGemmaModelWithPiGemma(config)

    # Make modules available through conditional class for BC
@@ -19,6 +19,7 @@ from .action_queue import ActionQueue
 from .configuration_rtc import RTCConfig
 from .latency_tracker import LatencyTracker
 from .modeling_rtc import RTCProcessor
+from .relative import reanchor_relative_rtc_prefix

 __all__ = [
    "ActionInterpolator",
@@ -26,4 +27,5 @@ __all__ = [
    "LatencyTracker",
    "RTCConfig",
    "RTCProcessor",
+    "reanchor_relative_rtc_prefix",
 ]
@@ -1,116 +1,4 @@
-# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Moved to lerobot.utils.action_interpolator — re-exported for backwards compatibility.
+from lerobot.utils.action_interpolator import ActionInterpolator

-"""Action interpolation for smoother robot control.
-
-Provides configurable Nx control rate by interpolating between consecutive actions.
-Useful with RTC and action-chunking policies to reduce jerkiness.
-"""
-
-from torch import Tensor
-
-
-class ActionInterpolator:
-    """Interpolates between consecutive actions for smoother control.
-
-    When enabled with multiplier N, produces N actions per policy action
-    by linearly interpolating between the previous and current action.
-
-    Example with multiplier=3:
-        prev_action -> [1/3 interpolated, 2/3 interpolated, current_action]
-
-    This effectively multiplies the control rate for smoother motion.
-
-    Usage:
-        interpolator = ActionInterpolator(multiplier=2)  # 2x control rate
-
-        # In control loop:
-        if interpolator.needs_new_action():
-            new_action = queue.get()
-            if new_action:
-                interpolator.add(new_action.cpu())
-
-        action = interpolator.get()
-        if action:
-            robot.send_action(action)
-    """
-
-    def __init__(self, multiplier: int = 1):
-        """Initialize the interpolator.
-
-        Args:
-            multiplier: Control rate multiplier (1 = no interpolation, 2 = 2x, 3 = 3x, etc.)
-        """
-        if multiplier < 1:
-            raise ValueError(f"multiplier must be >= 1, got {multiplier}")
-        self.multiplier = multiplier
-        self._prev: Tensor | None = None
-        self._buffer: list[Tensor] = []
-        self._idx = 0
-
-    @property
-    def enabled(self) -> bool:
-        """Whether interpolation is active (multiplier > 1)."""
-        return self.multiplier > 1
-
-    def reset(self):
-        """Reset interpolation state (call between episodes)."""
-        self._prev = None
-        self._buffer = []
-        self._idx = 0
-
-    def needs_new_action(self) -> bool:
-        """Check if a new action is needed from the queue."""
-        return self._idx >= len(self._buffer)
-
-    def add(self, action: Tensor) -> None:
-        """Add a new action and compute interpolated sequence.
-
-        Args:
-            action: New action tensor from policy/queue (already on CPU).
-        """
-        if self.multiplier > 1 and self._prev is not None:
-            self._buffer = []
-            for i in range(1, self.multiplier + 1):
-                t = i / self.multiplier
-                interp = self._prev + t * (action - self._prev)
-                self._buffer.append(interp)
-        else:
-            # First step: no previous action yet, so run at base FPS without interpolation.
-            self._buffer = [action.clone()]
-        self._prev = action.clone()
-        self._idx = 0
-
-    def get(self) -> Tensor | None:
-        """Get the next interpolated action.
-
-        Returns:
-            Next action tensor, or None if buffer is exhausted.
-        """
-        if self._idx >= len(self._buffer):
-            return None
-        action = self._buffer[self._idx]
-        self._idx += 1
-        return action
-
-    def get_control_interval(self, fps: float) -> float:
-        """Get the control interval based on interpolation multiplier.
-
-        Args:
-            fps: Base frames per second.
-
-        Returns:
-            Control interval in seconds (divided by multiplier).
-        """
-        return 1.0 / (fps * self.multiplier)
+__all__ = ["ActionInterpolator"]
@@ -92,10 +92,10 @@ class ActionQueue:
        Returns:
            int: Number of unconsumed actions.
        """
-        if self.queue is None:
-            return 0
-        length = len(self.queue)
-        return length - self.last_index
+        with self.lock:
+            if self.queue is None:
+                return 0
+            return len(self.queue) - self.last_index

    def empty(self) -> bool:
        """Check if the queue is empty.
@@ -103,11 +103,10 @@ class ActionQueue:
        Returns:
            bool: True if no actions remain, False otherwise.
        """
-        if self.queue is None:
-            return True
-
-        length = len(self.queue)
-        return length - self.last_index <= 0
+        with self.lock:
+            if self.queue is None:
+                return True
+            return len(self.queue) - self.last_index <= 0

    def get_action_index(self) -> int:
        """Get the current action consumption index.
@@ -115,7 +114,8 @@ class ActionQueue:
        Returns:
            int: Index of the next action to be consumed.
        """
-        return self.last_index
+        with self.lock:
+            return self.last_index

    def get_left_over(self) -> Tensor | None:
        """Get leftover original actions for RTC prev_chunk_left_over.
@@ -35,7 +35,7 @@ class RTCConfig:
    """

    # Infrastructure
-    enabled: bool = False
+    enabled: bool = True

    # Core RTC settings
    # Todo change to exp
@@ -0,0 +1,58 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Relative-action helpers for Real-Time Chunking (RTC)."""
+
+from __future__ import annotations
+
+import torch
+
+from lerobot.processor import (
+    NormalizerProcessorStep,
+    RelativeActionsProcessorStep,
+    TransitionKey,
+    create_transition,
+    to_relative_actions,
+)
+
+
+def reanchor_relative_rtc_prefix(
+    prev_actions_absolute: torch.Tensor,
+    current_state: torch.Tensor,
+    relative_step: RelativeActionsProcessorStep,
+    normalizer_step: NormalizerProcessorStep | None,
+    policy_device: torch.device | str,
+) -> torch.Tensor:
+    """Convert absolute leftover actions into model-space for relative-action RTC policies.
+
+    When using relative actions, the RTC prefix (previous chunk's unexecuted tail)
+    is stored in absolute coordinates. Before feeding it back to the policy, this
+    helper re-expresses those actions relative to the robot's current joint state
+    and optionally normalizes them so the policy receives correctly scaled inputs.
+    """
+    state = current_state.detach().cpu()
+    if state.dim() == 1:
+        state = state.unsqueeze(0)
+
+    action_cpu = prev_actions_absolute.detach().cpu()
+    mask = relative_step._build_mask(action_cpu.shape[-1])
+    relative_actions = to_relative_actions(action_cpu, state, mask)
+
+    transition = create_transition(action=relative_actions)
+    if normalizer_step is not None:
+        transition = normalizer_step(transition)
+
+    return transition[TransitionKey.ACTION].to(policy_device)
@@ -61,6 +61,7 @@ from .hil_processor import (
    RewardClassifierProcessorStep,
    TimeLimitProcessorStep,
 )
+from .leader_follower_processor import LeaderFollowerProcessor
 from .newline_task_processor import NewLineTaskProcessorStep
 from .normalize_processor import NormalizerProcessorStep, UnnormalizerProcessorStep, hotswap_stats
 from .observation_processor import VanillaObservationProcessorStep
@@ -122,6 +123,7 @@ __all__ = [
    "ImageCropResizeProcessorStep",
    "InfoProcessorStep",
    "InterventionActionProcessorStep",
+    "LeaderFollowerProcessor",
    "make_default_processors",
    "make_default_teleop_action_processor",
    "make_default_robot_action_processor",
@@ -38,6 +38,7 @@ class MapTensorToDeltaActionDictStep(ActionProcessorStep):
    """

    use_gripper: bool = True
+    use_rotation: bool = False

    def action(self, action: PolicyAction) -> RobotAction:
        if not isinstance(action, PolicyAction):
@@ -52,7 +53,13 @@ class MapTensorToDeltaActionDictStep(ActionProcessorStep):
            "delta_y": action[1].item(),
            "delta_z": action[2].item(),
        }
-        if self.use_gripper:
+        if self.use_rotation:
+            delta_action["delta_wx"] = action[3].item()
+            delta_action["delta_wy"] = action[4].item()
+            delta_action["delta_wz"] = action[5].item()
+            if self.use_gripper:
+                delta_action["gripper"] = action[6].item()
+        elif self.use_gripper:
            delta_action["gripper"] = action[3].item()
        return delta_action

@@ -64,6 +71,12 @@ class MapTensorToDeltaActionDictStep(ActionProcessorStep):
                type=FeatureType.ACTION, shape=(1,)
            )

+        if self.use_rotation:
+            for axis in ["wx", "wy", "wz"]:
+                features[PipelineFeatureType.ACTION][f"delta_{axis}"] = PolicyFeature(
+                    type=FeatureType.ACTION, shape=(1,)
+                )
+
        if self.use_gripper:
            features[PipelineFeatureType.ACTION]["gripper"] = PolicyFeature(
                type=FeatureType.ACTION, shape=(1,)
@@ -90,6 +103,8 @@ class MapDeltaActionToRobotActionStep(RobotActionProcessorStep):
    # Scale factors for delta movements
    position_scale: float = 1.0
    noise_threshold: float = 1e-3  # 1 mm threshold to filter out noise
+    use_rotation: bool = False
+    rotation_scale: float = 1.0

    def action(self, action: RobotAction) -> RobotAction:
        # NOTE (maractingi): Action can be a dict from the teleop_devices or a tensor from the policy
@@ -97,23 +112,34 @@ class MapDeltaActionToRobotActionStep(RobotActionProcessorStep):
        delta_x = action.pop("delta_x")
        delta_y = action.pop("delta_y")
        delta_z = action.pop("delta_z")
+        if self.use_rotation:
+            delta_wx = action.pop("delta_wx")
+            delta_wy = action.pop("delta_wy")
+            delta_wz = action.pop("delta_wz")
+        else:
+            delta_wx = 0.0
+            delta_wy = 0.0
+            delta_wz = 0.0
        gripper = action.pop("gripper")

        # Determine if the teleoperator is actively providing input
        # Consider enabled if any significant movement delta is detected
        position_magnitude = (delta_x**2 + delta_y**2 + delta_z**2) ** 0.5  # Use Euclidean norm for position
-        enabled = position_magnitude > self.noise_threshold  # Small threshold to avoid noise
+        rotation_magnitude = (
+            delta_wx**2 + delta_wy**2 + delta_wz**2
+        ) ** 0.5  # TODO use proper magnitud for rotation
+        enabled = (
+            position_magnitude > self.noise_threshold or rotation_magnitude > self.noise_threshold
+        )  # Small threshold to avoid noise

        # Scale the deltas appropriately
        scaled_delta_x = delta_x * self.position_scale
        scaled_delta_y = delta_y * self.position_scale
        scaled_delta_z = delta_z * self.position_scale

-        # For gamepad/keyboard, we don't have rotation input, so set to 0
-        # These could be extended in the future for more sophisticated teleoperators
-        target_wx = 0.0
-        target_wy = 0.0
-        target_wz = 0.0
+        target_wx = delta_wx * self.rotation_scale
+        target_wy = delta_wy * self.rotation_scale
+        target_wz = delta_wz * self.rotation_scale

        # Update action with robot target format
        action = {
@@ -132,9 +158,15 @@ class MapDeltaActionToRobotActionStep(RobotActionProcessorStep):
    def transform_features(
        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
-        for axis in ["x", "y", "z", "gripper"]:
+        for axis in ["x", "y", "z"]:
            features[PipelineFeatureType.ACTION].pop(f"delta_{axis}", None)

+        if self.use_rotation:
+            for axis in ["wx", "wy", "wz"]:
+                features[PipelineFeatureType.ACTION].pop(f"delta_{axis}", None)
+
+        features[PipelineFeatureType.ACTION].pop("delta_gripper", None)
+
        for feat in ["enabled", "target_x", "target_y", "target_z", "target_wx", "target_wy", "target_wz"]:
            features[PipelineFeatureType.ACTION][f"{feat}"] = PolicyFeature(
                type=FeatureType.ACTION, shape=(1,)
@@ -461,6 +461,7 @@ class InterventionActionProcessorStep(ProcessorStep):

    use_gripper: bool = False
    terminate_on_success: bool = True
+    use_rotation: bool = False

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        """
@@ -497,6 +498,14 @@ class InterventionActionProcessorStep(ProcessorStep):
                    teleop_action.get("delta_y", 0.0),
                    teleop_action.get("delta_z", 0.0),
                ]
+                if self.use_rotation:
+                    action_list.extend(
+                        [
+                            teleop_action.get("delta_wx", 0.0),
+                            teleop_action.get("delta_wy", 0.0),
+                            teleop_action.get("delta_wz", 0.0),
+                        ]
+                    )
                if self.use_gripper:
                    action_list.append(teleop_action.get(GRIPPER_KEY, 1.0))
            elif isinstance(teleop_action, np.ndarray):
@@ -574,7 +583,7 @@ class RewardClassifierProcessorStep(ProcessorStep):
    def __post_init__(self):
        """Initializes the reward classifier model after the dataclass is created."""
        if self.pretrained_path is not None:
-            from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+            from lerobot.policies.gaussian_actor.reward_model.modeling_classifier import Classifier

            self.reward_classifier = Classifier.from_pretrained(self.pretrained_path)
            self.reward_classifier.to(self.device)
@@ -0,0 +1,243 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import dataclass
+
+import numpy as np
+import torch
+
+from lerobot.configs.types import PipelineFeatureType, PolicyFeature
+from lerobot.model.kinematics import RobotKinematics
+from lerobot.processor.pipeline import EnvTransition, ProcessorStepRegistry, TransitionKey
+from lerobot.robots import Robot
+from lerobot.teleoperators import Teleoperator
+from lerobot.teleoperators.utils import TeleopEvents
+from lerobot.utils.rotation import Rotation
+
+from .pipeline import ProcessorStep
+
+
+@ProcessorStepRegistry.register("leader_follower_processor")
+@dataclass
+class LeaderFollowerProcessor(ProcessorStep):
+    """
+    Processor for leader-follower teleoperation mode.
+
+    This processor:
+    1. Sends follower positions to leader arm when not intervening
+    2. Computes EE delta actions from leader when intervening
+    3. Handles teleop events from the leader device
+    """
+
+    leader_device: Teleoperator
+    motor_names: list[str]
+    robot: Robot
+    kinematics: RobotKinematics
+    end_effector_step_sizes: np.ndarray | None = None
+    use_gripper: bool = True
+    # prev_leader_gripper: float | None = None
+    max_gripper_pos: float = 100.0
+    use_ik_solution: bool = False
+
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        """Process transition with leader-follower logic."""
+        # Get current follower position from complementary data
+        # raw_joint_pos = transition.get(TransitionKey.COMPLEMENTARY_DATA, {}).get("raw_joint_positions")
+        raw_joint_pos = transition.get(TransitionKey.OBSERVATION)
+        if raw_joint_pos is not None:
+            # Send follower position to leader (for follow mode)
+            # follower_action = {
+            #     f"{motor}.pos": float(raw_joint_pos[motor])
+            #     for motor in self.motor_names
+            # }
+            self.leader_device.send_action(raw_joint_pos)
+
+        # Only compute EE action if intervention is active
+        # (AddTeleopEventsAsInfo already added IS_INTERVENTION to info)
+        info = transition.get(TransitionKey.INFO, {})
+        if info.get(TeleopEvents.IS_INTERVENTION, False):
+            # Get leader joint positions from teleop_action
+            # (AddTeleopActionAsComplimentaryData already got the action)
+            complementary = transition.get(TransitionKey.COMPLEMENTARY_DATA, {})
+            teleop_action = complementary.get("teleop_action", {})
+
+            if isinstance(teleop_action, dict) and raw_joint_pos is not None:
+                leader_pos = np.array([teleop_action[f"{motor}.pos"] for motor in self.motor_names])
+
+                leader_ee = self.kinematics.forward_kinematics(leader_pos)
+
+                if self.use_ik_solution and "IK_solution" in transition.get(TransitionKey.COMPLEMENTARY_DATA):
+                    follower_pos = transition.get(TransitionKey.COMPLEMENTARY_DATA)["IK_solution"]
+                else:
+                    follower_pos = np.array([raw_joint_pos[f"{motor}.pos"] for motor in self.motor_names])
+
+                follower_ee = self.kinematics.forward_kinematics(follower_pos)
+
+                # follower_gripper_pos = raw_joint_pos["gripper.pos"]
+                follower_gripper_pos = follower_pos[-1]  # assuming gripper is the last motor
+
+                leader_ee_pos = leader_ee[:3, 3]
+                leader_ee_rvec = Rotation.from_matrix(leader_ee[:3, :3]).as_rotvec()
+                leader_gripper_pos = np.clip(
+                    teleop_action["gripper.pos"], -self.max_gripper_pos, self.max_gripper_pos
+                )
+
+                follower_ee_pos = follower_ee[:3, 3]
+                # follower_ee_rvec = Rotation.from_matrix(follower_ee[:3, :3]).as_rotvec()
+
+                delta_pos = leader_ee_pos - follower_ee_pos
+
+                # For rotation: compute relative rotation from follower to leader
+                # R_leader = R_follower * R_delta  =>  R_delta = R_follower^T * R_leader
+                r_delta = follower_ee[:3, :3].T @ leader_ee[:3, :3]
+                delta_rvec = Rotation.from_matrix(r_delta).as_rotvec()
+
+                delta_gripper = leader_gripper_pos - follower_gripper_pos
+
+                desired = np.eye(4, dtype=float)
+                desired[:3, :3] = follower_ee[:3, :3] @ r_delta
+                desired[:3, 3] = follower_ee[:3, 3] + delta_pos
+
+                pos = desired[:3, 3]
+                tw = Rotation.from_matrix(desired[:3, :3]).as_rotvec()
+
+                assert np.allclose(pos, leader_ee_pos), "Position delta computation error"
+                assert np.allclose(tw, leader_ee_rvec), "Orientation delta computation error"
+                assert np.isclose(follower_gripper_pos + delta_gripper, leader_gripper_pos), (
+                    "Gripper delta computation error"
+                )
+
+                # Normalize the action to the range [-1, 1]
+                delta_pos = delta_pos / np.array(
+                    [
+                        self.end_effector_step_sizes["x"],
+                        self.end_effector_step_sizes["y"],
+                        self.end_effector_step_sizes["z"],
+                    ]
+                )
+                delta_rvec = delta_rvec / np.array(
+                    [
+                        self.end_effector_step_sizes["wx"],
+                        self.end_effector_step_sizes["wy"],
+                        self.end_effector_step_sizes["wz"],
+                    ]
+                )
+                max_normalized_pos = max(
+                    abs(delta_pos[0]),
+                    abs(delta_pos[1]),
+                    abs(delta_pos[2]),
+                )
+
+                normalized_rot = max(abs(delta_rvec[0]), abs(delta_rvec[1]), abs(delta_rvec[2]))
+
+                max_normalized = max(max_normalized_pos, normalized_rot)
+
+                if max_normalized > 1.0:
+                    # Scale proportionally
+                    delta_pos = delta_pos / max_normalized
+                    delta_rvec = delta_rvec / max_normalized
+
+                intervention_action = np.array(
+                    [
+                        delta_pos[0],
+                        delta_pos[1],
+                        delta_pos[2],
+                        delta_rvec[0],
+                        delta_rvec[1],
+                        delta_rvec[2],
+                        np.clip(delta_gripper, -self.max_gripper_pos, self.max_gripper_pos)
+                        / self.max_gripper_pos,
+                    ],
+                    dtype=float,
+                )
+
+                #         # Extract leader positions from teleop action dict
+                #         # leader_pos = np.array([teleop_action.get(f"{motor}.pos", 0) for motor in self.motor_names])
+                #         # follower_pos = np.array([raw_joint_pos[f"{motor}.pos"] for motor in self.motor_names])
+
+                #         teleop_action = self.leader_device.bus.sync_read("Present_Position")
+                #         raw_joint_pos = self.robot.bus.sync_read("Present_Position")
+                #         leader_pos = np.array([teleop_action.get(f"{motor}", 0) for motor in self.motor_names])
+                #         follower_pos = np.array([raw_joint_pos[f"{motor}"] for motor in self.motor_names])
+
+                #         # Compute EE positions
+                #         leader_ee_fi = self.kinematics.forward_kinematics(leader_pos)
+                #         leader_ee_pos = leader_ee_fi[:3, 3]
+                #         # leader_ee_rot = Rotation.from_matrix(leader_ee_fi[:3, :3]).as_rotvec()
+                #         leader_ee = np.concat([leader_ee_pos, [0,0,0]])
+
+                #         if "IK_solution" in transition.get(TransitionKey.COMPLEMENTARY_DATA):
+                #             follower_ee = transition.get(TransitionKey.COMPLEMENTARY_DATA)["IK_solution"]
+                #         else:
+                #             follower_pos = np.array([raw_joint_pos[f"{motor}.pos"] for motor in self.motor_names])
+                #             follower_ee_fi = self.kinematics.forward_kinematics(follower_pos)
+                #             follower_ee_pos = follower_ee_fi[:3, 3]
+                #             # follower_ee_rot = Rotation.from_matrix(follower_ee_fi[:3, :3]).as_rotvec()
+                #             follower_ee = np.concat([follower_ee_pos, [0,0,0]])
+
+                #         # Compute normalized EE delta
+                #         if self.end_effector_step_sizes is not None:
+                #             ee_delta = np.clip(
+                #                 leader_ee - follower_ee,
+                #                 -self.end_effector_step_sizes,
+                #                 self.end_effector_step_sizes
+                #             )
+                #             ee_delta_normalized = ee_delta / self.end_effector_step_sizes
+                #         else:
+                #             ee_delta_normalized = leader_ee - follower_ee
+
+                #         # Handle gripper
+                #         if self.use_gripper and len(leader_pos) > 3:
+                #             if self.prev_leader_gripper is None:
+                #                 self.prev_leader_gripper = np.clip(
+                #                     leader_pos[-1], 0, self.max_gripper_pos
+                #                 )
+
+                #             leader_gripper = leader_pos[-1]
+                #             gripper_delta = leader_gripper - self.prev_leader_gripper
+                #             normalized_delta = gripper_delta / self.max_gripper_pos
+
+                #             # Quantize gripper action
+                #             if normalized_delta >= 0.3:
+                #                 gripper_action = 2
+                #             elif normalized_delta <= -0.1:
+                #                 gripper_action = 0
+                #             else:
+                #                 gripper_action = 1
+
+                #             self.prev_leader_gripper = leader_gripper
+
+                #             # Create intervention action
+                #             intervention_action = np.append(ee_delta_normalized, gripper_action)
+                #         else:
+                #             intervention_action = ee_delta_normalized
+
+                #         # Override teleop_action with computed EE action
+                complementary["teleop_action"] = torch.from_numpy(intervention_action).float()
+                transition[TransitionKey.COMPLEMENTARY_DATA] = complementary  # type: ignore[misc]
+
+        return transition
+
+    def reset(self) -> None:
+        """Reset leader-follower state."""
+        # self.prev_leader_gripper = None
+        if hasattr(self.leader_device, "reset"):
+            self.leader_device.reset()
+
+    def transform_features(
+        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
+    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
+        return features
@@ -142,6 +142,10 @@ class RelativeActionsProcessorStep(ProcessorStep):
        new_transition[TransitionKey.ACTION] = to_relative_actions(action, state, mask)
        return new_transition

+    def get_cached_state(self) -> torch.Tensor | None:
+        """Return the cached ``observation.state`` used as the reference point for relative/absolute action conversions."""
+        return self._last_state
+
    def get_config(self) -> dict[str, Any]:
        return {
            "enabled": self.enabled,
@@ -182,7 +186,8 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
                "but relative_step is None. Ensure relative_step is set when constructing the postprocessor."
            )

-        if self.relative_step._last_state is None:
+        cached_state = self.relative_step.get_cached_state()
+        if cached_state is None:
            raise RuntimeError(
                "AbsoluteActionsProcessorStep requires state from RelativeActionsProcessorStep "
                "but no state has been cached. Ensure the preprocessor runs before the postprocessor."
@@ -194,9 +199,7 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
            return new_transition

        mask = self.relative_step._build_mask(action.shape[-1])
-        new_transition[TransitionKey.ACTION] = to_absolute_actions(
-            action, self.relative_step._last_state, mask
-        )
+        new_transition[TransitionKey.ACTION] = to_absolute_actions(action, cached_state, mask)
        return new_transition

    def get_config(self) -> dict[str, Any]:
@@ -12,23 +12,33 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-"""
-Reinforcement learning modules.
+"""Reinforcement learning modules.

-Requires: ``pip install 'lerobot[hilserl]'``
-
-Available modules (import directly)::
-
-    from lerobot.rl.actor import ...
-    from lerobot.rl.learner import ...
-    from lerobot.rl.learner_service import ...
-    from lerobot.rl.buffer import ...
-    from lerobot.rl.eval_policy import ...
-    from lerobot.rl.gym_manipulator import ...
+Distributed actor / learner entry points (``actor``, ``learner``,
+``learner_service``) require ``pip install 'lerobot[hilserl]'``. Algorithms,
+buffer, data sources and trainer are gRPC-free and usable standalone.
 """

-from lerobot.utils.import_utils import require_package
+from .algorithms.base import RLAlgorithm as RLAlgorithm
+from .algorithms.configs import RLAlgorithmConfig as RLAlgorithmConfig, TrainingStats as TrainingStats
+from .algorithms.factory import (
+    make_algorithm as make_algorithm,
+    make_algorithm_config as make_algorithm_config,
+)
+from .algorithms.sac.configuration_sac import SACAlgorithmConfig as SACAlgorithmConfig
+from .buffer import ReplayBuffer as ReplayBuffer
+from .data_sources import DataMixer as DataMixer, OnlineOfflineMixer as OnlineOfflineMixer
+from .trainer import RLTrainer as RLTrainer

-require_package("grpcio", extra="hilserl", import_name="grpc")
-
-__all__: list[str] = []
+__all__ = [
+    "RLAlgorithm",
+    "RLAlgorithmConfig",
+    "TrainingStats",
+    "make_algorithm",
+    "make_algorithm_config",
+    "SACAlgorithmConfig",
+    "RLTrainer",
+    "ReplayBuffer",
+    "DataMixer",
+    "OnlineOfflineMixer",
+]
@@ -51,17 +51,19 @@ import os
 import time
 from functools import lru_cache
 from queue import Empty
+from typing import Any

 import grpc
 import torch
 from torch import nn
-from torch.multiprocessing import Event, Queue
+from torch.multiprocessing import Queue

 from lerobot.cameras import opencv  # noqa: F401
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
-from lerobot.policies import make_policy, make_pre_post_processors
-from lerobot.policies.sac.modeling_sac import SACPolicy
+from lerobot.policies import PreTrainedPolicy, make_policy, make_pre_post_processors
+from lerobot.processor import TransitionKey
+from lerobot.rl.queue import get_last_item_from_queue
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
 from lerobot.teleoperators.utils import TeleopEvents
@@ -74,13 +76,12 @@ from lerobot.transport.utils import (
    send_bytes_in_chunks,
    transitions_to_bytes,
 )
-from lerobot.types import TransitionKey
 from lerobot.utils.device_utils import get_safe_torch_device
+from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
 from lerobot.utils.robot_utils import precise_sleep
 from lerobot.utils.transition import (
    Transition,
-    move_state_dict_to_device,
    move_transition_to_device,
 )
 from lerobot.utils.utils import (
@@ -94,8 +95,6 @@ from .gym_manipulator import (
    reset_and_build_transition,
    step_env_and_process_transition,
 )
-from .process import ProcessSignalHandler
-from .queue import get_last_item_from_queue

 # Main entry point

@@ -212,7 +211,7 @@ def actor_cli(cfg: TrainRLServerPipelineConfig):

 def act_with_policy(
    cfg: TrainRLServerPipelineConfig,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    parameters_queue: Queue,
    transitions_queue: Queue,
    interactions_queue: Queue,
@@ -252,13 +251,13 @@ def act_with_policy(
    logging.info("make_policy")

    ### Instantiate the policy in both the actor and learner processes
-    ### To avoid sending a SACPolicy object through the port, we create a policy instance
+    ### To avoid sending a policy object through the port, we create a policy instance
    ### on both sides, the learner sends the updated parameters every n steps to update the actor's parameters
-    policy: SACPolicy = make_policy(
+    policy = make_policy(
        cfg=cfg.policy,
        env_cfg=cfg.env,
    )
-    policy = policy.eval()
+    policy = policy.to(device).eval()
    assert isinstance(policy, nn.Module)

    preprocessor, postprocessor = make_pre_post_processors(
@@ -292,11 +291,7 @@ def act_with_policy(
        with policy_timer:
            normalized_observation = preprocessor.process_observation(observation)
            action = policy.select_action(batch=normalized_observation)
-            # Unnormalize only the continuous part. When `num_discrete_actions` is set,
-            # `select_action` concatenates an argmax index in env space at the last dim;
-            # action stats cover the continuous dims only, so feeding the full vector to
-            # the unnormalizer would shape-mismatch and would also corrupt the discrete
-            # index by treating it as a normalized value.
+            # Unnormalize only the continuous part.
            if cfg.policy.num_discrete_actions is not None:
                continuous_action = postprocessor.process_action(action[..., :-1])
                discrete_action = action[..., -1:].to(
@@ -347,9 +342,6 @@ def act_with_policy(
            "discrete_penalty": torch.tensor(
                [new_transition[TransitionKey.COMPLEMENTARY_DATA].get("discrete_penalty", 0.0)]
            ),
-            # Forward the intervention flag so the learner can route this transition
-            # into the offline replay buffer (see `process_transitions` in learner.py).
-            # Use the plain string key so the payload survives torch.load(weights_only=True).
            TeleopEvents.IS_INTERVENTION.value: is_intervention,
        }
        # Create transition for learner (convert to old format)
@@ -419,7 +411,7 @@ def act_with_policy(

 def establish_learner_connection(
    stub: services_pb2_grpc.LearnerServiceStub,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    attempts: int = 30,
 ):
    """Establish a connection with the learner.
@@ -471,7 +463,7 @@ def learner_service_client(
 def receive_policy(
    cfg: TrainRLServerPipelineConfig,
    parameters_queue: Queue,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ):
@@ -523,7 +515,7 @@ def receive_policy(
 def send_transitions(
    cfg: TrainRLServerPipelineConfig,
    transitions_queue: Queue,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ) -> services_pb2.Empty:
@@ -573,7 +565,7 @@ def send_transitions(
 def send_interactions(
    cfg: TrainRLServerPipelineConfig,
    interactions_queue: Queue,
-    shutdown_event: Event,  # type: ignore
+    shutdown_event: Any,  # Event
    learner_client: services_pb2_grpc.LearnerServiceStub | None = None,
    grpc_channel: grpc.Channel | None = None,
 ) -> services_pb2.Empty:
@@ -623,7 +615,11 @@ def send_interactions(
    logging.info("[ACTOR] Interactions process stopped")


-def transitions_stream(shutdown_event: Event, transitions_queue: Queue, timeout: float) -> services_pb2.Empty:  # type: ignore
+def transitions_stream(
+    shutdown_event: Any,  # Event
+    transitions_queue: Queue,
+    timeout: float,
+) -> services_pb2.Empty:
    while not shutdown_event.is_set():
        try:
            message = transitions_queue.get(block=True, timeout=timeout)
@@ -639,9 +635,9 @@ def transitions_stream(shutdown_event: Event, transitions_queue: Queue, timeout:


 def interactions_stream(
-    shutdown_event: Event,
+    shutdown_event: Any,  # Event
    interactions_queue: Queue,
-    timeout: float,  # type: ignore
+    timeout: float,
 ) -> services_pb2.Empty:
    while not shutdown_event.is_set():
        try:
@@ -662,7 +658,7 @@ def interactions_stream(
 #  Policy functions


-def update_policy_parameters(policy: SACPolicy, parameters_queue: Queue, device):
+def update_policy_parameters(policy: PreTrainedPolicy, parameters_queue: Queue, device):
    bytes_state_dict = get_last_item_from_queue(parameters_queue, block=False)
    if bytes_state_dict is not None:
        logging.info("[ACTOR] Load new parameters from Learner.")
@@ -677,18 +673,7 @@ def update_policy_parameters(policy: SACPolicy, parameters_queue: Queue, device)
        # - Send critic's encoder state when shared_encoder=True
        # - Skip encoder params entirely when freeze_vision_encoder=True
        # - Ensure discrete_critic gets correct encoder state (currently uses encoder_critic)
-
-        # Load actor state dict
-        actor_state_dict = move_state_dict_to_device(state_dicts["policy"], device=device)
-        policy.actor.load_state_dict(actor_state_dict)
-
-        # Load discrete critic if present
-        if hasattr(policy, "discrete_critic") and "discrete_critic" in state_dicts:
-            discrete_critic_state_dict = move_state_dict_to_device(
-                state_dicts["discrete_critic"], device=device
-            )
-            policy.discrete_critic.load_state_dict(discrete_critic_state_dict)
-            logging.info("[ACTOR] Loaded discrete critic parameters from Learner.")
+        policy.load_actor_weights(state_dicts, device=device)


 #  Utilities functions
@@ -0,0 +1,20 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .sac import SACAlgorithm as SACAlgorithm, SACAlgorithmConfig as SACAlgorithmConfig
+
+__all__ = [
+    "SACAlgorithm",
+    "SACAlgorithmConfig",
+]
@@ -0,0 +1,106 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+from collections.abc import Iterator
+from typing import TYPE_CHECKING, Any
+
+import torch
+from torch.optim import Optimizer
+
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig, TrainingStats
+
+if TYPE_CHECKING:
+    from lerobot.rl.data_sources.data_mixer import DataMixer
+
+BatchType = dict[str, Any]
+
+
+class RLAlgorithm(abc.ABC):
+    """Base for all RL algorithms."""
+
+    config_class: type[RLAlgorithmConfig] | None = None
+    name: str | None = None
+
+    def __init_subclass__(cls, **kwargs):
+        super().__init_subclass__(**kwargs)
+        if not getattr(cls, "config_class", None):
+            raise TypeError(f"Class {cls.__name__} must define 'config_class'")
+        if not getattr(cls, "name", None):
+            raise TypeError(f"Class {cls.__name__} must define 'name'")
+
+    @abc.abstractmethod
+    def update(self, batch_iterator: Iterator[BatchType]) -> TrainingStats:
+        """One complete training step.
+
+        The algorithm calls ``next(batch_iterator)`` as many times as it
+        needs (e.g. ``utd_ratio`` times for SAC) to obtain fresh batches.
+        The iterator is owned by the trainer; the algorithm just consumes
+        from it.
+        """
+        ...
+
+    def configure_data_iterator(
+        self,
+        data_mixer: DataMixer,
+        batch_size: int,
+        *,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ) -> Iterator[BatchType]:
+        """Create the data iterator this algorithm needs.
+
+        The default implementation uses the standard ``data_mixer.get_iterator()``.
+        Algorithms that need specialised sampling should override this method.
+        """
+        return data_mixer.get_iterator(
+            batch_size=batch_size,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+    def make_optimizers_and_scheduler(self) -> dict[str, Optimizer]:
+        """Create, store, and return the optimizers needed for training.
+
+        Called on the **learner** side after construction.  Subclasses must
+        override this with algorithm-specific optimizer setup.
+        """
+        return {}
+
+    def get_optimizers(self) -> dict[str, Optimizer]:
+        """Return optimizers for checkpointing / external scheduling."""
+        return {}
+
+    @property
+    def optimization_step(self) -> int:
+        """Current learner optimization step.
+
+        Part of the stable contract for checkpoint/resume. Algorithms can
+        either use this default storage or override for custom behavior.
+        """
+        return getattr(self, "_optimization_step", 0)
+
+    @optimization_step.setter
+    def optimization_step(self, value: int) -> None:
+        self._optimization_step = int(value)
+
+    def get_weights(self) -> dict[str, Any]:
+        """Policy state-dict to push to actors."""
+        return {}
+
+    @abc.abstractmethod
+    def load_weights(self, weights: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        """Load policy state-dict received from the learner."""
@@ -0,0 +1,76 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Any
+
+import draccus
+import torch
+
+if TYPE_CHECKING:
+    from lerobot.rl.algorithms.base import RLAlgorithm
+
+
+@dataclass
+class TrainingStats:
+    """Returned by ``algorithm.update()`` for logging and checkpointing."""
+
+    losses: dict[str, float] = field(default_factory=dict)
+    grad_norms: dict[str, float] = field(default_factory=dict)
+    extra: dict[str, float] = field(default_factory=dict)
+
+    def to_log_dict(self) -> dict[str, float]:
+        """Flatten all stats into a single dict for logging."""
+
+        d: dict[str, float] = {}
+        for name, val in self.losses.items():
+            d[name] = val
+        for name, val in self.grad_norms.items():
+            d[f"{name}_grad_norm"] = val
+        for name, val in self.extra.items():
+            d[name] = val
+        return d
+
+
+@dataclass
+class RLAlgorithmConfig(draccus.ChoiceRegistry, abc.ABC):
+    """Registry for algorithm configs."""
+
+    @property
+    def type(self) -> str:
+        """Registered name of this algorithm config (e.g. ``"sac"``)."""
+        choice_name = self.get_choice_name(self.__class__)
+        if not isinstance(choice_name, str):
+            raise TypeError(f"Expected string from get_choice_name, got {type(choice_name)}")
+        return choice_name
+
+    @abc.abstractmethod
+    def build_algorithm(self, policy: torch.nn.Module) -> RLAlgorithm:
+        """Construct the :class:`RLAlgorithm` for this config.
+
+        Must be overridden by every registered config subclass.
+        """
+        raise NotImplementedError(f"{type(self).__name__} must implement build_algorithm()")
+
+    @classmethod
+    @abc.abstractmethod
+    def from_policy_config(cls, policy_cfg: Any) -> RLAlgorithmConfig:
+        """Build an algorithm config from a policy config.
+
+        Must be overridden by every registered config subclass.
+        """
+        raise NotImplementedError(f"{cls.__name__} must implement from_policy_config()")
@@ -0,0 +1,47 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import torch
+
+from lerobot.rl.algorithms.base import RLAlgorithm
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+
+
+def make_algorithm_config(algorithm_type: str, **kwargs) -> RLAlgorithmConfig:
+    """Instantiate an `RLAlgorithmConfig` from its registered type name.
+
+    Args:
+        algorithm_type: Registry key of the algorithm (e.g. ``"sac"``).
+        **kwargs: Keyword arguments forwarded to the config class constructor.
+
+    Returns:
+        An instance of the matching ``RLAlgorithmConfig`` subclass.
+
+    Raises:
+        ValueError: If ``algorithm_type`` is not registered.
+    """
+    try:
+        cls = RLAlgorithmConfig.get_choice_class(algorithm_type)
+    except KeyError as err:
+        raise ValueError(
+            f"Algorithm type '{algorithm_type}' is not registered. "
+            f"Available: {list(RLAlgorithmConfig.get_known_choices().keys())}"
+        ) from err
+    return cls(**kwargs)
+
+
+def make_algorithm(cfg: RLAlgorithmConfig, policy: torch.nn.Module) -> RLAlgorithm:
+    return cfg.build_algorithm(policy)
@@ -0,0 +1,18 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from lerobot.rl.algorithms.sac.configuration_sac import SACAlgorithmConfig
+from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+__all__ = ["SACAlgorithm", "SACAlgorithmConfig"]
@@ -0,0 +1,90 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING
+
+import torch
+
+from lerobot.policies.gaussian_actor.configuration_gaussian_actor import (
+    CriticNetworkConfig,
+    GaussianActorConfig,
+)
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+
+if TYPE_CHECKING:
+    from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+
+@RLAlgorithmConfig.register_subclass("sac")
+@dataclass
+class SACAlgorithmConfig(RLAlgorithmConfig):
+    """SAC algorithm hyperparameters."""
+
+    # Optimizer learning rates
+    actor_lr: float = 3e-4
+    critic_lr: float = 3e-4
+    temperature_lr: float = 3e-4
+
+    # Bellman update
+    discount: float = 0.99
+    use_backup_entropy: bool = True
+    critic_target_update_weight: float = 0.005
+
+    # Critic ensemble
+    num_critics: int = 2
+    num_subsample_critics: int | None = None
+    critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
+    discrete_critic_network_kwargs: CriticNetworkConfig = field(default_factory=CriticNetworkConfig)
+
+    # Temperature / entropy
+    temperature_init: float = 1.0
+    # Target entropy for automatic temperature tuning. If ``None``, defaults to
+    # ``-|A|/2`` where ``|A|`` is the total action dimension (continuous + 1 if
+    # there is a discrete action head).
+    target_entropy: float | None = None
+
+    # Update loop
+    utd_ratio: int = 1
+    policy_update_freq: int = 1
+    grad_clip_norm: float = 40.0
+
+    # Optimizations
+    # torch.compile is currently disabled by default
+    use_torch_compile: bool = False
+
+    # Policy config
+    policy_config: GaussianActorConfig | None = None
+
+    @classmethod
+    def from_policy_config(cls, policy_cfg: GaussianActorConfig) -> SACAlgorithmConfig:
+        """Build an algorithm config with default hyperparameters for a given policy."""
+        return cls(
+            policy_config=policy_cfg,
+            discrete_critic_network_kwargs=policy_cfg.discrete_critic_network_kwargs,
+        )
+
+    def build_algorithm(self, policy: torch.nn.Module) -> SACAlgorithm:
+        if self.policy_config is None:
+            raise ValueError(
+                "SACAlgorithmConfig.policy_config is None. "
+                "It must be populated (typically by TrainRLServerPipelineConfig.validate) "
+                "before calling build_algorithm()."
+            )
+
+        from lerobot.rl.algorithms.sac.sac_algorithm import SACAlgorithm
+
+        return SACAlgorithm(policy=policy, config=self)
@@ -0,0 +1,595 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import math
+from collections.abc import Callable, Iterator
+from dataclasses import asdict
+from typing import Any
+
+import einops
+import torch
+import torch.nn as nn
+import torch.nn.functional as F  # noqa: N812
+from torch import Tensor
+from torch.optim import Optimizer
+
+from lerobot.policies.gaussian_actor.modeling_gaussian_actor import (
+    DISCRETE_DIMENSION_INDEX,
+    MLP,
+    DiscreteCritic,
+    GaussianActorObservationEncoder,
+    GaussianActorPolicy,
+    orthogonal_init,
+)
+from lerobot.policies.utils import get_device_from_parameters
+from lerobot.rl.algorithms.base import BatchType, RLAlgorithm
+from lerobot.rl.algorithms.configs import TrainingStats
+from lerobot.rl.algorithms.sac.configuration_sac import SACAlgorithmConfig
+from lerobot.utils.constants import ACTION
+from lerobot.utils.transition import move_state_dict_to_device
+
+
+class SACAlgorithm(RLAlgorithm):
+    """Soft Actor-Critic. Owns critics, targets, temperature, and loss computation."""
+
+    config_class = SACAlgorithmConfig
+    name = "sac"
+
+    def __init__(
+        self,
+        policy: GaussianActorPolicy,
+        config: SACAlgorithmConfig,
+    ):
+        self.config = config
+        self.policy_config = config.policy_config
+        self.policy = policy
+        self.optimizers: dict[str, Optimizer] = {}
+        self._optimization_step: int = 0
+
+        action_dim = self.policy.config.output_features[ACTION].shape[0]
+        self._init_critics(action_dim)
+        self._init_temperature(action_dim)
+
+        self._device = torch.device(self.policy.config.device)
+        self._move_to_device()
+
+    def _init_critics(self, action_dim) -> None:
+        """Build critic ensemble, targets."""
+        encoder = self.policy.encoder_critic
+
+        heads = [
+            CriticHead(
+                input_dim=encoder.output_dim + action_dim,
+                **asdict(self.config.critic_network_kwargs),
+            )
+            for _ in range(self.config.num_critics)
+        ]
+        self.critic_ensemble = CriticEnsemble(encoder=encoder, ensemble=heads)
+        target_heads = [
+            CriticHead(
+                input_dim=encoder.output_dim + action_dim,
+                **asdict(self.config.critic_network_kwargs),
+            )
+            for _ in range(self.config.num_critics)
+        ]
+        self.critic_target = CriticEnsemble(encoder=encoder, ensemble=target_heads)
+        self.critic_target.load_state_dict(self.critic_ensemble.state_dict())
+
+        # TODO(Khalil): Investigate and fix torch.compile
+        # NOTE: torch.compile is disabled, policy does not converge when enabled.
+        if self.config.use_torch_compile:
+            self.critic_ensemble = torch.compile(self.critic_ensemble)
+            self.critic_target = torch.compile(self.critic_target)
+
+        self.discrete_critic_target = None
+        if self.policy_config.num_discrete_actions is not None:
+            self.discrete_critic_target = self._init_discrete_critic_target(encoder)
+
+    def _init_discrete_critic_target(self, encoder: GaussianActorObservationEncoder) -> DiscreteCritic:
+        """Build target discrete critic (main network is owned by the policy)."""
+        discrete_critic_target = DiscreteCritic(
+            encoder=encoder,
+            input_dim=encoder.output_dim,
+            output_dim=self.policy_config.num_discrete_actions,
+            **asdict(self.config.discrete_critic_network_kwargs),
+        )
+        # TODO(Khalil): Compile the discrete critic
+        discrete_critic_target.load_state_dict(self.policy.discrete_critic.state_dict())
+        return discrete_critic_target
+
+    def _init_temperature(self, continuous_action_dim: int) -> None:
+        """Set up temperature parameter (log_alpha) and target entropy."""
+        temp_init = self.config.temperature_init
+        self.log_alpha = nn.Parameter(torch.tensor([math.log(temp_init)]))
+
+        self.target_entropy = self.config.target_entropy
+        if self.target_entropy is None:
+            total_action_dim = continuous_action_dim + (
+                1 if self.policy_config.num_discrete_actions is not None else 0
+            )
+            self.target_entropy = -total_action_dim / 2
+
+    def _move_to_device(self) -> None:
+        self.policy.to(self._device)
+        self.critic_ensemble.to(self._device)
+        self.critic_target.to(self._device)
+        self.log_alpha = nn.Parameter(self.log_alpha.data.to(self._device))
+        if self.discrete_critic_target is not None:
+            self.discrete_critic_target.to(self._device)
+
+    @property
+    def temperature(self) -> float:
+        """Return the current temperature value, always in sync with log_alpha."""
+        return self.log_alpha.exp().item()
+
+    def _critic_forward(
+        self,
+        observations: dict[str, Tensor],
+        actions: Tensor,
+        use_target: bool = False,
+        observation_features: Tensor | None = None,
+    ) -> Tensor:
+        """Forward pass through a critic network ensemble
+
+        Args:
+            observations: Dictionary of observations
+            actions: Action tensor
+            use_target: If True, use target critics, otherwise use ensemble critics
+
+        Returns:
+            Tensor of Q-values from all critics
+        """
+
+        critics = self.critic_target if use_target else self.critic_ensemble
+        q_values = critics(observations, actions, observation_features)
+        return q_values
+
+    def _discrete_critic_forward(
+        self, observations, use_target=False, observation_features=None
+    ) -> torch.Tensor:
+        """Forward pass through a discrete critic network
+
+        Args:
+            observations: Dictionary of observations
+            use_target: If True, use target critics, otherwise use ensemble critics
+            observation_features: Optional pre-computed observation features to avoid recomputing encoder output
+
+        Returns:
+            Tensor of Q-values from the discrete critic network
+        """
+        discrete_critic = self.discrete_critic_target if use_target else self.policy.discrete_critic
+        q_values = discrete_critic(observations, observation_features)
+        return q_values
+
+    def update(self, batch_iterator: Iterator[BatchType]) -> TrainingStats:
+        clip = self.config.grad_clip_norm
+
+        for _ in range(self.config.utd_ratio - 1):
+            batch = next(batch_iterator)
+            fb = self._prepare_forward_batch(batch, include_complementary_info=True)
+
+            loss_critic = self._compute_loss_critic(fb)
+            self.optimizers["critic"].zero_grad()
+            loss_critic.backward()
+            torch.nn.utils.clip_grad_norm_(self.critic_ensemble.parameters(), max_norm=clip)
+            self.optimizers["critic"].step()
+
+            if self.policy_config.num_discrete_actions is not None:
+                loss_dc = self._compute_loss_discrete_critic(fb)
+                self.optimizers["discrete_critic"].zero_grad()
+                loss_dc.backward()
+                torch.nn.utils.clip_grad_norm_(self.policy.discrete_critic.parameters(), max_norm=clip)
+                self.optimizers["discrete_critic"].step()
+
+            self._update_target_networks()
+
+        batch = next(batch_iterator)
+        fb = self._prepare_forward_batch(batch, include_complementary_info=False)
+
+        loss_critic = self._compute_loss_critic(fb)
+        self.optimizers["critic"].zero_grad()
+        loss_critic.backward()
+        critic_grad = torch.nn.utils.clip_grad_norm_(self.critic_ensemble.parameters(), max_norm=clip).item()
+        self.optimizers["critic"].step()
+
+        stats = TrainingStats(
+            losses={"loss_critic": loss_critic.item()},
+            grad_norms={"critic": critic_grad},
+        )
+
+        if self.policy_config.num_discrete_actions is not None:
+            loss_dc = self._compute_loss_discrete_critic(fb)
+            self.optimizers["discrete_critic"].zero_grad()
+            loss_dc.backward()
+            dc_grad = torch.nn.utils.clip_grad_norm_(
+                self.policy.discrete_critic.parameters(), max_norm=clip
+            ).item()
+            self.optimizers["discrete_critic"].step()
+            stats.losses["loss_discrete_critic"] = loss_dc.item()
+            stats.grad_norms["discrete_critic"] = dc_grad
+
+        if self._optimization_step % self.config.policy_update_freq == 0:
+            for _ in range(self.config.policy_update_freq):
+                loss_actor = self._compute_loss_actor(fb)
+                self.optimizers["actor"].zero_grad()
+                loss_actor.backward()
+                actor_grad = torch.nn.utils.clip_grad_norm_(
+                    self.policy.actor.parameters(), max_norm=clip
+                ).item()
+                self.optimizers["actor"].step()
+
+                loss_temp = self._compute_loss_temperature(fb)
+                self.optimizers["temperature"].zero_grad()
+                loss_temp.backward()
+                temp_grad = torch.nn.utils.clip_grad_norm_([self.log_alpha], max_norm=clip).item()
+                self.optimizers["temperature"].step()
+
+            stats.losses["loss_actor"] = loss_actor.item()
+            stats.losses["loss_temperature"] = loss_temp.item()
+            stats.grad_norms["actor"] = actor_grad
+            stats.grad_norms["temperature"] = temp_grad
+            stats.extra["temperature"] = self.temperature
+
+        self._update_target_networks()
+        self._optimization_step += 1
+        return stats
+
+    def _compute_loss_critic(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        actions = batch[ACTION]
+        rewards = batch["reward"]
+        next_observations = batch["next_state"]
+        done = batch["done"]
+        observation_features = batch.get("observation_feature")
+        next_observation_features = batch.get("next_observation_feature")
+
+        with torch.no_grad():
+            next_action_preds, next_log_probs, _ = self.policy.actor(
+                next_observations, next_observation_features
+            )
+
+            # 2- compute q targets
+            q_targets = self._critic_forward(
+                observations=next_observations,
+                actions=next_action_preds,
+                use_target=True,
+                observation_features=next_observation_features,
+            )
+
+            # subsample critics to prevent overfitting if use high UTD (update to date)
+            # TODO: Get indices before forward pass to avoid unnecessary computation
+            if self.config.num_subsample_critics is not None:
+                indices = torch.randperm(self.config.num_critics)
+                indices = indices[: self.config.num_subsample_critics]
+                q_targets = q_targets[indices]
+
+            # critics subsample size
+            min_q, _ = q_targets.min(dim=0)  # Get values from min operation
+            if self.config.use_backup_entropy:
+                min_q = min_q - (self.temperature * next_log_probs)
+
+            td_target = rewards + (1 - done) * self.config.discount * min_q
+
+        # 3- compute predicted qs
+        if self.policy_config.num_discrete_actions is not None:
+            # NOTE: We only want to keep the continuous action part
+            # In the buffer we have the full action space (continuous + discrete)
+            # We need to split them before concatenating them in the critic forward
+            actions: Tensor = actions[:, :DISCRETE_DIMENSION_INDEX]
+        q_preds = self._critic_forward(
+            observations=observations,
+            actions=actions,
+            use_target=False,
+            observation_features=observation_features,
+        )
+
+        # 4- Calculate loss
+        # Compute state-action value loss (TD loss) for all of the Q functions in the ensemble.
+        td_target_duplicate = einops.repeat(td_target, "b -> e b", e=q_preds.shape[0])
+        # You compute the mean loss of the batch for each critic and then to compute the final loss you sum them up
+        critics_loss = (
+            F.mse_loss(
+                input=q_preds,
+                target=td_target_duplicate,
+                reduction="none",
+            ).mean(dim=1)
+        ).sum()
+        return critics_loss
+
+    def _compute_loss_discrete_critic(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        actions = batch[ACTION]
+        rewards = batch["reward"]
+        next_observations = batch["next_state"]
+        done = batch["done"]
+        observation_features = batch.get("observation_feature")
+        next_observation_features = batch.get("next_observation_feature")
+        complementary_info = batch.get("complementary_info")
+
+        # NOTE: We only want to keep the discrete action part
+        # In the buffer we have the full action space (continuous + discrete)
+        # We need to split them before concatenating them in the critic forward
+        actions_discrete: Tensor = actions[:, DISCRETE_DIMENSION_INDEX:].clone()
+        actions_discrete = torch.round(actions_discrete)
+        actions_discrete = actions_discrete.long()
+
+        discrete_penalties: Tensor | None = None
+        if complementary_info is not None:
+            discrete_penalties = complementary_info.get("discrete_penalty")
+
+        with torch.no_grad():
+            # For DQN, select actions using online network, evaluate with target network
+            next_discrete_qs = self._discrete_critic_forward(
+                next_observations, use_target=False, observation_features=next_observation_features
+            )
+            best_next_discrete_action = torch.argmax(next_discrete_qs, dim=-1, keepdim=True)
+
+            # Get target Q-values from target network
+            target_next_discrete_qs = self._discrete_critic_forward(
+                observations=next_observations,
+                use_target=True,
+                observation_features=next_observation_features,
+            )
+
+            # Use gather to select Q-values for best actions
+            target_next_discrete_q = torch.gather(
+                target_next_discrete_qs, dim=1, index=best_next_discrete_action
+            ).squeeze(-1)
+
+            # Compute target Q-value with Bellman equation
+            rewards_discrete = rewards
+            if discrete_penalties is not None:
+                rewards_discrete = rewards + discrete_penalties
+            target_discrete_q = rewards_discrete + (1 - done) * self.config.discount * target_next_discrete_q
+
+        # Get predicted Q-values for current observations
+        predicted_discrete_qs = self._discrete_critic_forward(
+            observations=observations, use_target=False, observation_features=observation_features
+        )
+
+        # Use gather to select Q-values for taken actions
+        predicted_discrete_q = torch.gather(predicted_discrete_qs, dim=1, index=actions_discrete).squeeze(-1)
+
+        # Compute MSE loss between predicted and target Q-values
+        discrete_critic_loss = F.mse_loss(input=predicted_discrete_q, target=target_discrete_q)
+        return discrete_critic_loss
+
+    def _compute_loss_actor(self, batch: dict[str, Any]) -> Tensor:
+        observations = batch["state"]
+        observation_features = batch.get("observation_feature")
+
+        actions_pi, log_probs, _ = self.policy.actor(observations, observation_features)
+
+        q_preds = self._critic_forward(
+            observations=observations,
+            actions=actions_pi,
+            use_target=False,
+            observation_features=observation_features,
+        )
+        min_q_preds = q_preds.min(dim=0)[0]
+
+        actor_loss = ((self.temperature * log_probs) - min_q_preds).mean()
+        return actor_loss
+
+    def _compute_loss_temperature(self, batch: dict[str, Any]) -> Tensor:
+        """Compute the temperature loss"""
+        observations = batch["state"]
+        observation_features = batch.get("observation_feature")
+
+        # calculate temperature loss
+        with torch.no_grad():
+            _, log_probs, _ = self.policy.actor(observations, observation_features)
+
+        temperature_loss = (-self.log_alpha.exp() * (log_probs + self.target_entropy)).mean()
+        return temperature_loss
+
+    def _update_target_networks(self) -> None:
+        """Update target networks with exponential moving average"""
+        for target_p, p in zip(
+            self.critic_target.parameters(), self.critic_ensemble.parameters(), strict=True
+        ):
+            target_p.data.copy_(
+                p.data * self.config.critic_target_update_weight
+                + target_p.data * (1.0 - self.config.critic_target_update_weight)
+            )
+        if self.policy_config.num_discrete_actions is not None:
+            for target_p, p in zip(
+                self.discrete_critic_target.parameters(),
+                self.policy.discrete_critic.parameters(),
+                strict=True,
+            ):
+                target_p.data.copy_(
+                    p.data * self.config.critic_target_update_weight
+                    + target_p.data * (1.0 - self.config.critic_target_update_weight)
+                )
+
+    def _prepare_forward_batch(
+        self, batch: BatchType, *, include_complementary_info: bool = True
+    ) -> dict[str, Any]:
+        observations = batch["state"]
+        next_observations = batch["next_state"]
+        observation_features, next_observation_features = self.get_observation_features(
+            observations, next_observations
+        )
+        forward_batch: dict[str, Any] = {
+            ACTION: batch[ACTION],
+            "reward": batch["reward"],
+            "state": observations,
+            "next_state": next_observations,
+            "done": batch["done"],
+            "observation_feature": observation_features,
+            "next_observation_feature": next_observation_features,
+        }
+        if include_complementary_info and "complementary_info" in batch:
+            forward_batch["complementary_info"] = batch["complementary_info"]
+        return forward_batch
+
+    def make_optimizers_and_scheduler(self) -> dict[str, Optimizer]:
+        """
+        Creates and returns optimizers for the actor, critic, and temperature components of a reinforcement learning policy.
+
+        This function sets up Adam optimizers for:
+        - The **actor network**, ensuring that only relevant parameters are optimized.
+        - The **critic ensemble**, which evaluates the value function.
+        - The **temperature parameter**, which controls the entropy in soft actor-critic (SAC)-like methods.
+
+        It also initializes a learning rate scheduler, though currently, it is set to `None`.
+
+        NOTE:
+        - If the encoder is shared, its parameters are excluded from the actor's optimization process.
+        - The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
+
+        Args:
+            cfg: Configuration object containing hyperparameters.
+            policy (nn.Module): The policy model containing the actor, critic, and temperature components.
+
+        Returns:
+            A dictionary mapping component names ("actor", "critic", "temperature")
+            to their respective Adam optimizers.
+        """
+        actor_params = self.policy.get_optim_params()["actor"]
+        self.optimizers = {
+            "actor": torch.optim.Adam(actor_params, lr=self.config.actor_lr),
+            "critic": torch.optim.Adam(self.critic_ensemble.parameters(), lr=self.config.critic_lr),
+            "temperature": torch.optim.Adam([self.log_alpha], lr=self.config.temperature_lr),
+        }
+        if self.policy_config.num_discrete_actions is not None:
+            self.optimizers["discrete_critic"] = torch.optim.Adam(
+                self.policy.discrete_critic.parameters(), lr=self.config.critic_lr
+            )
+        return self.optimizers
+
+    def get_optimizers(self) -> dict[str, Optimizer]:
+        return self.optimizers
+
+    def get_weights(self) -> dict[str, Any]:
+        """Send actor + discrete-critic state dicts."""
+        state_dicts: dict[str, Any] = {
+            "policy": move_state_dict_to_device(self.policy.actor.state_dict(), device="cpu"),
+        }
+        if self.policy_config.num_discrete_actions is not None:
+            state_dicts["discrete_critic"] = move_state_dict_to_device(
+                self.policy.discrete_critic.state_dict(), device="cpu"
+            )
+        return state_dicts
+
+    def load_weights(self, weights: dict[str, Any], device: str | torch.device = "cpu") -> None:
+        """Load actor + discrete-critic weights into the policy."""
+        self.policy.load_actor_weights(weights, device=device)
+
+    def get_observation_features(
+        self, observations: Tensor, next_observations: Tensor
+    ) -> tuple[Tensor | None, Tensor | None]:
+        """
+        Get observation features from the policy encoder. It act as cache for the observation features.
+        when the encoder is frozen, the observation features are not updated.
+        We can save compute by caching the observation features.
+
+        Args:
+            policy: The policy model
+            observations: The current observations
+            next_observations: The next observations
+
+        Returns:
+            tuple: observation_features, next_observation_features
+        """
+
+        if self.policy.config.vision_encoder_name is None or not self.policy.config.freeze_vision_encoder:
+            return None, None
+
+        with torch.no_grad():
+            observation_features = self.policy.actor.encoder.get_cached_image_features(observations)
+            next_observation_features = self.policy.actor.encoder.get_cached_image_features(next_observations)
+
+        return observation_features, next_observation_features
+
+
+class CriticHead(nn.Module):
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dims: list[int],
+        activations: Callable[[torch.Tensor], torch.Tensor] | str = nn.SiLU(),
+        activate_final: bool = False,
+        dropout_rate: float | None = None,
+        init_final: float | None = None,
+        final_activation: Callable[[torch.Tensor], torch.Tensor] | str | None = None,
+    ):
+        super().__init__()
+        self.net = MLP(
+            input_dim=input_dim,
+            hidden_dims=hidden_dims,
+            activations=activations,
+            activate_final=activate_final,
+            dropout_rate=dropout_rate,
+            final_activation=final_activation,
+        )
+        self.output_layer = nn.Linear(in_features=hidden_dims[-1], out_features=1)
+        if init_final is not None:
+            nn.init.uniform_(self.output_layer.weight, -init_final, init_final)
+            nn.init.uniform_(self.output_layer.bias, -init_final, init_final)
+        else:
+            orthogonal_init()(self.output_layer.weight)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.output_layer(self.net(x))
+
+
+class CriticEnsemble(nn.Module):
+    """
+    CriticEnsemble wraps multiple CriticHead modules into an ensemble.
+
+    Args:
+        encoder (GaussianActorObservationEncoder): encoder for observations.
+        ensemble (List[CriticHead]): list of critic heads.
+        init_final (float | None): optional initializer scale for final layers.
+
+    Forward returns a tensor of shape (num_critics, batch_size) containing Q-values.
+    """
+
+    def __init__(
+        self,
+        encoder: GaussianActorObservationEncoder,
+        ensemble: list[CriticHead],
+        init_final: float | None = None,
+    ):
+        super().__init__()
+        self.encoder = encoder
+        self.init_final = init_final
+        self.critics = nn.ModuleList(ensemble)
+
+    def forward(
+        self,
+        observations: dict[str, torch.Tensor],
+        actions: torch.Tensor,
+        observation_features: torch.Tensor | None = None,
+    ) -> torch.Tensor:
+        device = get_device_from_parameters(self)
+        # Move each tensor in observations to device
+        observations = {k: v.to(device) for k, v in observations.items()}
+
+        obs_enc = self.encoder(observations, cache=observation_features)
+
+        inputs = torch.cat([obs_enc, actions], dim=-1)
+
+        # Loop through critics and collect outputs
+        q_values = []
+        for critic in self.critics:
+            q_values.append(critic(inputs))
+
+        # Stack outputs to match expected shape [num_critics, batch_size]
+        q_values = torch.stack([q.squeeze(-1) for q in q_values], dim=0)
+        return q_values
@@ -97,8 +97,8 @@ class ReplayBuffer:
        Args:
            capacity (int): Maximum number of transitions to store in the buffer.
            device (str): The device where the tensors will be moved when sampling ("cuda:0" or "cpu").
-            state_keys (List[str]): The list of keys that appear in `state` and `next_state`.
-            image_augmentation_function (Optional[Callable]): A function that takes a batch of images
+            state_keys (list[str]): The list of keys that appear in `state` and `next_state`.
+            image_augmentation_function (Callable | None): A function that takes a batch of images
                and returns a batch of augmented images. If None, a default augmentation function is used.
            use_drq (bool): Whether to use the default DRQ image augmentation style, when sampling in the buffer.
            storage_device: The device (e.g. "cpu" or "cuda:0") where the data will be stored.
@@ -634,7 +634,7 @@ class ReplayBuffer:
                If None, you must handle or define default keys.

        Returns:
-            transitions (List[Transition]):
+            transitions (list[Transition]):
                A list of Transition dictionaries with the same length as `dataset`.
        """
        if state_keys is None:
@@ -176,11 +176,11 @@ def convert_lerobot_dataset_to_cropped_lerobot_dataset(

    Args:
        original_dataset (LeRobotDataset): The source dataset.
-        crop_params_dict (Dict[str, Tuple[int, int, int, int]]):
+        crop_params_dict (dict[str, Tuple[int, int, int, int]]):
            A dictionary mapping observation keys to crop parameters (top, left, height, width).
        new_repo_id (str): Repository id for the new dataset.
        new_dataset_root (str): The root directory where the new dataset will be written.
-        resize_size (Tuple[int, int], optional): The target size (height, width) after cropping.
+        resize_size (tuple[int, int], optional): The target size (height, width) after cropping.
            Defaults to (128, 128).

    Returns:
@@ -0,0 +1,17 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .data_mixer import BatchType, DataMixer, OnlineOfflineMixer
+
+__all__ = ["BatchType", "DataMixer", "OnlineOfflineMixer"]
@@ -0,0 +1,96 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+import abc
+
+from lerobot.rl.algorithms.base import BatchType
+from lerobot.rl.buffer import ReplayBuffer, concatenate_batch_transitions
+
+
+class DataMixer(abc.ABC):
+    """Abstract interface for all data mixing strategies."""
+
+    @abc.abstractmethod
+    def sample(self, batch_size: int) -> BatchType:
+        """Draw one batch of ``batch_size`` transitions."""
+        ...
+
+    def get_iterator(
+        self,
+        batch_size: int,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ):
+        """Infinite iterator that yields batches."""
+        while True:
+            yield self.sample(batch_size)
+
+
+class OnlineOfflineMixer(DataMixer):
+    """Mixes transitions from an online and an offline replay buffer."""
+
+    def __init__(
+        self,
+        online_buffer: ReplayBuffer,
+        offline_buffer: ReplayBuffer | None = None,
+        online_ratio: float = 1.0,
+    ):
+        if not 0.0 <= online_ratio <= 1.0:
+            raise ValueError(f"online_ratio must be in [0, 1], got {online_ratio}")
+        self.online_buffer = online_buffer
+        self.offline_buffer = offline_buffer
+        self.online_ratio = online_ratio
+
+    def sample(self, batch_size: int) -> BatchType:
+        if self.offline_buffer is None:
+            return self.online_buffer.sample(batch_size)
+
+        n_online = max(1, int(batch_size * self.online_ratio))
+        n_offline = batch_size - n_online
+
+        online_batch = self.online_buffer.sample(n_online)
+        offline_batch = self.offline_buffer.sample(n_offline)
+        return concatenate_batch_transitions(online_batch, offline_batch)
+
+    def get_iterator(
+        self,
+        batch_size: int,
+        async_prefetch: bool = True,
+        queue_size: int = 2,
+    ):
+        """Yield batches by composing buffer async iterators."""
+
+        n_online = max(1, int(batch_size * self.online_ratio))
+
+        online_iter = self.online_buffer.get_iterator(
+            batch_size=n_online,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+        if self.offline_buffer is None:
+            yield from online_iter
+            return
+
+        n_offline = batch_size - n_online
+        offline_iter = self.offline_buffer.get_iterator(
+            batch_size=n_offline,
+            async_prefetch=async_prefetch,
+            queue_size=queue_size,
+        )
+
+        while True:
+            yield concatenate_batch_transitions(next(online_iter), next(offline_iter))
@@ -17,9 +17,9 @@ import logging

 from lerobot.cameras import opencv  # noqa: F401
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
 from lerobot.datasets import LeRobotDataset
 from lerobot.policies import make_policy
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
 from lerobot.robots import (  # noqa: F401
    RobotConfig,
    make_robot_from_config,
@@ -689,74 +689,81 @@ def control_loop(
    episode_step = 0
    episode_start_time = time.perf_counter()

-    while episode_idx < cfg.dataset.num_episodes_to_record:
-        step_start_time = time.perf_counter()
+    try:
+        while episode_idx < cfg.dataset.num_episodes_to_record:
+            step_start_time = time.perf_counter()

-        # Create a neutral action (no movement)
-        neutral_action = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32)
-        if use_gripper:
-            neutral_action = torch.cat([neutral_action, torch.tensor([1.0])])  # Gripper stay
-
-        # Use the new step function
-        transition = step_env_and_process_transition(
-            env=env,
-            transition=transition,
-            action=neutral_action,
-            env_processor=env_processor,
-            action_processor=action_processor,
-        )
-        terminated = transition.get(TransitionKey.DONE, False)
-        truncated = transition.get(TransitionKey.TRUNCATED, False)
-
-        if cfg.mode == "record":
-            observations = {
-                k: v.squeeze(0).cpu()
-                for k, v in transition[TransitionKey.OBSERVATION].items()
-                if isinstance(v, torch.Tensor)
-            }
-            # Use teleop_action if available, otherwise use the action from the transition
-            action_to_record = transition[TransitionKey.COMPLEMENTARY_DATA].get(
-                "teleop_action", transition[TransitionKey.ACTION]
-            )
-            frame = {
-                **observations,
-                ACTION: action_to_record.cpu(),
-                REWARD: np.array([transition[TransitionKey.REWARD]], dtype=np.float32),
-                DONE: np.array([terminated or truncated], dtype=bool),
-            }
+            # Create a neutral action (no movement)
+            neutral_action = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32)
            if use_gripper:
-                discrete_penalty = transition[TransitionKey.COMPLEMENTARY_DATA].get("discrete_penalty", 0.0)
-                frame["complementary_info.discrete_penalty"] = np.array([discrete_penalty], dtype=np.float32)
+                neutral_action = torch.cat([neutral_action, torch.tensor([1.0])])  # Gripper stay

-            if dataset is not None:
-                frame["task"] = cfg.dataset.task
-                dataset.add_frame(frame)
-
-        episode_step += 1
-
-        # Handle episode termination
-        if terminated or truncated:
-            episode_time = time.perf_counter() - episode_start_time
-            logging.info(
-                f"Episode ended after {episode_step} steps in {episode_time:.1f}s with reward {transition[TransitionKey.REWARD]}"
+            transition = step_env_and_process_transition(
+                env=env,
+                transition=transition,
+                action=neutral_action,
+                env_processor=env_processor,
+                action_processor=action_processor,
            )
-            episode_step = 0
-            episode_idx += 1
+            terminated = transition.get(TransitionKey.DONE, False)
+            truncated = transition.get(TransitionKey.TRUNCATED, False)

-            if dataset is not None:
-                if transition[TransitionKey.INFO].get(TeleopEvents.RERECORD_EPISODE, False):
-                    logging.info(f"Re-recording episode {episode_idx}")
-                    dataset.clear_episode_buffer()
-                    episode_idx -= 1
-                else:
-                    logging.info(f"Saving episode {episode_idx}")
-                    dataset.save_episode()
+            if cfg.mode == "record":
+                observations = {
+                    k: v.squeeze(0).cpu()
+                    for k, v in transition[TransitionKey.OBSERVATION].items()
+                    if isinstance(v, torch.Tensor)
+                }
+                action_to_record = transition[TransitionKey.COMPLEMENTARY_DATA].get(
+                    "teleop_action", transition[TransitionKey.ACTION]
+                )
+                frame = {
+                    **observations,
+                    ACTION: action_to_record.cpu(),
+                    REWARD: np.array([transition[TransitionKey.REWARD]], dtype=np.float32),
+                    DONE: np.array([terminated or truncated], dtype=bool),
+                }
+                if use_gripper:
+                    discrete_penalty = transition[TransitionKey.COMPLEMENTARY_DATA].get(
+                        "discrete_penalty", 0.0
+                    )
+                    frame["complementary_info.discrete_penalty"] = np.array(
+                        [discrete_penalty], dtype=np.float32
+                    )

-            # Reset for new episode
-            transition = reset_and_build_transition(env, env_processor, action_processor)
+                if dataset is not None:
+                    frame["task"] = cfg.dataset.task
+                    dataset.add_frame(frame)

-        # Maintain fps timing
-        precise_sleep(max(dt - (time.perf_counter() - step_start_time), 0.0))
+            episode_step += 1
+
+            # Handle episode termination
+            if terminated or truncated:
+                episode_time = time.perf_counter() - episode_start_time
+                logging.info(
+                    f"Episode ended after {episode_step} steps in {episode_time:.1f}s with reward {transition[TransitionKey.REWARD]}"
+                )
+                episode_step = 0
+                episode_idx += 1
+
+                if dataset is not None:
+                    if transition[TransitionKey.INFO].get(TeleopEvents.RERECORD_EPISODE, False):
+                        logging.info(f"Re-recording episode {episode_idx}")
+                        dataset.clear_episode_buffer()
+                        episode_idx -= 1
+                    else:
+                        logging.info(f"Saving episode {episode_idx}")
+                        dataset.save_episode()
+
+                # Reset for new episode
+                transition = reset_and_build_transition(env, env_processor, action_processor)
+
+            # Maintain fps timing
+            precise_sleep(max(dt - (time.perf_counter() - step_start_time), 0.0))
+    finally:
+        if dataset is not None and dataset.writer is not None and dataset.writer.image_writer is not None:
+            logging.info("Waiting for image writer to finish...")
+            dataset.writer.image_writer.stop()

    if dataset is not None and cfg.dataset.push_to_hub:
        logging.info("Finalizing dataset before pushing to hub")
@@ -51,6 +51,7 @@ import time
 from concurrent.futures import ThreadPoolExecutor
 from pathlib import Path
 from pprint import pformat
+from typing import Any

 import grpc
 import torch
@@ -68,10 +69,14 @@ from lerobot.common.train_utils import (
 )
 from lerobot.common.wandb_utils import WandBLogger
 from lerobot.configs import parser
-from lerobot.configs.train import TrainRLServerPipelineConfig
 from lerobot.datasets import LeRobotDataset, make_dataset
 from lerobot.policies import make_policy, make_pre_post_processors
-from lerobot.policies.sac.modeling_sac import SACPolicy
+from lerobot.rl.algorithms.base import RLAlgorithm
+from lerobot.rl.algorithms.factory import make_algorithm
+from lerobot.rl.buffer import ReplayBuffer
+from lerobot.rl.data_sources import OnlineOfflineMixer
+from lerobot.rl.train_rl import TrainRLServerPipelineConfig
+from lerobot.rl.trainer import RLTrainer
 from lerobot.robots import so_follower  # noqa: F401
 from lerobot.teleoperators import gamepad, so_leader  # noqa: F401
 from lerobot.teleoperators.utils import TeleopEvents
@@ -90,16 +95,14 @@ from lerobot.utils.constants import (
    TRAINING_STATE_DIR,
 )
 from lerobot.utils.device_utils import get_safe_torch_device
+from lerobot.utils.process import ProcessSignalHandler
 from lerobot.utils.random_utils import set_seed
-from lerobot.utils.transition import move_state_dict_to_device, move_transition_to_device
 from lerobot.utils.utils import (
    format_big_number,
    init_logging,
 )

-from .buffer import ReplayBuffer, concatenate_batch_transitions
 from .learner_service import MAX_WORKERS, SHUTDOWN_TIMEOUT, LearnerService
-from .process import ProcessSignalHandler


@parser.wrap()
@@ -179,7 +182,7 @@ def train(cfg: TrainRLServerPipelineConfig, job_name: str | None = None):
 def start_learner_threads(
    cfg: TrainRLServerPipelineConfig,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
 ) -> None:
    """
    Start the learner threads for training.
@@ -253,7 +256,7 @@ def start_learner_threads(
 def add_actor_information_and_train(
    cfg: TrainRLServerPipelineConfig,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    transition_queue: Queue,
    interaction_message_queue: Queue,
    parameters_queue: Queue,
@@ -266,8 +269,8 @@ def add_actor_information_and_train(
    - Transfers transitions from the actor to the replay buffer.
    - Logs received interaction messages.
    - Ensures training begins only when the replay buffer has a sufficient number of transitions.
-    - Samples batches from the replay buffer and performs multiple critic updates.
-    - Periodically updates the actor, critic, and temperature optimizers.
+    - Delegates training updates to an ``RLAlgorithm``.
+    - Periodically pushes updated weights to actors.
    - Logs training statistics, including loss values and optimization frequency.

    NOTE: This function doesn't have a single responsibility, it should be split into multiple functions
@@ -286,17 +289,13 @@ def add_actor_information_and_train(
    # of 7%
    device = get_safe_torch_device(try_device=cfg.policy.device, log=True)
    storage_device = get_safe_torch_device(try_device=cfg.policy.storage_device)
-    clip_grad_norm_value = cfg.policy.grad_clip_norm
    online_step_before_learning = cfg.policy.online_step_before_learning
-    utd_ratio = cfg.policy.utd_ratio
    fps = cfg.env.fps
    log_freq = cfg.log_freq
    save_freq = cfg.save_freq
-    policy_update_freq = cfg.policy.policy_update_freq
    policy_parameters_push_frequency = cfg.policy.actor_learner_config.policy_parameters_push_frequency
    saving_checkpoint = cfg.save_checkpoint
    online_steps = cfg.policy.online_steps
-    async_prefetch = cfg.policy.async_prefetch

    # Initialize logging for multiprocessing
    if not use_threads(cfg):
@@ -308,7 +307,7 @@ def add_actor_information_and_train(

    logging.info("Initializing policy")

-    policy: SACPolicy = make_policy(
+    policy = make_policy(
        cfg=cfg.policy,
        env_cfg=cfg.env,
    )
@@ -317,20 +316,17 @@ def add_actor_information_and_train(

    policy.train()

-    preprocessor, _postprocessor = make_pre_post_processors(
+    algorithm = make_algorithm(cfg=cfg.algorithm, policy=policy)
+
+    preprocessor, postprocessor = make_pre_post_processors(
        policy_cfg=cfg.policy,
        dataset_stats=cfg.policy.dataset_stats,
    )

-    push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)
-
+    # Push initial policy weights to actors
+    push_actor_policy_to_queue(parameters_queue=parameters_queue, algorithm=algorithm)
    last_time_policy_pushed = time.time()

-    optimizers, lr_scheduler = make_optimizers_and_scheduler(cfg=cfg, policy=policy)
-
-    # If we are resuming, we need to load the training state
-    resume_optimization_step, resume_interaction_step = load_training_state(cfg=cfg, optimizers=optimizers)
-
    log_training_info(cfg=cfg, policy=policy)

    replay_buffer = initialize_replay_buffer(cfg, device, storage_device)
@@ -343,21 +339,35 @@ def add_actor_information_and_train(
            device=device,
            storage_device=storage_device,
        )
-        batch_size: int = batch_size // 2  # We will sample from both replay buffer
+
+    # DataMixer: online-only or online/offline 50-50 mix
+    data_mixer = OnlineOfflineMixer(
+        online_buffer=replay_buffer,
+        offline_buffer=offline_replay_buffer,
+        online_ratio=cfg.online_ratio,
+    )
+    # RLTrainer owns the iterator, preprocessor, and creates optimizers.
+    trainer = RLTrainer(
+        algorithm=algorithm,
+        data_mixer=data_mixer,
+        batch_size=batch_size,
+        preprocessor=preprocessor,
+    )
+
+    # If we are resuming, we need to load the training state
+    optimizers = algorithm.get_optimizers()
+    resume_optimization_step, resume_interaction_step = load_training_state(cfg=cfg, optimizers=optimizers)

    logging.info("Starting learner thread")
    interaction_message = None
    optimization_step = resume_optimization_step if resume_optimization_step is not None else 0
+    algorithm.optimization_step = optimization_step
    interaction_step_shift = resume_interaction_step if resume_interaction_step is not None else 0

    dataset_repo_id = None
    if cfg.dataset is not None:
        dataset_repo_id = cfg.dataset.repo_id

-    # Initialize iterators
-    online_iterator = None
-    offline_iterator = None
-
    # NOTE: THIS IS THE MAIN LOOP OF THE LEARNER
    while True:
        # Exit the training loop if shutdown is requested
@@ -370,7 +380,6 @@ def add_actor_information_and_train(
            transition_queue=transition_queue,
            replay_buffer=replay_buffer,
            offline_replay_buffer=offline_replay_buffer,
-            device=device,
            dataset_repo_id=dataset_repo_id,
            shutdown_event=shutdown_event,
        )
@@ -387,180 +396,20 @@ def add_actor_information_and_train(
        if len(replay_buffer) < online_step_before_learning:
            continue

-        if online_iterator is None:
-            online_iterator = replay_buffer.get_iterator(
-                batch_size=batch_size, async_prefetch=async_prefetch, queue_size=2
-            )
-
-        if offline_replay_buffer is not None and offline_iterator is None:
-            offline_iterator = offline_replay_buffer.get_iterator(
-                batch_size=batch_size, async_prefetch=async_prefetch, queue_size=2
-            )
-
        time_for_one_optimization_step = time.time()
-        for _ in range(utd_ratio - 1):
-            # Sample from the iterators
-            batch = next(online_iterator)

-            if dataset_repo_id is not None:
-                batch_offline = next(offline_iterator)
-                batch = concatenate_batch_transitions(
-                    left_batch_transitions=batch, right_batch_transition=batch_offline
-                )
-
-            actions = batch[ACTION]
-            rewards = batch["reward"]
-            observations = preprocessor.process_observation(batch["state"])
-            next_observations = preprocessor.process_observation(batch["next_state"])
-            done = batch["done"]
-            check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
-
-            observation_features, next_observation_features = get_observation_features(
-                policy=policy, observations=observations, next_observations=next_observations
-            )
-
-            # Create a batch dictionary with all required elements for the forward method
-            forward_batch = {
-                ACTION: actions,
-                "reward": rewards,
-                "state": observations,
-                "next_state": next_observations,
-                "done": done,
-                "observation_feature": observation_features,
-                "next_observation_feature": next_observation_features,
-                "complementary_info": batch["complementary_info"],
-            }
-
-            # Use the forward method for critic loss
-            critic_output = policy.forward(forward_batch, model="critic")
-
-            # Main critic optimization
-            loss_critic = critic_output["loss_critic"]
-            optimizers["critic"].zero_grad()
-            loss_critic.backward()
-            critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                parameters=policy.critic_ensemble.parameters(), max_norm=clip_grad_norm_value
-            )
-            optimizers["critic"].step()
-
-            # Discrete critic optimization (if available)
-            if policy.config.num_discrete_actions is not None:
-                discrete_critic_output = policy.forward(forward_batch, model="discrete_critic")
-                loss_discrete_critic = discrete_critic_output["loss_discrete_critic"]
-                optimizers["discrete_critic"].zero_grad()
-                loss_discrete_critic.backward()
-                discrete_critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=policy.discrete_critic.parameters(), max_norm=clip_grad_norm_value
-                )
-                optimizers["discrete_critic"].step()
-
-            # Update target networks (main and discrete)
-            policy.update_target_networks()
-
-        # Sample for the last update in the UTD ratio
-        batch = next(online_iterator)
-
-        if dataset_repo_id is not None:
-            batch_offline = next(offline_iterator)
-            batch = concatenate_batch_transitions(
-                left_batch_transitions=batch, right_batch_transition=batch_offline
-            )
-
-        actions = batch[ACTION]
-        rewards = batch["reward"]
-        observations = preprocessor.process_observation(batch["state"])
-        next_observations = preprocessor.process_observation(batch["next_state"])
-        done = batch["done"]
-
-        check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
-
-        observation_features, next_observation_features = get_observation_features(
-            policy=policy, observations=observations, next_observations=next_observations
-        )
-
-        # Create a batch dictionary with all required elements for the forward method
-        forward_batch = {
-            ACTION: actions,
-            "reward": rewards,
-            "state": observations,
-            "next_state": next_observations,
-            "done": done,
-            "observation_feature": observation_features,
-            "next_observation_feature": next_observation_features,
-        }
-
-        critic_output = policy.forward(forward_batch, model="critic")
-
-        loss_critic = critic_output["loss_critic"]
-        optimizers["critic"].zero_grad()
-        loss_critic.backward()
-        critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-            parameters=policy.critic_ensemble.parameters(), max_norm=clip_grad_norm_value
-        ).item()
-        optimizers["critic"].step()
-
-        # Initialize training info dictionary
-        training_infos = {
-            "loss_critic": loss_critic.item(),
-            "critic_grad_norm": critic_grad_norm,
-        }
-
-        # Discrete critic optimization (if available)
-        if policy.config.num_discrete_actions is not None:
-            discrete_critic_output = policy.forward(forward_batch, model="discrete_critic")
-            loss_discrete_critic = discrete_critic_output["loss_discrete_critic"]
-            optimizers["discrete_critic"].zero_grad()
-            loss_discrete_critic.backward()
-            discrete_critic_grad_norm = torch.nn.utils.clip_grad_norm_(
-                parameters=policy.discrete_critic.parameters(), max_norm=clip_grad_norm_value
-            ).item()
-            optimizers["discrete_critic"].step()
-
-            # Add discrete critic info to training info
-            training_infos["loss_discrete_critic"] = loss_discrete_critic.item()
-            training_infos["discrete_critic_grad_norm"] = discrete_critic_grad_norm
-
-        # Actor and temperature optimization (at specified frequency)
-        if optimization_step % policy_update_freq == 0:
-            for _ in range(policy_update_freq):
-                # Actor optimization
-                actor_output = policy.forward(forward_batch, model="actor")
-                loss_actor = actor_output["loss_actor"]
-                optimizers["actor"].zero_grad()
-                loss_actor.backward()
-                actor_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=policy.actor.parameters(), max_norm=clip_grad_norm_value
-                ).item()
-                optimizers["actor"].step()
-
-                # Add actor info to training info
-                training_infos["loss_actor"] = loss_actor.item()
-                training_infos["actor_grad_norm"] = actor_grad_norm
-
-                # Temperature optimization
-                temperature_output = policy.forward(forward_batch, model="temperature")
-                loss_temperature = temperature_output["loss_temperature"]
-                optimizers["temperature"].zero_grad()
-                loss_temperature.backward()
-                temp_grad_norm = torch.nn.utils.clip_grad_norm_(
-                    parameters=[policy.log_alpha], max_norm=clip_grad_norm_value
-                ).item()
-                optimizers["temperature"].step()
-
-                # Add temperature info to training info
-                training_infos["loss_temperature"] = loss_temperature.item()
-                training_infos["temperature_grad_norm"] = temp_grad_norm
-                training_infos["temperature"] = policy.temperature
+        # One training step (trainer owns data_mixer iterator; algorithm owns UTD loop)
+        stats = trainer.training_step()

        # Push policy to actors if needed
        if time.time() - last_time_policy_pushed > policy_parameters_push_frequency:
-            push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)
+            push_actor_policy_to_queue(parameters_queue=parameters_queue, algorithm=algorithm)
            last_time_policy_pushed = time.time()

-        # Update target networks (main and discrete)
-        policy.update_target_networks()
+        training_infos = stats.to_log_dict()

        # Log training metrics at specified intervals
+        optimization_step = algorithm.optimization_step
        if optimization_step % log_freq == 0:
            training_infos["replay_buffer_size"] = len(replay_buffer)
            if offline_replay_buffer is not None:
@@ -588,7 +437,6 @@ def add_actor_information_and_train(
                custom_step_key="Optimization step",
            )

-        optimization_step += 1
        if optimization_step % log_freq == 0:
            logging.info(f"[LEARNER] Number of optimization step: {optimization_step}")

@@ -605,6 +453,8 @@ def add_actor_information_and_train(
                offline_replay_buffer=offline_replay_buffer,
                dataset_repo_id=dataset_repo_id,
                fps=fps,
+                preprocessor=preprocessor,
+                postprocessor=postprocessor,
            )


@@ -612,7 +462,7 @@ def start_learner(
    parameters_queue: Queue,
    transition_queue: Queue,
    interaction_message_queue: Queue,
-    shutdown_event: any,  # Event,
+    shutdown_event: Any,  # Event
    cfg: TrainRLServerPipelineConfig,
 ):
    """
@@ -689,6 +539,8 @@ def save_training_checkpoint(
    offline_replay_buffer: ReplayBuffer | None = None,
    dataset_repo_id: str | None = None,
    fps: int = 30,
+    preprocessor=None,
+    postprocessor=None,
 ) -> None:
    """
    Save training checkpoint and associated data.
@@ -712,6 +564,8 @@ def save_training_checkpoint(
        offline_replay_buffer: Optional offline replay buffer to save
        dataset_repo_id: Repository ID for dataset
        fps: Frames per second for dataset
+        preprocessor: Optional preprocessor pipeline to save
+        postprocessor: Optional postprocessor pipeline to save
    """
    logging.info(f"Checkpoint policy after step {optimization_step}")
    _num_digits = max(6, len(str(online_steps)))
@@ -728,6 +582,8 @@ def save_training_checkpoint(
        policy=policy,
        optimizer=optimizers,
        scheduler=None,
+        preprocessor=preprocessor,
+        postprocessor=postprocessor,
    )

    # Save interaction step manually
@@ -765,58 +621,6 @@ def save_training_checkpoint(
    logging.info("Resume training")


-def make_optimizers_and_scheduler(cfg: TrainRLServerPipelineConfig, policy: nn.Module):
-    """
-    Creates and returns optimizers for the actor, critic, and temperature components of a reinforcement learning policy.
-
-    This function sets up Adam optimizers for:
-    - The **actor network**, ensuring that only relevant parameters are optimized.
-    - The **critic ensemble**, which evaluates the value function.
-    - The **temperature parameter**, which controls the entropy in soft actor-critic (SAC)-like methods.
-
-    It also initializes a learning rate scheduler, though currently, it is set to `None`.
-
-    NOTE:
-    - If the encoder is shared, its parameters are excluded from the actor's optimization process.
-    - The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
-
-    Args:
-        cfg: Configuration object containing hyperparameters.
-        policy (nn.Module): The policy model containing the actor, critic, and temperature components.
-
-    Returns:
-        Tuple[Dict[str, torch.optim.Optimizer], Optional[torch.optim.lr_scheduler._LRScheduler]]:
-        A tuple containing:
-        - `optimizers`: A dictionary mapping component names ("actor", "critic", "temperature") to their respective Adam optimizers.
-        - `lr_scheduler`: Currently set to `None` but can be extended to support learning rate scheduling.
-
-    """
-    optimizer_actor = torch.optim.Adam(
-        params=[
-            p
-            for n, p in policy.actor.named_parameters()
-            if not policy.config.shared_encoder or not n.startswith("encoder")
-        ],
-        lr=cfg.policy.actor_lr,
-    )
-    optimizer_critic = torch.optim.Adam(params=policy.critic_ensemble.parameters(), lr=cfg.policy.critic_lr)
-
-    if cfg.policy.num_discrete_actions is not None:
-        optimizer_discrete_critic = torch.optim.Adam(
-            params=policy.discrete_critic.parameters(), lr=cfg.policy.critic_lr
-        )
-    optimizer_temperature = torch.optim.Adam(params=[policy.log_alpha], lr=cfg.policy.critic_lr)
-    lr_scheduler = None
-    optimizers = {
-        "actor": optimizer_actor,
-        "critic": optimizer_critic,
-        "temperature": optimizer_temperature,
-    }
-    if cfg.policy.num_discrete_actions is not None:
-        optimizers["discrete_critic"] = optimizer_discrete_critic
-    return optimizers, lr_scheduler
-
-
 # Training setup functions


@@ -1021,33 +825,6 @@ def initialize_offline_replay_buffer(
 # Utilities/Helpers functions


-def get_observation_features(
-    policy: SACPolicy, observations: torch.Tensor, next_observations: torch.Tensor
-) -> tuple[torch.Tensor | None, torch.Tensor | None]:
-    """
-    Get observation features from the policy encoder. It act as cache for the observation features.
-    when the encoder is frozen, the observation features are not updated.
-    We can save compute by caching the observation features.
-
-    Args:
-        policy: The policy model
-        observations: The current observations
-        next_observations: The next observations
-
-    Returns:
-        tuple: observation_features, next_observation_features
-    """
-
-    if policy.config.vision_encoder_name is None or not policy.config.freeze_vision_encoder:
-        return None, None
-
-    with torch.no_grad():
-        observation_features = policy.actor.encoder.get_cached_image_features(observations)
-        next_observation_features = policy.actor.encoder.get_cached_image_features(next_observations)
-
-    return observation_features, next_observation_features
-
-
 def use_threads(cfg: TrainRLServerPipelineConfig) -> bool:
    return cfg.policy.concurrency.learner == "threads"

@@ -1098,19 +875,11 @@ def check_nan_in_transition(
    return nan_detected


-def push_actor_policy_to_queue(parameters_queue: Queue, policy: nn.Module):
+def push_actor_policy_to_queue(parameters_queue: Queue, algorithm: RLAlgorithm) -> None:
    logging.debug("[LEARNER] Pushing actor policy to the queue")

    # Create a dictionary to hold all the state dicts
-    state_dicts = {"policy": move_state_dict_to_device(policy.actor.state_dict(), device="cpu")}
-
-    # Add discrete critic if it exists
-    if hasattr(policy, "discrete_critic") and policy.discrete_critic is not None:
-        state_dicts["discrete_critic"] = move_state_dict_to_device(
-            policy.discrete_critic.state_dict(), device="cpu"
-        )
-        logging.debug("[LEARNER] Including discrete critic in state dict push")
-
+    state_dicts = algorithm.get_weights()
    state_bytes = state_to_bytes(state_dicts)
    parameters_queue.put(state_bytes)

@@ -1134,9 +903,8 @@ def process_transitions(
    transition_queue: Queue,
    replay_buffer: ReplayBuffer,
    offline_replay_buffer: ReplayBuffer,
-    device: str,
    dataset_repo_id: str | None,
-    shutdown_event: any,
+    shutdown_event: Any,  # Event
 ):
    """Process all available transitions from the queue.

@@ -1144,7 +912,6 @@ def process_transitions(
        transition_queue: Queue for receiving transitions from the actor
        replay_buffer: Replay buffer to add transitions to
        offline_replay_buffer: Offline replay buffer to add transitions to
-        device: Device to move transitions to
        dataset_repo_id: Repository ID for dataset
        shutdown_event: Event to signal shutdown
    """
@@ -1153,8 +920,6 @@ def process_transitions(
        transition_list = bytes_to_transitions(buffer=transition_list)

        for transition in transition_list:
-            transition = move_transition_to_device(transition=transition, device=device)
-
            # Skip transitions with NaN values
            if check_nan_in_transition(
                observations=transition["state"],
@@ -1177,7 +942,7 @@ def process_interaction_messages(
    interaction_message_queue: Queue,
    interaction_step_shift: int,
    wandb_logger: WandBLogger | None,
-    shutdown_event: any,
+    shutdown_event: Any,  # Event
 ) -> dict | None:
    """Process all available interaction messages from the queue.

@@ -0,0 +1,49 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Top-level pipeline config for distributed RL training (actor / learner)."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from lerobot.configs.default import DatasetConfig
+from lerobot.configs.train import TrainPipelineConfig
+from lerobot.rl.algorithms.configs import RLAlgorithmConfig
+from lerobot.rl.algorithms.factory import make_algorithm_config
+from lerobot.rl.algorithms.sac import SACAlgorithmConfig  # noqa: F401
+
+
+@dataclass(kw_only=True)
+class TrainRLServerPipelineConfig(TrainPipelineConfig):
+    # NOTE: In RL, we don't need an offline dataset
+    # TODO: Make `TrainPipelineConfig.dataset` optional
+    dataset: DatasetConfig | None = None  # type: ignore[assignment] # because the parent class has made it's type non-optional
+
+    # Algorithm config.
+    algorithm: RLAlgorithmConfig | None = None
+
+    # Data mixer strategy name. Currently supports "online_offline".
+    mixer: str = "online_offline"
+    # Fraction sampled from online replay when using OnlineOfflineMixer.
+    online_ratio: float = 0.5
+
+    def validate(self) -> None:
+        super().validate()
+
+        if self.algorithm is None:
+            self.algorithm = make_algorithm_config("sac")
+
+        if getattr(self.algorithm, "policy_config", None) is None:
+            self.algorithm.policy_config = self.policy
@@ -0,0 +1,99 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import annotations
+
+from collections.abc import Iterator
+from typing import Any
+
+from lerobot.rl.algorithms.base import BatchType, RLAlgorithm
+from lerobot.rl.algorithms.configs import TrainingStats
+from lerobot.rl.data_sources.data_mixer import DataMixer
+
+
+class RLTrainer:
+    """Unified training step orchestrator.
+
+    Holds the algorithm, a DataMixer, and an optional preprocessor.
+    """
+
+    def __init__(
+        self,
+        algorithm: RLAlgorithm,
+        data_mixer: DataMixer,
+        batch_size: int,
+        *,
+        preprocessor: Any | None = None,
+    ):
+        self.algorithm = algorithm
+        self.data_mixer = data_mixer
+        self.batch_size = batch_size
+        self._preprocessor = preprocessor
+
+        self._iterator: Iterator[BatchType] | None = None
+
+        self.algorithm.make_optimizers_and_scheduler()
+
+    def _build_data_iterator(self) -> Iterator[BatchType]:
+        """Create a fresh algorithm-configured iterator (optionally preprocessed)."""
+        raw = self.algorithm.configure_data_iterator(
+            data_mixer=self.data_mixer,
+            batch_size=self.batch_size,
+        )
+        if self._preprocessor is not None:
+            return _PreprocessedIterator(raw, self._preprocessor)
+        return raw
+
+    def reset_data_iterator(self) -> None:
+        """Discard the current iterator so it will be rebuilt lazily next step."""
+        self._iterator = None
+
+    def set_data_mixer(self, data_mixer: DataMixer, *, reset: bool = True) -> None:
+        """Swap the active data mixer, optionally resetting the iterator."""
+        self.data_mixer = data_mixer
+        if reset:
+            self.reset_data_iterator()
+
+    def training_step(self) -> TrainingStats:
+        """Run one training step (algorithm-agnostic)."""
+        if self._iterator is None:
+            self._iterator = self._build_data_iterator()
+        return self.algorithm.update(self._iterator)
+
+
+def preprocess_rl_batch(preprocessor: Any, batch: BatchType) -> BatchType:
+    """Apply policy preprocessing to RL observations only."""
+    observations = batch["state"]
+    next_observations = batch["next_state"]
+    batch["state"] = preprocessor.process_observation(observations)
+    batch["next_state"] = preprocessor.process_observation(next_observations)
+
+    return batch
+
+
+class _PreprocessedIterator:
+    """Iterator wrapper that preprocesses each sampled RL batch."""
+
+    __slots__ = ("_raw", "_preprocessor")
+
+    def __init__(self, raw_iterator: Iterator[BatchType], preprocessor: Any) -> None:
+        self._raw = raw_iterator
+        self._preprocessor = preprocessor
+
+    def __iter__(self) -> _PreprocessedIterator:
+        return self
+
+    def __next__(self) -> BatchType:
+        batch = next(self._raw)
+        return preprocess_rl_batch(self._preprocessor, batch)
@@ -18,6 +18,7 @@ from dataclasses import dataclass, field
 from typing import Any

 import numpy as np
+import torch

 from lerobot.configs import FeatureType, PipelineFeatureType, PolicyFeature
 from lerobot.model import RobotKinematics
@@ -31,6 +32,7 @@ from lerobot.processor import (
    RobotObservation,
    TransitionKey,
 )
+from lerobot.utils.constants import OBS_STATE
 from lerobot.utils.rotation import Rotation


@@ -126,9 +128,18 @@ class EEReferenceAndDelta(RobotActionProcessorStep):
                ],
                dtype=float,
            )
-            r_abs = Rotation.from_rotvec([wx, wy, wz]).as_matrix()
+            delta_r = np.array(
+                [
+                    wx * self.end_effector_step_sizes.get("wx", 1),
+                    wy * self.end_effector_step_sizes.get("wy", 1),
+                    wz * self.end_effector_step_sizes.get("wz", 1),
+                ],
+                dtype=float,
+            )
+
+            r_mat = Rotation.from_rotvec(delta_r).as_matrix()
            desired = np.eye(4, dtype=float)
-            desired[:3, :3] = ref[:3, :3] @ r_abs
+            desired[:3, :3] = ref[:3, :3] @ r_mat
            desired[:3, 3] = ref[:3, 3] + delta_p

            self._command_when_disabled = desired.copy()
@@ -361,6 +372,8 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
    clip_min: float = 0.0
    clip_max: float = 100.0
    discrete_gripper: bool = False
+    scale_velocity: bool = False
+    use_ik_solution: bool = False

    def action(self, action: RobotAction) -> RobotAction:
        observation = self.transition.get(TransitionKey.OBSERVATION).copy()
@@ -370,14 +383,17 @@ class GripperVelocityToJoint(RobotActionProcessorStep):
        if observation is None:
            raise ValueError("Joints observation is require for computing robot kinematics")

-        q_raw = np.array(
-            [float(v) for k, v in observation.items() if isinstance(k, str) and k.endswith(".pos")],
-            dtype=float,
-        )
+        if self.use_ik_solution and "IK_solution" in self.transition.get(TransitionKey.COMPLEMENTARY_DATA):
+            q_raw = self.transition.get(TransitionKey.COMPLEMENTARY_DATA)["IK_solution"]
+        else:
+            q_raw = np.array(
+                [float(v) for k, v in observation.items() if isinstance(k, str) and k.endswith(".pos")],
+                dtype=float,
+            )
        if q_raw is None:
            raise ValueError("Joints observation is require for computing robot kinematics")

-        if self.discrete_gripper:
+        if self.discrete_gripper or self.scale_velocity:
            # Map discrete command {0=close, 1=stay, 2=open} -> signed velocity.
            # Negation accounts for SO100 sign (joint position increases on close).
            #   0 -> +clip_max (close), 1 -> 0 (stay), 2 -> -clip_max (open)
@@ -579,6 +595,7 @@ class InverseKinematicsRLStep(ProcessorStep):

        # Compute inverse kinematics
        q_target = self.kinematics.inverse_kinematics(self.q_curr, t_des)
+        q_target[-1] = gripper_pos  # Set gripper position
        self.q_curr = q_target

        # TODO: This is sentitive to order of motor_names = q_target mapping
@@ -610,3 +627,50 @@ class InverseKinematicsRLStep(ProcessorStep):
    def reset(self):
        """Resets the initial guess for the IK solver."""
        self.q_curr = None
+
+
+@dataclass
+@ProcessorStepRegistry.register("ee_observation")
+class EEObservationStep(ObservationProcessorStep):
+    use_rotation: bool = False
+
+    def observation(self, observation: dict) -> dict:
+        ee_pose_list = [
+            observation["ee.x"],
+            observation["ee.y"],
+            observation["ee.z"],
+        ]
+        if self.use_rotation:
+            ee_pose_list.extend(
+                [
+                    observation["ee.wx"],
+                    observation["ee.wy"],
+                    observation["ee.wz"],
+                ]
+            )
+        # gripper_pos = action.pop("ee.gripper_pos")
+        ee_pose = torch.tensor(ee_pose_list, dtype=torch.float32).unsqueeze(0)
+
+        current_state = observation.get(OBS_STATE)
+        if current_state is None:
+            return observation
+
+        extended_state = torch.cat([current_state, ee_pose], dim=-1)
+
+        # Create new observation dict
+        new_observation = dict(observation)
+        new_observation[OBS_STATE] = extended_state
+
+        return new_observation
+
+    def transform_features(
+        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
+    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
+        if OBS_STATE in features[PipelineFeatureType.OBSERVATION]:
+            original_feature = features[PipelineFeatureType.OBSERVATION][OBS_STATE]
+            new_shape = (original_feature.shape[0] + 3,) + original_feature.shape[1:]
+
+            features[PipelineFeatureType.OBSERVATION][OBS_STATE] = PolicyFeature(
+                type=original_feature.type, shape=new_shape
+            )
+        return features
@@ -168,6 +168,12 @@ class SOFollower(Robot):
                    self.bus.write("Protection_Current", motor, 250)  # 50% of max current to avoid burnout
                    self.bus.write("Overload_Torque", motor, 25)  # 25% torque when overloaded

+            # Set Goal_Position = Present_Position while torque is still disabled so
+            # that when torque is re-enabled at the end of this block the motors have
+            # zero positional error and do not snap to a stale register value.
+            present = self.bus.sync_read("Present_Position")
+            self.bus.sync_write("Goal_Position", present)
+
    def setup_motors(self) -> None:
        for motor in reversed(self.bus.motors):
            input(f"Connect the controller board to the '{motor}' motor only and press enter.")
@@ -0,0 +1,87 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Policy deployment engine with pluggable rollout strategies."""
+
+from lerobot.utils.import_utils import require_package
+
+require_package("datasets", extra="dataset")
+
+from .configs import (
+    BaseStrategyConfig,
+    DAggerKeyboardConfig,
+    DAggerPedalConfig,
+    DAggerStrategyConfig,
+    HighlightStrategyConfig,
+    RolloutConfig,
+    RolloutStrategyConfig,
+    SentryStrategyConfig,
+)
+from .context import (
+    DatasetContext,
+    HardwareContext,
+    PolicyContext,
+    ProcessorContext,
+    RolloutContext,
+    RuntimeContext,
+    build_rollout_context,
+)
+from .inference import (
+    InferenceEngine,
+    InferenceEngineConfig,
+    RTCInferenceConfig,
+    RTCInferenceEngine,
+    SyncInferenceConfig,
+    SyncInferenceEngine,
+    create_inference_engine,
+)
+from .strategies import (
+    BaseStrategy,
+    DAggerStrategy,
+    HighlightStrategy,
+    RolloutStrategy,
+    SentryStrategy,
+    create_strategy,
+)
+
+__all__ = [
+    "BaseStrategy",
+    "BaseStrategyConfig",
+    "DAggerKeyboardConfig",
+    "DAggerPedalConfig",
+    "DAggerStrategy",
+    "DAggerStrategyConfig",
+    "DatasetContext",
+    "HardwareContext",
+    "HighlightStrategy",
+    "HighlightStrategyConfig",
+    "InferenceEngine",
+    "InferenceEngineConfig",
+    "PolicyContext",
+    "ProcessorContext",
+    "RTCInferenceConfig",
+    "RTCInferenceEngine",
+    "RolloutConfig",
+    "RolloutContext",
+    "RolloutStrategy",
+    "RolloutStrategyConfig",
+    "RuntimeContext",
+    "SentryStrategy",
+    "SentryStrategyConfig",
+    "SyncInferenceConfig",
+    "SyncInferenceEngine",
+    "build_rollout_context",
+    "create_inference_engine",
+    "create_strategy",
+]
@@ -0,0 +1,323 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Configuration dataclasses for the rollout deployment engine."""
+
+from __future__ import annotations
+
+import abc
+import logging
+from dataclasses import dataclass, field
+
+import draccus
+
+from lerobot.configs import PreTrainedConfig, parser
+from lerobot.configs.dataset import DatasetRecordConfig
+from lerobot.robots.config import RobotConfig
+from lerobot.teleoperators.config import TeleoperatorConfig
+from lerobot.utils.device_utils import auto_select_torch_device, is_torch_device_available
+
+from .inference import InferenceEngineConfig, SyncInferenceConfig
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Strategy configs (polymorphic dispatch via draccus ChoiceRegistry)
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class RolloutStrategyConfig(draccus.ChoiceRegistry, abc.ABC):
+    """Abstract base for rollout strategy configurations.
+
+    Use ``--strategy.type=<name>`` on the CLI to select a strategy.
+    """
+
+    @property
+    def type(self) -> str:
+        return self.get_choice_name(self.__class__)
+
+
+@RolloutStrategyConfig.register_subclass("base")
+@dataclass
+class BaseStrategyConfig(RolloutStrategyConfig):
+    """Autonomous rollout with no data recording."""
+
+    pass
+
+
+@RolloutStrategyConfig.register_subclass("sentry")
+@dataclass
+class SentryStrategyConfig(RolloutStrategyConfig):
+    """Continuous autonomous rollout with always-on recording.
+
+    Episode duration is derived from camera resolution, FPS, and
+    ``target_video_file_size_mb`` so that each saved episode produces a
+    video file that has crossed the target size.  This aligns episode
+    boundaries with the dataset's video file chunking, so each
+    ``push_to_hub`` call uploads complete video files rather than
+    re-uploading a growing file that hasn't crossed the chunk boundary.
+    """
+
+    upload_every_n_episodes: int = 5
+    # Target video file size in MB for episode rotation.  Episodes are
+    # saved once the estimated video duration would exceed this limit.
+    # Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when set to None.
+    target_video_file_size_mb: int | None = None
+
+
+@RolloutStrategyConfig.register_subclass("highlight")
+@dataclass
+class HighlightStrategyConfig(RolloutStrategyConfig):
+    """Autonomous rollout with on-demand recording via ring buffer.
+
+    A memory-bounded ring buffer continuously captures telemetry.  When
+    the user presses the save key, the buffer contents are flushed to
+    the dataset and live recording continues until the key is pressed
+    again.
+    """
+
+    ring_buffer_seconds: float = 10.0
+    ring_buffer_max_memory_mb: int = 1024
+    save_key: str = "s"
+    push_key: str = "h"
+
+
+@dataclass
+class DAggerKeyboardConfig:
+    """Keyboard key bindings for DAgger controls.
+
+    Keys are specified as single characters (e.g. ``"c"``, ``"h"``) or
+    special key names (``"space"``).
+    """
+
+    pause_resume: str = "space"
+    correction: str = "tab"
+    upload: str = "enter"
+
+
+@dataclass
+class DAggerPedalConfig:
+    """Foot pedal configuration for DAgger controls.
+
+    Pedal codes are evdev key code strings (e.g. ``"KEY_A"``).
+    """
+
+    device_path: str = "/dev/input/by-id/usb-PCsensor_FootSwitch-event-kbd"
+    pause_resume: str = "KEY_A"
+    correction: str = "KEY_B"
+    upload: str = "KEY_C"
+
+
+@RolloutStrategyConfig.register_subclass("dagger")
+@dataclass
+class DAggerStrategyConfig(RolloutStrategyConfig):
+    """Human-in-the-loop data collection (DAgger / RaC).
+
+    Alternates between autonomous policy execution and human intervention.
+    Intervention frames are tagged with ``intervention=True``.
+
+    Input is controlled via either a keyboard or foot pedal, selected by
+    ``input_device``.  Each device exposes three actions:
+
+    1. **pause_resume** — toggle policy execution on/off.
+    2. **correction** — toggle human correction recording.
+    3. **upload** — push dataset to hub on demand (corrections-only mode).
+
+    When ``record_autonomous=False`` (default) only human-correction windows
+    are recorded — each correction becomes its own episode.  Set to ``True``
+    to record both autonomous and correction frames with size-based episode
+    rotation (same as Sentry) and background uploading.  ``push_to_hub`` is
+    blocked while a correction is in progress.
+    """
+
+    # Number of correction episodes to collect (corrections-only mode).
+    # When None, falls back to ``--dataset.num_episodes``.
+    num_episodes: int | None = None
+    record_autonomous: bool = False
+    upload_every_n_episodes: int = 5
+    # Target video file size in MB for episode rotation (record_autonomous
+    # mode only).  Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when None.
+    target_video_file_size_mb: int | None = None
+    input_device: str = "keyboard"
+    keyboard: DAggerKeyboardConfig = field(default_factory=DAggerKeyboardConfig)
+    pedal: DAggerPedalConfig = field(default_factory=DAggerPedalConfig)
+
+    def __post_init__(self):
+        if self.input_device not in ("keyboard", "pedal"):
+            raise ValueError(f"DAgger input_device must be 'keyboard' or 'pedal', got '{self.input_device}'")
+
+
+# ---------------------------------------------------------------------------
+# Top-level rollout config
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class RolloutConfig:
+    """Top-level configuration for the ``lerobot-rollout`` CLI.
+
+    Combines hardware, policy, strategy, and runtime settings.  The
+    ``__post_init__`` method performs fail-fast validation to reject
+    invalid flag combinations early.
+    """
+
+    # Hardware
+    robot: RobotConfig | None = None
+    teleop: TeleoperatorConfig | None = None
+
+    # Policy (loaded from --policy.path via __post_init__)
+    policy: PreTrainedConfig | None = None
+
+    # Strategy (polymorphic: --strategy.type=base|sentry|highlight|dagger)
+    strategy: RolloutStrategyConfig = field(default_factory=BaseStrategyConfig)
+
+    # Inference backend (polymorphic: --inference.type=sync|rtc)
+    inference: InferenceEngineConfig = field(default_factory=SyncInferenceConfig)
+
+    # Dataset (required for sentry, highlight, dagger; None for base)
+    dataset: DatasetRecordConfig | None = None
+
+    # Runtime
+    fps: float = 30.0
+    duration: float = 0.0  # 0 = infinite (24/7 mode)
+    interpolation_multiplier: int = 1
+    device: str | None = None
+    task: str = ""
+    display_data: bool = False
+    # Display data on a remote Rerun server
+    display_ip: str | None = None
+    # Port of the remote Rerun server
+    display_port: int | None = None
+    # Whether to display compressed images in Rerun
+    display_compressed_images: bool = False
+    # Use vocal synthesis to read events
+    play_sounds: bool = True
+    resume: bool = False
+    # Rename map for mapping robot/dataset observation keys to policy keys
+    rename_map: dict[str, str] = field(default_factory=dict)
+
+    # Hardware teardown
+    # When True (default), smoothly interpolate the robot back to the joint
+    # positions captured at startup before disconnecting.  Set to False to
+    # leave the robot in its final achieved pose at shutdown.
+    return_to_initial_position: bool = True
+
+    # Torch compile
+    use_torch_compile: bool = False
+    torch_compile_backend: str = "inductor"
+    torch_compile_mode: str = "default"
+    compile_warmup_inferences: int = 2
+
+    def __post_init__(self):
+        """Validate config invariants and load the policy config from ``--policy.path``."""
+        # --- Strategy-specific validation ---
+        if isinstance(self.strategy, DAggerStrategyConfig) and self.teleop is None:
+            raise ValueError("DAgger strategy requires --teleop.type to be set")
+
+        # TODO(Steven): DAgger shouldn't require a dataset (user may want to just rollout+intervene without recording), but for now we require it to simplify the implementation.
+        needs_dataset = isinstance(
+            self.strategy, (SentryStrategyConfig, HighlightStrategyConfig, DAggerStrategyConfig)
+        )
+        if needs_dataset and (self.dataset is None or not self.dataset.repo_id):
+            raise ValueError(f"{self.strategy.type} strategy requires --dataset.repo_id to be set")
+
+        if isinstance(self.strategy, BaseStrategyConfig) and self.dataset is not None:
+            raise ValueError(
+                "Base strategy does not record data. Use sentry, highlight, or dagger for recording."
+            )
+
+        # Sentry MUST use streaming encoding to avoid disk I/O blocking the control loop
+        if (
+            isinstance(self.strategy, SentryStrategyConfig)
+            and self.dataset is not None
+            and not self.dataset.streaming_encoding
+        ):
+            logger.warning("Sentry mode forces streaming_encoding=True")
+            self.dataset.streaming_encoding = True
+
+        # Highlight writes frames while the policy is still running, so streaming is mandatory.
+        if (
+            isinstance(self.strategy, HighlightStrategyConfig)
+            and self.dataset is not None
+            and not self.dataset.streaming_encoding
+        ):
+            logger.warning("Highlight mode forces streaming_encoding=True")
+            self.dataset.streaming_encoding = True
+
+        # DAgger: streaming is mandatory only when the autonomous phase is also recorded.
+        if isinstance(self.strategy, DAggerStrategyConfig) and self.dataset is not None:
+            if self.strategy.record_autonomous and not self.dataset.streaming_encoding:
+                logger.warning("DAgger with record_autonomous=True forces streaming_encoding=True")
+                self.dataset.streaming_encoding = True
+            elif not self.strategy.record_autonomous and not self.dataset.streaming_encoding:
+                logger.info(
+                    "Streaming encoding is disabled for DAgger corrections-only mode. "
+                    "Consider enabling it for faster episode saving: "
+                    "--dataset.streaming_encoding=true --dataset.encoder_threads=2"
+                )
+
+        # DAgger: resolve num_episodes from dataset config when not explicitly set.
+        if isinstance(self.strategy, DAggerStrategyConfig) and self.strategy.num_episodes is None:
+            if self.dataset is not None:
+                self.strategy.num_episodes = self.dataset.num_episodes
+                logger.info(
+                    "DAgger num_episodes not set — using --dataset.num_episodes=%d",
+                    self.strategy.num_episodes,
+                )
+            else:
+                raise ValueError(
+                    "DAgger num_episodes must be set either via --strategy.num_episodes or --dataset.num_episodes"
+                )
+
+        # --- Policy loading ---
+        if self.robot is None:
+            raise ValueError("--robot.type is required for rollout")
+
+        policy_path = parser.get_path_arg("policy")
+        if policy_path:
+            cli_overrides = parser.get_cli_overrides("policy")
+            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
+            self.policy.pretrained_path = policy_path
+        if self.policy is None:
+            raise ValueError("--policy.path is required for rollout")
+
+        # --- Task resolution ---
+        # When any --dataset.* flag is passed, draccus creates a DatasetRecordConfig with single_task="".
+        # If the user set the task via the top-level --task flag, propagate it so that all
+        # downstream consumers (inference engine, dataset frame builders) see it.
+        if self.dataset is not None and not self.dataset.single_task and self.task:
+            logger.info("Propagating top-level task '%s' to dataset config", self.task)
+            self.dataset.single_task = self.task
+        elif self.dataset is not None and self.dataset.single_task and not self.task:
+            logger.info("Propagating dataset single_task '%s' to top-level task", self.dataset.single_task)
+            self.task = self.dataset.single_task
+
+        # --- Device resolution ---
+        # Resolve device from the policy config when not explicitly set so all
+        # components (policy.to, preprocessor, inference engine) use the same
+        # device string instead of inconsistent fallbacks.
+        if self.device is None or not is_torch_device_available(self.device):
+            resolved = self.policy.device
+            if resolved:
+                self.device = resolved
+                logger.info("Resolved device from policy config: %s", self.device)
+            else:
+                self.device = auto_select_torch_device().type
+                logger.info("No policy config to resolve device from; auto-selected device: %s", self.device)
+
+    @classmethod
+    def __get_path_fields__(cls) -> list[str]:
+        return ["policy"]
@@ -0,0 +1,459 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Rollout context: shared state created once before strategy dispatch.
+
+Grouped into five topical sub-contexts — :class:`RuntimeContext`,
+:class:`HardwareContext`, :class:`PolicyContext`, :class:`ProcessorContext`,
+and :class:`DatasetContext` — assembled into :class:`RolloutContext`.
+"""
+
+from __future__ import annotations
+
+import logging
+from dataclasses import dataclass, field
+from threading import Event
+
+import torch
+
+from lerobot.configs import FeatureType, PreTrainedConfig
+from lerobot.datasets import (
+    LeRobotDataset,
+    aggregate_pipeline_dataset_features,
+    create_initial_features,
+)
+from lerobot.policies import get_policy_class, make_pre_post_processors
+from lerobot.policies.pretrained import PreTrainedPolicy
+from lerobot.processor import (
+    PolicyProcessorPipeline,
+    RobotAction,
+    RobotObservation,
+    RobotProcessorPipeline,
+    make_default_processors,
+    rename_stats,
+)
+from lerobot.processor.relative_action_processor import RelativeActionsProcessorStep
+from lerobot.robots import make_robot_from_config
+from lerobot.teleoperators import Teleoperator, make_teleoperator_from_config
+from lerobot.utils.feature_utils import combine_feature_dicts, hw_to_dataset_features
+
+from .configs import BaseStrategyConfig, DAggerStrategyConfig, RolloutConfig
+from .inference import (
+    InferenceEngine,
+    RTCInferenceConfig,
+    SyncInferenceConfig,
+    create_inference_engine,
+)
+from .robot_wrapper import ThreadSafeRobot
+
+logger = logging.getLogger(__name__)
+
+
+def _resolve_action_key_order(
+    policy_action_names: list[str] | None, dataset_action_names: list[str]
+) -> list[str]:
+    """Choose action name ordering for mapping policy tensor outputs to robot action dicts."""
+    if not policy_action_names:
+        return dataset_action_names
+    policy_action_names = list(policy_action_names)
+    if len(policy_action_names) != len(dataset_action_names):
+        logger.warning(
+            "policy.action_feature_names length (%d) != dataset action dim (%d); using dataset order",
+            len(policy_action_names),
+            len(dataset_action_names),
+        )
+        return dataset_action_names
+    if set(dataset_action_names) != set(policy_action_names):
+        logger.warning("policy.action_feature_names keys don't match dataset; using dataset order")
+        return dataset_action_names
+    return policy_action_names
+
+
+# ---------------------------------------------------------------------------
+# Sub-contexts
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class RuntimeContext:
+    """Runtime knobs shared with every strategy."""
+
+    cfg: RolloutConfig
+    shutdown_event: Event
+
+
+@dataclass
+class HardwareContext:
+    """Connected hardware.
+
+    The raw robot is available via ``robot_wrapper.inner`` when needed
+    (e.g. for disconnect); strategies should otherwise go through the
+    thread-safe wrapper.
+
+    ``initial_position`` stores the robot's joint positions at connect
+    time.  Strategies use it to return the robot to a safe pose before
+    shutting down.
+    """
+
+    robot_wrapper: ThreadSafeRobot
+    teleop: Teleoperator | None
+    initial_position: dict | None = None
+
+
+@dataclass
+class PolicyContext:
+    """Loaded policy and its inference engine."""
+
+    policy: PreTrainedPolicy
+    preprocessor: PolicyProcessorPipeline
+    postprocessor: PolicyProcessorPipeline
+    inference: InferenceEngine
+
+
+@dataclass
+class ProcessorContext:
+    """Robot-side pipelines (run outside the policy)."""
+
+    teleop_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
+    robot_action_processor: RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]
+    robot_observation_processor: RobotProcessorPipeline[RobotObservation, RobotObservation]
+
+
+@dataclass
+class DatasetContext:
+    """Dataset and feature bookkeeping."""
+
+    dataset: LeRobotDataset | None
+    dataset_features: dict = field(default_factory=dict)
+    hw_features: dict = field(default_factory=dict)
+    ordered_action_keys: list[str] = field(default_factory=list)
+
+
+@dataclass
+class RolloutContext:
+    """Bundle of sub-contexts passed to every rollout strategy.
+
+    Built once by :func:`build_rollout_context` before strategy dispatch.
+    """
+
+    runtime: RuntimeContext
+    hardware: HardwareContext
+    policy: PolicyContext
+    processors: ProcessorContext
+    data: DatasetContext
+
+
+# ---------------------------------------------------------------------------
+# Build
+# ---------------------------------------------------------------------------
+
+
+def build_rollout_context(
+    cfg: RolloutConfig,
+    shutdown_event: Event,
+    teleop_action_processor: RobotProcessorPipeline | None = None,
+    robot_action_processor: RobotProcessorPipeline | None = None,
+    robot_observation_processor: RobotProcessorPipeline | None = None,
+) -> RolloutContext:
+    """Wire up policy, processors, hardware, dataset, and inference engine.
+
+    The order is policy-first / hardware-last so a bad ``--policy.path``
+    fails fast without touching the robot.
+    """
+    is_rtc = isinstance(cfg.inference, RTCInferenceConfig)
+
+    # --- 1. Policy (heavy I/O, but no hardware yet) -------------------
+    logger.info("Loading policy from '%s'...", cfg.policy.pretrained_path)
+    policy_config = cfg.policy
+    policy_class = get_policy_class(policy_config.type)
+
+    full_config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
+    for attr in ("device", "use_amp"):
+        if hasattr(cfg.policy, attr) and hasattr(full_config, attr):
+            cli_val = getattr(cfg.policy, attr)
+            if cli_val is not None:
+                setattr(full_config, attr, cli_val)
+
+    if hasattr(full_config, "compile_model"):
+        full_config.compile_model = cfg.use_torch_compile
+
+    if full_config.type == "vqbet" and cfg.device == "mps":
+        raise NotImplementedError(
+            "Current implementation of VQBeT does not support `mps` backend. "
+            "Please use `cpu` or `cuda` backend."
+        )
+
+    if full_config.use_peft:
+        from peft import PeftConfig, PeftModel
+
+        peft_path = cfg.policy.pretrained_path
+        peft_config = PeftConfig.from_pretrained(peft_path)
+        policy = policy_class.from_pretrained(
+            pretrained_name_or_path=peft_config.base_model_name_or_path, config=full_config
+        )
+        policy = PeftModel.from_pretrained(policy, peft_path, config=peft_config)
+    else:
+        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=full_config)
+
+    if is_rtc:
+        policy.config.rtc_config = cfg.inference.rtc
+        if hasattr(policy, "init_rtc_processor"):
+            policy.init_rtc_processor()
+
+    policy = policy.to(cfg.device)
+    policy.eval()
+    logger.info("Policy loaded: type=%s, device=%s", policy_config.type, cfg.device)
+
+    if cfg.use_torch_compile and policy.type not in ("pi0", "pi05"):
+        try:
+            if hasattr(torch, "compile"):
+                compile_kwargs = {
+                    "backend": cfg.torch_compile_backend,
+                    "mode": cfg.torch_compile_mode,
+                    "options": {"triton.cudagraphs": False},
+                }
+                policy.predict_action_chunk = torch.compile(policy.predict_action_chunk, **compile_kwargs)
+                logger.info("torch.compile applied to predict_action_chunk")
+        except Exception as e:
+            logger.warning("Failed to apply torch.compile: %s", e)
+
+    # --- 2. Robot-side processors (user-supplied or defaults) --------
+    if (
+        teleop_action_processor is None
+        or robot_action_processor is None
+        or robot_observation_processor is None
+    ):
+        _t, _r, _o = make_default_processors()
+        teleop_action_processor = teleop_action_processor or _t
+        robot_action_processor = robot_action_processor or _r
+        robot_observation_processor = robot_observation_processor or _o
+
+    # --- 3. Hardware (heaviest side-effect, deferred) -----------------
+    logger.info("Connecting robot (%s)...", cfg.robot.type if cfg.robot else "?")
+    robot = make_robot_from_config(cfg.robot)
+    robot.connect()
+    logger.info("Robot connected: %s", robot.name)
+
+    # Store the initial joint positions so we can return to a safe pose on shutdown.
+    initial_obs = robot.get_observation()
+    initial_position = {k: v for k, v in initial_obs.items() if k.endswith(".pos")}
+    logger.info("Captured initial robot position (%d keys)", len(initial_position))
+
+    robot_wrapper = ThreadSafeRobot(robot)
+
+    teleop = None
+    if cfg.teleop is not None:
+        logger.info("Connecting teleoperator (%s)...", cfg.teleop.type if cfg.teleop else "?")
+        teleop = make_teleoperator_from_config(cfg.teleop)
+        teleop.connect()
+        logger.info("Teleoperator connected")
+
+    # TODO(Steven): once Teleoperator motor-control methods are standardised
+    # (``enable_torque`` / ``disable_torque`` / ``write_goal_positions``), gate
+    # the DAgger strategy on their presence here and fail fast with a helpful
+    # message instead of relying on the operator to pre-align the leader by
+    # hand.  See :func:`DAggerStrategy._apply_transition` for the matching
+    # disabled call sites.
+    # if isinstance(cfg.strategy, DAggerStrategyConfig) and teleop is not None:
+    #     required_teleop_methods = ("enable_torque", "disable_torque", "write_goal_positions")
+    #     missing = [m for m in required_teleop_methods if not callable(getattr(teleop, m, None))]
+    #     if missing:
+    #         teleop.disconnect()
+    #         raise ValueError(
+    #             f"DAgger strategy requires a teleoperator with motor control methods "
+    #             f"{required_teleop_methods}. '{type(teleop).__name__}' is missing: {missing}"
+    #         )
+
+    # --- 4. Features + action-key reconciliation ---------------------
+    # TODO(Steven):Only ``.pos`` joint features are routed to the policy as state and as the
+    # action target; velocity and torque channels (when present) are kept in
+    # the raw observation but excluded from the policy-facing tensors.
+    all_obs_features = robot.observation_features
+    # ``observation_features`` values are either a tuple (camera shape) or the
+    # ``float`` type itself used as a sentinel for scalar motor features —
+    # see ``dict[str, type | tuple]`` annotation on ``Robot.observation_features``.
+    observation_features_hw = {
+        k: v
+        for k, v in all_obs_features.items()
+        if isinstance(v, tuple) or (v is float and k.endswith(".pos"))
+    }
+    action_features_hw = {k: v for k, v in robot.action_features.items() if k.endswith(".pos")}
+
+    # The action side is always needed: sync inference reads action names from
+    # ``dataset_features[ACTION]`` to map policy tensors back to robot actions.
+    action_dataset_features = aggregate_pipeline_dataset_features(
+        pipeline=teleop_action_processor,
+        initial_features=create_initial_features(action=action_features_hw),
+        use_videos=cfg.dataset.video if cfg.dataset else True,
+    )
+    # Observation-side aggregation is needed because of build_dataset_frame
+    observation_dataset_features = aggregate_pipeline_dataset_features(
+        pipeline=robot_observation_processor,
+        initial_features=create_initial_features(observation=observation_features_hw),
+        use_videos=cfg.dataset.video if cfg.dataset else True,
+    )
+    dataset_features = combine_feature_dicts(action_dataset_features, observation_dataset_features)
+    hw_features = hw_to_dataset_features(observation_features_hw, "observation")
+    raw_action_keys = list(action_features_hw.keys())
+    policy_action_names = getattr(policy_config, "action_feature_names", None)
+    ordered_action_keys = _resolve_action_key_order(
+        list(policy_action_names) if policy_action_names else None,
+        raw_action_keys,
+    )
+
+    # Validate visual features if no rename_map is active
+    rename_map = cfg.rename_map
+    if not rename_map:
+        expected_visuals = {k for k, v in full_config.input_features.items() if v.type == FeatureType.VISUAL}
+        provided_visuals = {
+            f"observation.images.{k}" for k, v in robot.observation_features.items() if isinstance(v, tuple)
+        }
+        policy_subset = expected_visuals.issubset(provided_visuals)
+        hw_subset = provided_visuals.issubset(expected_visuals)
+        if not (policy_subset or hw_subset):
+            raise ValueError(
+                f"Visual feature mismatch between policy and robot hardware.\n"
+                f"Policy expects: {expected_visuals}\n"
+                f"Robot provides: {provided_visuals}"
+            )
+
+    # --- 5. Dataset -------------
+    dataset = None
+    if cfg.dataset is not None and not isinstance(cfg.strategy, BaseStrategyConfig):
+        logger.info("Setting up dataset (repo_id=%s)...", cfg.dataset.repo_id)
+        if cfg.resume:
+            dataset = LeRobotDataset.resume(
+                cfg.dataset.repo_id,
+                root=cfg.dataset.root,
+                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
+                vcodec=cfg.dataset.vcodec,
+                streaming_encoding=cfg.dataset.streaming_encoding,
+                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
+                encoder_threads=cfg.dataset.encoder_threads,
+                image_writer_processes=cfg.dataset.num_image_writer_processes,
+                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
+                * len(robot.cameras if hasattr(robot, "cameras") else []),
+            )
+        else:
+            if isinstance(cfg.strategy, DAggerStrategyConfig):
+                dataset_features["intervention"] = {
+                    "dtype": "bool",
+                    "shape": (1,),
+                    "names": None,
+                }
+
+            repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
+            if not repo_name.startswith("rollout_"):
+                raise ValueError(
+                    "Dataset names for rollout must start with 'rollout_'. "
+                    "Use --dataset.repo_id=<user>/rollout_<name> for policy deployment datasets."
+                )
+            cfg.dataset.stamp_repo_id()
+            target_video_mb = getattr(cfg.strategy, "target_video_file_size_mb", None)
+            dataset = LeRobotDataset.create(
+                cfg.dataset.repo_id,
+                cfg.dataset.fps,
+                root=cfg.dataset.root,
+                robot_type=robot.name,
+                features=dataset_features,
+                use_videos=cfg.dataset.video,
+                image_writer_processes=cfg.dataset.num_image_writer_processes,
+                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
+                * len(robot.cameras if hasattr(robot, "cameras") else []),
+                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
+                vcodec=cfg.dataset.vcodec,
+                streaming_encoding=cfg.dataset.streaming_encoding,
+                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
+                encoder_threads=cfg.dataset.encoder_threads,
+                video_files_size_in_mb=target_video_mb,
+            )
+
+    if dataset is not None:
+        logger.info("Dataset ready: %s (%d existing episodes)", dataset.repo_id, dataset.num_episodes)
+
+    # --- 6. Policy pre/post processors (needs dataset stats if any) ---
+    dataset_stats = None
+    if dataset is not None:
+        dataset_stats = rename_stats(
+            dataset.meta.stats,
+            cfg.rename_map,
+        )
+
+    preprocessor, postprocessor = make_pre_post_processors(
+        policy_cfg=policy_config,
+        pretrained_path=cfg.policy.pretrained_path,
+        dataset_stats=dataset_stats,
+        preprocessor_overrides={
+            "device_processor": {"device": cfg.device},
+            "rename_observations_processor": {"rename_map": cfg.rename_map},
+        },
+    )
+
+    if isinstance(cfg.inference, SyncInferenceConfig) and any(
+        isinstance(step, RelativeActionsProcessorStep) and step.enabled
+        for step in getattr(preprocessor, "steps", ())
+    ):
+        raise NotImplementedError(
+            "SyncInferenceEngine does not support policies with relative actions for now."
+            "Use --inference.type=rtc or remove relative action processor steps from the policy pipeline."
+        )
+
+    # --- 7. Inference strategy (needs policy + pre/post + hardware) --
+    logger.info(
+        "Creating inference engine (type=%s)...",
+        cfg.inference.type if hasattr(cfg.inference, "type") else "sync",
+    )
+    task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
+    inference_strategy = create_inference_engine(
+        cfg.inference,
+        policy=policy,
+        preprocessor=preprocessor,
+        postprocessor=postprocessor,
+        robot_wrapper=robot_wrapper,
+        hw_features=hw_features,
+        dataset_features=dataset_features,
+        ordered_action_keys=ordered_action_keys,
+        task=task_str,
+        fps=cfg.fps,
+        device=cfg.device,
+        use_torch_compile=cfg.use_torch_compile,
+        compile_warmup_inferences=cfg.compile_warmup_inferences,
+        shutdown_event=shutdown_event,
+    )
+
+    # --- 8. Assemble ---------------------------------------------------
+    logger.info("Rollout context assembled successfully")
+    return RolloutContext(
+        runtime=RuntimeContext(cfg=cfg, shutdown_event=shutdown_event),
+        hardware=HardwareContext(
+            robot_wrapper=robot_wrapper, teleop=teleop, initial_position=initial_position
+        ),
+        policy=PolicyContext(
+            policy=policy,
+            preprocessor=preprocessor,
+            postprocessor=postprocessor,
+            inference=inference_strategy,
+        ),
+        processors=ProcessorContext(
+            teleop_action_processor=teleop_action_processor,
+            robot_action_processor=robot_action_processor,
+            robot_observation_processor=robot_observation_processor,
+        ),
+        data=DatasetContext(
+            dataset=dataset,
+            dataset_features=dataset_features,
+            hw_features=hw_features,
+            ordered_action_keys=ordered_action_keys,
+        ),
+    )
@@ -0,0 +1,39 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Inference engine package — backend-agnostic action production.
+
+Concrete backends (``sync``, ``rtc``, ...) expose the same small interface so
+rollout strategies never branch on which backend is in use.
+"""
+
+from .base import InferenceEngine
+from .factory import (
+    InferenceEngineConfig,
+    RTCInferenceConfig,
+    SyncInferenceConfig,
+    create_inference_engine,
+)
+from .rtc import RTCInferenceEngine
+from .sync import SyncInferenceEngine
+
+__all__ = [
+    "InferenceEngine",
+    "InferenceEngineConfig",
+    "RTCInferenceConfig",
+    "RTCInferenceEngine",
+    "SyncInferenceConfig",
+    "SyncInferenceEngine",
+    "create_inference_engine",
+]
@@ -0,0 +1,89 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Inference engine ABC.
+
+Rollout strategies consume actions through this small interface so they
+do not need to know whether inference happens inline on the control thread
+or asynchronously in a background thread (RTC).
+"""
+
+from __future__ import annotations
+
+import abc
+
+import torch
+
+
+class InferenceEngine(abc.ABC):
+    """Abstract backend for producing actions during rollout.
+
+    Subclasses decide whether inference happens inline on the control
+    thread or asynchronously in a background thread.  The contract is
+    minimal so additional backends can be plugged in without touching
+    rollout strategies.
+
+    Lifecycle
+    ---------
+    ``start`` — prepare the backend (e.g. launch a background thread).
+    ``stop`` — shut the backend down cleanly.
+    ``reset`` — clear episode-scoped state (policy hidden state, queues…).
+
+    Action production
+    -----------------
+    ``get_action(obs_frame)`` — return the next action tensor, or
+    ``None`` if none is available (e.g. async queue empty).  Sync
+    backends always compute from ``obs_frame``; async backends ignore
+    it (they receive observations via ``notify_observation``).
+
+    Optional hooks
+    --------------
+    ``notify_observation`` / ``pause`` / ``resume`` have a no-op default
+    so rollout strategies can invoke them unconditionally.
+    """
+
+    @abc.abstractmethod
+    def start(self) -> None:
+        """Initialise the backend."""
+
+    @abc.abstractmethod
+    def stop(self) -> None:
+        """Tear the backend down."""
+
+    @abc.abstractmethod
+    def reset(self) -> None:
+        """Clear episode-scoped state."""
+
+    @abc.abstractmethod
+    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
+        """Return the next action tensor, or ``None`` if unavailable."""
+
+    def notify_observation(self, obs: dict) -> None:  # noqa: B027
+        """Publish the latest processed observation.  Default: no-op."""
+
+    def pause(self) -> None:  # noqa: B027
+        """Pause background inference.  Default: no-op."""
+
+    def resume(self) -> None:  # noqa: B027
+        """Resume background inference.  Default: no-op."""
+
+    @property
+    def ready(self) -> bool:
+        """True once the backend can produce actions (e.g. warmup done)."""
+        return True
+
+    @property
+    def failed(self) -> bool:
+        """True if an unrecoverable error occurred in the backend."""
+        return False
@@ -0,0 +1,128 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Inference engine configs and factory.
+
+Selection is explicit via ``--inference.type=sync|rtc``.  Adding a new
+backend requires registering its config subclass and dispatching it in
+:func:`create_inference_engine`.
+"""
+
+from __future__ import annotations
+
+import abc
+import logging
+from dataclasses import dataclass, field
+from threading import Event
+
+import draccus
+
+from lerobot.policies.pretrained import PreTrainedPolicy
+from lerobot.policies.rtc.configuration_rtc import RTCConfig
+from lerobot.processor import PolicyProcessorPipeline
+
+from ..robot_wrapper import ThreadSafeRobot
+from .base import InferenceEngine
+from .rtc import RTCInferenceEngine
+from .sync import SyncInferenceEngine
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Configs
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class InferenceEngineConfig(draccus.ChoiceRegistry, abc.ABC):
+    """Abstract base for inference backend configuration.
+
+    Use ``--inference.type=<name>`` on the CLI to select a backend.
+    """
+
+    @property
+    def type(self) -> str:
+        return self.get_choice_name(self.__class__)
+
+
+@InferenceEngineConfig.register_subclass("sync")
+@dataclass
+class SyncInferenceConfig(InferenceEngineConfig):
+    """Inline synchronous inference (one policy call per control tick)."""
+
+
+@InferenceEngineConfig.register_subclass("rtc")
+@dataclass
+class RTCInferenceConfig(InferenceEngineConfig):
+    """Real-Time Chunking: async policy inference in a background thread."""
+
+    # Eagerly constructed so draccus exposes nested fields directly on the CLI
+    # (e.g. ``--inference.rtc.execution_horizon=...``).
+    rtc: RTCConfig = field(default_factory=RTCConfig)
+    queue_threshold: int = 30
+
+
+# ---------------------------------------------------------------------------
+# Factory
+# ---------------------------------------------------------------------------
+
+
+def create_inference_engine(
+    config: InferenceEngineConfig,
+    *,
+    policy: PreTrainedPolicy,
+    preprocessor: PolicyProcessorPipeline,
+    postprocessor: PolicyProcessorPipeline,
+    robot_wrapper: ThreadSafeRobot,
+    hw_features: dict,
+    dataset_features: dict,
+    ordered_action_keys: list[str],
+    task: str,
+    fps: float,
+    device: str | None,
+    use_torch_compile: bool = False,
+    compile_warmup_inferences: int = 2,
+    shutdown_event: Event | None = None,
+) -> InferenceEngine:
+    """Instantiate the appropriate inference engine from a config object."""
+    logger.info("Creating inference engine: %s", config.type)
+    if isinstance(config, SyncInferenceConfig):
+        return SyncInferenceEngine(
+            policy=policy,
+            preprocessor=preprocessor,
+            postprocessor=postprocessor,
+            dataset_features=dataset_features,
+            ordered_action_keys=ordered_action_keys,
+            task=task,
+            device=device,
+            robot_type=robot_wrapper.robot_type,
+        )
+    if isinstance(config, RTCInferenceConfig):
+        return RTCInferenceEngine(
+            policy=policy,
+            preprocessor=preprocessor,
+            postprocessor=postprocessor,
+            robot_wrapper=robot_wrapper,
+            rtc_config=config.rtc,
+            hw_features=hw_features,
+            task=task,
+            fps=fps,
+            device=device,
+            use_torch_compile=use_torch_compile,
+            compile_warmup_inferences=compile_warmup_inferences,
+            rtc_queue_threshold=config.queue_threshold,
+            shutdown_event=shutdown_event,
+        )
+    raise ValueError(f"Unknown inference engine type: {type(config).__name__}")
@@ -0,0 +1,360 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Real-Time Chunking inference engine.
+
+A background thread produces action chunks asynchronously via
+:meth:`policy.predict_action_chunk`.  The main control loop polls
+``get_action`` for the next ready action; observations flow the other
+way via ``notify_observation``.
+"""
+
+from __future__ import annotations
+
+import logging
+import math
+import time
+import traceback
+from threading import Event, Lock, Thread
+from typing import Any
+
+import torch
+
+from lerobot.policies.pretrained import PreTrainedPolicy
+from lerobot.policies.rtc import ActionQueue, LatencyTracker, reanchor_relative_rtc_prefix
+from lerobot.policies.rtc.configuration_rtc import RTCConfig
+from lerobot.policies.utils import prepare_observation_for_inference
+from lerobot.processor import (
+    NormalizerProcessorStep,
+    PolicyProcessorPipeline,
+    RelativeActionsProcessorStep,
+)
+from lerobot.utils.feature_utils import build_dataset_frame
+
+from ..robot_wrapper import ThreadSafeRobot
+from .base import InferenceEngine
+
+logger = logging.getLogger(__name__)
+
+# How long the RTC loop sleeps when paused, idle, or backpressured by a full queue.
+_RTC_IDLE_SLEEP_S: float = 0.01
+# Backoff between transient inference errors (per consecutive failure).
+_RTC_ERROR_RETRY_DELAY_S: float = 0.5
+# Consecutive transient errors tolerated before giving up and propagating shutdown.
+_RTC_MAX_CONSECUTIVE_ERRORS: int = 10
+# Hard timeout for joining the RTC thread on stop().
+_RTC_JOIN_TIMEOUT_S: float = 3.0
+
+
+# ---------------------------------------------------------------------------
+# RTC helpers
+# ---------------------------------------------------------------------------
+
+
+def _normalize_prev_actions_length(prev_actions: torch.Tensor, target_steps: int) -> torch.Tensor:
+    """Pad or truncate RTC prefix actions to a fixed length for stable compiled inference."""
+    if prev_actions.ndim != 2:
+        raise ValueError(f"Expected 2D [T, A] tensor, got shape={tuple(prev_actions.shape)}")
+    steps, action_dim = prev_actions.shape
+    if steps == target_steps:
+        return prev_actions
+    if steps > target_steps:
+        return prev_actions[:target_steps]
+    padded = torch.zeros((target_steps, action_dim), dtype=prev_actions.dtype, device=prev_actions.device)
+    padded[:steps] = prev_actions
+    return padded
+
+
+# ---------------------------------------------------------------------------
+# RTCInferenceEngine
+# ---------------------------------------------------------------------------
+
+
+class RTCInferenceEngine(InferenceEngine):
+    """Async RTC inference: a background thread produces action chunks.
+
+    ``get_action`` pops the next action from the shared queue (or
+    returns ``None`` if the queue is empty).  The main loop should call
+    ``notify_observation`` every tick and ``pause``/``resume`` around
+    human-intervention phases.
+    """
+
+    def __init__(
+        self,
+        policy: PreTrainedPolicy,
+        preprocessor: PolicyProcessorPipeline,
+        postprocessor: PolicyProcessorPipeline,
+        robot_wrapper: ThreadSafeRobot,
+        rtc_config: RTCConfig,
+        hw_features: dict,
+        task: str,
+        fps: float,
+        device: str | None,
+        use_torch_compile: bool = False,
+        compile_warmup_inferences: int = 2,
+        rtc_queue_threshold: int = 30,
+        shutdown_event: Event | None = None,
+    ) -> None:
+        self._policy = policy
+        self._preprocessor = preprocessor
+        self._postprocessor = postprocessor
+        self._robot = robot_wrapper
+        self._rtc_config = rtc_config
+        self._hw_features = hw_features
+        self._task = task
+        self._fps = fps
+        self._device = device or "cpu"
+        self._use_torch_compile = use_torch_compile
+        self._compile_warmup_inferences = compile_warmup_inferences
+        self._rtc_queue_threshold = rtc_queue_threshold
+
+        self._action_queue: ActionQueue | None = None
+        self._obs_holder: dict[str, Any] = {}
+        self._obs_lock = Lock()
+        self._policy_active = Event()
+        self._compile_warmup_done = Event()
+        self._shutdown_event = Event()
+        self._rtc_error = Event()
+        self._global_shutdown_event = shutdown_event
+        self._rtc_thread: Thread | None = None
+
+        if not self._use_torch_compile:
+            self._compile_warmup_done.set()
+            logger.info("RTCInferenceEngine initialized (torch.compile disabled, no warmup needed)")
+        else:
+            logger.info(
+                "RTCInferenceEngine initialized (torch.compile enabled, %d warmup inferences)",
+                compile_warmup_inferences,
+            )
+
+        # Processor introspection for relative-action re-anchoring.
+        self._relative_step = next(
+            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
+            None,
+        )
+        self._normalizer_step = next(
+            (s for s in preprocessor.steps if isinstance(s, NormalizerProcessorStep)),
+            None,
+        )
+        if self._relative_step is not None:
+            if self._relative_step.action_names is None:
+                cfg_names = getattr(policy.config, "action_feature_names", None)
+                if cfg_names:
+                    self._relative_step.action_names = list(cfg_names)
+                else:
+                    self._relative_step.action_names = [
+                        k for k in robot_wrapper.action_features if k.endswith(".pos")
+                    ]
+            logger.info("Relative actions enabled: RTC prefix will be re-anchored")
+
+    # ------------------------------------------------------------------
+    # Lifecycle
+    # ------------------------------------------------------------------
+
+    @property
+    def ready(self) -> bool:
+        """True once torch.compile warmup is complete (or immediately if compile is disabled)."""
+        return self._compile_warmup_done.is_set()
+
+    @property
+    def failed(self) -> bool:
+        """True if the RTC background thread exited due to an unrecoverable error."""
+        return self._rtc_error.is_set()
+
+    @property
+    def action_queue(self) -> ActionQueue | None:
+        """The shared action queue between the RTC thread and the main loop."""
+        return self._action_queue
+
+    def start(self) -> None:
+        """Launch the RTC background thread."""
+        self._action_queue = ActionQueue(self._rtc_config)
+        self._obs_holder = {
+            "obs": None,
+            "robot_type": self._robot.robot_type,
+        }
+        self._shutdown_event.clear()
+        self._rtc_thread = Thread(
+            target=self._rtc_loop,
+            daemon=True,
+            name="RTCInference",
+        )
+        self._rtc_thread.start()
+        logger.info("RTC inference thread started")
+
+    def stop(self) -> None:
+        """Signal the RTC thread to stop and wait for it."""
+        logger.info("Stopping RTC inference thread...")
+        self._shutdown_event.set()
+        self._policy_active.clear()
+        if self._rtc_thread is not None and self._rtc_thread.is_alive():
+            self._rtc_thread.join(timeout=_RTC_JOIN_TIMEOUT_S)
+            if self._rtc_thread.is_alive():
+                logger.warning("RTC thread did not join within %.1fs", _RTC_JOIN_TIMEOUT_S)
+            else:
+                logger.info("RTC inference thread stopped")
+            self._rtc_thread = None
+
+    def pause(self) -> None:
+        """Pause the RTC background thread."""
+        logger.info("Pausing RTC inference thread")
+        self._policy_active.clear()
+
+    def resume(self) -> None:
+        """Resume the RTC background thread."""
+        logger.info("Resuming RTC inference thread")
+        self._policy_active.set()
+
+    def reset(self) -> None:
+        """Reset the policy, processors, and action queue."""
+        logger.info("Resetting RTC inference state (policy + processors + queue)")
+        self._policy.reset()
+        self._preprocessor.reset()
+        self._postprocessor.reset()
+        if self._action_queue is not None:
+            self._action_queue.clear()
+
+    # ------------------------------------------------------------------
+    # Action production (called from main thread)
+    # ------------------------------------------------------------------
+
+    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
+        """Pop the next action from the RTC queue (ignores ``obs_frame``)."""
+        if self._action_queue is None:
+            return None
+        return self._action_queue.get()
+
+    def notify_observation(self, obs: dict) -> None:
+        """Publish the latest observation for the RTC thread to consume."""
+        with self._obs_lock:
+            self._obs_holder["obs"] = obs
+
+    # ------------------------------------------------------------------
+    # RTC: background inference thread
+    # ------------------------------------------------------------------
+
+    def _rtc_loop(self) -> None:
+        """Background thread that generates action chunks via RTC."""
+        try:
+            latency_tracker = LatencyTracker()
+            time_per_chunk = 1.0 / self._fps
+            policy_device = torch.device(self._device)
+
+            warmup_required = max(1, self._compile_warmup_inferences) if self._use_torch_compile else 0
+            inference_count = 0
+            consecutive_errors = 0
+
+            while not self._shutdown_event.is_set():
+                if not self._policy_active.is_set():
+                    time.sleep(_RTC_IDLE_SLEEP_S)
+                    continue
+
+                queue = self._action_queue
+                with self._obs_lock:
+                    obs = self._obs_holder.get("obs")
+                if queue is None or obs is None:
+                    time.sleep(_RTC_IDLE_SLEEP_S)
+                    continue
+
+                if queue.qsize() <= self._rtc_queue_threshold:
+                    try:
+                        current_time = time.perf_counter()
+                        idx_before = queue.get_action_index()
+                        prev_actions = queue.get_left_over()
+
+                        latency = latency_tracker.max()
+                        delay = math.ceil(latency / time_per_chunk) if latency else 0
+
+                        obs_batch = build_dataset_frame(self._hw_features, obs, prefix="observation")
+                        obs_batch = prepare_observation_for_inference(
+                            obs_batch, policy_device, self._task, self._robot.robot_type
+                        )
+                        obs_batch["task"] = [self._task]
+
+                        preprocessed = self._preprocessor(obs_batch)
+
+                        if prev_actions is not None and self._relative_step is not None:
+                            # Rebase against the raw cached state so the leftover tail stays in
+                            # the training-time coordinate frame.
+                            raw_state = self._relative_step.get_cached_state()
+                            if raw_state is not None:
+                                prev_abs = queue.get_processed_left_over()
+                                if prev_abs is not None and prev_abs.numel() > 0:
+                                    prev_actions = reanchor_relative_rtc_prefix(
+                                        prev_actions_absolute=prev_abs,
+                                        current_state=raw_state,
+                                        relative_step=self._relative_step,
+                                        normalizer_step=self._normalizer_step,
+                                        policy_device=policy_device,
+                                    )
+
+                        if prev_actions is not None:
+                            prev_actions = _normalize_prev_actions_length(
+                                prev_actions, target_steps=self._rtc_config.execution_horizon
+                            )
+
+                        actions = self._policy.predict_action_chunk(
+                            preprocessed, inference_delay=delay, prev_chunk_left_over=prev_actions
+                        )
+
+                        original = actions.squeeze(0).clone()
+                        processed = self._postprocessor(actions).squeeze(0)
+                        new_latency = time.perf_counter() - current_time
+                        new_delay = math.ceil(new_latency / time_per_chunk)
+
+                        inference_count += 1
+                        consecutive_errors = 0
+                        is_warmup = self._use_torch_compile and inference_count <= warmup_required
+                        if is_warmup:
+                            latency_tracker.reset()
+                        else:
+                            latency_tracker.add(new_latency)
+
+                        queue.merge(original, processed, new_delay, idx_before)
+
+                        if (
+                            is_warmup
+                            and inference_count >= warmup_required
+                            and not self._compile_warmup_done.is_set()
+                        ):
+                            self._compile_warmup_done.set()
+                            logger.info("Compile warmup complete (%d inferences)", inference_count)
+
+                        logger.debug("RTC inference latency=%.2fs, queue=%d", new_latency, queue.qsize())
+
+                    except Exception as e:
+                        consecutive_errors += 1
+                        logger.error(
+                            "RTC inference error (%d/%d): %s",
+                            consecutive_errors,
+                            _RTC_MAX_CONSECUTIVE_ERRORS,
+                            e,
+                        )
+                        logger.debug(traceback.format_exc())
+                        if consecutive_errors >= _RTC_MAX_CONSECUTIVE_ERRORS:
+                            # Persistent failure: stop retrying and propagate shutdown.
+                            raise
+                        time.sleep(_RTC_ERROR_RETRY_DELAY_S)
+                else:
+                    time.sleep(_RTC_IDLE_SLEEP_S)
+
+        except Exception as e:
+            logger.error("Fatal error in RTC thread: %s", e)
+            logger.error(traceback.format_exc())
+            self._rtc_error.set()
+            # Unblock any warmup waiters so the main loop doesn't spin forever
+            self._compile_warmup_done.set()
+            # Signal the top-level shutdown so strategies exit their control loops
+            if self._global_shutdown_event is not None:
+                self._global_shutdown_event.set()
@@ -0,0 +1,122 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Synchronous inference engine: inline policy call per control tick."""
+
+from __future__ import annotations
+
+import logging
+from contextlib import nullcontext
+from copy import copy
+
+import torch
+
+from lerobot.policies.pretrained import PreTrainedPolicy
+from lerobot.policies.utils import make_robot_action, prepare_observation_for_inference
+from lerobot.processor import PolicyProcessorPipeline
+
+from .base import InferenceEngine
+
+logger = logging.getLogger(__name__)
+
+
+# TODO(Steven): support relative-action policies.  The per-tick flow refreshes
+# ``RelativeActionsProcessorStep._last_state`` every call, so cached chunk
+# actions popped on later ticks get reanchored to the *current* robot state and
+# absolute targets drift through the chunk.  Relative-action policies are
+# rejected at context-build time today; RTC postprocesses the whole chunk and
+# is unaffected.
+#
+# Candidate fix: drive the policy via ``predict_action_chunk`` and serve a
+# local FIFO of postprocessed actions.  Eliminates drift by construction and
+# saves per-tick pre/post work, but bypasses ``select_action`` — needs
+# fallbacks for SAC (raises), ACT temporal ensembling (ensembler lives in
+# ``select_action``), and Diffusion-family (obs-history queues populated as a
+# side effect of ``select_action``).
+
+
+class SyncInferenceEngine(InferenceEngine):
+    """Inline synchronous inference: compute one action per call.
+
+    ``get_action`` runs the full policy pipeline (pre/post-processor +
+    ``select_action``) on the given observation frame and returns a
+    CPU action tensor reordered to match the dataset action keys.
+    """
+
+    def __init__(
+        self,
+        policy: PreTrainedPolicy,
+        preprocessor: PolicyProcessorPipeline,
+        postprocessor: PolicyProcessorPipeline,
+        dataset_features: dict,
+        ordered_action_keys: list[str],
+        task: str,
+        device: str | None,
+        robot_type: str,
+    ) -> None:
+        self._policy = policy
+        self._preprocessor = preprocessor
+        self._postprocessor = postprocessor
+        self._dataset_features = dataset_features
+        self._ordered_action_keys = ordered_action_keys
+        self._task = task
+        self._device = torch.device(device or "cpu")
+        self._robot_type = robot_type
+        logger.info(
+            "SyncInferenceEngine initialized (device=%s, action_keys=%d)",
+            self._device,
+            len(ordered_action_keys),
+        )
+
+    def start(self) -> None:
+        """No background resources to start."""
+        logger.info("SyncInferenceEngine started (inline mode — no background thread)")
+
+    def stop(self) -> None:
+        """No background resources to stop."""
+        logger.info("SyncInferenceEngine stopped")
+
+    def reset(self) -> None:
+        """Reset the policy and pre/post-processors."""
+        logger.info("Resetting sync inference state (policy + processors)")
+        self._policy.reset()
+        self._preprocessor.reset()
+        self._postprocessor.reset()
+
+    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
+        """Run the full inference pipeline on ``obs_frame`` and return an action tensor."""
+        if obs_frame is None:
+            return None
+        # Shallow copy is intentional: the caller (`send_next_action`) builds
+        # ``obs_frame`` fresh per tick via ``build_dataset_frame``, so the
+        # tensor/array values are not shared with any other reader.
+        observation = copy(obs_frame)
+        autocast_ctx = (
+            torch.autocast(device_type=self._device.type)
+            if self._device.type == "cuda" and self._policy.config.use_amp
+            else nullcontext()
+        )
+        with torch.inference_mode(), autocast_ctx:
+            observation = prepare_observation_for_inference(
+                observation, self._device, self._task, self._robot_type
+            )
+            observation = self._preprocessor(observation)
+            action = self._policy.select_action(observation)
+            action = self._postprocessor(action)
+        action_tensor = action.squeeze(0).cpu()
+
+        # Reorder to match dataset action ordering so the caller can treat
+        # the returned tensor uniformly across backends.
+        action_dict = make_robot_action(action_tensor, self._dataset_features)
+        return torch.tensor([action_dict[k] for k in self._ordered_action_keys])
@@ -0,0 +1,112 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Memory-bounded ring buffer for the Highlight Reel rollout strategy."""
+
+from __future__ import annotations
+
+from collections import deque
+
+import numpy as np
+import torch
+
+
+class RolloutRingBuffer:
+    """Fixed-capacity circular buffer for observation/action frames.
+
+    Stores the last *N* seconds of telemetry in memory, bounded by both
+    time (``max_frames``) and memory (``max_memory_bytes``).  When either
+    limit is reached the oldest frames are evicted.
+
+    .. note::
+       This class is **single-threaded**.  ``append``/``drain``/``clear``
+       must all be called from the same thread (the rollout main loop).
+       Concurrent access from a background thread will corrupt
+       ``_current_bytes`` accounting.
+
+    Parameters
+    ----------
+    max_seconds:
+        Maximum duration of buffered telemetry.
+    max_memory_mb:
+        Hard memory cap in MiB.  Frames are evicted when the estimated
+        total size exceeds this.
+    fps:
+        Frames per second — used to convert ``max_seconds`` to a frame
+        count.
+    """
+
+    def __init__(self, max_seconds: float = 30.0, max_memory_mb: int = 2048, fps: float = 30.0) -> None:
+        self._max_frames = int(max_seconds * fps)
+        self._max_bytes = int(max_memory_mb * 1024 * 1024)
+        self._buffer: deque[dict] = deque(maxlen=self._max_frames)
+        self._current_bytes: int = 0
+
+    # ------------------------------------------------------------------
+    # Public API
+    # ------------------------------------------------------------------
+
+    def append(self, frame: dict) -> None:
+        """Add *frame* to the buffer, evicting the oldest if at capacity."""
+        frame_bytes = _estimate_frame_bytes(frame)
+
+        # Evict oldest frames until we are under the memory cap
+        while self._current_bytes + frame_bytes > self._max_bytes and self._buffer:
+            evicted = self._buffer.popleft()
+            self._current_bytes -= _estimate_frame_bytes(evicted)
+
+        self._buffer.append(frame)
+        self._current_bytes += frame_bytes
+
+    def drain(self) -> list[dict]:
+        """Return all buffered frames and clear the buffer."""
+        frames = list(self._buffer)
+        self._buffer.clear()
+        self._current_bytes = 0
+        return frames
+
+    def clear(self) -> None:
+        """Discard all buffered frames."""
+        self._buffer.clear()
+        self._current_bytes = 0
+
+    def __len__(self) -> int:
+        return len(self._buffer)
+
+    @property
+    def estimated_bytes(self) -> int:
+        """Estimated total byte size of all buffered frames."""
+        return self._current_bytes
+
+
+# ------------------------------------------------------------------
+# Helpers
+# ------------------------------------------------------------------
+
+
+def _estimate_frame_bytes(frame: dict) -> int:
+    """Rough byte estimate for a single frame dictionary."""
+    total = 0
+    for v in frame.values():
+        if isinstance(v, torch.Tensor):
+            # ``torch.Tensor`` has no ``nbytes``; compute it explicitly so the
+            # memory cap is honoured even when frames hold unconverted tensors.
+            total += v.nelement() * v.element_size()
+        elif isinstance(v, np.ndarray) or hasattr(v, "nbytes"):
+            total += v.nbytes
+        elif isinstance(v, (int, float)):
+            total += 8
+        elif isinstance(v, (str, bytes)):
+            total += len(v)
+    return max(total, 1)  # avoid zero-size frames
@@ -0,0 +1,79 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Thread-safe robot wrapper for concurrent observation/action access."""
+
+from __future__ import annotations
+
+from threading import Lock
+from typing import Any
+
+from lerobot.robots import Robot
+
+
+class ThreadSafeRobot:
+    """Lock-protected wrapper around a :class:`Robot` for use with background threads.
+
+    When RTC inference runs in a background thread while the main loop
+    executes actions, both threads may access the robot concurrently.
+    This wrapper serialises ``get_observation`` and ``send_action`` calls.
+
+    Read-only properties are proxied without the lock since they don't
+    mutate hardware state.
+    """
+
+    def __init__(self, robot: Robot) -> None:
+        self._robot = robot
+        self._lock = Lock()
+
+    # -- Lock-protected I/O --------------------------------------------------
+
+    def get_observation(self) -> dict[str, Any]:
+        with self._lock:
+            return self._robot.get_observation()
+
+    def send_action(self, action: dict[str, Any] | Any) -> Any:
+        with self._lock:
+            return self._robot.send_action(action)
+
+    # -- Read-only proxies (no lock needed) -----------------------------------
+
+    @property
+    def observation_features(self) -> dict:
+        return self._robot.observation_features
+
+    @property
+    def action_features(self) -> dict:
+        return self._robot.action_features
+
+    @property
+    def name(self) -> str:
+        return self._robot.name
+
+    @property
+    def robot_type(self) -> str:
+        return self._robot.robot_type
+
+    @property
+    def cameras(self):
+        return getattr(self._robot, "cameras", {})
+
+    @property
+    def is_connected(self) -> bool:
+        return self._robot.is_connected
+
+    @property
+    def inner(self) -> Robot:
+        """Access the underlying robot (e.g. for connect/disconnect)."""
+        return self._robot
@@ -0,0 +1,36 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Rollout strategies — public API re-exports."""
+
+from .base import BaseStrategy
+from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
+from .dagger import DAggerEvents, DAggerPhase, DAggerStrategy
+from .factory import create_strategy
+from .highlight import HighlightStrategy
+from .sentry import SentryStrategy
+
+__all__ = [
+    "BaseStrategy",
+    "DAggerEvents",
+    "DAggerPhase",
+    "DAggerStrategy",
+    "HighlightStrategy",
+    "RolloutStrategy",
+    "SentryStrategy",
+    "create_strategy",
+    "estimate_max_episode_seconds",
+    "safe_push_to_hub",
+    "send_next_action",
+]
@@ -0,0 +1,85 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Base rollout strategy: autonomous policy execution with no data recording."""
+
+from __future__ import annotations
+
+import logging
+import time
+
+from lerobot.utils.robot_utils import precise_sleep
+
+from ..context import RolloutContext
+from .core import RolloutStrategy, send_next_action
+
+logger = logging.getLogger(__name__)
+
+
+class BaseStrategy(RolloutStrategy):
+    """Autonomous policy rollout with no data recording.
+
+    All actions flow through the ``robot_action_processor`` pipeline
+    before reaching the robot.
+    """
+
+    def setup(self, ctx: RolloutContext) -> None:
+        """Initialise the inference engine."""
+        self._init_engine(ctx)
+        logger.info("Base strategy ready")
+
+    def run(self, ctx: RolloutContext) -> None:
+        """Run the autonomous control loop until shutdown or duration expires."""
+        engine = self._engine
+        cfg = ctx.runtime.cfg
+        robot = ctx.hardware.robot_wrapper
+        interpolator = self._interpolator
+
+        control_interval = interpolator.get_control_interval(cfg.fps)
+
+        start_time = time.perf_counter()
+        engine.resume()
+        logger.info("Base strategy control loop started")
+
+        while not ctx.runtime.shutdown_event.is_set():
+            loop_start = time.perf_counter()
+
+            if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
+                logger.info("Duration limit reached (%.0fs)", cfg.duration)
+                break
+
+            obs = robot.get_observation()
+            obs_processed = self._process_observation_and_notify(ctx.processors, obs)
+
+            if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
+                continue
+
+            action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
+            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
+
+            dt = time.perf_counter() - loop_start
+            if (sleep_t := control_interval - dt) > 0:
+                precise_sleep(sleep_t)
+            else:
+                logger.warning(
+                    f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
+                )
+
+    def teardown(self, ctx: RolloutContext) -> None:
+        """Disconnect hardware and stop inference."""
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
+        logger.info("Base strategy teardown complete")
@@ -0,0 +1,304 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Rollout strategy ABC and shared action-dispatch helper."""
+
+from __future__ import annotations
+
+import abc
+import logging
+import time
+from typing import TYPE_CHECKING
+
+from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
+from lerobot.utils.action_interpolator import ActionInterpolator
+from lerobot.utils.constants import OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame
+from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.visualization_utils import log_rerun_data
+
+from ..inference import InferenceEngine
+
+if TYPE_CHECKING:
+    from ..configs import RolloutStrategyConfig
+    from ..context import HardwareContext, ProcessorContext, RolloutContext, RuntimeContext
+
+logger = logging.getLogger(__name__)
+
+
+class RolloutStrategy(abc.ABC):
+    """Abstract base for rollout execution strategies.
+
+    Each concrete strategy implements a self-contained control loop with
+    its own recording/interaction semantics.  Strategies are mutually
+    exclusive — only one runs per session.
+    """
+
+    def __init__(self, config: RolloutStrategyConfig) -> None:
+        self.config = config
+        self._engine: InferenceEngine | None = None
+        self._interpolator: ActionInterpolator | None = None
+        self._warmup_flushed: bool = False
+        self._cached_obs_processed: dict | None = None
+
+    def _init_engine(self, ctx: RolloutContext) -> None:
+        """Attach the inference engine and action interpolator, then start the backend.
+
+        Creates an :class:`ActionInterpolator` from the config's
+        ``interpolation_multiplier`` and starts the inference engine.
+        Call this from ``setup()`` so strategies share identical
+        initialisation without duplicating code.
+        """
+        self._interpolator = ActionInterpolator(multiplier=ctx.runtime.cfg.interpolation_multiplier)
+        self._engine = ctx.policy.inference
+        logger.info("Starting inference engine...")
+        self._engine.reset()
+        self._engine.start()
+        self._warmup_flushed = False
+        self._cached_obs_processed = None
+        logger.info("Inference engine started")
+
+    def _process_observation_and_notify(self, processors: ProcessorContext, obs_raw: dict) -> dict:
+        """Run the observation processor and notify the engine — throttled to policy ticks.
+
+        Callers are responsible for calling ``robot.get_observation()`` every loop
+        iteration so ``obs_raw`` stays fresh for the action post-processor.  This
+        helper gates only the comparatively expensive bits — the processor pipeline
+        and ``engine.notify_observation`` — to fire when the interpolator signals
+        it needs a new action (once per ``interpolation_multiplier`` ticks).  On
+        interpolated ticks the cached ``obs_processed`` is reused.
+
+        With ``interpolation_multiplier == 1`` this is equivalent to the unthrottled
+        path: ``needs_new_action()`` is True every tick.
+
+        The cache is implicitly invalidated whenever ``interpolator.reset()`` is
+        called (warmup completion, DAgger phase transitions back to AUTONOMOUS),
+        because reset makes ``needs_new_action()`` return True on the next call.
+        """
+        if self._cached_obs_processed is None or self._interpolator.needs_new_action():
+            obs_processed = processors.robot_observation_processor(obs_raw)
+            self._engine.notify_observation(obs_processed)
+            self._cached_obs_processed = obs_processed
+        return self._cached_obs_processed
+
+    def _handle_warmup(self, use_torch_compile: bool, loop_start: float, control_interval: float) -> bool:
+        """Handle torch.compile warmup phase.
+
+        Returns ``True`` if the caller should ``continue`` (still warming
+        up).  On the first post-warmup iteration the engine and
+        interpolator are reset so stale warmup state is discarded.
+        """
+        engine = self._engine
+        interpolator = self._interpolator
+        if not use_torch_compile:
+            return False
+        if not engine.ready:
+            dt = time.perf_counter() - loop_start
+            if (sleep_t := control_interval - dt) > 0:
+                precise_sleep(sleep_t)
+            return True
+        if not self._warmup_flushed:
+            logger.info("Warmup complete — flushing stale state and resuming engine")
+            engine.reset()
+            interpolator.reset()
+            self._warmup_flushed = True
+            engine.resume()
+        return False
+
+    def _teardown_hardware(self, hw: HardwareContext, return_to_initial_position: bool = True) -> None:
+        """Stop the inference engine, optionally return robot to initial position, and disconnect hardware."""
+        if self._engine is not None:
+            logger.info("Stopping inference engine...")
+            self._engine.stop()
+        robot = hw.robot_wrapper.inner
+        if robot.is_connected:
+            if return_to_initial_position and hw.initial_position:
+                logger.info("Returning robot to initial position before shutdown...")
+                self._return_to_initial_position(hw)
+            elif not return_to_initial_position:
+                logger.info(
+                    "Skipping return-to-initial-position (disabled by config); leaving robot in final pose."
+                )
+            logger.info("Disconnecting robot...")
+            robot.disconnect()
+        teleop = hw.teleop
+        if teleop is not None and teleop.is_connected:
+            logger.info("Disconnecting teleoperator...")
+            teleop.disconnect()
+
+    @staticmethod
+    def _return_to_initial_position(hw: HardwareContext, duration_s: float = 3.0, fps: int = 50) -> None:
+        """Smoothly interpolate the robot back to its initial position."""
+        robot = hw.robot_wrapper
+        target = hw.initial_position
+        try:
+            current_obs = robot.get_observation()
+            current_pos = {k: v for k, v in current_obs.items() if k in target}
+            steps = max(int(duration_s * fps), 1)
+            for step in range(1, steps + 1):
+                t = step / steps
+                interp = {}
+                for k in current_pos:
+                    interp[k] = current_pos[k] * (1 - t) + target[k] * t
+                robot.send_action(interp)
+                precise_sleep(1 / fps)
+        except Exception as e:
+            logger.warning("Could not return to initial position: %s", e)
+
+    @staticmethod
+    def _log_telemetry(
+        obs_processed: dict | None,
+        action_dict: dict | None,
+        runtime_ctx: RuntimeContext,
+    ) -> None:
+        """Log observation/action telemetry to Rerun if display_data is enabled."""
+        cfg = runtime_ctx.cfg
+        if not cfg.display_data:
+            return
+        log_rerun_data(
+            observation=obs_processed,
+            action=action_dict,
+            compress_images=cfg.display_compressed_images,
+        )
+
+    @abc.abstractmethod
+    def setup(self, ctx: RolloutContext) -> None:
+        """Strategy-specific initialisation (keyboard listeners, buffers, etc.)."""
+
+    @abc.abstractmethod
+    def run(self, ctx: RolloutContext) -> None:
+        """Main rollout loop.  Returns when shutdown is requested or duration expires."""
+
+    @abc.abstractmethod
+    def teardown(self, ctx: RolloutContext) -> None:
+        """Cleanup: save dataset, stop threads, disconnect hardware."""
+
+
+# ---------------------------------------------------------------------------
+# Shared helpers
+# ---------------------------------------------------------------------------
+
+
+def safe_push_to_hub(dataset, tags=None, private=False) -> bool:
+    """Push dataset to hub, skipping if no episodes have been saved.
+
+    Returns ``True`` if the push was attempted, ``False`` if skipped.
+    """
+    if dataset.num_episodes == 0:
+        logger.warning("No episodes saved — skipping push to hub")
+        return False
+    dataset.push_to_hub(tags=tags, private=private)
+    return True
+
+
+def estimate_max_episode_seconds(
+    dataset_features: dict,
+    fps: float,
+    target_size_mb: float = DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+) -> float:
+    """Conservatively estimate how many seconds of video will exceed *target_size_mb*.
+
+    Each camera produces its own video file, so the episode duration is
+    driven by the **slowest** camera to fill ``target_size_mb`` — i.e.
+    the one with the fewest pixels per frame (lowest bitrate).
+
+    Uses a deliberately **low** bits-per-pixel estimate so the computed
+    duration is *longer* than reality.  By the time the timer fires the
+    actual video file is guaranteed to have crossed the target size,
+    which aligns episode boundaries with the dataset's video-file
+    chunking — each ``push_to_hub`` uploads complete files rather than
+    re-uploading a still-growing one.
+
+    The estimate ignores codec-specific settings (CRF, preset) on purpose:
+    we only need a rough lower bound on bitrate, not a precise prediction.
+
+    Falls back to 300 s (5 min) when no video features are present.
+    """
+    # 0.1 bits-per-pixel is a *low* estimate for CRF-30 streaming video of
+    # robot footage (real-world is typically 0.1 – 0.3 bpp).  Under-
+    # estimating the bitrate over-estimates the time → the episode will be
+    # *larger* than target_size_mb when we save, which is what we want.
+    conservative_bpp = 0.1
+
+    # Collect per-camera pixel counts — each camera has its own video file.
+    camera_pixels = []
+    for feat in dataset_features.values():
+        if feat.get("dtype") == "video":
+            shape = feat.get("shape", ())
+
+            # (H, W, C) — bits-per-pixel is a per-spatial-pixel metric,
+            # so we exclude the channel dimension from the count.
+            if len(shape) == 3:
+                pixels = shape[0] * shape[1]
+                camera_pixels.append(pixels)
+            else:
+                raise ValueError(f"Unexpected video feature shape: {shape}")
+
+    if not camera_pixels:
+        return 300.0
+
+    # Use the smallest camera: it produces the lowest bitrate and therefore
+    # takes the longest to reach the target — the conservative choice.
+    min_pixels = min(camera_pixels)
+    bits_per_frame = min_pixels * conservative_bpp
+    bytes_per_second = (bits_per_frame * fps) / 8
+
+    # Guard against division by zero just in case
+    if bytes_per_second <= 0:
+        return 300.0
+
+    return (target_size_mb * 1024 * 1024) / bytes_per_second
+
+
+# ---------------------------------------------------------------------------
+# Shared action-dispatch helper
+# ---------------------------------------------------------------------------
+
+
+def send_next_action(
+    obs_processed: dict,
+    obs_raw: dict,
+    ctx: RolloutContext,
+    interpolator: ActionInterpolator,
+) -> dict | None:
+    """Dispatch the next action to the robot.
+
+    Pulls the next action tensor from the inference engine, feeds the
+    interpolator, and sends the interpolated action through the
+    ``robot_action_processor`` to the robot.  Works identically for
+    sync and async backends — the rollout strategy never needs to branch.
+
+    Returns the action dict that was sent, or ``None`` if no action was
+    ready (e.g. empty async queue, interpolator not yet primed).
+    """
+    engine = ctx.policy.inference
+    features = ctx.data.dataset_features
+    ordered_keys = ctx.data.ordered_action_keys
+
+    if interpolator.needs_new_action():
+        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+        action_tensor = engine.get_action(obs_frame)
+        if action_tensor is not None:
+            interpolator.add(action_tensor.cpu())
+
+    interp = interpolator.get()
+    if interp is None:
+        return None
+
+    if len(interp) != len(ordered_keys):
+        raise ValueError(f"Interpolated tensor length ({len(interp)}) != action keys ({len(ordered_keys)})")
+    action_dict = {k: interp[i].item() for i, k in enumerate(ordered_keys)}
+    processed = ctx.processors.robot_action_processor((action_dict, obs_raw))
+    ctx.hardware.robot_wrapper.send_action(processed)
+    return action_dict
@@ -0,0 +1,767 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""DAgger rollout strategy: Human-in-the-Loop data collection.
+
+Implements the RaC paradigm (Recovery and Correction) for interactive
+imitation learning.  Alternates between autonomous policy execution and
+human intervention via teleoperator.
+
+Input is controlled via either a keyboard or foot pedal, selected by
+the ``input_device`` config field.  Each device exposes three actions:
+
+    1. **pause_resume** — Toggle policy execution (AUTONOMOUS <-> PAUSED).
+    2. **correction**   — Toggle correction recording (PAUSED <-> CORRECTING).
+    3. **upload**        — Push dataset to hub on demand (corrections-only mode).
+    ESC (keyboard only) — Stop session.
+
+Recording modes:
+    ``record_autonomous=True``:  Sentry-like continuous recording with
+        time-based episode rotation.  Both autonomous and correction
+        frames are recorded; corrections tagged ``intervention=True``.
+    ``record_autonomous=False``: Only correction windows are recorded.
+        Each correction (start to stop) becomes one episode.
+
+Teleoperator expectations:
+    The user is responsible for keeping the leader arm aligned with the
+    follower arm at the moment a correction begins.  Programmatic motor
+    handover (``enable_torque`` / ``disable_torque`` / ``write_goal_positions``)
+    is intentionally not invoked here — see the TODO in
+    :func:`DAggerStrategy._apply_transition` for the open design decision.
+"""
+
+from __future__ import annotations
+
+import contextlib
+import enum
+import logging
+import os
+import sys
+import time
+from concurrent.futures import Future, ThreadPoolExecutor
+from threading import Event, Lock
+from typing import Any
+
+import numpy as np
+
+from lerobot.common.control_utils import is_headless
+from lerobot.datasets import VideoEncodingManager
+from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
+from lerobot.teleoperators import Teleoperator
+from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame
+from lerobot.utils.import_utils import _pynput_available
+from lerobot.utils.pedal import start_pedal_listener
+from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.utils import log_say
+
+from ..configs import DAggerKeyboardConfig, DAggerPedalConfig, DAggerStrategyConfig
+from ..context import RolloutContext
+from ..robot_wrapper import ThreadSafeRobot
+from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
+
+PYNPUT_AVAILABLE = _pynput_available
+keyboard = None
+if PYNPUT_AVAILABLE:
+    try:
+        if ("DISPLAY" not in os.environ) and ("linux" in sys.platform):
+            logging.info("No DISPLAY set. Skipping pynput import.")
+            PYNPUT_AVAILABLE = False
+        else:
+            from pynput import keyboard
+    except Exception as e:
+        PYNPUT_AVAILABLE = False
+        logging.info(f"Could not import pynput: {e}")
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# DAgger state machine
+# ---------------------------------------------------------------------------
+
+
+class DAggerPhase(enum.Enum):
+    """Observable phases of a DAgger episode."""
+
+    AUTONOMOUS = "autonomous"  # Policy driving
+    PAUSED = "paused"  # Engine paused, teleop aligned, awaiting input
+    CORRECTING = "correcting"  # Human driving via teleop, recording interventions
+
+
+# Valid (current_phase, event) -> next_phase
+_DAGGER_TRANSITIONS: dict[tuple[DAggerPhase, str], DAggerPhase] = {
+    (DAggerPhase.AUTONOMOUS, "pause_resume"): DAggerPhase.PAUSED,
+    (DAggerPhase.PAUSED, "pause_resume"): DAggerPhase.AUTONOMOUS,
+    (DAggerPhase.PAUSED, "correction"): DAggerPhase.CORRECTING,
+    (DAggerPhase.CORRECTING, "correction"): DAggerPhase.PAUSED,
+}
+
+
+class DAggerEvents:
+    """Thread-safe container for DAgger input device events.
+
+    The keyboard/pedal threads write transition requests; the main loop
+    consumes them.
+    """
+
+    def __init__(self) -> None:
+        self._lock = Lock()
+        self._phase = DAggerPhase.AUTONOMOUS
+        self._pending_transition: str | None = None
+
+        # Session-level flags
+        self.stop_recording = Event()
+        self.upload_requested = Event()
+
+    # -- Thread-safe phase access ------------------------------------------
+
+    @property
+    def phase(self) -> DAggerPhase:
+        """Current phase of the DAgger state machine."""
+        with self._lock:
+            return self._phase
+
+    @phase.setter
+    def phase(self, value: DAggerPhase) -> None:
+        with self._lock:
+            self._phase = value
+
+    def request_transition(self, event: str) -> None:
+        """Request a phase transition (called from keyboard/pedal threads).
+
+        Only enqueues the request if it corresponds to a valid transition
+        from the current phase, preventing impossible state changes.
+        """
+        with self._lock:
+            if (self._phase, event) in _DAGGER_TRANSITIONS:
+                self._pending_transition = event
+
+    def consume_transition(self) -> tuple[DAggerPhase, DAggerPhase] | None:
+        """Consume a pending transition (called from main loop)."""
+        with self._lock:
+            if self._pending_transition is None:
+                return None
+            key = (self._phase, self._pending_transition)
+            self._pending_transition = None
+            new_phase = _DAGGER_TRANSITIONS.get(key)
+            if new_phase is None:
+                return None
+            old_phase = self._phase
+            self._phase = new_phase
+            return old_phase, new_phase
+
+    def reset(self) -> None:
+        """Reset all transient state for a fresh session."""
+        with self._lock:
+            self._phase = DAggerPhase.AUTONOMOUS
+            self._pending_transition = None
+        self.upload_requested.clear()
+
+
+# ---------------------------------------------------------------------------
+# Teleoperator helpers
+# ---------------------------------------------------------------------------
+
+
+# TODO(Steven): re-enable programmatic teleop alignment once we decide whether
+# to enforce motor-control methods on every Teleoperator.  Until then the user
+# is responsible for moving the leader arm to the follower's pose at the moment
+# a correction begins.
+def _teleop_smooth_move_to(
+    teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50
+) -> None:
+    """Smoothly move teleop to target position via linear interpolation.
+
+    Requires the teleoperator to support motor control methods
+    (``enable_torque``, ``write_goal_positions``, ``get_action``).
+    """
+    teleop.enable_torque()
+    current = teleop.get_action()
+    steps = max(int(duration_s * fps), 1)
+
+    for step in range(steps + 1):
+        t = step / steps
+        interp = {}
+        for k in current:
+            if k in target_pos:
+                interp[k] = current[k] * (1 - t) + target_pos[k] * t
+            else:
+                interp[k] = current[k]
+        teleop.write_goal_positions(interp)
+        time.sleep(1 / fps)
+
+
+# ---------------------------------------------------------------------------
+# Input device handlers
+# ---------------------------------------------------------------------------
+
+
+def _init_dagger_keyboard(events: DAggerEvents, cfg: DAggerKeyboardConfig):
+    """Initialise keyboard listener with DAgger 3-key controls.
+
+    Returns the pynput Listener (or ``None`` in headless mode or when
+    pynput is unavailable).
+    """
+    if not PYNPUT_AVAILABLE or is_headless():
+        logger.warning("Headless environment or pynput unavailable — keyboard controls disabled")
+        return None
+
+    # Map config key names to pynput Key objects for special keys
+    special_keys = {
+        "space": keyboard.Key.space,
+        "tab": keyboard.Key.tab,
+        "enter": keyboard.Key.enter,
+    }
+
+    def _resolve_key(key) -> str | None:
+        """Resolve a pynput key event to a config-comparable string."""
+        if key == keyboard.Key.esc:
+            return "esc"
+        for name, pynput_key in special_keys.items():
+            if key == pynput_key:
+                return name
+        if hasattr(key, "char") and key.char:
+            return key.char
+        return None
+
+    # Build mapping: resolved key string -> DAgger event name
+    key_to_event = {
+        cfg.pause_resume: "pause_resume",
+        cfg.correction: "correction",
+    }
+
+    def on_press(key):
+        try:
+            resolved = _resolve_key(key)
+            if resolved is None:
+                return
+            if resolved == "esc":
+                logger.info("Stop recording...")
+                events.stop_recording.set()
+                return
+            if resolved in key_to_event:
+                events.request_transition(key_to_event[resolved])
+            if resolved == cfg.upload:
+                events.upload_requested.set()
+        except Exception as e:
+            logger.debug("Key error: %s", e)
+
+    listener = keyboard.Listener(on_press=on_press)
+    listener.start()
+    logger.info(
+        "DAgger keyboard listener started (pause_resume='%s', correction='%s', upload='%s', ESC=stop)",
+        cfg.pause_resume,
+        cfg.correction,
+        cfg.upload,
+    )
+    return listener
+
+
+def _init_dagger_pedal(events: DAggerEvents, cfg: DAggerPedalConfig):
+    """Initialise foot pedal listener with DAgger 3-pedal controls.
+
+    Returns the pedal listener thread (or ``None`` if evdev is unavailable).
+    """
+    code_to_event = {
+        cfg.pause_resume: "pause_resume",
+        cfg.correction: "correction",
+    }
+
+    def on_press(code: str) -> None:
+        if code in code_to_event:
+            events.request_transition(code_to_event[code])
+        if code == cfg.upload:
+            events.upload_requested.set()
+
+    logger.info("Initializing DAgger foot pedal listener (device=%s)", cfg.device_path)
+    return start_pedal_listener(on_press, device_path=cfg.device_path)
+
+
+# ---------------------------------------------------------------------------
+# DAgger Strategy
+# ---------------------------------------------------------------------------
+
+
+class DAggerStrategy(RolloutStrategy):
+    """Human-in-the-Loop data collection with intervention tagging.
+
+    State machine::
+
+        AUTONOMOUS --(key1)--> PAUSED --(key2)--> CORRECTING --(key2)--> PAUSED
+                               --(key1)--> AUTONOMOUS
+
+    Recording modes:
+        ``record_autonomous=True``: Sentry-like continuous recording with
+            time-based episode rotation.  Intervention frames tagged True.
+        ``record_autonomous=False``: Only correction windows recorded.
+            Each correction = one episode.  Upload on demand via key3.
+    """
+
+    config: DAggerStrategyConfig
+
+    def __init__(self, config: DAggerStrategyConfig):
+        super().__init__(config)
+        self._listener = None
+        self._pedal_thread = None
+        self._events = DAggerEvents()
+        self._push_executor: ThreadPoolExecutor | None = None
+        self._pending_push: Future | None = None
+        self._needs_push = Event()
+        self._episode_lock = Lock()
+
+    def setup(self, ctx: RolloutContext) -> None:
+        """Initialise the inference engine and input device listener."""
+        self._init_engine(ctx)
+        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="dagger-push")
+        target_mb = self.config.target_video_file_size_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB
+        self._episode_duration_s = estimate_max_episode_seconds(
+            ctx.data.dataset_features, ctx.runtime.cfg.fps, target_size_mb=target_mb
+        )
+
+        if self.config.input_device == "keyboard":
+            self._listener = _init_dagger_keyboard(self._events, self.config.keyboard)
+        else:
+            self._pedal_thread = _init_dagger_pedal(self._events, self.config.pedal)
+
+        record_mode = "all frames (sentry-like)" if self.config.record_autonomous else "corrections only"
+        logger.info(
+            "DAgger strategy ready (input=%s, episodes=%d, record=%s, episode_duration=%.0fs)",
+            self.config.input_device,
+            self.config.num_episodes,
+            record_mode,
+            self._episode_duration_s,
+        )
+
+    def run(self, ctx: RolloutContext) -> None:
+        """Run DAgger episodes with human-in-the-loop intervention."""
+        if self.config.record_autonomous:
+            self._run_continuous(ctx)
+        else:
+            self._run_corrections_only(ctx)
+
+    def teardown(self, ctx: RolloutContext) -> None:
+        """Stop listeners, finalise the dataset, and disconnect hardware."""
+        play_sounds = ctx.runtime.cfg.play_sounds
+        logger.info("Stopping DAgger recording")
+        log_say("Stopping DAgger recording", play_sounds)
+
+        if self._listener is not None and not is_headless():
+            logger.info("Stopping keyboard listener")
+            self._listener.stop()
+
+        # Flush any queued/running push cleanly
+        if self._push_executor is not None:
+            logger.info("Shutting down push executor (waiting for pending pushes)...")
+            self._push_executor.shutdown(wait=True)
+            self._push_executor = None
+
+        if ctx.data.dataset is not None:
+            logger.info("Finalizing dataset...")
+            ctx.data.dataset.finalize()
+            if self._needs_push.is_set() and ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
+                logger.info("Pushing final dataset to hub...")
+                if safe_push_to_hub(
+                    ctx.data.dataset,
+                    tags=ctx.runtime.cfg.dataset.tags,
+                    private=ctx.runtime.cfg.dataset.private,
+                ):
+                    logger.info("Dataset uploaded to hub")
+                    log_say("Dataset uploaded to hub", play_sounds)
+
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
+        logger.info("DAgger strategy teardown complete")
+
+    # ------------------------------------------------------------------
+    # Continuous recording mode (record_autonomous=True)
+    # ------------------------------------------------------------------
+
+    def _run_continuous(self, ctx: RolloutContext) -> None:
+        """Sentry-like continuous recording with intervention tagging.
+
+        Episodes are auto-rotated every ``episode_time_s`` seconds and
+        uploaded in the background every ``upload_every_n_episodes`` episodes.
+        Both autonomous and correction frames are recorded; corrections are
+        tagged with ``intervention=True``.
+        """
+        engine = self._engine
+        cfg = ctx.runtime.cfg
+        robot = ctx.hardware.robot_wrapper
+        teleop = ctx.hardware.teleop
+        dataset = ctx.data.dataset
+        events = self._events
+        interpolator = self._interpolator
+        features = ctx.data.dataset_features
+
+        control_interval = interpolator.get_control_interval(cfg.fps)
+        record_stride = max(1, cfg.interpolation_multiplier)
+        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
+        play_sounds = cfg.play_sounds
+
+        engine.reset()
+        interpolator.reset()
+        events.reset()
+        # TODO(Steven): re-enable once Teleoperator motor-control methods are
+        # standardised; until then the user pre-aligns the leader by hand.
+        # teleop.disable_torque()
+        engine.resume()
+
+        last_action: dict[str, Any] | None = None
+        record_tick = 0
+        start_time = time.perf_counter()
+        episode_start = time.perf_counter()
+        episodes_since_push = 0
+        episode_duration_s = self._episode_duration_s
+        logger.info("DAgger continuous recording started (episode_duration=%.0fs)", episode_duration_s)
+
+        with VideoEncodingManager(dataset):
+            try:
+                while not events.stop_recording.is_set() and not ctx.runtime.shutdown_event.is_set():
+                    loop_start = time.perf_counter()
+
+                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
+                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
+                        break
+
+                    # Process transitions
+                    transition = events.consume_transition()
+                    if transition is not None:
+                        old_phase, new_phase = transition
+                        self._apply_transition(old_phase, new_phase, engine, interpolator, robot, teleop)
+                        last_action = None
+
+                    phase = events.phase
+                    obs = robot.get_observation()
+
+                    # --- CORRECTING: human teleop control ---
+                    # TODO(Steven): teleop runs at the same FPS as the policy. To
+                    # decouple the two, sample teleop at its native rate and
+                    # interpolate to the control loop's tick rate.
+                    if phase == DAggerPhase.CORRECTING:
+                        obs_processed = ctx.processors.robot_observation_processor(obs)
+                        teleop_action = teleop.get_action()
+                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
+                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
+                        robot.send_action(robot_action_to_send)
+                        last_action = robot_action_to_send
+                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)
+                        if record_tick % record_stride == 0:
+                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
+                            frame = {
+                                **obs_frame,
+                                **action_frame,
+                                "task": task_str,
+                                "intervention": np.array([True], dtype=bool),
+                            }
+                            dataset.add_frame(frame)
+                        record_tick += 1
+
+                    # --- PAUSED: hold position ---
+                    elif phase == DAggerPhase.PAUSED:
+                        if last_action:
+                            robot.send_action(last_action)
+
+                    # --- AUTONOMOUS: policy control ---
+                    else:
+                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)
+
+                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
+                            continue
+
+                        action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
+                        if action_dict is not None:
+                            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
+                            last_action = ctx.processors.robot_action_processor((action_dict, obs))
+                            if record_tick % record_stride == 0:
+                                obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                                action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
+                                frame = {
+                                    **obs_frame,
+                                    **action_frame,
+                                    "task": task_str,
+                                    "intervention": np.array([False], dtype=bool),
+                                }
+                                dataset.add_frame(frame)
+                            record_tick += 1
+
+                    # Episode rotation derived from the video file-size target.
+                    # Saving is deferred while a correction is ongoing so the
+                    # episode boundary lands on a clean autonomous frame.
+                    elapsed = time.perf_counter() - episode_start
+                    if elapsed >= episode_duration_s and phase != DAggerPhase.CORRECTING:
+                        with self._episode_lock:
+                            dataset.save_episode()
+                        episodes_since_push += 1
+                        self._needs_push.set()
+                        logger.info(
+                            "Episode saved (total: %d, elapsed: %.1fs)",
+                            dataset.num_episodes,
+                            elapsed,
+                        )
+                        log_say(f"Episode {dataset.num_episodes} saved", play_sounds)
+
+                        if episodes_since_push >= self.config.upload_every_n_episodes:
+                            self._background_push(dataset, cfg)
+                            episodes_since_push = 0
+
+                        episode_start = time.perf_counter()
+
+                    dt = time.perf_counter() - loop_start
+                    if (sleep_t := control_interval - dt) > 0:
+                        precise_sleep(sleep_t)
+                    else:
+                        logger.warning(
+                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
+                        )
+
+            finally:
+                logger.info("DAgger continuous control loop ended — pausing engine")
+                engine.pause()
+                # TODO(Steven): re-enable once Teleoperator motor-control methods
+                # are standardised across all teleop implementations.
+                # teleop.disable_torque()
+                with contextlib.suppress(Exception):
+                    with self._episode_lock:
+                        dataset.save_episode()
+                    self._needs_push.set()
+                    logger.info("Final in-progress episode saved")
+
+    # ------------------------------------------------------------------
+    # Corrections-only mode (record_autonomous=False)
+    # ------------------------------------------------------------------
+
+    def _run_corrections_only(self, ctx: RolloutContext) -> None:
+        """Record only human correction windows.  Each correction = one episode.
+
+        The policy runs autonomously without recording.  When the user
+        pauses and starts a correction, frames are recorded with
+        ``intervention=True``.  Stopping the correction saves the episode.
+        The dataset can be uploaded on demand via the upload key/pedal.
+        """
+        engine = self._engine
+        cfg = ctx.runtime.cfg
+        robot = ctx.hardware.robot_wrapper
+        teleop = ctx.hardware.teleop
+        dataset = ctx.data.dataset
+        events = self._events
+        interpolator = self._interpolator
+        features = ctx.data.dataset_features
+
+        control_interval = interpolator.get_control_interval(cfg.fps)
+        record_stride = max(1, cfg.interpolation_multiplier)
+        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
+        play_sounds = cfg.play_sounds
+
+        engine.reset()
+        interpolator.reset()
+        events.reset()
+        # TODO(Steven): re-enable once Teleoperator motor-control methods are
+        # standardised; until then the user pre-aligns the leader by hand.
+        # teleop.disable_torque()
+        engine.resume()
+
+        last_action: dict[str, Any] | None = None
+        start_time = time.perf_counter()
+        record_tick = 0
+        recorded = 0
+        logger.info(
+            "DAgger corrections-only recording started (target: %d episodes)", self.config.num_episodes
+        )
+
+        with VideoEncodingManager(dataset):
+            try:
+                while (
+                    recorded < self.config.num_episodes
+                    and not events.stop_recording.is_set()
+                    and not ctx.runtime.shutdown_event.is_set()
+                ):
+                    loop_start = time.perf_counter()
+
+                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
+                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
+                        break
+
+                    # Process transitions
+                    transition = events.consume_transition()
+                    if transition is not None:
+                        old_phase, new_phase = transition
+                        self._apply_transition(old_phase, new_phase, engine, interpolator, robot, teleop)
+                        last_action = None
+
+                        # Correction ended -> save episode (blocking if not streaming)
+                        if old_phase == DAggerPhase.CORRECTING and new_phase == DAggerPhase.PAUSED:
+                            with self._episode_lock:
+                                dataset.save_episode()
+                            recorded += 1
+                            self._needs_push.set()
+                            logger.info(
+                                "Correction %d/%d saved",
+                                recorded,
+                                self.config.num_episodes,
+                            )
+                            log_say(f"Correction {recorded} saved", play_sounds)
+
+                    # On-demand upload
+                    if events.upload_requested.is_set():
+                        events.upload_requested.clear()
+                        logger.info("Upload requested by user")
+                        self._background_push(dataset, cfg)
+
+                    phase = events.phase
+                    obs = robot.get_observation()
+
+                    # --- CORRECTING: human teleop control + recording ---
+                    # TODO(Steven): teleop runs at the same FPS as the policy. To
+                    # decouple the two, sample teleop at its native rate and
+                    # interpolate to the control loop's tick rate.
+                    if phase == DAggerPhase.CORRECTING:
+                        obs_processed = ctx.processors.robot_observation_processor(obs)
+                        teleop_action = teleop.get_action()
+                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
+                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
+                        robot.send_action(robot_action_to_send)
+                        last_action = robot_action_to_send
+                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)
+
+                        if record_tick % record_stride == 0:
+                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
+                            dataset.add_frame(
+                                {
+                                    **obs_frame,
+                                    **action_frame,
+                                    "task": task_str,
+                                    "intervention": np.array([True], dtype=bool),
+                                }
+                            )
+                        record_tick += 1
+
+                    # --- PAUSED: hold position ---
+                    elif phase == DAggerPhase.PAUSED:
+                        if last_action:
+                            robot.send_action(last_action)
+
+                    # --- AUTONOMOUS: policy control (no recording) ---
+                    else:
+                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)
+
+                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
+                            continue
+
+                        action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
+                        if action_dict is not None:
+                            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
+                            last_action = ctx.processors.robot_action_processor((action_dict, obs))
+
+                    dt = time.perf_counter() - loop_start
+                    if (sleep_t := control_interval - dt) > 0:
+                        precise_sleep(sleep_t)
+                    else:
+                        logger.warning(
+                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
+                        )
+
+            finally:
+                logger.info("DAgger corrections-only loop ended — pausing engine")
+                engine.pause()
+                # TODO(Steven): re-enable once Teleoperator motor-control methods
+                # are standardised across all teleop implementations.
+                # teleop.disable_torque()
+                with contextlib.suppress(Exception):
+                    with self._episode_lock:
+                        dataset.save_episode()
+                    self._needs_push.set()
+                    logger.info("Final in-progress episode saved")
+
+    # ------------------------------------------------------------------
+    # State-machine transition side-effects
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _apply_transition(
+        old_phase: DAggerPhase,
+        new_phase: DAggerPhase,
+        engine,
+        interpolator,
+        robot: ThreadSafeRobot,
+        teleop: Teleoperator,
+    ) -> None:
+        """Execute side-effects for a validated phase transition."""
+        logger.info("Phase transition: %s -> %s", old_phase.value, new_phase.value)
+        if old_phase == DAggerPhase.AUTONOMOUS and new_phase == DAggerPhase.PAUSED:
+            logger.info("Pausing engine — robot holds position")
+            engine.pause()
+            obs = robot.get_observation()
+            _robot_pos = {
+                k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features
+            }
+            # TODO(Steven): once Teleoperator motor-control methods are
+            # standardised, drive the leader to the follower's pose here so the
+            # operator does not need to pre-align the arm by hand.  Until then
+            # the user is responsible for the alignment.
+            # _teleop_smooth_move_to(teleop, _robot_pos, duration_s=2.0, fps=50)
+
+        elif new_phase == DAggerPhase.CORRECTING:
+            logger.info("Entering correction mode — human teleop control")
+            # TODO(Steven): re-enable once Teleoperator motor-control methods
+            # are standardised across all teleop implementations.
+            # teleop.disable_torque()
+
+        elif new_phase == DAggerPhase.AUTONOMOUS:
+            logger.info("Resuming autonomous mode — resetting engine and interpolator")
+            interpolator.reset()
+            engine.reset()
+            engine.resume()
+
+    # ------------------------------------------------------------------
+    # Background push (shared by both modes)
+    # ------------------------------------------------------------------
+
+    def _background_push(self, dataset, cfg) -> None:
+        """Queue a Hub push on the single-worker executor.
+
+        The executor's max_workers=1 guarantees at most one push runs at
+        a time; submitted tasks are queued rather than dropped.  Pushes
+        are blocked while the operator is mid-correction to avoid
+        uploading a partially-recorded episode.
+        """
+        if self._push_executor is None:
+            return
+
+        if self._events.phase == DAggerPhase.CORRECTING:
+            logger.info("Skipping push — correction in progress")
+            return
+
+        if self._pending_push is not None and not self._pending_push.done():
+            logger.info("Previous push still in progress; queueing next")
+
+        def _push():
+            try:
+                with self._episode_lock:
+                    if safe_push_to_hub(
+                        dataset,
+                        tags=cfg.dataset.tags if cfg.dataset else None,
+                        private=cfg.dataset.private if cfg.dataset else False,
+                    ):
+                        self._needs_push.clear()
+                        logger.info("Background push to hub complete")
+            except Exception as e:
+                logger.error("Background push failed: %s", e)
+
+        self._pending_push = self._push_executor.submit(_push)
+        logger.info("Background push task submitted")
@@ -0,0 +1,45 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Strategy factory: config type-name → strategy class dispatch."""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+from .base import BaseStrategy
+from .core import RolloutStrategy
+from .dagger import DAggerStrategy
+from .highlight import HighlightStrategy
+from .sentry import SentryStrategy
+
+if TYPE_CHECKING:
+    from ..configs import RolloutStrategyConfig
+
+
+def create_strategy(config: RolloutStrategyConfig) -> RolloutStrategy:
+    """Instantiate the appropriate strategy from a config object.
+
+    Dispatches on ``config.type`` (the name registered via
+    ``draccus.ChoiceRegistry``).
+    """
+    if config.type == "base":
+        return BaseStrategy(config)
+    if config.type == "sentry":
+        return SentryStrategy(config)
+    if config.type == "highlight":
+        return HighlightStrategy(config)
+    if config.type == "dagger":
+        return DAggerStrategy(config)
+    raise ValueError(f"Unknown strategy type '{config.type}'. Available: base, sentry, highlight, dagger")
@@ -0,0 +1,283 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Highlight Reel strategy: on-demand recording via ring buffer."""
+
+from __future__ import annotations
+
+import contextlib
+import logging
+import os
+import sys
+import time
+from concurrent.futures import Future, ThreadPoolExecutor
+from threading import Event as ThreadingEvent, Lock
+
+from lerobot.common.control_utils import is_headless
+from lerobot.datasets import VideoEncodingManager
+from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame
+from lerobot.utils.import_utils import _pynput_available, require_package
+from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.utils import log_say
+
+from ..configs import HighlightStrategyConfig
+from ..context import RolloutContext
+from ..ring_buffer import RolloutRingBuffer
+from .core import RolloutStrategy, safe_push_to_hub, send_next_action
+
+PYNPUT_AVAILABLE = _pynput_available
+keyboard = None
+if PYNPUT_AVAILABLE:
+    try:
+        if ("DISPLAY" not in os.environ) and ("linux" in sys.platform):
+            logging.info("No DISPLAY set. Skipping pynput import.")
+            PYNPUT_AVAILABLE = False
+        else:
+            from pynput import keyboard
+    except Exception as e:
+        PYNPUT_AVAILABLE = False
+        logging.info(f"Could not import pynput: {e}")
+
+logger = logging.getLogger(__name__)
+
+
+class HighlightStrategy(RolloutStrategy):
+    """Autonomous rollout with on-demand recording via ring buffer.
+
+    The robot runs autonomously while a memory-bounded ring buffer
+    captures continuous telemetry.  When the user presses the save key:
+
+    1. The ring buffer is flushed to the dataset (last *Z* seconds).
+    2. Live recording continues until the save key is pressed again.
+    3. The episode is saved and the ring buffer resumes capturing.
+
+    Requires ``streaming_encoding=True`` (enforced in config validation)
+    so that ``dataset.add_frame`` is a non-blocking queue put — flushing
+    the entire ring buffer in one tick must not stall the control loop.
+    """
+
+    config: HighlightStrategyConfig
+
+    def __init__(self, config: HighlightStrategyConfig):
+        super().__init__(config)
+        require_package("pynput", extra="pynput-dep")
+        self._ring: RolloutRingBuffer | None = None
+        self._listener = None
+        self._save_requested = ThreadingEvent()
+        self._recording_live = ThreadingEvent()
+        self._push_requested = ThreadingEvent()
+        self._push_executor: ThreadPoolExecutor | None = None
+        self._pending_push: Future | None = None
+        self._episode_lock = Lock()
+
+    def setup(self, ctx: RolloutContext) -> None:
+        """Initialise the inference engine, ring buffer, and keyboard listener."""
+        self._init_engine(ctx)
+
+        self._ring = RolloutRingBuffer(
+            max_seconds=self.config.ring_buffer_seconds,
+            max_memory_mb=self.config.ring_buffer_max_memory_mb,
+            fps=ctx.runtime.cfg.fps,
+        )
+
+        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="highlight-push")
+        logger.info(
+            "Ring buffer initialized (max_seconds=%.0f, max_memory=%.0fMB)",
+            self.config.ring_buffer_seconds,
+            self.config.ring_buffer_max_memory_mb,
+        )
+        self._setup_keyboard(ctx.runtime.shutdown_event)
+        logger.info(
+            "Highlight strategy ready (buffer=%.0fs, save='%s', push='%s')",
+            self.config.ring_buffer_seconds,
+            self.config.save_key,
+            self.config.push_key,
+        )
+
+    def run(self, ctx: RolloutContext) -> None:
+        """Run the autonomous loop, buffering frames and recording on demand."""
+        engine = self._engine
+        cfg = ctx.runtime.cfg
+        robot = ctx.hardware.robot_wrapper
+        dataset = ctx.data.dataset
+        ring = self._ring
+        interpolator = self._interpolator
+        features = ctx.data.dataset_features
+
+        control_interval = interpolator.get_control_interval(cfg.fps)
+
+        engine.resume()
+        play_sounds = cfg.play_sounds
+
+        start_time = time.perf_counter()
+        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
+        logger.info("Highlight strategy recording started (press '%s' to save)", self.config.save_key)
+
+        with VideoEncodingManager(dataset):
+            try:
+                while not ctx.runtime.shutdown_event.is_set():
+                    loop_start = time.perf_counter()
+
+                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
+                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
+                        break
+
+                    obs = robot.get_observation()
+                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)
+
+                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
+                        continue
+
+                    action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
+
+                    if action_dict is not None:
+                        self._log_telemetry(obs_processed, action_dict, ctx.runtime)
+                        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                        action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
+                        frame = {**obs_frame, **action_frame, "task": task_str}
+
+                        # NOTE: ``is_set()`` then ``clear()`` is not atomic
+                        # against the keyboard thread setting the flag again
+                        # in between — but that is benign: we lose at most one
+                        # toggle, processed on the next iteration.
+                        if self._save_requested.is_set():
+                            self._save_requested.clear()
+                            if not self._recording_live.is_set():
+                                logger.info(
+                                    "Flushing ring buffer (%d frames) + starting live recording",
+                                    len(ring),
+                                )
+                                for buffered_frame in ring.drain():
+                                    dataset.add_frame(buffered_frame)
+                                self._recording_live.set()
+                            else:
+                                dataset.add_frame(frame)
+                                with self._episode_lock:
+                                    dataset.save_episode()
+                                logger.info("Episode saved (total: %d)", dataset.num_episodes)
+                                log_say(
+                                    f"Episode {dataset.num_episodes} saved",
+                                    play_sounds,
+                                )
+                                self._recording_live.clear()
+                                continue  # frame already consumed — skip ring.append
+
+                        if self._push_requested.is_set():
+                            self._push_requested.clear()
+                            logger.info("Push requested by user")
+                            self._background_push(dataset, cfg)
+
+                        if self._recording_live.is_set():
+                            dataset.add_frame(frame)
+                        else:
+                            ring.append(frame)
+
+                    dt = time.perf_counter() - loop_start
+                    if (sleep_t := control_interval - dt) > 0:
+                        precise_sleep(sleep_t)
+                    else:
+                        logger.warning(
+                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
+                        )
+
+            finally:
+                logger.info("Highlight control loop ended")
+                if self._recording_live.is_set():
+                    logger.info("Saving in-progress live episode")
+                    with contextlib.suppress(Exception), self._episode_lock:
+                        dataset.save_episode()
+
+    def teardown(self, ctx: RolloutContext) -> None:
+        """Stop listeners, finalise the dataset, and disconnect hardware."""
+        play_sounds = ctx.runtime.cfg.play_sounds
+        logger.info("Stopping highlight recording")
+        log_say("Stopping highlight recording", play_sounds)
+
+        if self._listener is not None:
+            logger.info("Stopping keyboard listener")
+            self._listener.stop()
+
+        if self._push_executor is not None:
+            logger.info("Shutting down push executor (waiting for pending pushes)...")
+            self._push_executor.shutdown(wait=True)
+            self._push_executor = None
+
+        if ctx.data.dataset is not None:
+            logger.info("Finalizing dataset...")
+            ctx.data.dataset.finalize()
+            if ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
+                logger.info("Pushing final dataset to hub...")
+                if safe_push_to_hub(
+                    ctx.data.dataset,
+                    tags=ctx.runtime.cfg.dataset.tags,
+                    private=ctx.runtime.cfg.dataset.private,
+                ):
+                    logger.info("Dataset uploaded to hub")
+                    log_say("Dataset uploaded to hub", play_sounds)
+
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
+        logger.info("Highlight strategy teardown complete")
+
+    def _setup_keyboard(self, shutdown_event: ThreadingEvent) -> None:
+        """Set up keyboard listener for save and push keys."""
+        if is_headless():
+            logger.warning("Headless environment — highlight keys unavailable")
+            return
+
+        try:
+            save_key = self.config.save_key
+            push_key = self.config.push_key
+
+            def on_press(key):
+                with contextlib.suppress(Exception):
+                    if hasattr(key, "char") and key.char == save_key:
+                        self._save_requested.set()
+                    elif hasattr(key, "char") and key.char == push_key:
+                        self._push_requested.set()
+                    elif key == keyboard.Key.esc:
+                        self._save_requested.clear()
+                        shutdown_event.set()
+
+            self._listener = keyboard.Listener(on_press=on_press)
+            self._listener.start()
+            logger.info("Keyboard listener started (save='%s', push='%s', ESC=stop)", save_key, push_key)
+        except ImportError:
+            logger.warning("pynput not available — keyboard listener disabled")
+
+    def _background_push(self, dataset, cfg) -> None:
+        """Queue a Hub push on the single-worker executor."""
+        if self._push_executor is None:
+            return
+
+        if self._pending_push is not None and not self._pending_push.done():
+            logger.info("Previous push still in progress; queueing next")
+
+        def _push():
+            try:
+                with self._episode_lock:
+                    if safe_push_to_hub(
+                        dataset,
+                        tags=cfg.dataset.tags if cfg.dataset else None,
+                        private=cfg.dataset.private if cfg.dataset else False,
+                    ):
+                        logger.info("Background push to hub complete")
+            except Exception as e:
+                logger.error("Background push failed: %s", e)
+
+        self._pending_push = self._push_executor.submit(_push)
+        logger.info("Background push task submitted")
@@ -0,0 +1,231 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Sentry rollout strategy: continuous autonomous recording with auto-upload."""
+
+from __future__ import annotations
+
+import contextlib
+import logging
+import time
+from concurrent.futures import Future, ThreadPoolExecutor
+from threading import Event, Lock
+
+from lerobot.datasets import VideoEncodingManager
+from lerobot.datasets.utils import DEFAULT_VIDEO_FILE_SIZE_IN_MB
+from lerobot.utils.constants import ACTION, OBS_STR
+from lerobot.utils.feature_utils import build_dataset_frame
+from lerobot.utils.robot_utils import precise_sleep
+from lerobot.utils.utils import log_say
+
+from ..configs import SentryStrategyConfig
+from ..context import RolloutContext
+from .core import RolloutStrategy, estimate_max_episode_seconds, safe_push_to_hub, send_next_action
+
+logger = logging.getLogger(__name__)
+
+
+class SentryStrategy(RolloutStrategy):
+    """Continuous autonomous rollout with always-on recording.
+
+    Episode duration is derived from camera resolution, FPS, and
+    ``DEFAULT_VIDEO_FILE_SIZE_IN_MB`` so that each saved episode
+    produces a video file that has crossed the chunk-size boundary.
+    This keeps ``push_to_hub`` efficient — it uploads complete video
+    files rather than re-uploading a still-growing one.
+
+    The dataset is pushed to the Hub via a bounded single-worker executor
+    so no push is ever silently dropped and exactly one push runs at a
+    time.
+
+    Policy state (hidden state, RTC queue) intentionally persists across
+    episode boundaries — Sentry slices one continuous rollout, the robot
+    does not reset between slices.
+
+    Requires ``streaming_encoding=True`` (enforced in config validation)
+    to prevent disk I/O from blocking the control loop.
+    """
+
+    config: SentryStrategyConfig
+
+    def __init__(self, config: SentryStrategyConfig):
+        super().__init__(config)
+        self._push_executor: ThreadPoolExecutor | None = None
+        self._pending_push: Future | None = None
+        self._needs_push = Event()
+        self._episode_lock = Lock()
+
+    def setup(self, ctx: RolloutContext) -> None:
+        """Initialise the inference engine and background push executor."""
+        self._init_engine(ctx)
+        self._push_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="sentry-push")
+        target_mb = self.config.target_video_file_size_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB
+        self._episode_duration_s = estimate_max_episode_seconds(
+            ctx.data.dataset_features, ctx.runtime.cfg.fps, target_size_mb=target_mb
+        )
+        logger.info(
+            "Sentry strategy ready (episode_duration=%.0fs, upload_every=%d eps)",
+            self._episode_duration_s,
+            self.config.upload_every_n_episodes,
+        )
+
+    def run(self, ctx: RolloutContext) -> None:
+        """Run the continuous recording loop with automatic episode rotation."""
+        engine = self._engine
+        cfg = ctx.runtime.cfg
+        robot = ctx.hardware.robot_wrapper
+        dataset = ctx.data.dataset
+        interpolator = self._interpolator
+        features = ctx.data.dataset_features
+
+        control_interval = interpolator.get_control_interval(cfg.fps)
+
+        engine.resume()
+        play_sounds = cfg.play_sounds
+        episode_duration_s = self._episode_duration_s
+
+        start_time = time.perf_counter()
+        episode_start = time.perf_counter()
+        episodes_since_push = 0
+        task_str = cfg.dataset.single_task if cfg.dataset else cfg.task
+        logger.info("Sentry recording started (episode_duration=%.0fs)", episode_duration_s)
+
+        with VideoEncodingManager(dataset):
+            try:
+                while not ctx.runtime.shutdown_event.is_set():
+                    loop_start = time.perf_counter()
+
+                    if cfg.duration > 0 and (time.perf_counter() - start_time) >= cfg.duration:
+                        logger.info("Duration limit reached (%.0fs)", cfg.duration)
+                        break
+
+                    obs = robot.get_observation()
+                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)
+
+                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
+                        continue
+
+                    action_dict = send_next_action(obs_processed, obs, ctx, interpolator)
+
+                    if action_dict is not None:
+                        self._log_telemetry(obs_processed, action_dict, ctx.runtime)
+                        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                        action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
+                        frame = {**obs_frame, **action_frame, "task": task_str}
+                        # ``add_frame`` writes to the in-progress episode buffer; the
+                        # background pusher only ever touches *finalised* episode
+                        # artifacts on disk.  The two operate on disjoint state, so
+                        # ``add_frame`` does not need ``_episode_lock``.
+                        dataset.add_frame(frame)
+
+                    # Episode rotation derived from video file-size target.
+                    # The duration is a conservative estimate so the actual
+                    # video has crossed DEFAULT_VIDEO_FILE_SIZE_IN_MB by now,
+                    # keeping push_to_hub efficient (uploads complete files).
+                    elapsed = time.perf_counter() - episode_start
+                    if elapsed >= episode_duration_s:
+                        # ``save_episode`` finalises the in-progress episode and
+                        # flushes it to disk; ``_episode_lock`` serialises this with
+                        # ``push_to_hub`` (run in the background executor) so the
+                        # pusher never reads a half-written episode.
+                        with self._episode_lock:
+                            dataset.save_episode()
+                        episodes_since_push += 1
+                        self._needs_push.set()
+                        logger.info(
+                            "Episode saved (total: %d, elapsed: %.1fs)",
+                            dataset.num_episodes,
+                            elapsed,
+                        )
+                        log_say(f"Episode {dataset.num_episodes} saved", play_sounds)
+
+                        if episodes_since_push >= self.config.upload_every_n_episodes:
+                            self._background_push(dataset, cfg)
+                            episodes_since_push = 0
+
+                        episode_start = time.perf_counter()
+
+                    dt = time.perf_counter() - loop_start
+                    if (sleep_t := control_interval - dt) > 0:
+                        precise_sleep(sleep_t)
+                    else:
+                        logger.warning(
+                            f"Record loop is running slower ({1 / dt:.1f} Hz) than the target FPS ({cfg.fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
+                        )
+
+            finally:
+                logger.info("Sentry control loop ended — saving final episode")
+                with contextlib.suppress(Exception):
+                    with self._episode_lock:
+                        dataset.save_episode()
+                    self._needs_push.set()
+
+    def teardown(self, ctx: RolloutContext) -> None:
+        """Flush pending pushes, finalise the dataset, and disconnect hardware."""
+        play_sounds = ctx.runtime.cfg.play_sounds
+        logger.info("Stopping sentry recording")
+        log_say("Stopping sentry recording", play_sounds)
+
+        # Flush any queued/running push cleanly.
+        if self._push_executor is not None:
+            logger.info("Shutting down push executor (waiting for pending pushes)...")
+            self._push_executor.shutdown(wait=True)
+            self._push_executor = None
+
+        if ctx.data.dataset is not None:
+            logger.info("Finalizing dataset...")
+            ctx.data.dataset.finalize()
+            if self._needs_push.is_set() and ctx.runtime.cfg.dataset and ctx.runtime.cfg.dataset.push_to_hub:
+                logger.info("Pushing final dataset to hub...")
+                if safe_push_to_hub(
+                    ctx.data.dataset,
+                    tags=ctx.runtime.cfg.dataset.tags,
+                    private=ctx.runtime.cfg.dataset.private,
+                ):
+                    logger.info("Dataset uploaded to hub")
+                    log_say("Dataset uploaded to hub", play_sounds)
+
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
+        logger.info("Sentry strategy teardown complete")
+
+    def _background_push(self, dataset, cfg) -> None:
+        """Queue a Hub push on the single-worker executor.
+
+        The executor's max_workers=1 guarantees at most one push runs at
+        a time; submitted tasks are queued rather than dropped.
+        """
+        if self._push_executor is None:
+            return
+
+        if self._pending_push is not None and not self._pending_push.done():
+            logger.info("Previous push still in progress; queueing next")
+
+        def _push():
+            try:
+                with self._episode_lock:
+                    if safe_push_to_hub(
+                        dataset,
+                        tags=cfg.dataset.tags if cfg.dataset else None,
+                        private=cfg.dataset.private if cfg.dataset else False,
+                    ):
+                        self._needs_push.clear()
+                        logger.info("Background push to hub complete")
+            except Exception as e:
+                logger.error("Background push failed: %s", e)
+
+        self._pending_push = self._push_executor.submit(_push)
+        logger.info("Background push task submitted")
@@ -13,70 +13,62 @@
 # limitations under the License.

 """
-Records a dataset. Actions for the robot can be either generated by teleoperation or by a policy.
+Records a dataset via teleoperation.  This is a pure data-collection
+tool — no policy inference.  For deploying trained policies, use
+``lerobot-rollout`` instead.

 Requires: pip install 'lerobot[core_scripts]'  (includes dataset + hardware + viz extras)

 Example:

 ```shell
-lerobot-record \
-    --robot.type=so100_follower \
-    --robot.port=/dev/tty.usbmodem58760431541 \
-    --robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
-    --robot.id=black \
-    --dataset.repo_id=<my_username>/<my_dataset_name> \
-    --dataset.num_episodes=2 \
-    --dataset.single_task="Grab the cube" \
-    --dataset.streaming_encoding=true \
-    --dataset.encoder_threads=2 \
+lerobot-record \\
+    --robot.type=so100_follower \\
+    --robot.port=/dev/tty.usbmodem58760431541 \\
+    --robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \\
+    --robot.id=black \\
+    --teleop.type=so100_leader \\
+    --teleop.port=/dev/tty.usbmodem58760431551 \\
+    --teleop.id=blue \\
+    --dataset.repo_id=<my_username>/<my_dataset_name> \\
+    --dataset.num_episodes=2 \\
+    --dataset.single_task="Grab the cube" \\
+    --dataset.streaming_encoding=true \\
+    --dataset.encoder_threads=2 \\
    --display_data=true
-    # <- Optional: specify video codec (auto, h264, hevc, libsvtav1). Default is libsvtav1. \
-    # --dataset.vcodec=h264 \
-    # <- Teleop optional if you want to teleoperate to record or in between episodes with a policy \
-    # --teleop.type=so100_leader \
-    # --teleop.port=/dev/tty.usbmodem58760431551 \
-    # --teleop.id=blue \
-    # <- Policy optional if you want to record with a policy \
-    # --policy.path=${HF_USER}/my_policy \
 ```

 Example recording with bimanual so100:
 ```shell
-lerobot-record \
-  --robot.type=bi_so_follower \
-  --robot.left_arm_config.port=/dev/tty.usbmodem5A460822851 \
-  --robot.right_arm_config.port=/dev/tty.usbmodem5A460814411 \
-  --robot.id=bimanual_follower \
+lerobot-record \\
+  --robot.type=bi_so_follower \\
+  --robot.left_arm_config.port=/dev/tty.usbmodem5A460822851 \\
+  --robot.right_arm_config.port=/dev/tty.usbmodem5A460814411 \\
+  --robot.id=bimanual_follower \\
  --robot.left_arm_config.cameras='{
    wrist: {"type": "opencv", "index_or_path": 1, "width": 640, "height": 480, "fps": 30},
    top: {"type": "opencv", "index_or_path": 3, "width": 640, "height": 480, "fps": 30},
  }' --robot.right_arm_config.cameras='{
    wrist: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
    front: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
-  }' \
-  --teleop.type=bi_so_leader \
-  --teleop.left_arm_config.port=/dev/tty.usbmodem5A460852721 \
-  --teleop.right_arm_config.port=/dev/tty.usbmodem5A460819811 \
-  --teleop.id=bimanual_leader \
-  --display_data=true \
-  --dataset.repo_id=${HF_USER}/bimanual-so-handover-cube \
-  --dataset.num_episodes=25 \
-  --dataset.single_task="Grab and handover the red cube to the other arm" \
-  --dataset.streaming_encoding=true \
-  # --dataset.vcodec=auto \
+  }' \\
+  --teleop.type=bi_so_leader \\
+  --teleop.left_arm_config.port=/dev/tty.usbmodem5A460852721 \\
+  --teleop.right_arm_config.port=/dev/tty.usbmodem5A460819811 \\
+  --teleop.id=bimanual_leader \\
+  --display_data=true \\
+  --dataset.repo_id=${HF_USER}/bimanual-so-handover-cube \\
+  --dataset.num_episodes=25 \\
+  --dataset.single_task="Grab and handover the red cube to the other arm" \\
+  --dataset.streaming_encoding=true \\
  --dataset.encoder_threads=2
 ```
 """

 import logging
 import time
-from dataclasses import asdict, dataclass, field
-from pathlib import Path
+from dataclasses import asdict, dataclass
 from pprint import pformat
-from typing import Any
-
-import torch

 from lerobot.cameras import CameraConfig  # noqa: F401
 from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
@@ -86,11 +78,10 @@ from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
 from lerobot.common.control_utils import (
    init_keyboard_listener,
    is_headless,
-    predict_action,
-    sanity_check_dataset_name,
    sanity_check_dataset_robot_compatibility,
 )
-from lerobot.configs import PreTrainedConfig, parser
+from lerobot.configs import parser
+from lerobot.configs.dataset import DatasetRecordConfig
 from lerobot.datasets import (
    LeRobotDataset,
    VideoEncodingManager,
@@ -98,21 +89,11 @@ from lerobot.datasets import (
    create_initial_features,
    safe_stop_image_writer,
 )
-from lerobot.policies import (
-    ActionInterpolator,
-    PreTrainedPolicy,
-    make_policy,
-    make_pre_post_processors,
-    make_robot_action,
-)
 from lerobot.processor import (
-    PolicyAction,
-    PolicyProcessorPipeline,
    RobotAction,
    RobotObservation,
    RobotProcessorPipeline,
    make_default_processors,
-    rename_stats,
 )
 from lerobot.robots import (  # noqa: F401
    Robot,
@@ -146,7 +127,6 @@ from lerobot.teleoperators import (  # noqa: F401
 )
 from lerobot.teleoperators.keyboard import KeyboardTeleop
 from lerobot.utils.constants import ACTION, OBS_STR
-from lerobot.utils.device_utils import get_safe_torch_device
 from lerobot.utils.feature_utils import build_dataset_frame, combine_feature_dicts
 from lerobot.utils.import_utils import register_third_party_plugins
 from lerobot.utils.robot_utils import precise_sleep
@@ -157,71 +137,12 @@ from lerobot.utils.utils import (
 from lerobot.utils.visualization_utils import init_rerun, log_rerun_data


-@dataclass
-class DatasetRecordConfig:
-    # Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test`).
-    repo_id: str
-    # A short but accurate description of the task performed during the recording (e.g. "Pick the Lego block and drop it in the box on the right.")
-    single_task: str
-    # Root directory where the dataset will be stored (e.g. 'dataset/path'). If None, defaults to $HF_LEROBOT_HOME/repo_id.
-    root: str | Path | None = None
-    # Limit the frames per second.
-    fps: int = 30
-    # Number of seconds for data recording for each episode.
-    episode_time_s: int | float = 60
-    # Number of seconds for resetting the environment after each episode.
-    reset_time_s: int | float = 60
-    # Number of episodes to record.
-    num_episodes: int = 50
-    # Encode frames in the dataset into video
-    video: bool = True
-    # Upload dataset to Hugging Face hub.
-    push_to_hub: bool = True
-    # Upload on private repository on the Hugging Face hub.
-    private: bool = False
-    # Add tags to your dataset on the hub.
-    tags: list[str] | None = None
-    # Number of subprocesses handling the saving of frames as PNG. Set to 0 to use threads only;
-    # set to ≥1 to use subprocesses, each using threads to write images. The best number of processes
-    # and threads depends on your system. We recommend 4 threads per camera with 0 processes.
-    # If fps is unstable, adjust the thread count. If still unstable, try using 1 or more subprocesses.
-    num_image_writer_processes: int = 0
-    # Number of threads writing the frames as png images on disk, per camera.
-    # Too many threads might cause unstable teleoperation fps due to main thread being blocked.
-    # Not enough threads might cause low camera fps.
-    num_image_writer_threads_per_camera: int = 4
-    # Number of episodes to record before batch encoding videos
-    # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
-    video_encoding_batch_size: int = 1
-    # Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto',
-    # or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'.
-    # Use 'auto' to auto-detect the best available hardware encoder.
-    vcodec: str = "libsvtav1"
-    # Enable streaming video encoding: encode frames in real-time during capture instead
-    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
-    streaming_encoding: bool = False
-    # Maximum number of frames to buffer per camera when using streaming encoding.
-    # ~1s buffer at 30fps. Provides backpressure if the encoder can't keep up.
-    encoder_queue_maxsize: int = 30
-    # Number of threads per encoder instance. None = auto (codec default).
-    # Lower values reduce CPU usage, maps to 'lp' (via svtav1-params) for libsvtav1 and 'threads' for h264/hevc..
-    encoder_threads: int | None = None
-    # Rename map for the observation to override the image and state keys
-    rename_map: dict[str, str] = field(default_factory=dict)
-
-    def __post_init__(self):
-        if self.single_task is None:
-            raise ValueError("You need to provide a task as argument in `single_task`.")
-
-
@dataclass
 class RecordConfig:
    robot: RobotConfig
    dataset: DatasetRecordConfig
-    # Whether to control the robot with a teleoperator
+    # Teleoperator to control the robot (required)
    teleop: TeleoperatorConfig | None = None
-    # Whether to control the robot with a policy
-    policy: PreTrainedConfig | None = None
    # Display all cameras on screen
    display_data: bool = False
    # Display data on a remote Rerun server
@@ -234,27 +155,14 @@ class RecordConfig:
    play_sounds: bool = True
    # Resume recording on an existing dataset.
    resume: bool = False
-    # Action interpolation multiplier for smoother policy control (1=off, 2=2x, 3=3x)
-    # Only applies when using a policy (not teleop)
-    interpolation_multiplier: int = 1

    def __post_init__(self):
-        # HACK: We parse again the cli args here to get the pretrained path if there was one.
-        policy_path = parser.get_path_arg("policy")
-
-        if policy_path:
-            cli_overrides = parser.get_cli_overrides("policy")
-
-            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
-            self.policy.pretrained_path = policy_path
-
-        if self.teleop is None and self.policy is None:
-            raise ValueError("Choose a policy, a teleoperator or both to control the robot")
-
-    @classmethod
-    def __get_path_fields__(cls) -> list[str]:
-        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
-        return ["policy"]
+        if self.teleop is None:
+            raise ValueError(
+                "A teleoperator is required for recording. "
+                "Use --teleop.type=... to specify one. "
+                "For policy-based deployment, use lerobot-rollout instead."
+            )


 """ --------------- record_loop() data flow --------------------------
@@ -264,18 +172,14 @@ class RecordConfig:
           V
     [ robot_observation_processor ] ---> processed_obs
           V
-     .-----( ACTION LOGIC )------------------.
-     V                                       V
-     [ From Teleoperator ]                   [ From Policy ]
-     |                                       |
-     |  [teleop.get_action] -> raw_action    |   [predict_action]
-     |          |                            |          |
-     |          V                            |          V
-     | [teleop_action_processor]             |          |
-     |          |                            |          |
-     '---> processed_teleop_action           '---> processed_policy_action
-     |                                       |
-     '-------------------------.-------------'
+     [ Teleoperator ]
+     |
+     |  [teleop.get_action] -> raw_action
+     |          |
+     |          V
+     | [teleop_action_processor]
+     |          |
+     '---> processed_teleop_action
                               V
                  [ robot_action_processor ] --> robot_action_to_send
                               V
@@ -303,13 +207,9 @@ def record_loop(
    ],  # runs after robot
    dataset: LeRobotDataset | None = None,
    teleop: Teleoperator | list[Teleoperator] | None = None,
-    policy: PreTrainedPolicy | None = None,
-    preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]] | None = None,
-    postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction] | None = None,
    control_time_s: int | None = None,
    single_task: str | None = None,
    display_data: bool = False,
-    interpolator: ActionInterpolator | None = None,
    display_compressed_images: bool = False,
 ):
    if dataset is not None and dataset.fps != fps:
@@ -340,21 +240,7 @@ def record_loop(
                "For multi-teleop, the list must contain exactly one KeyboardTeleop and one arm teleoperator. Currently only supported for LeKiwi robot."
            )

-    # Reset policy and processor if they are provided
-    if policy is not None and preprocessor is not None and postprocessor is not None:
-        policy.reset()
-        preprocessor.reset()
-        postprocessor.reset()
-
-    # Reset interpolator if provided
-    if interpolator is not None:
-        interpolator.reset()
-
-    # Calculate control interval based on interpolation
-    use_interpolation = interpolator is not None and interpolator.enabled and policy is not None
-    control_interval = interpolator.get_control_interval(fps) if interpolator else 1 / fps
-    # Pre-compute action key order outside the hot loop — it won't change mid-episode.
-    action_keys = sorted(robot.action_features) if use_interpolation else []
+    control_interval = 1 / fps

    no_action_count = 0
    timestamp = 0
@@ -372,63 +258,11 @@ def record_loop(
        # Applies a pipeline to the raw robot observation, default is IdentityProcessor
        obs_processed = robot_observation_processor(obs)

-        if policy is not None or dataset is not None:
+        if dataset is not None:
            observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)

-        # Track whether this iteration should be recorded to the dataset.
-        # Interpolated-only iterations send actions to the robot but don't record frames,
-        # keeping the dataset at the original fps while the robot moves at the higher rate.
-        is_record_frame = True
-
-        # Get action from either policy or teleop
-        if policy is not None and preprocessor is not None and postprocessor is not None:
-            # With interpolation: only call policy when interpolator needs new action
-            if use_interpolation:
-                ran_inference = False
-
-                if interpolator.needs_new_action():
-                    action_values = predict_action(
-                        observation=observation_frame,
-                        policy=policy,
-                        device=get_safe_torch_device(policy.config.device),
-                        preprocessor=preprocessor,
-                        postprocessor=postprocessor,
-                        use_amp=policy.config.use_amp,
-                        task=single_task,
-                        robot_type=robot.robot_type,
-                    )
-                    act_processed_policy = make_robot_action(action_values, dataset.features)
-                    robot_action_to_send = robot_action_processor((act_processed_policy, obs))
-
-                    action_tensor = torch.tensor([robot_action_to_send[k] for k in action_keys])
-                    interpolator.add(action_tensor)
-                    ran_inference = True
-
-                interp_action = interpolator.get()
-                if interp_action is not None:
-                    robot_action_to_send = {k: interp_action[i].item() for i, k in enumerate(action_keys)}
-                    action_values = robot_action_to_send
-                else:
-                    continue
-
-                is_record_frame = ran_inference
-            else:
-                action_values = predict_action(
-                    observation=observation_frame,
-                    policy=policy,
-                    device=get_safe_torch_device(policy.config.device),
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
-                    use_amp=policy.config.use_amp,
-                    task=single_task,
-                    robot_type=robot.robot_type,
-                )
-                act_processed_policy: RobotAction = make_robot_action(action_values, dataset.features)
-                # Applies a pipeline to the action, default is IdentityProcessor
-                robot_action_to_send = robot_action_processor((act_processed_policy, obs))
-                action_values = robot_action_to_send
-
-        elif policy is None and isinstance(teleop, Teleoperator):
+        # Get action from teleop
+        if isinstance(teleop, Teleoperator):
            act = teleop.get_action()
            if robot.name == "unitree_g1":
                teleop.send_feedback(obs)
@@ -438,7 +272,7 @@ def record_loop(
            action_values = act_processed_teleop
            robot_action_to_send = robot_action_processor((act_processed_teleop, obs))

-        elif policy is None and isinstance(teleop, list):
+        elif isinstance(teleop, list):
            arm_action = teleop_arm.get_action()
            arm_action = {f"arm_{k}": v for k, v in arm_action.items()}
            keyboard_action = teleop_keyboard.get_action()
@@ -451,7 +285,7 @@ def record_loop(
            no_action_count += 1
            if no_action_count == 1 or no_action_count % 10 == 0:
                logging.warning(
-                    "No policy or teleoperator provided, skipping action generation. "
+                    "No teleoperator provided, skipping action generation. "
                    "This is likely to happen when resetting the environment without a teleop device. "
                    "The robot won't be at its rest position at the start of the next episode."
                )
@@ -463,8 +297,8 @@ def record_loop(
        # TODO(steven, pepijn, adil): we should use a pipeline step to clip the action, so the sent action is the action that we input to the robot.
        _sent_action = robot.send_action(robot_action_to_send)

-        # Write to dataset (only on real policy frames, not interpolated-only iterations)
-        if dataset is not None and is_record_frame:
+        # Write to dataset
+        if dataset is not None:
            action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
            frame = {**observation_frame, **action_frame, "task": single_task}
            dataset.add_frame(frame)
@@ -488,7 +322,12 @@ def record_loop(


@parser.wrap()
-def record(cfg: RecordConfig) -> LeRobotDataset:
+def record(
+    cfg: RecordConfig,
+    teleop_action_processor: RobotProcessorPipeline | None = None,
+    robot_action_processor: RobotProcessorPipeline | None = None,
+    robot_observation_processor: RobotProcessorPipeline | None = None,
+) -> LeRobotDataset:
    init_logging()
    logging.info(pformat(asdict(cfg)))
    if cfg.display_data:
@@ -502,7 +341,16 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
    robot = make_robot_from_config(cfg.robot)
    teleop = make_teleoperator_from_config(cfg.teleop) if cfg.teleop is not None else None

-    teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
+    # Fall back to identity pipelines when the caller doesn't supply processors.
+    if (
+        teleop_action_processor is None
+        or robot_action_processor is None
+        or robot_observation_processor is None
+    ):
+        _t, _r, _o = make_default_processors()
+        teleop_action_processor = teleop_action_processor or _t
+        robot_action_processor = robot_action_processor or _r
+        robot_observation_processor = robot_observation_processor or _o

    dataset_features = combine_feature_dicts(
        aggregate_pipeline_dataset_features(
@@ -540,8 +388,14 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
            )
            sanity_check_dataset_robot_compatibility(dataset, robot, cfg.dataset.fps, dataset_features)
        else:
-            # Create empty dataset or load existing saved episodes
-            sanity_check_dataset_name(cfg.dataset.repo_id, cfg.policy)
+            # Reject eval_ prefix — for policy evaluation use lerobot-rollout
+            repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
+            if repo_name.startswith("eval_"):
+                raise ValueError(
+                    "Dataset names starting with 'eval_' are reserved for policy evaluation. "
+                    "lerobot-record is for data collection only. Use lerobot-rollout for policy deployment."
+                )
+            cfg.dataset.stamp_repo_id()
            dataset = LeRobotDataset.create(
                cfg.dataset.repo_id,
                cfg.dataset.fps,
@@ -558,30 +412,6 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
                encoder_threads=cfg.dataset.encoder_threads,
            )

-        # Load pretrained policy
-        policy = (
-            None
-            if cfg.policy is None
-            else make_policy(cfg.policy, ds_meta=dataset.meta, rename_map=cfg.dataset.rename_map)
-        )
-        preprocessor = None
-        postprocessor = None
-        interpolator = None
-        if cfg.policy is not None:
-            preprocessor, postprocessor = make_pre_post_processors(
-                policy_cfg=cfg.policy,
-                pretrained_path=cfg.policy.pretrained_path,
-                dataset_stats=rename_stats(dataset.meta.stats, cfg.dataset.rename_map),
-                preprocessor_overrides={
-                    "device_processor": {"device": cfg.policy.device},
-                    "rename_observations_processor": {"rename_map": cfg.dataset.rename_map},
-                },
-            )
-            # Create interpolator for smoother policy control
-            if cfg.interpolation_multiplier > 1:
-                interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
-                logging.info(f"Action interpolation enabled: {cfg.interpolation_multiplier}x control rate")
-
        robot.connect()
        if teleop is not None:
            teleop.connect()
@@ -605,14 +435,10 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
                    robot_action_processor=robot_action_processor,
                    robot_observation_processor=robot_observation_processor,
                    teleop=teleop,
-                    policy=policy,
-                    preprocessor=preprocessor,
-                    postprocessor=postprocessor,
                    dataset=dataset,
                    control_time_s=cfg.dataset.episode_time_s,
                    single_task=cfg.dataset.single_task,
                    display_data=cfg.display_data,
-                    interpolator=interpolator,
                    display_compressed_images=display_compressed_images,
                )

@@ -660,7 +486,10 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
            listener.stop()

        if cfg.dataset.push_to_hub:
-            dataset.push_to_hub(tags=cfg.dataset.tags, private=cfg.dataset.private)
+            if dataset and dataset.num_episodes > 0:
+                dataset.push_to_hub(tags=cfg.dataset.tags, private=cfg.dataset.private)
+            else:
+                logging.warning("No episodes saved — skipping push to hub")

        log_say("Exiting", cfg.play_sounds)
    return dataset
@@ -0,0 +1,211 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Policy deployment engine with pluggable rollout strategies.
+
+``lerobot-rollout`` is the single CLI for running trained policies on
+real robots.
+
+Strategies
+----------
+    --strategy.type=base       Autonomous rollout, no recording
+    --strategy.type=sentry     Continuous recording with auto-upload
+    --strategy.type=highlight  Ring buffer + keystroke save
+    --strategy.type=dagger     Human-in-the-loop (DAgger / RaC)
+
+Inference backends
+------------------
+    --inference.type=sync      One policy call per control tick (default)
+    --inference.type=rtc       Real-Time Chunking for slow VLA models
+
+Usage examples
+--------------
+::
+
+    # Base mode — quick evaluation with sync inference
+    lerobot-rollout \\
+        --strategy.type=base \\
+        --policy.path=lerobot/act_koch_real \\
+        --robot.type=koch_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --task="pick up cube" --duration=30
+
+    # Base mode — RTC inference for slow VLAs (Pi0, Pi0.5, SmolVLA)
+    lerobot-rollout \\
+        --strategy.type=base \\
+        --policy.path=lerobot/pi0_base \\
+        --inference.type=rtc \\
+        --inference.rtc.execution_horizon=10 \\
+        --inference.rtc.max_guidance_weight=10.0 \\
+        --robot.type=so100_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \\
+        --task="pick up cube" --duration=60
+
+    # Sentry mode — continuous recording with periodic upload
+    lerobot-rollout \\
+        --strategy.type=sentry \\
+        --strategy.upload_every_n_episodes=5 \\
+        --policy.path=lerobot/pi0_base \\
+        --inference.type=rtc \\
+        --robot.type=so100_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --dataset.repo_id=user/rollout_sentry_data \\
+        --dataset.single_task="patrol" --duration=3600
+
+    # Highlight mode — ring buffer, press 's' to save, 'h' to push
+    lerobot-rollout \\
+        --strategy.type=highlight \\
+        --strategy.ring_buffer_seconds=30 \\
+        --policy.path=lerobot/act_koch_real \\
+        --robot.type=koch_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --dataset.repo_id=user/rollout_highlight_data \\
+        --dataset.single_task="pick up cube"
+
+    # DAgger mode — human-in-the-loop corrections only
+    lerobot-rollout \\
+        --strategy.type=dagger \\
+        --strategy.num_episodes=20 \\
+        --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \\
+        --robot.type=bi_openarm_follower \\
+        --teleop.type=openarm_mini \\
+        --dataset.repo_id=user/rollout_hil_data \\
+        --dataset.single_task="Fold the T-shirt"
+
+    # DAgger mode — continuous recording with RTC inference
+    lerobot-rollout \\
+        --strategy.type=dagger \\
+        --strategy.record_autonomous=true \\
+        --strategy.num_episodes=50 \\
+        --inference.type=rtc \\
+        --inference.rtc.execution_horizon=10 \\
+        --policy.path=user/my_pi0_policy \\
+        --robot.type=so100_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --teleop.type=so101_leader \\
+        --teleop.port=/dev/ttyACM1 \\
+        --dataset.repo_id=user/rollout_dagger_rtc_data \\
+        --dataset.single_task="Grasp the block"
+
+    # With Rerun visualization and torch.compile
+    lerobot-rollout \\
+        --strategy.type=base \\
+        --policy.path=lerobot/act_koch_real \\
+        --robot.type=koch_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --task="pick up cube" --duration=60 \\
+        --display_data=true \\
+        --use_torch_compile=true
+
+    # Resume a previous sentry recording session
+    lerobot-rollout \\
+        --strategy.type=sentry \\
+        --policy.path=user/my_policy \\
+        --robot.type=so100_follower \\
+        --robot.port=/dev/ttyACM0 \\
+        --dataset.repo_id=user/rollout_sentry_data \\
+        --dataset.single_task="patrol" \\
+        --resume=true
+"""
+
+import logging
+
+from lerobot.cameras.opencv import OpenCVCameraConfig  # noqa: F401
+from lerobot.cameras.realsense import RealSenseCameraConfig  # noqa: F401
+from lerobot.cameras.zmq import ZMQCameraConfig  # noqa: F401
+from lerobot.configs import parser
+from lerobot.robots import (  # noqa: F401
+    Robot,
+    RobotConfig,
+    bi_openarm_follower,
+    bi_so_follower,
+    earthrover_mini_plus,
+    hope_jr,
+    koch_follower,
+    omx_follower,
+    openarm_follower,
+    reachy2,
+    so_follower,
+    unitree_g1 as unitree_g1_robot,
+)
+from lerobot.rollout import RolloutConfig, build_rollout_context, create_strategy
+from lerobot.teleoperators import (  # noqa: F401
+    Teleoperator,
+    TeleoperatorConfig,
+    bi_openarm_leader,
+    bi_so_leader,
+    homunculus,
+    koch_leader,
+    omx_leader,
+    openarm_leader,
+    openarm_mini,
+    reachy2_teleoperator,
+    so_leader,
+    unitree_g1,
+)
+from lerobot.utils.import_utils import register_third_party_plugins
+from lerobot.utils.process import ProcessSignalHandler
+from lerobot.utils.utils import init_logging
+from lerobot.utils.visualization_utils import init_rerun
+
+logger = logging.getLogger(__name__)
+
+
+@parser.wrap()
+def rollout(cfg: RolloutConfig):
+    """Main entry point for policy deployment."""
+    init_logging()
+
+    if cfg.display_data:
+        logger.info("Initializing Rerun visualization (ip=%s, port=%s)", cfg.display_ip, cfg.display_port)
+        init_rerun(session_name="rollout", ip=cfg.display_ip, port=cfg.display_port)
+
+    signal_handler = ProcessSignalHandler(use_threads=True, display_pid=False)
+    shutdown_event = signal_handler.shutdown_event
+
+    logger.info("Building rollout context...")
+    ctx = build_rollout_context(cfg, shutdown_event)
+
+    strategy = create_strategy(cfg.strategy)
+    logger.info("Rollout strategy: %s", cfg.strategy.type)
+    logger.info(
+        "Robot: %s | FPS: %.0f | Duration: %s",
+        cfg.robot.type if cfg.robot else "?",
+        cfg.fps,
+        f"{cfg.duration}s" if cfg.duration > 0 else "infinite",
+    )
+
+    try:
+        strategy.setup(ctx)
+        logger.info("Rollout setup complete, starting rollout...")
+        strategy.run(ctx)
+    except KeyboardInterrupt:
+        logger.info("Interrupted by user")
+    finally:
+        strategy.teardown(ctx)
+
+    logger.info("Rollout finished")
+
+
+def main():
+    """CLI entry point for ``lerobot-rollout``."""
+    register_third_party_plugins()
+    rollout()
+
+
+if __name__ == "__main__":
+    main()
@@ -104,11 +104,14 @@ class KeyboardTeleop(Teleoperator):

    def _on_press(self, key):
        if hasattr(key, "char"):
-            self.event_queue.put((key.char, True))
+            key = key.char
+        self.event_queue.put((key, True))

    def _on_release(self, key):
        if hasattr(key, "char"):
-            self.event_queue.put((key.char, False))
+            key = key.char
+        self.event_queue.put((key, False))
+
        if key == keyboard.Key.esc:
            logging.info("ESC pressed, disconnecting.")
            self.disconnect()
@@ -204,8 +207,6 @@ class KeyboardEndEffectorTeleop(KeyboardTeleop):
                # this is useful for retrieving other events like interventions for RL, episode success, etc.
                self.misc_keys_queue.put(key)

-        self.current_pressed.clear()
-
        action_dict = {
            "delta_x": delta_x,
            "delta_y": delta_y,
@@ -256,6 +257,8 @@ class KeyboardEndEffectorTeleop(KeyboardTeleop):
        ]
        is_intervention = any(self.current_pressed.get(key, False) for key in movement_keys)

+        self.current_pressed.clear()
+
        # Check for episode control commands from misc_keys_queue
        terminate_episode = False
        success = False
@@ -20,6 +20,7 @@ from .config_so_leader import (
    SOLeaderConfig,
    SOLeaderTeleopConfig,
 )
+from .so101_leader_follower import SO101LeaderFollower
 from .so_leader import SO100Leader, SO101Leader, SOLeader

 __all__ = [
@@ -27,6 +28,7 @@ __all__ = [
    "SO100LeaderConfig",
    "SO101Leader",
    "SO101LeaderConfig",
+    "SO101LeaderFollower",
    "SOLeader",
    "SOLeaderConfig",
    "SOLeaderTeleopConfig",
@@ -29,6 +29,11 @@ class SOLeaderConfig:
    # Whether to use degrees for angles
    use_degrees: bool = True

+    # Enable leader-follower mode where leader can both lead and follow
+    leader_follower_mode: bool = False
+
+    use_gripper: bool = True
+

@TeleoperatorConfig.register_subclass("so101_leader")
@TeleoperatorConfig.register_subclass("so100_leader")
@@ -0,0 +1,261 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import sys
+import time
+from collections import deque
+from threading import Event, Thread
+
+import numpy as np
+
+from lerobot.teleoperators.so_leader.so_leader import SOLeader as SO101Leader
+from lerobot.teleoperators.utils import TeleopEvents
+
+PYNPUT_AVAILABLE = True
+try:
+    if ("DISPLAY" not in os.environ) and ("linux" in sys.platform):
+        logging.info("No DISPLAY set. Skipping pynput import.")
+        raise ImportError("pynput blocked intentionally due to no display.")
+
+    from pynput import keyboard
+except ImportError:
+    keyboard = None
+    PYNPUT_AVAILABLE = False
+except Exception as e:
+    keyboard = None
+    PYNPUT_AVAILABLE = False
+    logging.info(f"Could not import pynput: {e}")
+
+logger = logging.getLogger(__name__)
+
+
+class SO101LeaderFollower(SO101Leader):
+    """
+    Extended SO101 Leader that can both lead (human control) and follow (mimic follower).
+
+    This class adds leader-follower functionality where:
+    - In follow mode: The leader arm mimics the follower's position (torque enabled)
+    - In lead mode: Human controls the leader (torque disabled) and provides actions
+    """
+
+    def __init__(self, config):
+        super().__init__(config)
+
+        # Leader-follower state
+        self.is_intervening = False
+        # Initialize as False because configure() disables torque at connect time;
+        # send_action() will re-enable it on the first call when not intervening.
+        self.leader_torque_enabled = False
+
+        # Tracking error for automatic intervention detection
+        self.leader_tracking_error_queue = deque(maxlen=4)
+
+        # Keyboard event handling
+        self.keyboard_events = {
+            "intervention": False,
+            "success": False,
+            "failure": False,
+            "rerecord": False,
+        }
+        self.keyboard_thread = None
+        self.stop_event = Event()
+
+        # Store last follower position for action computation
+        self.last_follower_pos = None
+
+    @property
+    def action_features(self) -> dict:
+        if self.config.use_gripper:
+            return {
+                "dtype": "float32",
+                "shape": (7,),
+                "names": {
+                    "delta_x": 0,
+                    "delta_y": 1,
+                    "delta_z": 2,
+                    "delta_wx": 3,
+                    "delta_wy": 4,
+                    "delta_wz": 5,
+                    "gripper": 6,
+                },
+            }
+        else:
+            return {
+                "dtype": "float32",
+                "shape": (6,),
+                "names": {
+                    "delta_x": 0,
+                    "delta_y": 1,
+                    "delta_z": 2,
+                    "delta_wx": 3,
+                    "delta_wy": 4,
+                    "delta_wz": 5,
+                },
+            }
+
+    def connect(self, calibrate: bool = True) -> None:
+        """Connect and configure for leader-follower mode."""
+        super().connect(calibrate)
+
+        # Configure for leader-follower mode with lower gains
+        # Lower gains allow manual intervention without injury risk
+        # self.bus.sync_write("Torque_Enable", 1)
+        for motor in self.bus.motors:
+            self.bus.write("P_Coefficient", motor, 16)
+            self.bus.write("I_Coefficient", motor, 0)
+            self.bus.write("D_Coefficient", motor, 16)
+
+        # Start keyboard listener
+        self._start_keyboard_listener()
+
+        print("- Leader-Follower Mode:")
+        print("  - Press SPACE to toggle intervention (leader control)")
+        print("  - When not intervening, leader follows follower position")
+        print("  - When intervening, follower follows leader in end-effector space")
+        print("  - Press 's' to mark episode as success")
+        print("  - Press ESC to end episode as failure")
+        print("  - Press 'r' to re-record episode")
+
+    def _start_keyboard_listener(self):
+        """Start keyboard listener thread for intervention control."""
+
+        def on_press(key):
+            try:
+                if key == keyboard.Key.space:
+                    self.keyboard_events["intervention"] = not self.keyboard_events["intervention"]
+                    self.is_intervening = self.keyboard_events["intervention"]
+                    state = "INTERVENTION MODE" if self.is_intervening else "FOLLOWING MODE"
+                    logger.info(f"Toggled to {state}")
+                elif key == keyboard.Key.esc:
+                    self.keyboard_events["failure"] = True
+                elif hasattr(key, "char"):
+                    if key.char == "s":
+                        self.keyboard_events["success"] = True
+                    elif key.char == "r":
+                        self.keyboard_events["rerecord"] = True
+            except Exception as e:
+                logger.error(f"Error handling key press: {e}")
+
+        def listen():
+            with keyboard.Listener(on_press=on_press) as listener:
+                while not self.stop_event.is_set():
+                    time.sleep(0.1)
+                listener.stop()
+
+        self.keyboard_thread = Thread(target=listen, daemon=True)
+        self.keyboard_thread.start()
+
+    def send_action(self, action: dict[str, float]) -> None:
+        """
+        Send position commands to leader arm (follow mode).
+
+        Args:
+            action: Dictionary of motor positions to command
+        """
+        # Store follower position for later use
+        self.last_follower_pos = np.array([action.get(f"{motor}.pos", 0) for motor in self.bus.motors])
+
+        if not self.is_intervening:
+            # Follow mode: enable torque and track follower
+            if not self.leader_torque_enabled:
+                self.bus.sync_write("Torque_Enable", 1)
+                self.leader_torque_enabled = True
+
+            # Send follower positions to leader
+            goal_pos = {motor: action[f"{motor}.pos"] for motor in self.bus.motors}
+            self.bus.sync_write("Goal_Position", goal_pos)
+
+            # Track error for automatic intervention detection
+            current_pos = self.bus.sync_read("Present_Position")
+            current_array = np.array([current_pos[motor] for motor in self.bus.motors])
+            error = np.linalg.norm(self.last_follower_pos[:-1] - current_array[:-1])
+            self.leader_tracking_error_queue.append(error)
+
+    def get_action(self) -> dict[str, float]:
+        """
+        Get action from leader arm.
+
+        In follow mode: Returns neutral/current positions
+        In lead mode: Returns actual leader positions for follower to track
+        """
+        start = time.perf_counter()
+
+        if self.is_intervening:
+            # Lead mode: disable torque if needed and return leader positions
+            if self.leader_torque_enabled:
+                self.bus.sync_write("Torque_Enable", 0)
+                self.leader_torque_enabled = False
+
+            # Get current leader position
+            action = self.bus.sync_read("Present_Position")
+            action = {f"{motor}.pos": val for motor, val in action.items()}
+
+            # Track error
+            if self.last_follower_pos is not None:
+                current_array = np.array([action[f"{motor}.pos"] for motor in self.bus.motors])
+                error = np.linalg.norm(self.last_follower_pos[:-1] - current_array[:-1])
+                self.leader_tracking_error_queue.append(error)
+        else:
+            # Follow mode: return current/neutral positions
+            action = self.bus.sync_read("Present_Position")
+            action = {f"{motor}.pos": val for motor, val in action.items()}
+
+        dt_ms = (time.perf_counter() - start) * 1e3
+        logger.debug(f"{self} read action: {dt_ms:.1f}ms")
+        return action
+
+    def get_teleop_events(self) -> dict[TeleopEvents, bool]:
+        """Get current keyboard events."""
+        events = {}
+
+        # Map keyboard events to TeleopEvents
+        if self.keyboard_events["success"]:
+            events[TeleopEvents.SUCCESS] = True
+            self.keyboard_events["success"] = False
+        if self.keyboard_events["failure"]:
+            events[TeleopEvents.FAILURE] = True
+            events[TeleopEvents.TERMINATE_EPISODE] = True
+            self.keyboard_events["failure"] = False
+        if self.keyboard_events["rerecord"]:
+            events[TeleopEvents.RERECORD_EPISODE] = True
+            events[TeleopEvents.TERMINATE_EPISODE] = True
+            self.keyboard_events["rerecord"] = False
+
+        # Always report intervention state
+        events[TeleopEvents.IS_INTERVENTION] = self.is_intervening
+
+        return events
+
+    def disconnect(self) -> None:
+        """Disconnect and cleanup."""
+        self.stop_event.set()
+        if self.keyboard_thread:
+            self.keyboard_thread.join(timeout=1.0)
+        super().disconnect()
+
+    def reset(self) -> None:
+        """Reset leader-follower state."""
+        self.is_intervening = False
+        self.leader_torque_enabled = True
+        self.leader_tracking_error_queue.clear()
+        self.keyboard_events = {
+            "intervention": False,
+            "success": False,
+            "failure": False,
+            "rerecord": False,
+        }
@@ -52,9 +52,10 @@ def make_teleoperator_from_config(config: TeleoperatorConfig) -> "Teleoperator":

        return SO100Leader(config)
    elif config.type == "so101_leader":
-        from .so_leader import SO101Leader
+        from .so_leader import SO101LeaderFollower

-        return SO101Leader(config)
+        if getattr(config, "leader_follower_mode", False):
+            return SO101LeaderFollower(config)
    elif config.type == "mock_teleop":
        from tests.mocks.mock_teleop import MockTeleop

@@ -39,8 +39,8 @@ For more details, see the [Physical Intelligence π₀ blog post](https://www.ph
 π₀.₅ represents a significant evolution from π₀, developed by Physical Intelligence to address a big challenge in robotics: open-world generalization. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.

 For more details, see the [Physical Intelligence π₀.₅ blog post](https://www.physicalintelligence.company/blog/pi05).
-{% elif model_name == "sac" %}
-[Soft Actor-Critic (SAC)](https://huggingface.co/papers/1801.01290) is an entropy-regularised actor-critic algorithm offering stable, sample-efficient learning in continuous-control environments.
+{% elif model_name == "gaussian_actor" %}
+This is a Gaussian Actor policy (Gaussian policy with a tanh squash) — the policy-side component used by [Soft Actor-Critic (SAC)](https://huggingface.co/papers/1801.01290) and related maximum-entropy continuous-control algorithms.
 {% elif model_name == "reward_classifier" %}
 A reward classifier is a lightweight neural network that scores observations or trajectories for task success, providing a learned reward signal or offline evaluation when explicit rewards are unavailable.
 {% else %}
@@ -0,0 +1,116 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Action interpolation for smoother robot control.
+
+Provides configurable Nx control rate by interpolating between consecutive actions.
+Useful with RTC and action-chunking policies to reduce jerkiness.
+"""
+
+from torch import Tensor
+
+
+class ActionInterpolator:
+    """Interpolates between consecutive actions for smoother control.
+
+    When enabled with multiplier N, produces N actions per policy action
+    by linearly interpolating between the previous and current action.
+
+    Example with multiplier=3:
+        prev_action -> [1/3 interpolated, 2/3 interpolated, current_action]
+
+    This effectively multiplies the control rate for smoother motion.
+
+    Usage:
+        interpolator = ActionInterpolator(multiplier=2)  # 2x control rate
+
+        # In control loop:
+        if interpolator.needs_new_action():
+            new_action = queue.get()
+            if new_action:
+                interpolator.add(new_action.cpu())
+
+        action = interpolator.get()
+        if action:
+            robot.send_action(action)
+    """
+
+    def __init__(self, multiplier: int = 1):
+        """Initialize the interpolator.
+
+        Args:
+            multiplier: Control rate multiplier (1 = no interpolation, 2 = 2x, 3 = 3x, etc.)
+        """
+        if multiplier < 1:
+            raise ValueError(f"multiplier must be >= 1, got {multiplier}")
+        self.multiplier = multiplier
+        self._prev: Tensor | None = None
+        self._buffer: list[Tensor] = []
+        self._idx = 0
+
+    @property
+    def enabled(self) -> bool:
+        """Whether interpolation is active (multiplier > 1)."""
+        return self.multiplier > 1
+
+    def reset(self):
+        """Reset interpolation state (call between episodes)."""
+        self._prev = None
+        self._buffer = []
+        self._idx = 0
+
+    def needs_new_action(self) -> bool:
+        """Check if a new action is needed from the queue."""
+        return self._idx >= len(self._buffer)
+
+    def add(self, action: Tensor) -> None:
+        """Add a new action and compute interpolated sequence.
+
+        Args:
+            action: New action tensor from policy/queue (already on CPU).
+        """
+        if self.multiplier > 1 and self._prev is not None:
+            self._buffer = []
+            for i in range(1, self.multiplier + 1):
+                t = i / self.multiplier
+                interp = self._prev + t * (action - self._prev)
+                self._buffer.append(interp)
+        else:
+            # First step: no previous action yet, so run at base FPS without interpolation.
+            self._buffer = [action.clone()]
+        self._prev = action.clone()
+        self._idx = 0
+
+    def get(self) -> Tensor | None:
+        """Get the next interpolated action.
+
+        Returns:
+            Next action tensor, or None if buffer is exhausted.
+        """
+        if self._idx >= len(self._buffer):
+            return None
+        action = self._buffer[self._idx]
+        self._idx += 1
+        return action
+
+    def get_control_interval(self, fps: float) -> float:
+        """Get the control interval based on interpolation multiplier.
+
+        Args:
+            fps: Base frames per second.
+
+        Returns:
+            Control interval in seconds (divided by multiplier).
+        """
+        return 1.0 / (fps * self.multiplier)
@@ -0,0 +1,83 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Generic foot pedal listener using evdev.
+
+Callers supply a callback receiving the pressed key code (e.g. ``"KEY_A"``)
+and an optional device path.  The listener runs in a daemon thread and
+silently no-ops when :mod:`evdev` is not installed or the device is
+unavailable.  Strategy-specific key mapping logic lives in the caller.
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+from collections.abc import Callable
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_PEDAL_DEVICE = "/dev/input/by-id/usb-PCsensor_FootSwitch-event-kbd"
+
+
+def start_pedal_listener(
+    on_press: Callable[[str], None],
+    device_path: str = DEFAULT_PEDAL_DEVICE,
+) -> threading.Thread | None:
+    """Spawn a daemon thread that forwards pedal key-press codes to ``on_press``.
+
+    Parameters
+    ----------
+    on_press:
+        Callback invoked with the pressed key code string (e.g. ``"KEY_A"``)
+        on each pedal press event.  The callback runs in the listener thread
+        and must be thread-safe.
+    device_path:
+        Linux input device path (e.g. ``/dev/input/by-id/...``).
+
+    Returns
+    -------
+    The started daemon :class:`threading.Thread`, or ``None`` when
+    :mod:`evdev` is not installed (optional dependency; silent no-op).
+    """
+    try:
+        from evdev import InputDevice, categorize, ecodes
+    except ImportError:
+        return None
+
+    def pedal_reader() -> None:
+        try:
+            dev = InputDevice(device_path)
+            logger.info("Pedal connected: %s", dev.name)
+            for ev in dev.read_loop():
+                if ev.type != ecodes.EV_KEY:
+                    continue
+                key = categorize(ev)
+                code = key.keycode
+                if isinstance(code, (list, tuple)):
+                    code = code[0]
+                if key.keystate != 1:  # only key-down events
+                    continue
+                try:
+                    on_press(code)
+                except Exception as cb_err:  # pragma: no cover - defensive
+                    logger.warning("Pedal callback error: %s", cb_err)
+        except (FileNotFoundError, PermissionError):
+            pass
+        except Exception as e:
+            logger.warning("Pedal error: %s", e)
+
+    thread = threading.Thread(target=pedal_reader, daemon=True, name="PedalListener")
+    thread.start()
+    return thread
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Khalil Meftah	5c444302c1	feat(so_follower): synchronize goal position with present position to prevent positional error during torque re-enablement	2026-04-28 18:40:48 +02:00
Khalil Meftah	c868f874f1	feat(teleop): enhance leader-follower behavior and torque management in SO101 teleoperation	2026-04-28 17:46:06 +02:00
Khalil Meftah	e228f0880f	feat(teleop): add SO100/SO101 leader-follower teleoperation example fix: update import for SO101Leader in so101_leader_follower.py chore: include SO101LeaderFollower in exports	2026-04-28 17:28:15 +02:00
Khalil Meftah	fe2c32d9e7	add so leader arm	2026-04-28 16:53:36 +02:00
Khalil Meftah	6ed80f5a59	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor # Conflicts: # src/lerobot/policies/__init__.py # src/lerobot/rl/actor.py	2026-04-28 12:04:13 +02:00
Khalil Meftah	ef6b3b5b0f	refactor: simplify docstrings for clarity and conciseness across multiple files	2026-04-28 11:11:02 +02:00
Steven Palma	ca87ccd941	feat(rollout): decouple policy deployment from data recording with new `lerobot-rollout` CLI (#3413 ) * feat(scripts): lerobot-rollout * fix(rollout) require dataset in dagger + use duration too * fix(docs): dagger num_episodes * test(rollout): fix expectations * fix(rollout): features check * fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config * docs(rollout): edit rename_map instructions * chore(rollout): multiple minor improvements * chore(rollout): address coments + minor improvements * fix(rollout): enable default * fix(tests): default value RTCConfig * fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): prevent relativeactions with sync inference engine Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): rtc reanchor to non normalized state Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): fixing the episode length to use hwc (#3469) also reducing default length to 5 minutes * feat(rollout): go back to initial position is now a config * fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470) * chore(rollout): note about dagger correction stage * chore(docs): update comments and docstring * fix(test): move rtc relative out of rollout module * fix(rollout): address the review comments --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-28 00:57:35 +02:00
Steven Palma	77352c495c	chore(dependencies): update uv.lock (#3437 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-04-27 23:15:46 +02:00
Khalil Meftah	e298474bf3	fix(tests): gate RL tests on the `datasets` extra	2026-04-27 16:53:34 +02:00
Khalil Meftah	577f14337a	refactor(tests): remove grpc import checks from test files for cleaner code	2026-04-27 16:20:13 +02:00
Khalil Meftah	47be90f040	refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility	2026-04-27 15:59:59 +02:00
Khalil Meftah	47dd65347e	refactor(rl): add type property to RLAlgorithmConfig for better clarity	2026-04-27 15:57:24 +02:00
Khalil Meftah	fd5a788120	refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation	2026-04-27 15:55:16 +02:00
Khalil Meftah	9ce9e01469	refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable	2026-04-27 13:39:03 +02:00
Khalil Meftah	21c16a27f0	Revert "perf(observation_processor): add CUDA support for image processing" This reverts commit `38b88c414c`.	2026-04-27 11:52:19 +02:00
Khalil Meftah	b3164543f4	fix(rl): enhance intervention handling in actor and learner (cherry picked from commit `ef8bfffbd7`)	2026-04-27 11:35:21 +02:00
Khalil Meftah	f3993cbbb1	fix(rl): improve action processing for discrete and continuous actions (cherry picked from commit `f887ab3f6a`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	c278cfa026	fix(rl): postprocess action in actor (cherry picked from commit `c2556439e5`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	77d18659b1	fix(rl): mirror gym_manipulator in actor (cherry picked from commit `d2a046dfc5`)	2026-04-27 11:35:19 +02:00
Khalil Meftah	6347edefb1	fix(rl): merge environment and action-processor info in transition processing (cherry picked from commit `30e1886b64`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	eda47eca18	fix(rl): update neutral gripper action (cherry picked from commit `9c9064e5be`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	a64e6f5070	fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100 (cherry picked from commit `494f469a2b`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	3def86c2c3	fix(rl): add time limit processor to environment pipeline (cherry picked from commit `cd105f65cb`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	356a64d8c4	fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline (cherry picked from commit `9c2af818ff`)	2026-04-27 11:35:16 +02:00
Steven Palma	05a5223885	fix(pi): avoid peak RAM in PiGemma construction by freeing replaced submodules (#3454 ) Co-Authored-By: Daiki Kamata <daiki.kamata@access-company.com> Co-Authored-By: Jack Vial <jackvial@users.noreply.github.com> Co-Authored-By: Ajay Anubolu <AjAnubolu@users.noreply.github.com> Co-Authored-By: Finn F. <F-Fer@users.noreply.github.com>	2026-04-24 17:50:12 +02:00
Khalil Meftah	38b88c414c	perf(observation_processor): add CUDA support for image processing	2026-04-24 13:36:26 +02:00
Khalil Meftah	1ed32210c7	refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic	2026-04-24 13:18:33 +02:00
Khalil Meftah	06255996ea	refactor(policies): rename policies/sac → policies/gaussian_actor	2026-04-23 19:13:18 +02:00
Khalil Meftah	8065bf15c7	fix test for flat dict structure	2026-04-21 12:06:25 +02:00
Khalil Meftah	8191d2d87f	remove unused type alias	2026-04-21 11:56:27 +02:00
Khalil Meftah	6b93f31238	fix docstring	2026-04-21 11:55:17 +02:00
Khalil Meftah	a4c0c9e358	update losses names in tests	2026-04-21 11:53:32 +02:00
Khalil Meftah	a84b0e8132	refactor(sac): decouple algorithm hyperparameters from policy config	2026-04-18 16:40:56 +02:00
Khalil Meftah	2487a6ee6d	perf(rl): use async iterators in OnlineOfflineMixer.get_iterator	2026-04-18 16:02:28 +02:00
Khalil Meftah	72fb0faf62	refactor(sac): simplify optimizer return structure	2026-04-18 15:45:22 +02:00
Khalil Meftah	2c97cb23c8	refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity	2026-04-18 15:39:32 +02:00
Khalil Meftah	87d4c9879c	fix(sac): clarify torch.compile status	2026-04-18 15:19:35 +02:00
Khalil Meftah	e4c1a8472d	fix(config): update vision encoder model name to lerobot/resnet10	2026-04-18 15:15:59 +02:00
Khalil Meftah	d7e25c8326	refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages	2026-04-16 15:46:34 +02:00
Khalil Meftah	a5ad273b62	fix(tests): skip tests that require grpc if not available	2026-04-15 16:30:20 +02:00
Khalil Meftah	23bece96a4	fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests	2026-04-15 16:12:08 +02:00
Khalil Meftah	7a1c9e74c3	fix: skip tests that require grpc if not available	2026-04-15 15:18:04 +02:00
Khalil Meftah	c88cf979f1	fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error	2026-04-15 11:49:38 +02:00
Khalil Meftah	79a9ebdaa6	fix: add try/finally to control_loop to ensure image writer cleanup on exit	2026-04-14 17:54:35 +02:00
Khalil Meftah	da6e36fd03	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor	2026-04-14 17:14:56 +02:00
Khalil Meftah	64dc08cb7b	fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer	2026-04-14 16:35:08 +02:00
Khalil Meftah	e6d282108d	Fix: add kwargs in reward classifier __init__()	2026-04-14 11:13:43 +02:00
Khalil Meftah	a8838c081b	perf: remove redundant CPU→GPU→CPU transition move in learner	2026-04-13 19:06:28 +02:00
Khalil Meftah	ee0814ef60	refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference	2026-04-13 18:31:17 +02:00
Khalil Meftah	7b0bdf2a98	fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()	2026-04-13 18:27:24 +02:00
Khalil Meftah	9422dc98c2	fix: remove leftover normalization calls from reward classifier predict_reward Fixes #2355	2026-04-13 13:30:50 +02:00
Khalil Meftah	11a0b0174f	fix(teleop): keyboard EE teleop not registering special keys and losing intervention state Fixes #2345 Co-authored-by: jpizarrom <jpizarrom@gmail.com>	2026-04-13 12:31:00 +02:00
Khalil Meftah	036b310a97	chore: clarify torch.compile disabled note in SACAlgorithm	2026-04-13 11:49:27 +02:00
Khalil Meftah	e022207c75	refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring	2026-04-13 11:39:48 +02:00