feat(depth maps writer): adding support for raw depth maps recording with image writer

fix(viz): anchor rerun DepthImage colormap to encoder depth range
feat(viz): render depth observations as rr.DepthImage in Viridis
2026-06-16 15:57:03 +00:00 · 2026-05-01 00:49:09 +02:00 · 2026-05-01 00:49:09 +02:00 · 2026-05-01 00:49:09 +02:00 · 2026-05-01 00:49:09 +02:00 · 2026-05-01 00:49:09 +02:00
127 changed files with 6020 additions and 1731 deletions
@@ -33,7 +33,7 @@ jobs:
      github.event.workflow_run.event == 'pull_request' &&
      github.event.workflow_run.conclusion == 'success' &&
      github.repository == 'huggingface/lerobot'
-    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@9ad2de8582b56c017cb530c1165116d40433f1c6  # main
+    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
    with:
      package_name: lerobot
    secrets:
@@ -55,7 +55,7 @@ jobs:
      github.repository == 'huggingface/lerobot'
    permissions:
      contents: read
-    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
    with:
      commit_sha: ${{ github.sha }}
      package: lerobot
@@ -78,7 +78,7 @@ jobs:
    permissions:
      contents: read
      pull-requests: write
-    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
    with:
      commit_sha: ${{ github.event.pull_request.head.sha }}
      pr_number: ${{ github.event.number }}
@@ -1,3 +1,4 @@
 include src/lerobot/templates/lerobot_modelcard_template.md
+include src/lerobot/templates/lerobot_rewardmodel_modelcard_template.md
 include src/lerobot/datasets/card_template.md
 include src/lerobot/envs/metaworld_config.json
@@ -39,6 +39,7 @@ from tqdm import tqdm

 from lerobot.datasets.lerobot_dataset import LeRobotDataset
 from lerobot.datasets.video_utils import (
+    VideoEncoderConfig,
    decode_video_frames,
    encode_video_frames,
 )
@@ -251,10 +252,13 @@ def benchmark_encoding_decoding(
            imgs_dir=imgs_dir,
            video_path=video_path,
            fps=fps,
-            vcodec=encoding_cfg["vcodec"],
-            pix_fmt=encoding_cfg["pix_fmt"],
-            g=encoding_cfg.get("g"),
-            crf=encoding_cfg.get("crf"),
+            camera_encoder_config=VideoEncoderConfig(
+                vcodec=encoding_cfg["vcodec"],
+                pix_fmt=encoding_cfg["pix_fmt"],
+                g=encoding_cfg.get("g"),
+                crf=encoding_cfg.get("crf"),
+                preset=encoding_cfg.get("preset"),
+            ),
            # fast_decode=encoding_cfg.get("fastdecode"),
            overwrite=True,
        )
@@ -90,6 +90,6 @@ lerobot-record \
  --dataset.single_task="Your task description" \
  --dataset.streaming_encoding=true \
  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
+  # --dataset.camera_encoder_config.vcodec=auto \
  --policy.path=${HF_USER}/act_policy
 ```
@@ -194,7 +194,7 @@ lerobot-record \
    --dataset.single_task="Navigate around obstacles" \
    --dataset.streaming_encoding=true \
    --dataset.encoder_threads=2 \
-    # --dataset.vcodec=auto \
+    # --dataset.camera_encoder_config.vcodec=auto \
    --display_data=true
 ```

@@ -123,7 +123,7 @@ lerobot-record \
  --dataset.single_task="Grab and handover the red cube to the other arm" \
  --dataset.streaming_encoding=true \
  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
+  # --dataset.camera_encoder_config.vcodec=auto \
  --policy.path=<user>/groot-bimanual \ # your trained model
  --dataset.episode_time_s=30 \
  --dataset.reset_time_s=10
@@ -232,7 +232,7 @@ lerobot-record \
    --dataset.private=true \
    --dataset.streaming_encoding=true \
    --dataset.encoder_threads=2 \
-    # --dataset.vcodec=auto \
+    # --dataset.camera_encoder_config.vcodec=auto \
    --display_data=true
 ```

@@ -278,6 +278,6 @@ lerobot-record \
  --dataset.num_episodes=10 \
  --dataset.streaming_encoding=true \
  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
+  # --dataset.camera_encoder_config.vcodec=auto \
  --policy.path=outputs/train/hopejr_hand/checkpoints/last/pretrained_model
 ```
@@ -193,7 +193,7 @@ lerobot-record \
    --dataset.num_episodes=5 \
    --dataset.single_task="Grab the black cube" \
    --dataset.streaming_encoding=true \
-    # --dataset.vcodec=auto \
+    # --dataset.camera_encoder_config.vcodec=auto \
    --dataset.encoder_threads=2
 ```
 </hfoption>
@@ -43,7 +43,7 @@ lerobot-record \
  --dataset.num_episodes=5 \
  --dataset.single_task="Grab the black cube" \
  --dataset.streaming_encoding=true \
-  # --dataset.vcodec=auto \
+  # --dataset.camera_encoder_config.vcodec=auto \
  --dataset.encoder_threads=2
 ```

@@ -161,7 +161,7 @@ lerobot-record \
    --dataset.private=true \
    --dataset.streaming_encoding=true \
    --dataset.encoder_threads=2 \
-    # --dataset.vcodec=auto \
+    # --dataset.camera_encoder_config.vcodec=auto \
    --display_data=true
 ```

@@ -203,7 +203,7 @@ lerobot-record \
    --dataset.private=true \
    --dataset.streaming_encoding=true \
    --dataset.encoder_threads=2 \
-    # --dataset.vcodec=auto \
+    # --dataset.camera_encoder_config.vcodec=auto \
    --display_data=true
 ```

@@ -46,7 +46,7 @@ This ensures identical task states map to consistent progress values, even acros

 ## Inputs and Targets (What the new code expects)

-SARM is trained through its processor (`src/lerobot/policies/sarm/processor_sarm.py`), which:
+SARM is trained through its processor (`src/lerobot/rewards/sarm/processor_sarm.py`), which:

 - **Encodes** images and task text with CLIP (ViT-B/32) into `video_features` and `text_features`
 - **Pads/truncates** robot state into `state_features` (up to `max_state_dim`)
@@ -347,7 +347,7 @@ Use `compute_rabc_weights.py` with `--visualize-only` to visualize model predict
 <hfoption id="single_stage">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -360,7 +360,7 @@ python src/lerobot/policies/sarm/compute_rabc_weights.py \
 <hfoption id="dense_only">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -373,7 +373,7 @@ python src/lerobot/policies/sarm/compute_rabc_weights.py \
 <hfoption id="dual">

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --visualize-only \
@@ -429,7 +429,7 @@ The weighting follows **Equations 8-9** from the paper:
 First, run the SARM model on all frames in your dataset to compute progress values:

 ```bash
-python src/lerobot/policies/sarm/compute_rabc_weights.py \
+python -m lerobot.rewards.sarm.compute_rabc_weights \
  --dataset-repo-id your-username/your-dataset \
  --reward-model-path your-username/sarm-model \
  --head-mode sparse \
@@ -465,15 +465,15 @@ This script:

 ### Step 5b: Train Policy with RA-BC

-Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`). Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:
+Once you have the progress file, train your policy with RA-BC weighting. The progress file is auto-detected from the dataset path (`sarm_progress.parquet`) if not explicitly provided. Currently PI0, PI0.5 and SmolVLA are supported with RA-BC:

 ```bash
 lerobot-train \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --use_rabc=true \
-  --rabc_head_mode=sparse \
-  --rabc_kappa=0.01 \
+  --sample_weighting.type=rabc \
+  --sample_weighting.head_mode=sparse \
+  --sample_weighting.kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -488,12 +488,13 @@ The training script automatically:

 **RA-BC Arguments:**

-| Argument               | Description                                                | Default                            |
-| ---------------------- | ---------------------------------------------------------- | ---------------------------------- |
-| `--use_rabc`           | Enable RA-BC sample weighting                              | `false`                            |
-| `--rabc_progress_path` | Path to progress parquet file (auto-detected from dataset) | `sarm_progress.parquet` in dataset |
-| `--rabc_head_mode`     | Which SARM head's progress to use: `sparse` or `dense`     | `sparse`                           |
-| `--rabc_kappa`         | Threshold κ for high-quality samples                       | `0.01`                             |
+| Argument                           | Description                                            | Default                 |
+| ---------------------------------- | ------------------------------------------------------ | ----------------------- |
+| `--sample_weighting.type`          | Weighting strategy type (`rabc` or `uniform`)          | `rabc`                  |
+| `--sample_weighting.progress_path` | Path to progress parquet file                          | `sarm_progress.parquet` |
+| `--sample_weighting.head_mode`     | Which SARM head's progress to use: `sparse` or `dense` | `sparse`                |
+| `--sample_weighting.kappa`         | Threshold κ for high-quality samples                   | `0.01`                  |
+| `--sample_weighting.epsilon`       | Small constant for numerical stability                 | `1e-6`                  |

 ### Tuning RA-BC Kappa

@@ -511,30 +512,30 @@ The `kappa` parameter is the threshold that determines which samples get full we

 Monitor these WandB metrics during training:

-| Metric             | Healthy Range | Problem Indicator         |
-| ------------------ | ------------- | ------------------------- |
-| `rabc_mean_weight` | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
-| `rabc_delta_mean`  | > 0           | Should be positive        |
-| `rabc_delta_std`   | > 0           | Variance in data quality  |
+| Metric                        | Healthy Range | Problem Indicator         |
+| ----------------------------- | ------------- | ------------------------- |
+| `sample_weight_mean_weight`   | 0.3 - 0.8     | ≈ 1.0 means kappa too low |
+| `sample_weighting/delta_mean` | > 0           | Should be positive        |
+| `sample_weighting/delta_std`  | > 0           | Variance in data quality  |

-**If `rabc_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.
+**If `sample_weight_mean_weight ≈ 1.0`:** Your kappa is too low. Most samples have `delta > kappa` and bypass the soft-weighting entirely. RA-BC becomes equivalent to vanilla BC.

 **Setting kappa based on your data:**

-The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `rabc_delta_mean` and `rabc_delta_std`:
+The default `kappa=0.01` was tuned for the paper's T-shirt folding task (~90s episodes at 30fps). For your dataset, check the logged `sample_weighting/delta_mean` and `sample_weighting/delta_std`:

 ```
 # If delta_mean ≈ 0.03 and delta_std ≈ 0.02:
 # Most deltas fall in range [0.01, 0.05]

 # Option 1: Set kappa = delta_mean (medium selectivity)
--rabc_kappa=0.03
+--sample_weighting.kappa=0.03

 # Option 2: Set kappa = delta_mean + delta_std (high selectivity)
--rabc_kappa=0.05
+--sample_weighting.kappa=0.05

 # Option 3: Set kappa = delta_mean + 2*delta_std (very selective)
--rabc_kappa=0.07
+--sample_weighting.kappa=0.07
 ```

 **When RA-BC may not help:**
@@ -550,8 +551,8 @@ accelerate launch \
  src/lerobot/scripts/lerobot_train.py \
  --dataset.repo_id=your-username/your-dataset \
  --policy.type=pi0 \
-  --use_rabc=true \
-  --rabc_kappa=0.01 \
+  --sample_weighting.type=rabc \
+  --sample_weighting.kappa=0.01 \
  --output_dir=outputs/train/policy_rabc \
  --batch_size=32 \
  --steps=40000
@@ -576,7 +577,7 @@ accelerate launch \
 ### RA-BC

 1. **Train SARM first**: RA-BC quality depends entirely on SARM quality
-2. **Monitor `rabc_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))
+2. **Monitor `sample_weight_mean_weight`**: If it's ≈ 1.0, increase kappa (see [Tuning RA-BC Kappa](#tuning-ra-bc-kappa))

 ---

@@ -108,7 +108,7 @@ lerobot-record \
  --dataset.num_episodes=10 \
  --dataset.streaming_encoding=true \
  --dataset.encoder_threads=2 \
-  # --dataset.vcodec=auto \
+  # --dataset.camera_encoder_config.vcodec=auto \
  # <- Teleop optional if you want to teleoperate in between episodes \
  # --teleop.type=so100_leader \
  # --teleop.port=/dev/ttyACM0 \
@@ -14,12 +14,22 @@ This makes `save_episode()` near-instant (the video is already encoded by the ti

 ## 2. Tuning Parameters

-| Parameter               | CLI Flag                          | Type          | Default       | Description                                                       |
-| ----------------------- | --------------------------------- | ------------- | ------------- | ----------------------------------------------------------------- |
-| `streaming_encoding`    | `--dataset.streaming_encoding`    | `bool`        | `True`        | Enable real-time encoding during capture                          |
-| `vcodec`                | `--dataset.vcodec`                | `str`         | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder                     |
-| `encoder_threads`       | `--dataset.encoder_threads`       | `int \| None` | `None` (auto) | Threads per encoder instance. `None` will leave the vcoded decide |
-| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int`         | `60`          | Max buffered frames per camera (~2s at 30fps). Consumes RAM       |
+All encoding parameters are grouped under `camera_encoder_config` (a `VideoEncoderConfig` dataclass), accessible from the CLI via `--dataset.camera_encoder_config.<field>`.
+
+| Parameter               | CLI Flag                                      | Type          | Default       | Description                                                         |
+| ----------------------- | --------------------------------------------- | ------------- | ------------- | ------------------------------------------------------------------- |
+| `streaming_encoding`    | `--dataset.streaming_encoding`                | `bool`        | `True`        | Enable real-time encoding during capture                            |
+| `vcodec`                | `--dataset.camera_encoder_config.vcodec`      | `str`         | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder                       |
+| `pix_fmt`               | `--dataset.camera_encoder_config.pix_fmt`     | `str`         | `"yuv420p"`   | Pixel format                                                        |
+| `g`                     | `--dataset.camera_encoder_config.g`           | `int \| None` | `2`           | GOP size (keyframe interval)                                        |
+| `crf`                   | `--dataset.camera_encoder_config.crf`         | `int \| None` | `30`          | Quality level (mapped to codec-specific parameter)                  |
+| `preset`                | `--dataset.camera_encoder_config.preset`      | `int \| None` | `12`          | Speed preset (libsvtav1 only, 0 = slowest … 13 = fastest)           |
+| `fast_decode`           | `--dataset.camera_encoder_config.fast_decode` | `int`         | `0`           | Fast-decode tuning level                                            |
+| `encoder_threads`       | `--dataset.encoder_threads`                   | `int \| None` | `None` (auto) | Threads per encoder instance (global). `None` lets the codec decide |
+| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize`             | `int`         | `60`          | Max buffered frames per camera (~2s at 30fps). Consumes RAM         |
+
+> [!TIP]
+> Not all parameters apply to every codec. `VideoEncoderConfig` will warn at startup if you set a parameter that your chosen codec ignores (e.g. `preset` with `h264_nvenc`).

 ## 3. Performance Considerations

@@ -40,7 +50,7 @@ Streaming encoding means the CPU is encoding video **during** the capture loop,

 ### `encoder_threads` Tuning

-This parameter controls how many threads each encoder instance uses internally:
+This parameter (`--dataset.encoder_threads`) controls how many threads each encoder instance uses internally:

 - **Higher values** (e.g., 4-5): Faster encoding, but uses more CPU cores per camera. Good for high-end systems with many cores.
 - **Lower values** (e.g., 1-2): Less CPU per camera, freeing cores for capture and visualization. Good for low-res images and capable CPUs.
@@ -82,15 +92,15 @@ Use HW encoding when:

 ### Available HW Encoders

-| Encoder             | Platform      | Hardware                                                                                         | CLI Value                            |
-| ------------------- | ------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------ |
-| `h264_videotoolbox` | macOS         | Apple Silicon / Intel                                                                            | `--dataset.vcodec=h264_videotoolbox` |
-| `hevc_videotoolbox` | macOS         | Apple Silicon / Intel                                                                            | `--dataset.vcodec=hevc_videotoolbox` |
-| `h264_nvenc`        | Linux/Windows | NVIDIA GPU                                                                                       | `--dataset.vcodec=h264_nvenc`        |
-| `hevc_nvenc`        | Linux/Windows | NVIDIA GPU                                                                                       | `--dataset.vcodec=hevc_nvenc`        |
-| `h264_vaapi`        | Linux         | Intel/AMD GPU                                                                                    | `--dataset.vcodec=h264_vaapi`        |
-| `h264_qsv`          | Linux/Windows | Intel Quick Sync                                                                                 | `--dataset.vcodec=h264_qsv`          |
-| `auto`              | Any           | Probes the system for available HW encoders. Falls back to `libsvtav1` if no HW encoder is found | `--dataset.vcodec=auto`              |
+| Encoder             | Platform      | Hardware                                                                                         | CLI Value                                                  |
+| ------------------- | ------------- | ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------- |
+| `h264_videotoolbox` | macOS         | Apple Silicon / Intel                                                                            | `--dataset.camera_encoder_config.vcodec=h264_videotoolbox` |
+| `hevc_videotoolbox` | macOS         | Apple Silicon / Intel                                                                            | `--dataset.camera_encoder_config.vcodec=hevc_videotoolbox` |
+| `h264_nvenc`        | Linux/Windows | NVIDIA GPU                                                                                       | `--dataset.camera_encoder_config.vcodec=h264_nvenc`        |
+| `hevc_nvenc`        | Linux/Windows | NVIDIA GPU                                                                                       | `--dataset.camera_encoder_config.vcodec=hevc_nvenc`        |
+| `h264_vaapi`        | Linux         | Intel/AMD GPU                                                                                    | `--dataset.camera_encoder_config.vcodec=h264_vaapi`        |
+| `h264_qsv`          | Linux/Windows | Intel Quick Sync                                                                                 | `--dataset.camera_encoder_config.vcodec=h264_qsv`          |
+| `auto`              | Any           | Probes the system for available HW encoders. Falls back to `libsvtav1` if no HW encoder is found | `--dataset.camera_encoder_config.vcodec=auto`              |

 > [!NOTE]
 > In order to use the HW accelerated encoders you might need to upgrade your GPU drivers.
@@ -100,15 +110,15 @@ Use HW encoding when:

 ## 5. Troubleshooting

-| Symptom                                                            | Likely Cause                                 | Fix                                                                                                                                                                                                                                                                                  |
-| ------------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| System freezes or choppy robot movement or Rerun visualization lag | CPU starved (100% load usage)                | Close other apps, reduce encoding throughput, lower `encoder_threads`, use `h264`, use `display_data=False`. If the CPU continues to be at 100% then it might be insufficient for your setup, consider `--dataset.streaming_encoding=false` or HW encoding (`--dataset.vcodec=auto`) |
-| "Encoder queue full" warnings or dropped frames in dataset         | Encoder can't keep up (Queue overflow)       | If CPU is not at 100%: Increase `encoder_threads`, increase `encoder_queue_maxsize` or use HW encoding (`--dataset.vcodec=auto`).                                                                                                                                                    |
-| High RAM usage                                                     | Queue filling faster than encoding           | `encoder_threads` too low or CPU insufficient. Reduce `encoder_queue_maxsize` or use HW encoding                                                                                                                                                                                     |
-| Large video files                                                  | Using HW encoder or H.264                    | Expected trade-off. Switch to `libsvtav1` if CPU allows                                                                                                                                                                                                                              |
-| `save_episode()` still slow                                        | `streaming_encoding` is `False`              | Set `--dataset.streaming_encoding=true`                                                                                                                                                                                                                                              |
-| Encoder thread crash                                               | Codec not available or invalid settings      | Check `vcodec` is installed, try `--dataset.vcodec=auto`                                                                                                                                                                                                                             |
-| Recorded dataset is missing frames                                 | CPU/GPU starvation or occasional load spikes | If ~5% of frames are missing, your system is likely overloaded — follow the recommendations above. If fewer frames are missing (~2%), they are probably due to occasional transient load spikes (often at startup) and can be considered expected.                                   |
+| Symptom                                                            | Likely Cause                                 | Fix                                                                                                                                                                                                                                                                                                        |
+| ------------------------------------------------------------------ | -------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| System freezes or choppy robot movement or Rerun visualization lag | CPU starved (100% load usage)                | Close other apps, reduce encoding throughput, lower `encoder_threads`, use `h264`, use `display_data=False`. If the CPU continues to be at 100% then it might be insufficient for your setup, consider `--dataset.streaming_encoding=false` or HW encoding (`--dataset.camera_encoder_config.vcodec=auto`) |
+| "Encoder queue full" warnings or dropped frames in dataset         | Encoder can't keep up (Queue overflow)       | If CPU is not at 100%: Increase `encoder_threads`, increase `encoder_queue_maxsize` or use HW encoding (`--dataset.camera_encoder_config.vcodec=auto`).                                                                                                                                                    |
+| High RAM usage                                                     | Queue filling faster than encoding           | `encoder_threads` too low or CPU insufficient. Reduce `encoder_queue_maxsize` or use HW encoding                                                                                                                                                                                                           |
+| Large video files                                                  | Using HW encoder or H.264                    | Expected trade-off. Switch to `libsvtav1` if CPU allows                                                                                                                                                                                                                                                    |
+| `save_episode()` still slow                                        | `streaming_encoding` is `False`              | Set `--dataset.streaming_encoding=true`                                                                                                                                                                                                                                                                    |
+| Encoder thread crash                                               | Codec not available or invalid settings      | Check `vcodec` is installed, try `--dataset.camera_encoder_config.vcodec=auto`                                                                                                                                                                                                                             |
+| Recorded dataset is missing frames                                 | CPU/GPU starvation or occasional load spikes | If ~5% of frames are missing, your system is likely overloaded — follow the recommendations above. If fewer frames are missing (~2%), they are probably due to occasional transient load spikes (often at startup) and can be considered expected.                                                         |

 ## 6. Recommended Configurations

@@ -146,10 +156,10 @@ On very constrained systems, streaming encoding may compete too heavily with the
 # 2camsx 640x480x3 @30fps: Requires some tuning.

 # Use H.264, disable streaming, consider batching encoding
-lerobot-record --dataset.vcodec=h264 --dataset.streaming_encoding=false ...
+lerobot-record --dataset.camera_encoder_config.vcodec=h264 --dataset.streaming_encoding=false ...
 ```

 ## 7. Closing note

 Performance ultimately depends on your exact setup — frames-per-second, resolution, CPU cores and load, available memory, episode length, and the encoder you choose. Always test with your target workload, be mindful about your CPU & system capabilities and tune `encoder_threads`, `encoder_queue_maxsize`, and
-`vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration.
+`camera_encoder_config.vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration.
@@ -117,10 +117,10 @@ lerobot-edit-dataset \
    --repo_id lerobot/pusht_image \
    --operation.type convert_image_to_video \
    --operation.output_dir outputs/pusht_video \
-    --operation.vcodec libsvtav1 \
-    --operation.pix_fmt yuv420p \
-    --operation.g 2 \
-    --operation.crf 30
+    --operation.camera_encoder_config.vcodec libsvtav1 \
+    --operation.camera_encoder_config.pix_fmt yuv420p \
+    --operation.camera_encoder_config.g 2 \
+    --operation.camera_encoder_config.crf 30

 # Convert only specific episodes
 lerobot-edit-dataset \
@@ -147,11 +147,14 @@ lerobot-edit-dataset \
 **Parameters:**

 - `output_dir`: Custom output directory (optional - by default uses `new_repo_id` or `{repo_id}_video`)
- `vcodec`: Video codec to use - options: `h264`, `hevc`, `libsvtav1` (default: `libsvtav1`)
- `pix_fmt`: Pixel format - options: `yuv420p`, `yuv444p` (default: `yuv420p`)
- `g`: Group of pictures (GOP) size - lower values give better quality but larger files (default: 2)
- `crf`: Constant rate factor - lower values give better quality but larger files, 0 is lossless (default: 30)
- `fast_decode`: Fast decode tuning option (default: 0)
+- `camera_encoder_config`: Video encoder settings — all sub-fields accessible via `--operation.camera_encoder_config.<field>`:
+  - `vcodec`: Video codec — `h264`, `hevc`, `libsvtav1`, `auto`, or hardware codecs (default: `libsvtav1`)
+  - `pix_fmt`: Pixel format — `yuv420p`, `yuv444p` (default: `yuv420p`)
+  - `g`: GOP size — lower values give better quality but larger files (default: 2)
+  - `crf`: Quality level — lower is better, 0 is lossless (default: 30)
+  - `preset`: Speed preset, libsvtav1 only (default: 12)
+  - `fast_decode`: Fast-decode tuning (default: 0)
+  - `encoder_threads`: Threads per encoder instance — global setting, separate from `camera_encoder_config` (default: None)
 - `episode_indices`: List of specific episodes to convert (default: all episodes)
 - `num_workers`: Number of parallel workers for processing (default: 4)

@@ -220,7 +220,7 @@ REAL_DIM = 12
 # Postprocessing: Trim 20D predictions to 12D for deployment
 ```

-See the [action_hub.py](/home/jade_choghari/robot/lerobot/src/lerobot/policies/xvla/action_hub.py) implementation for details.
+See the [action_hub.py](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/action_hub.py) implementation for details.

 #### Auto Action Mode (Recommended)

@@ -519,9 +519,9 @@ If you use X-VLA in your research, please cite:

 - [X-VLA Paper](https://arxiv.org/pdf/2510.10274)
 - [LeRobot Documentation](https://github.com/huggingface/lerobot)
- [Action Registry Implementation](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/action_hub.py)
- [Processor Implementation](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/processor_xvla.py)
- [Model Configuration](https://github.com/huggingface/lerobot/src/lerobot/policies/xvla/configuration_xvla.py)
+- [Action Registry Implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/action_hub.py)
+- [Processor Implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/processor_xvla.py)
+- [Model Configuration](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/xvla/configuration_xvla.py)

 ## Contributing

@@ -69,7 +69,7 @@ class ComputeProgressShards(PipelineStep):
        import torch
        from tqdm import tqdm

-        from lerobot.policies.sarm.compute_rabc_weights import (
+        from lerobot.rewards.sarm.compute_rabc_weights import (
            generate_all_frame_indices,
            interpolate_progress,
            load_sarm_resources,
@@ -10,7 +10,7 @@ from lerobot.datasets import LeRobotDataset
 from lerobot.envs.configs import HILSerlProcessorConfig, HILSerlRobotEnvConfig
 from lerobot.policies import SACConfig
 from lerobot.policies.sac.modeling_sac import SACPolicy
-from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+from lerobot.rewards.classifier.modeling_classifier import Classifier
 from lerobot.rl.buffer import ReplayBuffer
 from lerobot.rl.gym_manipulator import make_robot_env
 from lerobot.robots.so_follower import SO100FollowerConfig
@@ -1,7 +1,7 @@
 import torch

 from lerobot.datasets import LeRobotDataset
-from lerobot.policies import RewardClassifierConfig, make_policy, make_pre_post_processors
+from lerobot.rewards import RewardClassifierConfig, make_reward_model, make_reward_pre_post_processors


 def main():
@@ -22,10 +22,10 @@ def main():
        model_name="microsoft/resnet-18",
    )

-    # Make policy, preprocessor, and optimizer
-    policy = make_policy(config, ds_meta=dataset.meta)
-    optimizer = config.get_optimizer_preset().build(policy.parameters())
-    preprocessor, _ = make_pre_post_processors(policy_cfg=config, dataset_stats=dataset.meta.stats)
+    # Make reward model, preprocessor, and optimizer
+    reward_model = make_reward_model(config, dataset_stats=dataset.meta.stats)
+    optimizer = config.get_optimizer_preset().build(reward_model.parameters())
+    preprocessor, _ = make_reward_pre_post_processors(config, dataset_stats=dataset.meta.stats)

    classifier_id = "<user>/reward_classifier_hil_serl_example"

@@ -42,7 +42,7 @@ def main():
            batch = preprocessor(batch)

            # Forward pass
-            loss, output_dict = policy.forward(batch)
+            loss, output_dict = reward_model.forward(batch)

            # Backward pass and optimization
            optimizer.zero_grad()
@@ -58,8 +58,8 @@ def main():

    print("Training finished!")

-    # You can now save the trained policy.
-    policy.push_to_hub(classifier_id)
+    # You can now save the trained reward model.
+    reward_model.push_to_hub(classifier_id)


 if __name__ == "__main__":
@@ -17,6 +17,7 @@ Provides the RealSenseCamera class for capturing frames from Intel RealSense cam
 """

 import logging
+import sys
 import time
 from threading import Event, Lock, Thread
 from typing import TYPE_CHECKING, Any
@@ -41,6 +42,7 @@ from ..utils import get_cv2_rotation
 from .configuration_realsense import RealSenseCameraConfig

 logger = logging.getLogger(__name__)
+pkg_name = "pyrealsense2-macosx" if sys.platform == "darwin" else "pyrealsense2"


 class RealSenseCamera(Camera):
@@ -114,7 +116,7 @@ class RealSenseCamera(Camera):
        Args:
            config: The configuration settings for the camera.
        """
-        require_package("pyrealsense2", extra="intelrealsense")
+        require_package(pkg_name, extra="intelrealsense", import_name="pyrealsense2")
        super().__init__(config)

        self.config = config
@@ -131,6 +133,9 @@ class RealSenseCamera(Camera):

        self.rs_pipeline: rs.pipeline | None = None
        self.rs_profile: rs.pipeline_profile | None = None
+        # Meters per uint16 unit on the depth stream. Queried from the device
+        # at connect() time. Typical D-series value is 0.001 (= 1 mm/unit).
+        self.depth_scale: float | None = None

        self.thread: Thread | None = None
        self.stop_event: Event | None = None
@@ -188,6 +193,17 @@ class RealSenseCamera(Camera):
            ) from e

        self._configure_capture_settings()
+
+        # Query depth scale (meters per uint16 unit) when depth is enabled so
+        # consumers can convert the raw z16 stream to metric distances.
+        if self.use_depth and self.rs_profile is not None:
+            try:
+                depth_sensor = self.rs_profile.get_device().first_depth_sensor()
+                self.depth_scale = float(depth_sensor.get_depth_scale())
+            except RuntimeError as e:
+                logger.warning(f"{self}: failed to query depth scale ({e}); falling back to 0.001 m/unit.")
+                self.depth_scale = 0.001
+
        self._start_read_thread()

        # NOTE(Steven/Caroline): Enforcing at least one second of warmup as RS cameras need a bit of time before the first read. If we don't wait, the first read from the warmup will raise.
@@ -530,7 +546,6 @@ class RealSenseCamera(Camera):
            self.latest_timestamp = None
            self.new_frame_event.clear()

-    # NOTE(Steven): Missing implementation for depth for now
    @check_if_not_connected
    def async_read(self, timeout_ms: float = 200) -> NDArray[Any]:
        """
@@ -573,7 +588,6 @@ class RealSenseCamera(Camera):

        return frame

-    # NOTE(Steven): Missing implementation for depth for now
    @check_if_not_connected
    def read_latest(self, max_age_ms: int = 500) -> NDArray[Any]:
        """Return the most recent (color) frame captured immediately (Peeking).
@@ -609,6 +623,78 @@ class RealSenseCamera(Camera):

        return frame

+
+    @check_if_not_connected
+    def async_read_depth(self, timeout_ms: float = 200) -> NDArray[Any]:
+        """Read the latest depth frame asynchronously, in metric meters.
+
+        Mirrors :meth:`async_read` but returns the depth stream rather than the
+        color stream. Output is ``np.uint16`` of shape ``(H, W)``.
+
+        Raises:
+            DeviceNotConnectedError: If the camera is not connected.
+            RuntimeError: If ``use_depth`` is ``False`` for this camera, or if
+                the background read thread is not running.
+            TimeoutError: If no frame becomes available within ``timeout_ms``.
+        """
+        if not self.use_depth:
+            raise RuntimeError(
+                f"{self}: cannot read depth — camera was configured with use_depth=False."
+            )
+
+        if self.thread is None or not self.thread.is_alive():
+            raise RuntimeError(f"{self} read thread is not running.")
+
+        if not self.new_frame_event.wait(timeout=timeout_ms / 1000.0):
+            raise TimeoutError(
+                f"Timed out waiting for depth frame from camera {self} after {timeout_ms} ms."
+            )
+
+        with self.frame_lock:
+            depth_frame = self.latest_depth_frame
+            self.new_frame_event.clear()
+
+        if depth_frame is None:
+            raise RuntimeError(f"Internal error: Event set but no depth frame available for {self}.")
+
+        return depth_frame
+
+    @check_if_not_connected
+    def read_latest_depth(self, max_age_ms: int = 500) -> NDArray[Any]:
+        """Return the most recent depth frame in metric meters (peeking).
+
+        Non-blocking counterpart of :meth:`read_latest` for the depth stream.
+        Output is ``np.float32`` of shape ``(H, W)`` in meters.
+
+        Raises:
+            DeviceNotConnectedError: If the camera is not connected.
+            RuntimeError: If ``use_depth`` is ``False`` for this camera, or if
+                no depth frame has been captured yet.
+            TimeoutError: If the latest depth frame is older than ``max_age_ms``.
+        """
+        if not self.use_depth:
+            raise RuntimeError(
+                f"{self}: cannot read depth — camera was configured with use_depth=False."
+            )
+
+        if self.thread is None or not self.thread.is_alive():
+            raise RuntimeError(f"{self} read thread is not running.")
+
+        with self.frame_lock:
+            depth_frame = self.latest_depth_frame
+            timestamp = self.latest_timestamp
+
+        if depth_frame is None or timestamp is None:
+            raise RuntimeError(f"{self} has not captured any depth frames yet.")
+
+        age_ms = (time.perf_counter() - timestamp) * 1e3
+        if age_ms > max_age_ms:
+            raise TimeoutError(
+                f"{self} latest depth frame is too old: {age_ms:.1f} ms (max allowed: {max_age_ms} ms)."
+            )
+
+        return depth_frame
+
    def disconnect(self) -> None:
        """
        Disconnects from the camera, stops the pipeline, and cleans up resources.
@@ -632,6 +718,8 @@ class RealSenseCamera(Camera):
            self.rs_pipeline = None
            self.rs_profile = None

+        self.depth_scale = None
+
        with self.frame_lock:
            self.latest_color_frame = None
            self.latest_depth_frame = None
@@ -41,8 +41,12 @@ def cfg_to_group(
            return tag
        return tag[:max_tag_length]

+    if cfg.is_reward_model_training:
+        trainable_tag = f"reward_model:{cfg.reward_model.type}"
+    else:
+        trainable_tag = f"policy:{cfg.policy.type}"
    lst = [
-        f"policy:{cfg.policy.type}",
+        trainable_tag,
        f"seed:{cfg.seed}",
    ]
    if cfg.dataset is not None:
@@ -17,7 +17,7 @@
 from dataclasses import dataclass, field

 from lerobot.transforms import ImageTransformsConfig
-from lerobot.utils.import_utils import get_safe_default_codec
+from lerobot.utils.import_utils import get_safe_default_video_backend


@dataclass
@@ -34,7 +34,7 @@ class DatasetConfig:
    image_transforms: ImageTransformsConfig = field(default_factory=ImageTransformsConfig)
    revision: str | None = None
    use_imagenet_stats: bool = True
-    video_backend: str = field(default_factory=get_safe_default_codec)
+    video_backend: str = field(default_factory=get_safe_default_video_backend)
    # When True, video frames are returned as uint8 tensors (0-255) instead of float32 (0.0-1.0).
    # This reduces memory and speeds up DataLoader IPC. The training pipeline handles the conversion.
    return_uint8: bool = False
@@ -0,0 +1,163 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import abc
+import builtins
+import json
+import logging
+import os
+import tempfile
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, TypeVar
+
+import draccus
+from huggingface_hub import hf_hub_download
+from huggingface_hub.constants import CONFIG_NAME
+from huggingface_hub.errors import HfHubHTTPError
+
+from lerobot.configs.types import PolicyFeature
+from lerobot.optim.optimizers import OptimizerConfig
+from lerobot.optim.schedulers import LRSchedulerConfig
+from lerobot.utils.device_utils import auto_select_torch_device, is_torch_device_available
+from lerobot.utils.hub import HubMixin
+
+T = TypeVar("T", bound="RewardModelConfig")
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class RewardModelConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC):
+    """Base configuration for reward models.
+
+    Args:
+    input_features: A dictionary defining the PolicyFeature of the input data for the reward. The key represents
+        the input data name, and the value is PolicyFeature, which consists of FeatureType and shape attributes.
+    output_features: A dictionary defining the PolicyFeature of the output data for the reward. The key represents
+        the output data name, and the value is PolicyFeature, which consists of FeatureType and shape attributes.
+    """
+
+    # Reuses PolicyFeature
+    input_features: dict[str, PolicyFeature] = field(default_factory=dict)
+    output_features: dict[str, PolicyFeature] = field(default_factory=dict)
+
+    device: str | None = None
+
+    pretrained_path: str | None = None
+
+    push_to_hub: bool = False
+    repo_id: str | None = None
+
+    # Hub metadata
+    license: str | None = None
+    tags: list[str] | None = None
+    private: bool | None = None
+
+    def __post_init__(self) -> None:
+        if not self.device or not is_torch_device_available(self.device):
+            auto_device = auto_select_torch_device()
+            logger.warning(f"Device '{self.device}' is not available. Switching to '{auto_device}'.")
+            self.device = auto_device.type
+
+    @property
+    def type(self) -> str:
+        choice_name = self.get_choice_name(self.__class__)
+        if not isinstance(choice_name, str):
+            raise TypeError(f"Expected string from get_choice_name, got {type(choice_name)}")
+        return choice_name
+
+    @property
+    def observation_delta_indices(self) -> list | None:  # type: ignore[type-arg]
+        return None
+
+    @property
+    def action_delta_indices(self) -> list | None:  # type: ignore[type-arg]
+        return None
+
+    @property
+    def reward_delta_indices(self) -> list | None:  # type: ignore[type-arg]
+        return None
+
+    @abc.abstractmethod
+    def get_optimizer_preset(self) -> OptimizerConfig:
+        raise NotImplementedError
+
+    def get_scheduler_preset(self) -> LRSchedulerConfig | None:
+        return None
+
+    def validate_features(self) -> None:
+        pass
+
+    def _save_pretrained(self, save_directory: Path) -> None:
+        with open(save_directory / CONFIG_NAME, "w") as f, draccus.config_type("json"):
+            draccus.dump(self, f, indent=4)
+
+    @classmethod
+    def from_pretrained(
+        cls: builtins.type[T],
+        pretrained_name_or_path: str | Path,
+        *,
+        force_download: bool = False,
+        resume_download: bool | None = None,
+        proxies: dict[Any, Any] | None = None,
+        token: str | bool | None = None,
+        cache_dir: str | Path | None = None,
+        local_files_only: bool = False,
+        revision: str | None = None,
+        **reward_kwargs: Any,
+    ) -> T:
+        model_id = str(pretrained_name_or_path)
+        config_file: str | None = None
+        if Path(model_id).is_dir():
+            if CONFIG_NAME in os.listdir(model_id):
+                config_file = os.path.join(model_id, CONFIG_NAME)
+            else:
+                logger.error(f"{CONFIG_NAME} not found in {Path(model_id).resolve()}")
+        else:
+            try:
+                config_file = hf_hub_download(
+                    repo_id=model_id,
+                    filename=CONFIG_NAME,
+                    revision=revision,
+                    cache_dir=cache_dir,
+                    force_download=force_download,
+                    proxies=proxies,
+                    resume_download=resume_download,
+                    token=token,
+                    local_files_only=local_files_only,
+                )
+            except HfHubHTTPError as e:
+                raise FileNotFoundError(
+                    f"{CONFIG_NAME} not found on the HuggingFace Hub in {model_id}"
+                ) from e
+
+        if config_file is None:
+            raise FileNotFoundError(f"{CONFIG_NAME} not found in {model_id}")
+
+        # HACK: Parse the original config to get the config subclass, so that we can
+        # apply cli overrides.
+        with draccus.config_type("json"):
+            orig_config = draccus.parse(cls, config_file, args=[])
+
+        with open(config_file) as f:
+            config = json.load(f)
+
+        config.pop("type", None)
+        with tempfile.NamedTemporaryFile("w+", delete=False, suffix=".json") as f:
+            json.dump(config, f)
+            config_file = f.name
+
+        cli_overrides = reward_kwargs.pop("cli_overrides", [])
+        with draccus.config_type("json"):
+            return draccus.parse(orig_config.__class__, config_file, args=cli_overrides)
@@ -13,7 +13,9 @@
 # limitations under the License.
 import builtins
 import datetime as dt
+import json
 import os
+import tempfile
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
@@ -26,18 +28,57 @@ from lerobot import envs
 from lerobot.configs import parser
 from lerobot.optim import LRSchedulerConfig, OptimizerConfig
 from lerobot.utils.hub import HubMixin
+from lerobot.utils.sample_weighting import SampleWeightingConfig

 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
+from .rewards import RewardModelConfig

 TRAIN_CONFIG_NAME = "train_config.json"


+def _migrate_legacy_rabc_fields(config: dict[str, Any]) -> dict[str, Any] | None:
+    """Return migrated payload for legacy RA-BC fields, or None when no migration is needed."""
+    legacy_fields = (
+        "use_rabc",
+        "rabc_progress_path",
+        "rabc_kappa",
+        "rabc_epsilon",
+        "rabc_head_mode",
+    )
+    if not any(key in config for key in legacy_fields):
+        return None
+
+    migrated_config = dict(config)
+    use_rabc = bool(migrated_config.pop("use_rabc", False))
+    rabc_progress_path = migrated_config.pop("rabc_progress_path", None)
+    rabc_kappa = migrated_config.pop("rabc_kappa", None)
+    rabc_epsilon = migrated_config.pop("rabc_epsilon", None)
+    rabc_head_mode = migrated_config.pop("rabc_head_mode", None)
+
+    # New configs may already define sample_weighting explicitly. In that case,
+    # legacy fields are ignored after being stripped from the payload.
+    if migrated_config.get("sample_weighting") is None and use_rabc:
+        sample_weighting: dict[str, Any] = {"type": "rabc"}
+        if rabc_progress_path is not None:
+            sample_weighting["progress_path"] = rabc_progress_path
+        if rabc_kappa is not None:
+            sample_weighting["kappa"] = rabc_kappa
+        if rabc_epsilon is not None:
+            sample_weighting["epsilon"] = rabc_epsilon
+        if rabc_head_mode is not None:
+            sample_weighting["head_mode"] = rabc_head_mode
+        migrated_config["sample_weighting"] = sample_weighting
+
+    return migrated_config
+
+
@dataclass
 class TrainPipelineConfig(HubMixin):
    dataset: DatasetConfig
    env: envs.EnvConfig | None = None
    policy: PreTrainedConfig | None = None
+    reward_model: RewardModelConfig | None = None
    # Set `dir` to where you would like to save all of the run outputs. If you run another training session
    # with the same value for `dir` its contents will be overwritten unless you set `resume` to true.
    output_dir: Path | None = None
@@ -72,27 +113,41 @@ class TrainPipelineConfig(HubMixin):
    wandb: WandBConfig = field(default_factory=WandBConfig)
    peft: PeftConfig | None = None

-    # RA-BC (Reward-Aligned Behavior Cloning) parameters
-    use_rabc: bool = False  # Enable reward-weighted training
-    rabc_progress_path: str | None = None  # Path to precomputed SARM progress parquet file
-    rabc_kappa: float = 0.01  # Hard threshold for high-quality samples
-    rabc_epsilon: float = 1e-6  # Small constant for numerical stability
-    rabc_head_mode: str | None = "sparse"  # For dual-head models: "sparse" or "dense"
+    # Sample weighting configuration (e.g., for RA-BC training)
+    sample_weighting: SampleWeightingConfig | None = None

    # Rename map for the observation to override the image and state keys
    rename_map: dict[str, str] = field(default_factory=dict)
    checkpoint_path: Path | None = field(init=False, default=None)

+    @property
+    def is_reward_model_training(self) -> bool:
+        """True when the config targets a reward model rather than a policy."""
+        return self.reward_model is not None
+
+    @property
+    def trainable_config(self) -> PreTrainedConfig | RewardModelConfig:
+        """Return whichever config (policy or reward_model) is active."""
+        if self.is_reward_model_training:
+            return self.reward_model  # type: ignore[return-value]
+        return self.policy  # type: ignore[return-value]
+
    def validate(self) -> None:
        # HACK: We parse again the cli args here to get the pretrained paths if there was some.
        policy_path = parser.get_path_arg("policy")
-        if policy_path:
-            # Only load the policy config
+        reward_model_path = parser.get_path_arg("reward_model")
+
+        if reward_model_path:
+            cli_overrides = parser.get_cli_overrides("reward_model")
+            self.reward_model = RewardModelConfig.from_pretrained(
+                reward_model_path, cli_overrides=cli_overrides
+            )
+            self.reward_model.pretrained_path = str(Path(reward_model_path))
+        elif policy_path:
            cli_overrides = parser.get_cli_overrides("policy")
            self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
            self.policy.pretrained_path = Path(policy_path)
        elif self.resume:
-            # The entire train config is already loaded, we just need to get the checkpoint dir
            config_path = parser.parse_arg("config_path")
            if not config_path:
                raise ValueError(
@@ -108,18 +163,22 @@ class TrainPipelineConfig(HubMixin):
            policy_dir = Path(config_path).parent
            if self.policy is not None:
                self.policy.pretrained_path = policy_dir
+            if self.reward_model is not None:
+                self.reward_model.pretrained_path = str(policy_dir)
            self.checkpoint_path = policy_dir.parent

-        if self.policy is None:
+        if self.policy is None and self.reward_model is None:
            raise ValueError(
-                "Policy is not configured. Please specify a pretrained policy with `--policy.path`."
+                "Neither policy nor reward_model is configured. "
+                "Please specify one with `--policy.path` or `--reward_model.path`."
            )

+        active_cfg = self.trainable_config
        if not self.job_name:
            if self.env is None:
-                self.job_name = f"{self.policy.type}"
+                self.job_name = f"{active_cfg.type}"
            else:
-                self.job_name = f"{self.env.type}_{self.policy.type}"
+                self.job_name = f"{self.env.type}_{active_cfg.type}"

        if not self.resume and isinstance(self.output_dir, Path) and self.output_dir.is_dir():
            raise FileExistsError(
@@ -137,26 +196,16 @@ class TrainPipelineConfig(HubMixin):
        if not self.use_policy_training_preset and (self.optimizer is None or self.scheduler is None):
            raise ValueError("Optimizer and Scheduler must be set when the policy presets are not used.")
        elif self.use_policy_training_preset and not self.resume:
-            self.optimizer = self.policy.get_optimizer_preset()
-            self.scheduler = self.policy.get_scheduler_preset()
+            self.optimizer = active_cfg.get_optimizer_preset()
+            self.scheduler = active_cfg.get_scheduler_preset()

-        if self.policy.push_to_hub and not self.policy.repo_id:
-            raise ValueError(
-                "'policy.repo_id' argument missing. Please specify it to push the model to the hub."
-            )
-
-        if self.use_rabc and not self.rabc_progress_path:
-            # Auto-detect from dataset path
-            repo_id = self.dataset.repo_id
-            if self.dataset.root:
-                self.rabc_progress_path = str(Path(self.dataset.root) / "sarm_progress.parquet")
-            else:
-                self.rabc_progress_path = f"hf://datasets/{repo_id}/sarm_progress.parquet"
+        if hasattr(active_cfg, "push_to_hub") and active_cfg.push_to_hub and not active_cfg.repo_id:
+            raise ValueError("'repo_id' argument missing. Please specify it to push the model to the hub.")

    @classmethod
    def __get_path_fields__(cls) -> list[str]:
-        """This enables the parser to load config from the policy using `--policy.path=local/dir`"""
-        return ["policy"]
+        """Keys for draccus pretrained-path loading."""
+        return ["policy", "reward_model"]

    def to_dict(self) -> dict[str, Any]:
        return draccus.encode(self)  # type: ignore[no-any-return]  # because of the third-party library draccus uses Any as the return type
@@ -207,6 +256,15 @@ class TrainPipelineConfig(HubMixin):
                ) from e

        cli_args = kwargs.pop("cli_args", [])
+        if config_file is not None:
+            with open(config_file) as f:
+                config = json.load(f)
+            migrated_config = _migrate_legacy_rabc_fields(config)
+            if migrated_config is not None:
+                with tempfile.NamedTemporaryFile("w+", delete=False, suffix=".json") as f:
+                    json.dump(migrated_config, f)
+                    config_file = f.name
+
        with draccus.config_type("json"):
            return draccus.parse(cls, config_file, args=cli_args)

@@ -40,10 +40,21 @@ from .io_utils import load_episodes, write_stats
 from .lerobot_dataset import LeRobotDataset
 from .multi_dataset import MultiLeRobotDataset
 from .pipeline_features import aggregate_pipeline_dataset_features, create_initial_features
+from .pyav_utils import (
+    check_video_encoder_config_pyav,
+    detect_available_encoders_pyav,
+    get_codec,
+)
 from .sampler import EpisodeAwareSampler
 from .streaming_dataset import StreamingLeRobotDataset
 from .utils import DEFAULT_EPISODES_PATH, create_lerobot_dataset_card
-from .video_utils import VideoEncodingManager
+from .video_utils import (
+    DepthEncoderConfig,
+    VideoEncoderConfig,
+    VideoEncodingManager,
+    camera_encoder_defaults,
+    depth_encoder_defaults,
+)

 # NOTE: Low-level I/O functions (cast_stats_to_numpy, get_parquet_file_size_in_mb, etc.)
 # and legacy migration constants are intentionally NOT re-exported here.
@@ -58,15 +69,22 @@ __all__ = [
    "LeRobotDatasetMetadata",
    "MultiLeRobotDataset",
    "StreamingLeRobotDataset",
+    "DepthEncoderConfig",
+    "VideoEncoderConfig",
    "VideoEncodingManager",
+    "camera_encoder_defaults",
+    "depth_encoder_defaults",
    "add_features",
    "aggregate_datasets",
    "aggregate_pipeline_dataset_features",
    "aggregate_stats",
+    "check_video_encoder_config_pyav",
    "convert_image_to_video_dataset",
    "create_initial_features",
    "create_lerobot_dataset_card",
    "delete_episodes",
+    "detect_available_encoders_pyav",
+    "get_codec",
    "get_feature_stats",
    "load_episodes",
    "make_dataset",
@@ -97,8 +97,8 @@ def update_data_df(df, src_meta, dst_meta):
        pd.DataFrame: Updated DataFrame with adjusted indices.
    """

-    df["episode_index"] = df["episode_index"] + dst_meta.info["total_episodes"]
-    df["index"] = df["index"] + dst_meta.info["total_frames"]
+    df["episode_index"] = df["episode_index"] + dst_meta.info.total_episodes
+    df["index"] = df["index"] + dst_meta.info.total_frames

    src_task_names = src_meta.tasks.index.take(df["task_index"].to_numpy())
    df["task_index"] = dst_meta.tasks.loc[src_task_names, "task_index"].to_numpy()
@@ -225,9 +225,9 @@ def update_meta_data(
        # Clean up temporary columns
        df = df.drop(columns=["_orig_chunk", "_orig_file"])

-    df["dataset_from_index"] = df["dataset_from_index"] + dst_meta.info["total_frames"]
-    df["dataset_to_index"] = df["dataset_to_index"] + dst_meta.info["total_frames"]
-    df["episode_index"] = df["episode_index"] + dst_meta.info["total_episodes"]
+    df["dataset_from_index"] = df["dataset_from_index"] + dst_meta.info.total_frames
+    df["dataset_to_index"] = df["dataset_to_index"] + dst_meta.info.total_frames
+    df["episode_index"] = df["episode_index"] + dst_meta.info.total_episodes

    return df

@@ -237,8 +237,8 @@ def aggregate_datasets(
    aggr_repo_id: str,
    roots: list[Path] | None = None,
    aggr_root: Path | None = None,
-    data_files_size_in_mb: float | None = None,
-    video_files_size_in_mb: float | None = None,
+    data_files_size_in_mb: int | None = None,
+    video_files_size_in_mb: int | None = None,
    chunk_size: int | None = None,
 ):
    """Aggregates multiple LeRobot datasets into a single unified dataset.
@@ -313,8 +313,8 @@ def aggregate_datasets(
        # to avoid interference between different source datasets
        data_idx.pop("src_to_dst", None)

-        dst_meta.info["total_episodes"] += src_meta.total_episodes
-        dst_meta.info["total_frames"] += src_meta.total_frames
+        dst_meta.info.total_episodes += src_meta.total_episodes
+        dst_meta.info.total_frames += src_meta.total_frames

    finalize_aggregation(dst_meta, all_metadata)
    logging.info("Aggregation complete.")
@@ -332,7 +332,6 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
        videos_idx: Dictionary tracking video chunk and file indices.
        video_files_size_in_mb: Maximum size for video files in MB (defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB)
        chunk_size: Maximum number of files per chunk (defaults to DEFAULT_CHUNK_SIZE)
-
    Returns:
        dict: Updated videos_idx with current chunk and file indices.
    """
@@ -417,6 +416,7 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
                concatenate_video_files(
                    [dst_path, src_path],
                    dst_path,
+                    compatibility_check=True,
                )
                # Update duration of this destination file
                dst_file_durations[dst_key] = current_dst_duration + src_duration
@@ -640,14 +640,10 @@ def finalize_aggregation(aggr_meta, all_metadata):
    write_tasks(aggr_meta.tasks, aggr_meta.root)

    logging.info("write info")
-    aggr_meta.info.update(
-        {
-            "total_tasks": len(aggr_meta.tasks),
-            "total_episodes": sum(m.total_episodes for m in all_metadata),
-            "total_frames": sum(m.total_frames for m in all_metadata),
-            "splits": {"train": f"0:{sum(m.total_episodes for m in all_metadata)}"},
-        }
-    )
+    aggr_meta.info.total_tasks = len(aggr_meta.tasks)
+    aggr_meta.info.total_episodes = sum(m.total_episodes for m in all_metadata)
+    aggr_meta.info.total_frames = sum(m.total_frames for m in all_metadata)
+    aggr_meta.info.splits = {"train": f"0:{sum(m.total_episodes for m in all_metadata)}"}
    write_info(aggr_meta.info, aggr_meta.root)

    logging.info("write stats")
@@ -37,20 +37,18 @@ from .io_utils import (
    load_subtasks,
    load_tasks,
    write_info,
-    write_json,
    write_stats,
    write_tasks,
 )
 from .utils import (
    DEFAULT_EPISODES_PATH,
-    INFO_PATH,
    check_version_compatibility,
    get_safe_version,
    has_legacy_hub_download_metadata,
    is_valid_version,
    update_chunk_file_indices,
 )
-from .video_utils import get_video_info
+from .video_utils import VideoEncoderConfig, get_video_info

 CODEBASE_VERSION = "v3.0"

@@ -228,7 +226,7 @@ class LeRobotDatasetMetadata:
    @property
    def _version(self) -> packaging.version.Version:
        """Codebase version used to create this dataset."""
-        return packaging.version.parse(self.info["codebase_version"])
+        return packaging.version.parse(self.info.codebase_version)

    def get_data_file_path(self, ep_index: int) -> Path:
        """Return the relative parquet file path for the given episode index.
@@ -283,27 +281,27 @@ class LeRobotDatasetMetadata:
    @property
    def data_path(self) -> str:
        """Formattable string for the parquet files."""
-        return self.info["data_path"]
+        return self.info.data_path

    @property
    def video_path(self) -> str | None:
        """Formattable string for the video files."""
-        return self.info["video_path"]
+        return self.info.video_path

    @property
    def robot_type(self) -> str | None:
        """Robot type used in recording this dataset."""
-        return self.info["robot_type"]
+        return self.info.robot_type

    @property
    def fps(self) -> int:
        """Frames per second used during data collection."""
-        return self.info["fps"]
+        return self.info.fps

    @property
    def features(self) -> dict[str, dict]:
        """All features contained in the dataset."""
-        return self.info["features"]
+        return self.info.features

    @property
    def image_keys(self) -> list[str]:
@@ -315,6 +313,20 @@ class LeRobotDatasetMetadata:
        """Keys to access visual modalities stored as videos."""
        return [key for key, ft in self.features.items() if ft["dtype"] == "video"]

+    @property
+    def depth_keys(self) -> list[str]:
+        """Keys to access depth-map modalities stored as videos.
+
+        A depth video key is a feature whose ``info`` dict carries
+        ``"video.is_depth_map": True`` (set either at creation time by the user
+        or after the first encoded episode by :meth:`update_video_info`).
+        """
+        return [
+            key
+            for key, ft in self.features.items()
+            if ft["dtype"] == "video" and ft.get("info", {}).get("video.is_depth_map", False)
+        ]
+
    @property
    def camera_keys(self) -> list[str]:
        """Keys to access visual modalities (regardless of their storage method)."""
@@ -333,32 +345,32 @@ class LeRobotDatasetMetadata:
    @property
    def total_episodes(self) -> int:
        """Total number of episodes available."""
-        return self.info["total_episodes"]
+        return self.info.total_episodes

    @property
    def total_frames(self) -> int:
        """Total number of frames saved in this dataset."""
-        return self.info["total_frames"]
+        return self.info.total_frames

    @property
    def total_tasks(self) -> int:
        """Total number of different tasks performed in this dataset."""
-        return self.info["total_tasks"]
+        return self.info.total_tasks

    @property
    def chunks_size(self) -> int:
        """Max number of files per chunk."""
-        return self.info["chunks_size"]
+        return self.info.chunks_size

    @property
    def data_files_size_in_mb(self) -> int:
        """Max size of data file in mega bytes."""
-        return self.info["data_files_size_in_mb"]
+        return self.info.data_files_size_in_mb

    @property
    def video_files_size_in_mb(self) -> int:
        """Max size of video file in mega bytes."""
-        return self.info["video_files_size_in_mb"]
+        return self.info.video_files_size_in_mb

    def get_task_index(self, task: str) -> int | None:
        """
@@ -502,29 +514,48 @@ class LeRobotDatasetMetadata:
        self._save_episode_metadata(episode_dict)

        # Update info
-        self.info["total_episodes"] += 1
-        self.info["total_frames"] += episode_length
-        self.info["total_tasks"] = len(self.tasks)
-        self.info["splits"] = {"train": f"0:{self.info['total_episodes']}"}
+        self.info.total_episodes += 1
+        self.info.total_frames += episode_length
+        self.info.total_tasks = len(self.tasks)
+        self.info.splits = {"train": f"0:{self.info.total_episodes}"}

        write_info(self.info, self.root)

        self.stats = aggregate_stats([self.stats, episode_stats]) if self.stats is not None else episode_stats
        write_stats(self.stats, self.root)

-    def update_video_info(self, video_key: str | None = None) -> None:
-        """
+    def update_video_info(
+        self,
+        video_key: str | None = None,
+        camera_encoder_config: VideoEncoderConfig | None = None,
+    ) -> None:
+        """Populate per-feature video info in ``info.json``.
+
        Warning: this function writes info from first episode videos, implicitly assuming that all videos have
        been encoded the same way. Also, this means it assumes the first episode exists.
+
+        Args:
+            video_key: If provided, only update this video key. Otherwise update
+                all video keys in the dataset.
+            camera_encoder_config: Encoder configuration used to produce the
+                videos. When provided, its fields are recorded as
+                ``video.<field>`` entries alongside the stream-derived
+                ``video.*`` entries (see :func:`get_video_info`).
        """
        if video_key is not None and video_key not in self.video_keys:
            raise ValueError(f"Video key {video_key} not found in dataset")

        video_keys = [video_key] if video_key is not None else self.video_keys
        for key in video_keys:
-            if not self.features[key].get("info", None):
+            existing = self.features[key].get("info") or {}
+            # Repopulate when codec metadata is missing — preserves user-provided
+            # markers like ``video.is_depth_map`` while still recording stream
+            # info on the first episode.
+            if not existing or "video.codec" not in existing:
                video_path = self.root / self.video_path.format(video_key=key, chunk_index=0, file_index=0)
-                self.info["features"][key]["info"] = get_video_info(video_path)
+                stream_info = get_video_info(video_path, camera_encoder_config=camera_encoder_config)
+                merged = {**existing, **stream_info}
+                self.info.features[key]["info"] = merged

    def update_chunk_settings(
        self,
@@ -546,17 +577,17 @@ class LeRobotDatasetMetadata:
        if chunks_size is not None:
            if chunks_size <= 0:
                raise ValueError(f"chunks_size must be positive, got {chunks_size}")
-            self.info["chunks_size"] = chunks_size
+            self.info.chunks_size = chunks_size

        if data_files_size_in_mb is not None:
            if data_files_size_in_mb <= 0:
                raise ValueError(f"data_files_size_in_mb must be positive, got {data_files_size_in_mb}")
-            self.info["data_files_size_in_mb"] = data_files_size_in_mb
+            self.info.data_files_size_in_mb = data_files_size_in_mb

        if video_files_size_in_mb is not None:
            if video_files_size_in_mb <= 0:
                raise ValueError(f"video_files_size_in_mb must be positive, got {video_files_size_in_mb}")
-            self.info["video_files_size_in_mb"] = video_files_size_in_mb
+            self.info.video_files_size_in_mb = video_files_size_in_mb

        # Update the info file on disk
        write_info(self.info, self.root)
@@ -653,7 +684,7 @@ class LeRobotDatasetMetadata:
                f"Features contain video keys {obj.video_keys}, but 'use_videos' is set to False. "
                "Either remove video features from the features dict, or set 'use_videos=True'."
            )
-        write_json(obj.info, obj.root / INFO_PATH)
+        write_info(obj.info, obj.root)
        obj.revision = None
        obj._pq_writer = None
        obj.latest_episode = None
@@ -32,7 +32,13 @@ from .io_utils import (
    hf_transform_to_torch,
    load_nested_dataset,
 )
-from .video_utils import decode_video_frames
+from .video_utils import decode_depth_frames, decode_video_frames
+from .depth_utils import (
+    DEFAULT_DEPTH_MIN, 
+    DEFAULT_DEPTH_MAX, 
+    DEFAULT_DEPTH_SHIFT, 
+    DEFAULT_DEPTH_USE_LOG,
+)


 class DatasetReader:
@@ -237,17 +243,31 @@ class DatasetReader:
        """
        ep = self._meta.episodes[ep_idx]

+        depth_keys = set(self._meta.depth_keys)
+
        def _decode_single(vid_key: str, query_ts: list[float]) -> tuple[str, torch.Tensor]:
            from_timestamp = ep[f"videos/{vid_key}/from_timestamp"]
            shifted_query_ts = [from_timestamp + ts for ts in query_ts]
            video_path = self.root / self._meta.get_video_file_path(ep_idx, vid_key)
-            frames = decode_video_frames(
-                video_path,
-                shifted_query_ts,
-                self._tolerance_s,
-                self._video_backend,
-                return_uint8=self._return_uint8,
-            )
+            if vid_key in depth_keys:
+                feature_info = self._meta.features[vid_key].get("info") or {}
+                frames = decode_depth_frames(
+                    video_path,
+                    shifted_query_ts,
+                    self._tolerance_s,
+                    depth_min=feature_info.get("video.depth_min", DEFAULT_DEPTH_MIN),
+                    depth_max=feature_info.get("video.depth_max", DEFAULT_DEPTH_MAX),
+                    shift=feature_info.get("video.shift", DEFAULT_DEPTH_SHIFT),
+                    use_log=feature_info.get("video.use_log", DEFAULT_DEPTH_USE_LOG),
+                )
+            else:
+                frames = decode_video_frames(
+                    video_path,
+                    shifted_query_ts,
+                    self._tolerance_s,
+                    self._video_backend,
+                    return_uint8=self._return_uint8,
+                )
            return vid_key, frames.squeeze(0)

        items = list(query_timestamps.items())
@@ -62,7 +62,7 @@ from .utils import (
    DEFAULT_EPISODES_PATH,
    update_chunk_file_indices,
 )
-from .video_utils import encode_video_frames, get_video_info
+from .video_utils import VideoEncoderConfig, encode_video_frames, get_video_info


 def _load_episode_with_stats(src_dataset: LeRobotDataset, episode_idx: int) -> dict:
@@ -92,6 +92,7 @@ def delete_episodes(
    episode_indices: list[int],
    output_dir: str | Path | None = None,
    repo_id: str | None = None,
+    camera_encoder_config: VideoEncoderConfig | None = None,
 ) -> LeRobotDataset:
    """Delete episodes from a LeRobotDataset and create a new dataset.

@@ -100,6 +101,7 @@ def delete_episodes(
        episode_indices: List of episode indices to delete.
        output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig.
        repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig.
+        camera_encoder_config: Video encoder settings used when re-encoding video segments (default: :class:`VideoEncoderConfig()`).
    """
    if not episode_indices:
        raise ValueError("No episodes to delete")
@@ -132,7 +134,7 @@ def delete_episodes(

    video_metadata = None
    if dataset.meta.video_keys:
-        video_metadata = _copy_and_reindex_videos(dataset, new_meta, episode_mapping)
+        video_metadata = _copy_and_reindex_videos(dataset, new_meta, episode_mapping, camera_encoder_config)

    data_metadata = _copy_and_reindex_data(dataset, new_meta, episode_mapping)

@@ -154,6 +156,7 @@ def split_dataset(
    dataset: LeRobotDataset,
    splits: dict[str, float | list[int]],
    output_dir: str | Path | None = None,
+    camera_encoder_config: VideoEncoderConfig | None = None,
 ) -> dict[str, LeRobotDataset]:
    """Split a LeRobotDataset into multiple smaller datasets.

@@ -162,6 +165,7 @@ def split_dataset(
        splits: Either a dict mapping split names to episode indices, or a dict mapping
                split names to fractions (must sum to <= 1.0).
        output_dir: Root directory where the split datasets will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id.
+        camera_encoder_config: Video encoder settings used when re-encoding video segments (default: :class:`VideoEncoderConfig()`).

    Examples:
      Split by specific episodes
@@ -222,7 +226,9 @@ def split_dataset(

        video_metadata = None
        if dataset.meta.video_keys:
-            video_metadata = _copy_and_reindex_videos(dataset, new_meta, episode_mapping)
+            video_metadata = _copy_and_reindex_videos(
+                dataset, new_meta, episode_mapping, camera_encoder_config
+            )

        data_metadata = _copy_and_reindex_data(dataset, new_meta, episode_mapping)

@@ -578,8 +584,7 @@ def _keep_episodes_from_video_with_av(
    output_path: Path,
    episodes_to_keep: list[tuple[int, int]],
    fps: float,
-    vcodec: str = "libsvtav1",
-    pix_fmt: str = "yuv420p",
+    camera_encoder_config: VideoEncoderConfig | None = None,
 ) -> None:
    """Keep only specified episodes from a video file using PyAV.

@@ -593,9 +598,10 @@ def _keep_episodes_from_video_with_av(
            Ranges are half-open intervals: [start_frame, end_frame), where start_frame
            is inclusive and end_frame is exclusive.
        fps: Frame rate of the video.
-        vcodec: Video codec to use for encoding.
-        pix_fmt: Pixel format for output video.
+        camera_encoder_config: Video encoder settings (default: :class:`VideoEncoderConfig()`).
    """
+    if camera_encoder_config is None:
+        camera_encoder_config = VideoEncoderConfig()
    from fractions import Fraction

    import av
@@ -619,12 +625,12 @@ def _keep_episodes_from_video_with_av(

    # Convert fps to Fraction for PyAV compatibility.
    fps_fraction = Fraction(fps).limit_denominator(1000)
-    v_out = out.add_stream(vcodec, rate=fps_fraction)
+    v_out = out.add_stream(camera_encoder_config.vcodec, rate=fps_fraction)

    # PyAV type stubs don't distinguish video streams from audio/subtitle streams.
    v_out.width = v_in.codec_context.width
    v_out.height = v_in.codec_context.height
-    v_out.pix_fmt = pix_fmt
+    v_out.pix_fmt = camera_encoder_config.pix_fmt

    # Set time_base to match the frame rate for proper timestamp handling.
    v_out.time_base = Fraction(1, int(fps))
@@ -687,8 +693,7 @@ def _copy_and_reindex_videos(
    src_dataset: LeRobotDataset,
    dst_meta: LeRobotDatasetMetadata,
    episode_mapping: dict[int, int],
-    vcodec: str = "libsvtav1",
-    pix_fmt: str = "yuv420p",
+    camera_encoder_config: VideoEncoderConfig | None = None,
 ) -> dict[int, dict]:
    """Copy and filter video files, only re-encoding files with deleted episodes.

@@ -700,10 +705,13 @@ def _copy_and_reindex_videos(
        src_dataset: Source dataset to copy from
        dst_meta: Destination metadata object
        episode_mapping: Mapping from old episode indices to new indices
+        camera_encoder_config: Video encoder settings used when re-encoding segments (default: :class:`VideoEncoderConfig()`).

    Returns:
        dict mapping episode index to its video metadata (chunk_index, file_index, timestamps)
    """
+    if camera_encoder_config is None:
+        camera_encoder_config = VideoEncoderConfig()
    if src_dataset.meta.episodes is None:
        src_dataset.meta.episodes = load_episodes(src_dataset.meta.root)

@@ -792,8 +800,7 @@ def _copy_and_reindex_videos(
                    dst_video_path,
                    episodes_to_keep_ranges,
                    src_dataset.meta.fps,
-                    vcodec,
-                    pix_fmt,
+                    camera_encoder_config,
                )

                cumulative_ts = 0.0
@@ -897,14 +904,10 @@ def _copy_and_reindex_episodes_metadata(

    dst_meta.finalize()

-    dst_meta.info.update(
-        {
-            "total_episodes": len(episode_mapping),
-            "total_frames": total_frames,
-            "total_tasks": len(dst_meta.tasks) if dst_meta.tasks is not None else 0,
-            "splits": {"train": f"0:{len(episode_mapping)}"},
-        }
-    )
+    dst_meta.info.total_episodes = len(episode_mapping)
+    dst_meta.info.total_frames = total_frames
+    dst_meta.info.total_tasks = len(dst_meta.tasks) if dst_meta.tasks is not None else 0
+    dst_meta.info.splits = {"train": f"0:{len(episode_mapping)}"}
    write_info(dst_meta.info, dst_meta.root)

    if not all_stats:
@@ -1069,21 +1072,20 @@ def _copy_episodes_metadata_and_stats(
    if episodes_dir.exists():
        shutil.copytree(episodes_dir, dst_episodes_dir, dirs_exist_ok=True)

-    dst_meta.info.update(
-        {
-            "total_episodes": src_dataset.meta.total_episodes,
-            "total_frames": src_dataset.meta.total_frames,
-            "total_tasks": src_dataset.meta.total_tasks,
-            "splits": src_dataset.meta.info.get("splits", {"train": f"0:{src_dataset.meta.total_episodes}"}),
-        }
+    dst_meta.info.total_episodes = src_dataset.meta.total_episodes
+    dst_meta.info.total_frames = src_dataset.meta.total_frames
+    dst_meta.info.total_tasks = src_dataset.meta.total_tasks
+    # Preserve original splits if available, otherwise create default
+    dst_meta.info.splits = (
+        src_dataset.meta.info.splits
+        if src_dataset.meta.info.splits
+        else {"train": f"0:{src_dataset.meta.total_episodes}"}
    )

    if dst_meta.video_keys and src_dataset.meta.video_keys:
        for key in dst_meta.video_keys:
            if key in src_dataset.meta.features:
-                dst_meta.info["features"][key]["info"] = src_dataset.meta.info["features"][key].get(
-                    "info", {}
-                )
+                dst_meta.info.features[key]["info"] = src_dataset.meta.info.features[key].get("info", {})

    write_info(dst_meta.info, dst_meta.root)

@@ -1269,11 +1271,7 @@ def _estimate_frame_size_via_calibration(
    episode_indices: list[int],
    temp_dir: Path,
    fps: int,
-    vcodec: str,
-    pix_fmt: str,
-    g: int,
-    crf: int,
-    fast_decode: int,
+    camera_encoder_config: VideoEncoderConfig,
    num_calibration_frames: int = 30,
 ) -> float:
    """Estimate MB per frame by encoding a small calibration sample.
@@ -1287,11 +1285,7 @@ def _estimate_frame_size_via_calibration(
        episode_indices: List of episode indices being processed.
        temp_dir: Temporary directory for calibration files.
        fps: Frames per second for video encoding.
-        vcodec: Video codec (libsvtav1, h264, hevc).
-        pix_fmt: Pixel format (yuv420p, etc.).
-        g: GOP size (group of pictures).
-        crf: Constant Rate Factor (quality).
-        fast_decode: Fast decode tuning parameter.
+        camera_encoder_config: Video encoder settings used for calibration encoding.
        num_calibration_frames: Number of frames to use for calibration (default: 30).

    Returns:
@@ -1327,11 +1321,7 @@ def _estimate_frame_size_via_calibration(
            imgs_dir=calibration_dir,
            video_path=calibration_video_path,
            fps=fps,
-            vcodec=vcodec,
-            pix_fmt=pix_fmt,
-            g=g,
-            crf=crf,
-            fast_decode=fast_decode,
+            camera_encoder_config=camera_encoder_config,
            overwrite=True,
        )

@@ -1525,7 +1515,7 @@ def modify_tasks(
    write_tasks(new_task_df, root)

    # Update info.json
-    dataset.meta.info["total_tasks"] = len(unique_tasks)
+    dataset.meta.info.total_tasks = len(unique_tasks)
    write_info(dataset.meta.info, root)

    # Reload metadata to reflect changes
@@ -1649,11 +1639,7 @@ def convert_image_to_video_dataset(
    dataset: LeRobotDataset,
    output_dir: Path | None = None,
    repo_id: str | None = None,
-    vcodec: str = "libsvtav1",
-    pix_fmt: str = "yuv420p",
-    g: int = 2,
-    crf: int = 30,
-    fast_decode: int = 0,
+    camera_encoder_config: VideoEncoderConfig | None = None,
    episode_indices: list[int] | None = None,
    num_workers: int = 4,
    max_episodes_per_batch: int | None = None,
@@ -1668,11 +1654,7 @@ def convert_image_to_video_dataset(
        dataset: The source LeRobot dataset with images
        output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig.
        repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig.
-        vcodec: Video codec (default: libsvtav1)
-        pix_fmt: Pixel format (default: yuv420p)
-        g: Group of pictures size (default: 2)
-        crf: Constant rate factor (default: 30)
-        fast_decode: Fast decode tuning (default: 0)
+        camera_encoder_config: Video encoder settings (default: :class:`VideoEncoderConfig()`).
        episode_indices: List of episode indices to convert (None = all episodes)
        num_workers: Number of threads for parallel processing (default: 4)
        max_episodes_per_batch: Maximum episodes per video batch to avoid memory issues (None = no limit)
@@ -1681,6 +1663,9 @@ def convert_image_to_video_dataset(
    Returns:
        New LeRobotDataset with images encoded as videos
    """
+    if camera_encoder_config is None:
+        camera_encoder_config = VideoEncoderConfig()
+
    # Check that it's an image dataset
    if len(dataset.meta.video_keys) > 0:
        raise ValueError(
@@ -1704,7 +1689,10 @@ def convert_image_to_video_dataset(
    logging.info(
        f"Converting {len(episode_indices)} episodes with {len(img_keys)} cameras from {dataset.repo_id}"
    )
-    logging.info(f"Video codec: {vcodec}, pixel format: {pix_fmt}, GOP: {g}, CRF: {crf}")
+    logging.info(
+        f"Video codec: {camera_encoder_config.vcodec}, pixel format: {camera_encoder_config.pix_fmt}, "
+        f"GOP: {camera_encoder_config.g}, CRF: {camera_encoder_config.crf}"
+    )

    # Create new features dict, converting image features to video features
    new_features = {}
@@ -1774,11 +1762,7 @@ def convert_image_to_video_dataset(
                episode_indices=episode_indices,
                temp_dir=temp_dir,
                fps=fps,
-                vcodec=vcodec,
-                pix_fmt=pix_fmt,
-                g=g,
-                crf=crf,
-                fast_decode=fast_decode,
+                camera_encoder_config=camera_encoder_config,
            )

            logging.info(f"Processing camera: {img_key}")
@@ -1820,11 +1804,7 @@ def convert_image_to_video_dataset(
                    imgs_dir=imgs_dir,
                    video_path=video_path,
                    fps=fps,
-                    vcodec=vcodec,
-                    pix_fmt=pix_fmt,
-                    g=g,
-                    crf=crf,
-                    fast_decode=fast_decode,
+                    camera_encoder_config=camera_encoder_config,
                    overwrite=True,
                )

@@ -1858,10 +1838,10 @@ def convert_image_to_video_dataset(
        episodes_df.to_parquet(episodes_path, index=False)

        # Update metadata info
-        new_meta.info["total_episodes"] = len(episode_indices)
-        new_meta.info["total_frames"] = sum(ep["length"] for ep in all_episode_metadata.values())
-        new_meta.info["total_tasks"] = dataset.meta.total_tasks
-        new_meta.info["splits"] = {"train": f"0:{len(episode_indices)}"}
+        new_meta.info.total_episodes = len(episode_indices)
+        new_meta.info.total_frames = sum(ep["length"] for ep in all_episode_metadata.values())
+        new_meta.info.total_tasks = dataset.meta.total_tasks
+        new_meta.info.splits = {"train": f"0:{len(episode_indices)}"}

        # Update video info for all image keys (now videos)
        # We need to manually set video info since update_video_info() checks video_keys first
@@ -1870,7 +1850,9 @@ def convert_image_to_video_dataset(
                video_path = new_meta.root / new_meta.video_path.format(
                    video_key=img_key, chunk_index=0, file_index=0
                )
-                new_meta.info["features"][img_key]["info"] = get_video_info(video_path)
+                new_meta.info.features[img_key]["info"] = get_video_info(
+                    video_path, camera_encoder_config=camera_encoder_config
+                )

        write_info(new_meta.info, new_meta.root)

@@ -46,15 +46,19 @@ from .io_utils import (
    write_info,
 )
 from .utils import (
+    DEFAULT_DEPTH_PATH,
    DEFAULT_EPISODES_PATH,
    DEFAULT_IMAGE_PATH,
    update_chunk_file_indices,
 )
 from .video_utils import (
+    DepthEncoderConfig,
    StreamingVideoEncoder,
+    VideoEncoderConfig,
    concatenate_video_files,
    encode_video_frames,
    get_video_duration_in_s,
+    is_depth_feature,
 )

 logger = logging.getLogger(__name__)
@@ -65,14 +69,19 @@ def _encode_video_worker(
    episode_index: int,
    root: Path,
    fps: int,
-    vcodec: str = "libsvtav1",
+    camera_encoder_config: VideoEncoderConfig | None = None,
    encoder_threads: int | None = None,
 ) -> Path:
    temp_path = Path(tempfile.mkdtemp(dir=root)) / f"{video_key}_{episode_index:03d}.mp4"
    fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=episode_index, frame_index=0)
    img_dir = (root / fpath).parent
    encode_video_frames(
-        img_dir, temp_path, fps, vcodec=vcodec, overwrite=True, encoder_threads=encoder_threads
+        img_dir,
+        temp_path,
+        fps,
+        camera_encoder_config=camera_encoder_config,
+        encoder_threads=encoder_threads,
+        overwrite=True,
    )
    shutil.rmtree(img_dir)
    return temp_path
@@ -89,33 +98,40 @@ class DatasetWriter:
        self,
        meta: LeRobotDatasetMetadata,
        root: Path,
-        vcodec: str,
+        camera_encoder_config: VideoEncoderConfig,
        encoder_threads: int | None,
        batch_encoding_size: int,
        streaming_encoder: StreamingVideoEncoder | None = None,
        initial_frames: int = 0,
+        depth_encoder_config: DepthEncoderConfig | None = None,
    ):
-        """Initialize the writer with metadata, codec, and encoding config.
+        """Initialize the writer with metadata, codec, and encoder config.

        Args:
            meta: Dataset metadata instance (used for feature schema, chunk
                settings, and episode persistence).
            root: Local dataset root directory.
-            vcodec: Video codec for encoding (e.g. ``'libsvtav1'``, ``'h264'``).
-            encoder_threads: Threads per encoder instance. ``None`` for auto.
+            camera_encoder_config: Video encoder settings applied to all cameras.
+            encoder_threads: Number of encoder threads (global). ``None``
+                lets the codec decide.
            batch_encoding_size: Number of episodes to accumulate before
                batch-encoding videos.
            streaming_encoder: Optional pre-built :class:`StreamingVideoEncoder`
                for real-time encoding. ``None`` disables streaming mode.
            initial_frames: Starting frame count (non-zero when resuming).
+            depth_encoder_config: Optional depth-map encoder config used in
+                place of ``camera_encoder_config`` for keys present in
+                ``meta.depth_keys``.
        """
        self._meta = meta
        self._root = root
-        self._vcodec = vcodec
+        self._camera_encoder_config = camera_encoder_config
+        self._depth_encoder_config = depth_encoder_config
        self._encoder_threads = encoder_threads
        self._batch_encoding_size = batch_encoding_size
        self._streaming_encoder = streaming_encoder

+
        # Writer state
        self.image_writer: AsyncImageWriter | None = None
        self.episode_buffer: dict = self._create_episode_buffer()
@@ -135,8 +151,16 @@ class DatasetWriter:
            ep_buffer[key] = current_ep_idx if key == "episode_index" else []
        return ep_buffer

+    def _is_depth_image_key(self, image_key: str) -> bool:
+        """Whether *image_key* is a depth feature stored as per-frame images."""
+        ft = self._meta.features.get(image_key)
+        if ft is None or ft.get("dtype") != "image":
+            return False
+        return is_depth_feature(ft.get("info") or {})
+
    def _get_image_file_path(self, episode_index: int, image_key: str, frame_index: int) -> Path:
-        fpath = DEFAULT_IMAGE_PATH.format(
+        path_template = DEFAULT_DEPTH_PATH if self._is_depth_image_key(image_key) else DEFAULT_IMAGE_PATH
+        fpath = path_template.format(
            image_key=image_key, episode_index=episode_index, frame_index=frame_index
        )
        return self._root / fpath
@@ -284,7 +308,7 @@ class DatasetWriter:
                            episode_index,
                            self._root,
                            self._meta.fps,
-                            self._vcodec,
+                            self._camera_encoder_config,
                            self._encoder_threads,
                        ): video_key
                        for video_key in self._meta.video_keys
@@ -495,7 +519,13 @@ class DatasetWriter:

        # Update video info (only needed when first episode is encoded)
        if episode_index == 0:
-            self._meta.update_video_info(video_key)
+            is_depth_key = video_key in set(self._meta.depth_keys)
+            cfg_for_info = (
+                self._depth_encoder_config
+                if is_depth_key and self._depth_encoder_config is not None
+                else self._camera_encoder_config
+            )
+            self._meta.update_video_info(video_key, camera_encoder_config=cfg_for_info)
            write_info(self._meta.info, self._meta.root)

        metadata = {
@@ -564,7 +594,12 @@ class DatasetWriter:
    def _encode_temporary_episode_video(self, video_key: str, episode_index: int) -> Path:
        """Use ffmpeg to convert frames stored as png into mp4 videos."""
        return _encode_video_worker(
-            video_key, episode_index, self._root, self._meta.fps, self._vcodec, self._encoder_threads
+            video_key,
+            episode_index,
+            self._root,
+            self._meta.fps,
+            self._camera_encoder_config,
+            self._encoder_threads,
        )

    def close_writer(self) -> None:
@@ -0,0 +1,189 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Depth encoding/decoding helpers for :class:`VideoEncoderConfig`.
+"""
+
+import math
+from typing import Literal
+
+import numpy as np
+import torch
+from numpy.typing import NDArray
+
+DEPTH_QUANT_BITS: int = 12
+DEPTH_QMAX: int = (1 << DEPTH_QUANT_BITS) - 1  # 4095
+_MM_PER_METRE: float = 1000.0
+_UINT16_MAX: int = 65535
+
+DEFAULT_DEPTH_MIN: float = 0.01
+DEFAULT_DEPTH_MAX: float = 10.0
+DEFAULT_DEPTH_SHIFT: float = 3.5
+DEFAULT_DEPTH_USE_LOG: bool = True
+
+
+def _validate_log_quant_params(depth_min: float, shift: float) -> None:
+    """Ensure ``log(depth_min + shift)`` is finite."""
+    if depth_min + shift <= 0:
+        raise ValueError(
+            f"depth_min + shift must be positive for logarithmic quantization, "
+            f"got depth_min={depth_min} + shift={shift} = {depth_min + shift}"
+        )
+
+
+def _depth_input_to_float32_and_unit(
+    depth: NDArray[np.uint16] | NDArray[np.floating] | torch.Tensor,
+    input_unit: Literal["auto", "m", "mm"],
+) -> tuple[NDArray[np.float32], Literal["m", "mm"]]:
+    """Depth as float32 in the chosen unit, plus the resolved unit."""
+    if isinstance(depth, torch.Tensor):
+        t = depth.detach().cpu()
+        arr = t.numpy()
+        is_floating = t.is_floating_point()
+    else:
+        arr = np.asarray(depth)
+        is_floating = np.issubdtype(arr.dtype, np.floating)
+
+    resolved_unit: Literal["m", "mm"]
+    if input_unit == "auto":
+        resolved_unit = "m" if is_floating else "mm"
+    else:
+        resolved_unit = input_unit
+
+    # Convert to float32 to keep typing consistency
+    return np.asarray(arr, dtype=np.float32, order="K"), resolved_unit
+
+
+def quantize_depth(
+    depth: NDArray[np.uint16] | NDArray[np.floating] | torch.Tensor,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    *,
+    input_unit: Literal["auto", "m", "mm"] = "auto",
+) -> NDArray[np.uint16]:
+    """Quantize depth to 12-bit codes (``uint16``, values ``0…DEPTH_QMAX``).
+
+    Depth maps are packed into 12-bit integer frames so they fit in standard
+    high-bit-depth pixel formats (e.g. ``yuv420p12le`` / ``gray12le``)
+    and can be encoded by widely supported video codecs (HEVC Main 12, ffv1).
+    Logarithmic quantization is the default because it allocates more quanta
+    to near-range depth, which matches the (1/depth) error profile of typical
+    depth sensors. Math is ported from BEHAVIOR-1K's ``obs_utils.py``.
+
+    **Input units**:
+
+    - ``input_unit="auto"`` (default): infer from dtype (floating = m, non-floating = mm).
+    - ``input_unit="mm"``: interpret input values as millimetres.
+    - ``input_unit="m"``: interpret input values as metres.
+
+    Quantization math runs in the **resolved input unit**. 
+    
+    ``depth_min``, ``depth_max``, and ``shift`` are always in **metres**.
+
+    Args:
+        depth: Depth map; ``torch.Tensor`` is moved to CPU for conversion.
+        depth_min: Depth (metres) at quantum ``0``.
+        depth_max: Depth (metres) at quantum :data:`DEPTH_QMAX`.
+        shift: Depth shift (metres); used in log mode. Must satisfy ``depth_min + shift > 0``.
+        use_log: If ``True`` (default), quantize in log space.
+        input_unit: Input unit policy (``"auto"``, ``"mm"``, ``"m"``).
+
+    Returns:
+        ``numpy.ndarray``, ``dtype=uint16``, same shape as ``depth``, values in
+        ``[0, DEPTH_QMAX]``.
+
+    Raises:
+        ValueError: If ``input_unit`` is not ``"auto"``, ``"mm"``, or ``"m"``.
+        ValueError: If ``use_log=True`` and ``depth_min + shift <= 0``.
+    """
+    if input_unit not in ("auto", "m", "mm"):
+        raise ValueError(f"input_unit must be 'auto', 'm', or 'mm', got {input_unit!r}")
+
+    depth_f, resolved_unit = _depth_input_to_float32_and_unit(depth, input_unit=input_unit)
+    depth_min_u = np.float32(depth_min) if resolved_unit == "m" else np.float32(depth_min * _MM_PER_METRE)
+    depth_max_u = np.float32(depth_max) if resolved_unit == "m" else np.float32(depth_max * _MM_PER_METRE)
+    shift_u = np.float32(shift) if resolved_unit == "m" else np.float32(shift * _MM_PER_METRE)
+
+    if use_log:
+        _validate_log_quant_params(depth_min, shift)
+        log_min = math.log(float(depth_min_u + shift_u))
+        log_max = math.log(float(depth_max_u + shift_u))
+        norm = (np.log(depth_f + shift_u) - log_min) / (log_max - log_min)
+    else:
+        norm = (depth_f - depth_min_u) / (depth_max_u - depth_min_u)
+
+    out = np.rint(norm * DEPTH_QMAX).clip(0, DEPTH_QMAX)
+    return out.astype(np.uint16, copy=False)
+
+
+def dequantize_depth(
+    quantized: NDArray[np.uint16] | torch.Tensor,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    *,
+    output_unit: Literal["m", "mm"] = "mm",
+) -> NDArray[np.uint16] | NDArray[np.float32]:
+    """Inverse of :func:`quantize_depth`.
+
+    Tuning arguments **must match** :func:`quantize_depth`.
+
+    Decoding inverts the same normalized code mapping as :func:`quantize_depth`
+    using ``depth_min`` / ``depth_max`` / ``shift`` (in metres), then returns
+    the requested output unit.
+
+    Args:
+        quantized: 12-bit codes ``[0, DEPTH_QMAX]``, ``dtype=uint16``.
+        depth_min, depth_max, shift, use_log: Same as :func:`quantize_depth` (metres).
+        output_unit: ``\"mm\"`` returns ``uint16`` millimetres (``rint``, clip
+            ``[0, 65535]``). ``\"m\"`` returns ``float32`` metres in
+            ``[depth_min, depth_max]``.
+
+    Returns:
+        Depth map in the requested unit and dtype.
+
+    Raises:
+        ValueError: If ``use_log=True`` and ``depth_min + shift <= 0``.
+        ValueError: If ``output_unit`` is not ``\"m\"`` or ``\"mm\"``.
+    """
+    if output_unit not in ("m", "mm"):
+        raise ValueError(f"output_unit must be 'm' or 'mm', got {output_unit!r}")
+
+    if isinstance(quantized, torch.Tensor):
+        quantized = quantized.detach().cpu().numpy()
+    q = np.asarray(quantized, dtype=np.uint16, order="K")
+    norm = q.astype(np.float32, copy=False) / DEPTH_QMAX
+
+    depth_min_mm = np.float32(depth_min * _MM_PER_METRE)
+    depth_max_mm = np.float32(depth_max * _MM_PER_METRE)
+    shift_mm = np.float32(shift * _MM_PER_METRE)
+
+    if use_log:
+        _validate_log_quant_params(depth_min, shift)
+        log_min = math.log(float(depth_min_mm + shift_mm))
+        log_max = math.log(float(depth_max_mm + shift_mm))
+        depth_mm = np.exp(norm * (log_max - log_min) + log_min) - shift_mm
+    else:
+        depth_mm = norm * (depth_max_mm - depth_min_mm) + depth_min_mm
+
+    depth_mm = np.clip(depth_mm, depth_min_mm, depth_max_mm).astype(np.float32, copy=False)
+    if output_unit == "m":
+        return (depth_mm / np.float32(_MM_PER_METRE)).astype(np.float32, copy=False)
+    mm = np.rint(depth_mm).clip(0, _UINT16_MAX)
+    return mm.astype(np.uint16, copy=False)
@@ -19,6 +19,7 @@ from pprint import pformat
 import torch

 from lerobot.configs import PreTrainedConfig
+from lerobot.configs.rewards import RewardModelConfig
 from lerobot.configs.train import TrainPipelineConfig
 from lerobot.transforms import ImageTransforms
 from lerobot.utils.constants import ACTION, IMAGENET_STATS, OBS_PREFIX, REWARD
@@ -30,12 +31,14 @@ from .streaming_dataset import StreamingLeRobotDataset


 def resolve_delta_timestamps(
-    cfg: PreTrainedConfig, ds_meta: LeRobotDatasetMetadata
+    cfg: PreTrainedConfig | RewardModelConfig, ds_meta: LeRobotDatasetMetadata
 ) -> dict[str, list] | None:
-    """Resolves delta_timestamps by reading from the 'delta_indices' properties of the PreTrainedConfig.
+    """Resolves delta_timestamps by reading from the 'delta_indices' properties of the config.

    Args:
-        cfg (PreTrainedConfig): The PreTrainedConfig to read delta_indices from.
+        cfg (PreTrainedConfig | RewardModelConfig): The config to read delta_indices from. Both
+            ``PreTrainedConfig`` and concrete ``RewardModelConfig`` subclasses expose the
+            ``{observation,action,reward}_delta_indices`` properties used below.
        ds_meta (LeRobotDatasetMetadata): The dataset from which features and fps are used to build
            delta_timestamps against.

@@ -82,7 +85,7 @@ def make_dataset(cfg: TrainPipelineConfig) -> LeRobotDataset | MultiLeRobotDatas
        ds_meta = LeRobotDatasetMetadata(
            cfg.dataset.repo_id, root=cfg.dataset.root, revision=cfg.dataset.revision
        )
-        delta_timestamps = resolve_delta_timestamps(cfg.policy, ds_meta)
+        delta_timestamps = resolve_delta_timestamps(cfg.trainable_config, ds_meta)
        if not cfg.dataset.streaming:
            dataset = LeRobotDataset(
                cfg.dataset.repo_id,
@@ -28,6 +28,7 @@ from .utils import (
    DEFAULT_DATA_PATH,
    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_PATH,
+    DatasetInfo,
 )


@@ -78,8 +79,8 @@ def create_empty_dataset_info(
    chunks_size: int | None = None,
    data_files_size_in_mb: int | None = None,
    video_files_size_in_mb: int | None = None,
-) -> dict:
-    """Create a template dictionary for a new dataset's `info.json`.
+) -> DatasetInfo:
+    """Create a template ``DatasetInfo`` object for a new dataset's ``meta/info.json``.

    Args:
        codebase_version (str): The version of the LeRobot codebase.
@@ -87,25 +88,24 @@ def create_empty_dataset_info(
        features (dict): The LeRobot features dictionary for the dataset.
        use_videos (bool): Whether the dataset will store videos.
        robot_type (str | None): The type of robot used, if any.
+        chunks_size (int | None): Max files per chunk directory. Defaults to ``DEFAULT_CHUNK_SIZE``.
+        data_files_size_in_mb (int | None): Max parquet file size in MB. Defaults to ``DEFAULT_DATA_FILE_SIZE_IN_MB``.
+        video_files_size_in_mb (int | None): Max video file size in MB. Defaults to ``DEFAULT_VIDEO_FILE_SIZE_IN_MB``.

    Returns:
-        dict: A dictionary with the initial dataset metadata.
+        DatasetInfo: A typed dataset information object with initial metadata.
    """
-    return {
-        "codebase_version": codebase_version,
-        "robot_type": robot_type,
-        "total_episodes": 0,
-        "total_frames": 0,
-        "total_tasks": 0,
-        "chunks_size": chunks_size or DEFAULT_CHUNK_SIZE,
-        "data_files_size_in_mb": data_files_size_in_mb or DEFAULT_DATA_FILE_SIZE_IN_MB,
-        "video_files_size_in_mb": video_files_size_in_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB,
-        "fps": fps,
-        "splits": {},
-        "data_path": DEFAULT_DATA_PATH,
-        "video_path": DEFAULT_VIDEO_PATH if use_videos else None,
-        "features": features,
-    }
+    return DatasetInfo(
+        codebase_version=codebase_version,
+        fps=fps,
+        features=features,
+        robot_type=robot_type,
+        chunks_size=chunks_size or DEFAULT_CHUNK_SIZE,
+        data_files_size_in_mb=data_files_size_in_mb or DEFAULT_DATA_FILE_SIZE_IN_MB,
+        video_files_size_in_mb=video_files_size_in_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+        data_path=DEFAULT_DATA_PATH,
+        video_path=DEFAULT_VIDEO_PATH if use_videos else None,
+    )


 def check_delta_timestamps(
@@ -294,10 +294,20 @@ def validate_feature_image_or_video(
    # Note: The check of pixels range ([0,1] for float and [0,255] for uint8) is done by the image writer threads.
    error_message = ""
    if isinstance(value, np.ndarray):
-        actual_shape = value.shape
-        c, h, w = expected_shape
-        if len(actual_shape) != 3 or (actual_shape != (c, h, w) and actual_shape != (h, w, c)):
-            error_message += f"The feature '{name}' of shape '{actual_shape}' does not have the expected shape '{(c, h, w)}' or '{(h, w, c)}'.\n"
+        actual_shape = tuple(value.shape)
+        expected = tuple(expected_shape)
+        if len(expected) == 2:
+            # Single-channel features (e.g. depth maps) — accept (H,W), (1,H,W), (H,W,1)
+            h, w = expected
+            valid = actual_shape in {(h, w), (1, h, w), (h, w, 1)}
+            if not valid:
+                error_message += f"The feature '{name}' of shape '{actual_shape}' does not have the expected shape '{(h, w)}', '{(1, h, w)}', or '{(h, w, 1)}'.\n"
+        elif len(expected) == 3:
+            c, h, w = expected
+            if len(actual_shape) != 3 or (actual_shape != (c, h, w) and actual_shape != (h, w, c)):
+                error_message += f"The feature '{name}' of shape '{actual_shape}' does not have the expected shape '{(c, h, w)}' or '{(h, w, c)}'.\n"
+        else:
+            error_message += f"The feature '{name}' has an unsupported expected_shape '{expected}'.\n"
    elif isinstance(value, PILImage.Image):
        pass
    else:
@@ -41,15 +41,56 @@ def safe_stop_image_writer(func):
    return wrapper


-def image_array_to_pil_image(image_array: np.ndarray, range_check: bool = True) -> PIL.Image.Image:
-    # TODO(aliberts): handle 1 channel and 4 for depth images
-    if image_array.ndim != 3:
-        raise ValueError(f"The array has {image_array.ndim} dimensions, but 3 is expected for an image.")
+# Single-channel dtypes that PIL natively maps to the matching mode
+# (``uint8`` → ``L``, ``uint16`` → ``I;16``, ``float32`` → ``F``).
+GRAYSCALE_DTYPES: tuple[np.dtype, ...] = (
+    np.dtype("uint8"),
+    np.dtype("uint16"),
+    np.dtype("float32"),
+)

+
+def image_array_to_pil_image(image_array: np.ndarray, range_check: bool = True) -> PIL.Image.Image:
+    """Convert a NumPy array to a PIL Image, preserving precision for grayscale.
+
+    Behaviour by shape:
+
+    - ``(H, W)`` or ``(1, H, W)`` / ``(H, W, 1)``: single-channel grayscale.
+      The native dtype is preserved using the matching PIL mode
+      (``L`` / ``I;16`` / ``F``). This is the path used for raw depth maps (no rescaling, clamping, or downcasting)
+    - ``(3, H, W)`` / ``(H, W, 3)``: RGB. Channels-first inputs are transposed
+      to channels-last. Float inputs in ``[0, 1]`` are scaled to ``uint8``
+      (existing behaviour, gated by ``range_check``).
+
+    Other shapes / channel counts raise ``NotImplementedError`` or
+    ``ValueError``.
+    """
+    if image_array.ndim not in (2, 3):
+        raise ValueError(
+            f"The array has {image_array.ndim} dimensions, but 2 or 3 is expected for an image."
+        )
+
+    # Squeeze 3D single-channel inputs to 2D so depth maps work whether the
+    # caller emits (H, W), (1, H, W), or (H, W, 1).
+    if image_array.ndim == 3:
+        if image_array.shape[0] == 1:
+            image_array = image_array[0]
+        elif image_array.shape[-1] == 1:
+            image_array = image_array[..., 0]
+
+    if image_array.ndim == 2:
+        if image_array.dtype not in GRAYSCALE_DTYPES:
+            raise ValueError(
+                f"Unsupported single-channel image dtype: {image_array.dtype}. "
+                f"Supported dtypes: {sorted(str(d) for d in GRAYSCALE_DTYPES)}."
+            )
+
+        return PIL.Image.fromarray(np.ascontiguousarray(image_array))
+
+    # 3D path: must be RGB (3 channels), channels-first or channels-last.
    if image_array.shape[0] == 3:
        # Transpose from pytorch convention (C, H, W) to (H, W, C)
        image_array = image_array.transpose(1, 2, 0)
-
    elif image_array.shape[-1] != 3:
        raise NotImplementedError(
            f"The image has {image_array.shape[-1]} channels, but 3 is required for now."
@@ -71,13 +112,28 @@ def image_array_to_pil_image(image_array: np.ndarray, range_check: bool = True)
    return PIL.Image.fromarray(image_array)


+def save_kwargs_for_path(fpath: Path, compress_level: int) -> dict:
+    """Pick the right format-specific kwargs for :meth:`PIL.Image.Image.save`.
+
+    PNG uses ``compress_level`` (0–9, zlib). TIFF uses ``compression`` (raw) for lossless raw depth maps.
+    """
+    suffix = Path(fpath).suffix.lower()
+    if suffix == ".png":
+        return {"compress_level": compress_level}
+    if suffix in (".tif", ".tiff"):
+        return {"compression": "raw"}
+    return {}
+
+
 def write_image(image: np.ndarray | PIL.Image.Image, fpath: Path, compress_level: int = 1):
    """
    Saves a NumPy array or PIL Image to a file.

    This function handles both NumPy arrays and PIL Image objects, converting
    the former to a PIL Image before saving. It includes error handling for
-    the save operation.
+    the save operation. The output format is inferred from the *fpath*
+    extension: ``.png`` → PNG with ``compress_level``, ``.tiff`` / ``.tif``
+    → lossless raw depth maps (TIFF).

    Args:
        image (np.ndarray | PIL.Image.Image): The image data to save.
@@ -101,7 +157,7 @@ def write_image(image: np.ndarray | PIL.Image.Image, fpath: Path, compress_level
            img = image
        else:
            raise TypeError(f"Unsupported image type: {type(image)}")
-        img.save(fpath, compress_level=compress_level)
+        img.save(fpath, **save_kwargs_for_path(Path(fpath), compress_level))
    except Exception as e:
        logger.error("Error writing image %s: %s", fpath, e)

@@ -39,6 +39,7 @@ from .utils import (
    EPISODES_DIR,
    INFO_PATH,
    STATS_PATH,
+    DatasetInfo,
    serialize_dict,
 )

@@ -115,25 +116,21 @@ def embed_images(dataset: datasets.Dataset) -> datasets.Dataset:
    return dataset


-def write_info(info: dict, local_dir: Path) -> None:
-    write_json(info, local_dir / INFO_PATH)
+def write_info(info: DatasetInfo, local_dir: Path) -> None:
+    write_json(info.to_dict(), local_dir / INFO_PATH)


-def load_info(local_dir: Path) -> dict:
+def load_info(local_dir: Path) -> DatasetInfo:
    """Load dataset info metadata from its standard file path.

-    Also converts shape lists to tuples for consistency.
-
    Args:
        local_dir (Path): The root directory of the dataset.

    Returns:
-        dict: The dataset information dictionary.
+        DatasetInfo: The typed dataset information object.
    """
-    info = load_json(local_dir / INFO_PATH)
-    for ft in info["features"].values():
-        ft["shape"] = tuple(ft["shape"])
-    return info
+    raw = load_json(local_dir / INFO_PATH)
+    return DatasetInfo.from_dict(raw)


 def write_stats(stats: dict, local_dir: Path) -> None:
@@ -35,9 +35,11 @@ from .utils import (
    is_valid_version,
 )
 from .video_utils import (
+    DepthEncoderConfig,
    StreamingVideoEncoder,
-    get_safe_default_codec,
-    resolve_vcodec,
+    VideoEncoderConfig,
+    get_safe_default_video_backend,
+    seed_depth_feature_info,
 )

 logger = logging.getLogger(__name__)
@@ -58,10 +60,11 @@ class LeRobotDataset(torch.utils.data.Dataset):
        video_backend: str | None = None,
        return_uint8: bool = False,
        batch_encoding_size: int = 1,
-        vcodec: str = "libsvtav1",
+        camera_encoder_config: VideoEncoderConfig | None = None,
+        depth_encoder_config: DepthEncoderConfig | None = None,
+        encoder_threads: int | None = None,
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
-        encoder_threads: int | None = None,
    ):
        """
        2 modes are available for instantiating this class, depending on 2 different use cases:
@@ -177,16 +180,15 @@ class LeRobotDataset(torch.utils.data.Dataset):
                You can also use the 'pyav' decoder used by Torchvision, which used to be the default option, or 'video_reader' which is another decoder of Torchvision.
            batch_encoding_size (int, optional): Number of episodes to accumulate before batch encoding videos.
                Set to 1 for immediate encoding (default), or higher for batched encoding. Defaults to 1.
-            vcodec (str, optional): Video codec for encoding videos during recording. Options: 'h264', 'hevc',
-                'libsvtav1', 'auto', or hardware-specific codecs like 'h264_videotoolbox', 'h264_nvenc'.
-                Defaults to 'libsvtav1'. Use 'auto' to auto-detect the best available hardware encoder.
+            camera_encoder_config (VideoEncoderConfig | None, optional): Video encoder settings for cameras
+                (codec, quality, etc.). Defaults to
+                :class:`~lerobot.datasets.video_utils.VideoEncoderConfig` defaults when ``None``.
+            encoder_threads (int | None, optional): Number of encoder threads (global). ``None`` lets the
+                codec decide.
            streaming_encoding (bool, optional): If True, encode video frames in real-time during capture
                instead of writing PNG images first. This makes save_episode() near-instant. Defaults to False.
            encoder_queue_maxsize (int, optional): Maximum number of frames to buffer per camera when using
                streaming encoding. Defaults to 30 (~1s at 30fps).
-            encoder_threads (int | None, optional): Number of threads per encoder instance. None lets the
-                codec auto-detect (default). Lower values reduce CPU usage per encoder. Maps to 'lp' (via svtav1-params) for
-                libsvtav1 and 'threads' for h264/hevc.

        Note:
            Write-mode parameters (``streaming_encoding``, ``batch_encoding_size``) passed to
@@ -202,10 +204,13 @@ class LeRobotDataset(torch.utils.data.Dataset):
        self.episodes = episodes
        self.tolerance_s = tolerance_s
        self.revision = revision if revision else CODEBASE_VERSION
-        self._video_backend = video_backend if video_backend else get_safe_default_codec()
+        self._video_backend = video_backend if video_backend else get_safe_default_video_backend()
        self._return_uint8 = return_uint8
        self._batch_encoding_size = batch_encoding_size
-        self._vcodec = resolve_vcodec(vcodec)
+        if camera_encoder_config is None:
+            camera_encoder_config = VideoEncoderConfig()
+        self._camera_encoder_config = camera_encoder_config
+        self._depth_encoder_config = depth_encoder_config
        self._encoder_threads = encoder_threads

        if self._requested_root is not None:
@@ -248,16 +253,23 @@ class LeRobotDataset(torch.utils.data.Dataset):
                DeprecationWarning,
                stacklevel=2,
            )
+            seed_depth_feature_info(self.meta.features, self._depth_encoder_config)
            streaming_enc = None
            if streaming_encoding and len(self.meta.video_keys) > 0:
                streaming_enc = self._build_streaming_encoder(
-                    self.meta.fps, self._vcodec, encoder_queue_maxsize, encoder_threads
+                    self.meta.fps,
+                    self._camera_encoder_config,
+                    self._encoder_threads,
+                    encoder_queue_maxsize,
+                    depth_encoder_config=self._depth_encoder_config,
+                    depth_keys=self.meta.depth_keys,
                )
            self.writer = DatasetWriter(
                meta=self.meta,
                root=self.root,
-                vcodec=self._vcodec,
-                encoder_threads=encoder_threads,
+                camera_encoder_config=self._camera_encoder_config,
+                depth_encoder_config=self._depth_encoder_config,
+                encoder_threads=self._encoder_threads,
                batch_encoding_size=batch_encoding_size,
                streaming_encoder=streaming_enc,
                initial_frames=self.meta.total_frames,
@@ -298,19 +310,20 @@ class LeRobotDataset(torch.utils.data.Dataset):
    @staticmethod
    def _build_streaming_encoder(
        fps: int,
-        vcodec: str,
-        encoder_queue_maxsize: int,
+        camera_encoder_config: VideoEncoderConfig,
        encoder_threads: int | None,
+        encoder_queue_maxsize: int,
+        *,
+        depth_encoder_config: DepthEncoderConfig | None = None,
+        depth_keys: list[str] | None = None,
    ) -> StreamingVideoEncoder:
        return StreamingVideoEncoder(
            fps=fps,
-            vcodec=vcodec,
-            pix_fmt="yuv420p",
-            g=2,
-            crf=30,
-            preset=None,
-            queue_maxsize=encoder_queue_maxsize,
+            camera_encoder_config=camera_encoder_config,
            encoder_threads=encoder_threads,
+            queue_maxsize=encoder_queue_maxsize,
+            depth_encoder_config=depth_encoder_config,
+            depth_keys=depth_keys,
        )

    # ── Metadata properties ───────────────────────────────────────────
@@ -625,11 +638,14 @@ class LeRobotDataset(torch.utils.data.Dataset):
        image_writer_threads: int = 0,
        video_backend: str | None = None,
        batch_encoding_size: int = 1,
-        vcodec: str = "libsvtav1",
+        camera_encoder_config: VideoEncoderConfig | None = None,
+        depth_encoder_config: DepthEncoderConfig | None = None,
        metadata_buffer_size: int = 10,
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
        encoder_threads: int | None = None,
+        video_files_size_in_mb: int | None = None,
+        data_files_size_in_mb: int | None = None,
    ) -> "LeRobotDataset":
        """Create a new LeRobotDataset from scratch for recording data.

@@ -654,20 +670,23 @@ class LeRobotDataset(torch.utils.data.Dataset):
            video_backend: Video decoding backend (used when reading back).
            batch_encoding_size: Number of episodes to accumulate before
                batch-encoding videos. ``1`` means encode immediately.
-            vcodec: Video codec for encoding. Options include ``'libsvtav1'``,
-                ``'h264'``, ``'hevc'``, ``'auto'``.
+            camera_encoder_config: Video encoder settings for cameras; defaults
+                match :class:`~lerobot.datasets.video_utils.VideoEncoderConfig`
+                when ``None``.
+            encoder_threads: Number of encoder threads (global). ``None``
+                lets the codec decide.
            metadata_buffer_size: Number of episode metadata records to buffer
                before flushing to parquet.
            streaming_encoding: If ``True``, encode video frames in real-time
                during capture instead of writing images first.
            encoder_queue_maxsize: Max buffered frames per camera when using
                streaming encoding.
-            encoder_threads: Threads per encoder instance. ``None`` for auto.

        Returns:
            A new :class:`LeRobotDataset` in write mode.
        """
-        vcodec = resolve_vcodec(vcodec)
+        if camera_encoder_config is None:
+            camera_encoder_config = VideoEncoderConfig()
        obj = cls.__new__(cls)
        obj.meta = LeRobotDatasetMetadata.create(
            repo_id=repo_id,
@@ -677,6 +696,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
            root=root,
            use_videos=use_videos,
            metadata_buffer_size=metadata_buffer_size,
+            video_files_size_in_mb=video_files_size_in_mb,
+            data_files_size_in_mb=data_files_size_in_mb,
        )
        obj.repo_id = obj.meta.repo_id
        obj._requested_root = obj.meta.root
@@ -686,23 +707,32 @@ class LeRobotDataset(torch.utils.data.Dataset):
        obj.image_transforms = None
        obj.delta_timestamps = None
        obj.episodes = None
-        obj._video_backend = video_backend if video_backend is not None else get_safe_default_codec()
+        obj._video_backend = video_backend if video_backend is not None else get_safe_default_video_backend()
        obj._return_uint8 = False
        obj._batch_encoding_size = batch_encoding_size
-        obj._vcodec = vcodec
+        obj._camera_encoder_config = camera_encoder_config
+        obj._depth_encoder_config = depth_encoder_config
        obj._encoder_threads = encoder_threads
+        seed_depth_feature_info(obj.meta.features, depth_encoder_config)

        # Reader is lazily created on first access (write-only mode)
        obj.reader = None

-        # Create writer
        streaming_enc = None
        if streaming_encoding and len(obj.meta.video_keys) > 0:
-            streaming_enc = cls._build_streaming_encoder(fps, vcodec, encoder_queue_maxsize, encoder_threads)
+            streaming_enc = cls._build_streaming_encoder(
+                fps,
+                camera_encoder_config,
+                encoder_threads,
+                encoder_queue_maxsize,
+                depth_encoder_config=depth_encoder_config,
+                depth_keys=obj.meta.depth_keys,
+            )
        obj.writer = DatasetWriter(
            meta=obj.meta,
            root=obj.root,
-            vcodec=vcodec,
+            camera_encoder_config=camera_encoder_config,
+            depth_encoder_config=depth_encoder_config,
            encoder_threads=encoder_threads,
            batch_encoding_size=batch_encoding_size,
            streaming_encoder=streaming_enc,
@@ -725,12 +755,13 @@ class LeRobotDataset(torch.utils.data.Dataset):
        force_cache_sync: bool = False,
        video_backend: str | None = None,
        batch_encoding_size: int = 1,
-        vcodec: str = "libsvtav1",
+        camera_encoder_config: VideoEncoderConfig | None = None,
+        depth_encoder_config: DepthEncoderConfig | None = None,
+        encoder_threads: int | None = None,
        image_writer_processes: int = 0,
        image_writer_threads: int = 0,
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
-        encoder_threads: int | None = None,
    ) -> "LeRobotDataset":
        """Resume recording on an existing dataset.

@@ -753,13 +784,16 @@ class LeRobotDataset(torch.utils.data.Dataset):
            video_backend: Video decoding backend for reading back data.
            batch_encoding_size: Number of episodes to accumulate before
                batch-encoding videos.
-            vcodec: Video codec for encoding.
+            camera_encoder_config: Video encoder settings for cameras; defaults
+                match :class:`~lerobot.datasets.video_utils.VideoEncoderConfig`
+                when ``None``.
+            encoder_threads: Number of encoder threads (global). ``None``
+                lets the codec decide.
            image_writer_processes: Subprocesses for async image writing.
            image_writer_threads: Threads for async image writing.
            streaming_encoding: If ``True``, encode video in real-time during
                capture.
            encoder_queue_maxsize: Max buffered frames per camera for streaming.
-            encoder_threads: Threads per encoder instance. ``None`` for auto.

        Returns:
            A :class:`LeRobotDataset` in write mode, ready to append episodes.
@@ -770,7 +804,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
                "Writing into the revision-safe Hub snapshot cache (used when root=None) would corrupt "
                "the shared cache. Please provide a local directory path."
            )
-        vcodec = resolve_vcodec(vcodec)
        obj = cls.__new__(cls)
        obj.repo_id = repo_id
        obj._requested_root = Path(root)
@@ -779,11 +812,9 @@ class LeRobotDataset(torch.utils.data.Dataset):
        obj.image_transforms = None
        obj.delta_timestamps = None
        obj.episodes = None
-        obj._video_backend = video_backend if video_backend else get_safe_default_codec()
+        obj._video_backend = video_backend if video_backend else get_safe_default_video_backend()
        obj._return_uint8 = False
        obj._batch_encoding_size = batch_encoding_size
-        obj._vcodec = vcodec
-        obj._encoder_threads = encoder_threads

        if obj._requested_root is not None:
            obj._requested_root.mkdir(exist_ok=True, parents=True)
@@ -792,21 +823,33 @@ class LeRobotDataset(torch.utils.data.Dataset):
        obj.meta = LeRobotDatasetMetadata(
            obj.repo_id, obj._requested_root, obj.revision, force_cache_sync=force_cache_sync
        )
+
+        if camera_encoder_config is None:
+            camera_encoder_config = VideoEncoderConfig()
+        obj._camera_encoder_config = camera_encoder_config
+        obj._depth_encoder_config = depth_encoder_config
+        obj._encoder_threads = encoder_threads
        obj.root = obj.meta.root
+        seed_depth_feature_info(obj.meta.features, depth_encoder_config)

        # Reader is lazily created on first access (write-only mode)
        obj.reader = None

-        # Create writer for appending
        streaming_enc = None
        if streaming_encoding and len(obj.meta.video_keys) > 0:
            streaming_enc = cls._build_streaming_encoder(
-                obj.meta.fps, vcodec, encoder_queue_maxsize, encoder_threads
+                obj.meta.fps,
+                camera_encoder_config,
+                encoder_threads,
+                encoder_queue_maxsize,
+                depth_encoder_config=depth_encoder_config,
+                depth_keys=obj.meta.depth_keys,
            )
        obj.writer = DatasetWriter(
            meta=obj.meta,
            root=obj.root,
-            vcodec=vcodec,
+            camera_encoder_config=camera_encoder_config,
+            depth_encoder_config=depth_encoder_config,
            encoder_threads=encoder_threads,
            batch_encoding_size=batch_encoding_size,
            streaming_encoder=streaming_enc,
@@ -123,7 +123,7 @@ class MultiLeRobotDataset(torch.utils.data.Dataset):

        NOTE: Fow now, this relies on a check in __init__ to make sure all sub-datasets have the same info.
        """
-        return self._datasets[0].meta.info["fps"]
+        return self._datasets[0].meta.info.fps

    @property
    def video(self) -> bool:
@@ -133,7 +133,7 @@ class MultiLeRobotDataset(torch.utils.data.Dataset):

        NOTE: Fow now, this relies on a check in __init__ to make sure all sub-datasets have the same info.
        """
-        return self._datasets[0].meta.info.get("video", False)
+        return len(self._datasets[0].meta.video_keys) > 0

    @property
    def features(self) -> datasets.Features:
@@ -0,0 +1,311 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyAV-based compatibility checks for :class:`VideoEncoderConfig`.
+
+Centralises all :mod:`av` introspection of the bundled FFmpeg build.
+Checks degrade to a no-op when the target codec isn't available locally.
+"""
+
+from __future__ import annotations
+
+import functools
+import logging
+from typing import TYPE_CHECKING, Any, Literal
+
+import av
+import numpy as np
+import torch
+
+from lerobot.datasets.depth_utils import (
+    DEFAULT_DEPTH_MAX,
+    DEFAULT_DEPTH_MIN,
+    DEFAULT_DEPTH_SHIFT,
+    DEFAULT_DEPTH_USE_LOG,
+    quantize_depth,
+    dequantize_depth,
+)
+
+if TYPE_CHECKING:
+    from lerobot.datasets.video_utils import VideoEncoderConfig
+
+logger = logging.getLogger(__name__)
+
+# Pixel formats supported by the depth encode/decode helpers below. Both are
+# 16-bit-word formats that carry 12 significant bits per sample, matching the
+# ``DEPTH_QMAX = 4095`` quantization range.
+DEPTH_PIX_FMTS: tuple[str, ...] = ("yuv420p12le", "gray12le")
+
+# Neutral chroma for 12-bit YUV (the midpoint of [0, 4095]). Filling the U/V
+# planes with this value keeps the encoder from spending bits on chroma noise
+# when only the Y plane carries information.
+_NEUTRAL_CHROMA_12BIT: int = 2048
+
+FFMPEG_NUMERIC_OPTION_TYPES = ("INT", "INT64", "UINT64", "FLOAT", "DOUBLE")
+FFMPEG_INTEGER_OPTION_TYPES = ("INT", "INT64", "UINT64")
+
+
+def _write_u16_plane(plane: av.video.plane.VideoPlane, src: np.ndarray, fill_value: int | None = None) -> None:
+    """Copy ``src`` into a uint16 plane respecting FFmpeg line padding."""
+    height, width = src.shape
+    stride_u16 = plane.line_size // np.dtype(np.uint16).itemsize
+    dst = np.frombuffer(plane, dtype=np.uint16).reshape(height, stride_u16)
+    if fill_value is not None:
+        dst.fill(fill_value)
+    dst[:, :width] = src
+
+
+def encode_depth_frame_pyav(
+    depth: np.ndarray | torch.Tensor,
+    *,
+    pix_fmt: str = "yuv420p12le",
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    input_unit: Literal["auto", "m", "mm"] = "auto",
+) -> av.VideoFrame:
+    """Quantize depth and pack it into a 12-bit PyAV video frame.
+
+    Args:
+        depth: Depth frame to encode (H, W). Unit handling follows
+            :func:`lerobot.datasets.depth_utils.quantize_depth`.
+        pix_fmt: Target pixel format. Must be one of :data:`DEPTH_PIX_FMTS`.
+        depth_min, depth_max, shift, use_log, input_unit: Forwarded to
+            :func:`quantize_depth`.
+
+    Returns:
+        An :class:`av.VideoFrame` in ``pix_fmt`` with quantized depth in the
+        luminance plane.
+    """
+    if pix_fmt not in DEPTH_PIX_FMTS:
+        raise ValueError(f"Unsupported depth pix_fmt={pix_fmt!r}; expected one of {DEPTH_PIX_FMTS}")
+
+    quantized_depth = quantize_depth(
+        depth,
+        depth_min=depth_min,
+        depth_max=depth_max,
+        shift=shift,
+        use_log=use_log,
+        input_unit=input_unit,
+    )
+    if quantized_depth.ndim != 2:
+        raise ValueError(f"depth must be a 2D frame; got shape {quantized_depth.shape}")
+
+    quantized_depth = np.ascontiguousarray(quantized_depth, dtype=np.uint16)
+    height, width = quantized_depth.shape
+
+    if pix_fmt == "gray12le":
+        frame = av.VideoFrame(width=width, height=height, format="gray12le")
+        _write_u16_plane(frame.planes[0], quantized_depth)
+        return frame
+
+    if height % 2 != 0 or width % 2 != 0:
+        raise ValueError("yuv420p12le requires even H and W")
+
+    frame = av.VideoFrame(width=width, height=height, format="yuv420p12le")
+    _write_u16_plane(frame.planes[0], quantized_depth)
+    neutral_chroma = np.full((height // 2, width // 2), _NEUTRAL_CHROMA_12BIT, dtype=np.uint16)
+    _write_u16_plane(frame.planes[1], neutral_chroma, fill_value=_NEUTRAL_CHROMA_12BIT)
+    _write_u16_plane(frame.planes[2], neutral_chroma, fill_value=_NEUTRAL_CHROMA_12BIT)
+    return frame
+
+
+def decode_depth_frame_pyav(
+    frame: av.VideoFrame | list[av.VideoFrame],
+    *,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    return_quantized: bool = False,
+    output_unit: Literal["m", "mm"] = "m",
+) -> np.ndarray:
+    """Decode one or many depth video frames to quantized or metric depth.
+
+    Args:
+        frame: A single depth frame or a list of depth frames.
+        depth_min, depth_max, shift, use_log: Forwarded to
+            :func:`dequantize_depth`.
+        return_quantized: If ``True``, return raw 12-bit quanta as ``uint16``.
+        output_unit: Unit for dequantized output (``"m"`` or ``"mm"``).
+
+    Returns:
+        ``(H, W)`` array for a single frame, or ``(N, H, W)`` for a list.
+    """
+    frames = frame if isinstance(frame, list) else [frame]
+    quantized = np.stack([f.reformat(format="gray12le").to_ndarray() for f in frames]).astype(np.uint16, copy=False)
+    if return_quantized:
+        return quantized[0] if len(frames) == 1 else quantized
+
+    decoded = dequantize_depth(
+        quantized,
+        depth_min=depth_min,
+        depth_max=depth_max,
+        shift=shift,
+        use_log=use_log,
+        output_unit=output_unit,
+    )
+    return decoded[0] if len(frames) == 1 else decoded
+
+
+@functools.cache
+def get_codec(vcodec: str) -> av.codec.Codec | None:
+    """PyAV write-mode ``Codec`` for *vcodec*, or ``None`` if unavailable."""
+    try:
+        return av.codec.Codec(vcodec, "w")
+    except Exception:
+        return None
+
+
+@functools.cache
+def _get_codec_video_formats(vcodec: str) -> dict[str, av.option.Option]:
+    """Private-option name → PyAV ``Option`` for *vcodec* (empty if unavailable)."""
+    codec = get_codec(vcodec)
+    if codec is None:
+        return {}
+    return {opt.name: opt for opt in codec.descriptor.options}
+
+
+@functools.cache
+def _get_codec_video_formats(vcodec: str) -> tuple[str, ...]:
+    """Pixel formats accepted by *vcodec* in PyAV's preferred order (empty if unknown)."""
+    codec = get_codec(vcodec)
+    if codec is None:
+        return ()
+    return tuple(fmt.name for fmt in (codec.video_formats or []))
+
+
+def detect_available_encoders_pyav(encoders: list[str] | str) -> list[str]:
+    """Return the subset of *encoders* available as video encoders in the local FFmpeg build.
+
+    Each name is probed directly via :func:`get_codec`; input order is preserved.
+    """
+    if isinstance(encoders, str):
+        encoders = [encoders]
+
+    available: list[str] = []
+    for name in encoders:
+        codec = get_codec(name)
+        if codec is not None and codec.type == "video":
+            available.append(name)
+        else:
+            logger.debug("encoder '%s' not available as video encoder", name)
+    return available
+
+
+def _check_option_value(vcodec: str, label: str, value: Any, opt: av.option.Option) -> None:
+    """Range-check numeric *value* and choice-check string *value* against *opt*."""
+    type_name = opt.type.name
+    if type_name in FFMPEG_NUMERIC_OPTION_TYPES:
+        if isinstance(value, bool):
+            raise ValueError(
+                f"{label}={value!r} is not numeric; codec {vcodec!r} expects a number for this option."
+            )
+        elif isinstance(value, str):
+            try:
+                num_val = float(value)
+            except ValueError as e:
+                raise ValueError(
+                    f"{label}={value!r} is not numeric; codec {vcodec!r} expects a number for this option."
+                ) from e
+        elif isinstance(value, (float, int)):
+            num_val = value
+        else:
+            raise ValueError(
+                f"{label}={value!r} is not numeric; codec {vcodec!r} expects a number for this option."
+            )
+
+        # Check integer type compatibility
+        if type_name in FFMPEG_INTEGER_OPTION_TYPES and not num_val.is_integer():
+            raise ValueError(
+                f"{label}={num_val!r} must be an integer for codec {vcodec!r} "
+                f"(FFmpeg option {opt.name!r} is {type_name}); float values are not allowed."
+            )
+
+        # Check numeric range compatibility
+        lo, hi = float(opt.min), float(opt.max)
+        if lo < hi and not (lo <= num_val <= hi):
+            raise ValueError(
+                f"{label}={num_val} is out of range for codec {vcodec!r}; must be in [{lo}, {hi}]"
+            )
+
+    elif type_name == "STRING":
+        if isinstance(value, bool):
+            raise ValueError(f"{label}={value!r} is not a valid string value for codec {vcodec!r}.")
+        if isinstance(value, str):
+            str_val = value
+        elif isinstance(value, (int, float)):
+            str_val = str(value)
+        else:
+            raise ValueError(f"{label}={value!r} has unsupported type for STRING option on codec {vcodec!r}")
+
+        # Check string choice compatibility
+        choices = [c.name for c in (opt.choices or [])]
+        if choices and str_val not in choices:
+            raise ValueError(
+                f"{label}={str_val!r} is not a supported choice for codec "
+                f"{vcodec!r}; valid choices: {choices}"
+            )
+    else:
+        return
+
+
+def _check_pixel_format(vcodec: str, pix_fmt: str) -> None:
+    formats = _get_codec_video_formats(vcodec)
+    if formats and pix_fmt not in formats:
+        raise ValueError(
+            f"pix_fmt={pix_fmt!r} is not supported by codec {vcodec!r}; "
+            f"supported pixel formats: {list(formats)}"
+        )
+
+
+def _check_codec_options(vcodec: str, codec_options: dict[str, Any], config: VideoEncoderConfig) -> None:
+    """Validate merged encoder options (typed) against the codec's published AVOptions."""
+    supported_options = _get_codec_options_by_name(vcodec)
+    for key, value in codec_options.items():
+        # GOP size is not a codec-specific option, it has to be validated separately.
+        if key == "g":
+            if isinstance(value, bool) or not isinstance(value, int) or value < 1:
+                raise ValueError(f"g={value!r} must be a positive integer for codec {vcodec!r}")
+            continue
+        if key not in supported_options:
+            continue
+        opt = supported_options[key]
+        label = f"extra_options[{key!r}]" if key in config.extra_options else key
+        _check_option_value(vcodec, label, value, opt)
+
+
+def check_video_encoder_config_pyav(config: VideoEncoderConfig) -> None:
+    """Verify *config* is compatible with the bundled FFmpeg build.
+
+    Checks pixel format, abstract tuning-field compatibility, and each merged
+    encoder option from :meth:`~lerobot.datasets.video_utils.VideoEncoderConfig.get_codec_options`
+    against PyAV (including numeric ``extra_options`` present in that dict).
+    No-op when ``config.vcodec`` isn't in the local FFmpeg build.
+
+    Raises:
+        ValueError: on the first incompatibility encountered.
+    """
+    vcodec = config.vcodec
+    options = _get_codec_options_by_name(vcodec)
+    if not options:
+        logger.warning(
+            "Codec %r is not available in the bundled FFmpeg build; ",
+            vcodec,
+        )
+        return
+    _check_pixel_format(config.vcodec, config.pix_fmt)
+    _check_codec_options(config.vcodec, config.get_codec_options(), config)
@@ -434,7 +434,7 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):

    def _make_padding_camera_frame(self, camera_key: str):
        """Variable-shape padding frame for given camera keys, given in (H, W, C)"""
-        return torch.zeros(self.meta.info["features"][camera_key]["shape"]).permute(-1, 0, 1)
+        return torch.zeros(self.meta.info.features[camera_key]["shape"]).permute(-1, 0, 1)

    def _get_video_frame_padding_mask(
        self,
@@ -14,9 +14,11 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import contextlib
+import dataclasses
 import importlib.resources
 import json
 import logging
+from dataclasses import dataclass, field
 from pathlib import Path

 import datasets
@@ -70,9 +72,12 @@ class ForwardCompatibilityError(CompatibilityError):
        super().__init__(message)


+logger = logging.getLogger(__name__)
+
+
 DEFAULT_CHUNK_SIZE = 1000  # Max number of files per chunk
-DEFAULT_DATA_FILE_SIZE_IN_MB = 50  # Max size per file
-DEFAULT_VIDEO_FILE_SIZE_IN_MB = 100  # Max size per file
+DEFAULT_DATA_FILE_SIZE_IN_MB = 100  # Max size per file
+DEFAULT_VIDEO_FILE_SIZE_IN_MB = 200  # Max size per file

 INFO_PATH = "meta/info.json"
 STATS_PATH = "meta/stats.json"
@@ -88,12 +93,133 @@ DEFAULT_EPISODES_PATH = EPISODES_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
 DEFAULT_DATA_PATH = DATA_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
 DEFAULT_VIDEO_PATH = VIDEO_DIR + "/{video_key}/" + CHUNK_FILE_PATTERN + ".mp4"
 DEFAULT_IMAGE_PATH = "images/{image_key}/episode-{episode_index:06d}/frame-{frame_index:06d}.png"
+# Depth maps live alongside images on disk but use TIFF instead of PNG: PNG
+# cannot natively round-trip float32, and several common loaders silently
+# downcast 16-bit grayscale.
+DEFAULT_DEPTH_PATH = "images/{image_key}/episode-{episode_index:06d}/frame-{frame_index:06d}.tiff"

 LEGACY_EPISODES_PATH = "meta/episodes.jsonl"
 LEGACY_EPISODES_STATS_PATH = "meta/episodes_stats.jsonl"
 LEGACY_TASKS_PATH = "meta/tasks.jsonl"


+@dataclass
+class DatasetInfo:
+    """Typed representation of the ``meta/info.json`` file for a LeRobot dataset.
+
+    Replaces the previously untyped ``dict`` returned by ``load_info()`` and
+    created by ``create_empty_dataset_info()``.  Using a dataclass provides
+    explicit field definitions, IDE auto-completion, and validation at
+    construction time.
+    """
+
+    codebase_version: str
+    fps: int
+    features: dict[str, dict]
+
+    # Episode / frame counters — start at zero for new datasets
+    total_episodes: int = 0
+    total_frames: int = 0
+    total_tasks: int = 0
+
+    # Storage settings
+    chunks_size: int = field(default=DEFAULT_CHUNK_SIZE)
+    data_files_size_in_mb: int = field(default=DEFAULT_DATA_FILE_SIZE_IN_MB)
+    video_files_size_in_mb: int = field(default=DEFAULT_VIDEO_FILE_SIZE_IN_MB)
+
+    # File path templates
+    data_path: str = field(default=DEFAULT_DATA_PATH)
+    video_path: str | None = field(default=DEFAULT_VIDEO_PATH)
+
+    # Optional metadata
+    robot_type: str | None = None
+    splits: dict[str, str] = field(default_factory=dict)
+
+    def __post_init__(self) -> None:
+        # Coerce feature shapes from list to tuple — JSON deserialisation
+        # returns lists, but the rest of the codebase expects tuples.
+        for ft in self.features.values():
+            if isinstance(ft.get("shape"), list):
+                ft["shape"] = tuple(ft["shape"])
+
+        if self.fps <= 0:
+            raise ValueError(f"fps must be positive, got {self.fps}")
+        if self.chunks_size <= 0:
+            raise ValueError(f"chunks_size must be positive, got {self.chunks_size}")
+        if self.data_files_size_in_mb <= 0:
+            raise ValueError(f"data_files_size_in_mb must be positive, got {self.data_files_size_in_mb}")
+        if self.video_files_size_in_mb <= 0:
+            raise ValueError(f"video_files_size_in_mb must be positive, got {self.video_files_size_in_mb}")
+
+    def to_dict(self) -> dict:
+        """Return a JSON-serialisable dict.
+
+        Converts tuple shapes back to lists so ``json.dump`` can handle them.
+        """
+        d = dataclasses.asdict(self)
+        for ft in d["features"].values():
+            if isinstance(ft.get("shape"), tuple):
+                ft["shape"] = list(ft["shape"])
+        return d
+
+    @classmethod
+    def from_dict(cls, data: dict) -> "DatasetInfo":
+        """Construct from a raw dict (e.g. loaded directly from JSON).
+
+        Unknown keys are ignored for forward compatibility with datasets that
+        carry additional fields (e.g. ``total_videos`` from v2.x). A warning is
+        logged when such fields are present.
+        """
+        known = {f.name for f in dataclasses.fields(cls)}
+        unknown = sorted(k for k in data if k not in known)
+        if unknown:
+            logger.warning(f"Unknown fields in DatasetInfo: {unknown}. These will be ignored.")
+        return cls(**{k: v for k, v in data.items() if k in known})
+
+    # ---------------------------------------------------------------------------
+    # Temporary dict-style compatibility layer
+    # Allows existing ``info["key"]`` call-sites to keep working without changes.
+    # Once all callers have been migrated to attribute access, remove these.
+    # ---------------------------------------------------------------------------
+    def __getitem__(self, key: str):
+        import warnings
+
+        warnings.warn(
+            f"Accessing DatasetInfo with dict-style syntax info['{key}'] is deprecated. "
+            f"Use attribute access info.{key} instead.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        try:
+            return getattr(self, key)
+        except AttributeError as err:
+            raise KeyError(key) from err
+
+    def __setitem__(self, key: str, value) -> None:
+        import warnings
+
+        warnings.warn(
+            f"Setting DatasetInfo with dict-style syntax info['{key}'] = ... is deprecated. "
+            f"Use attribute assignment info.{key} = ... instead.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        if not hasattr(self, key):
+            raise KeyError(f"DatasetInfo has no field '{key}'")
+        setattr(self, key, value)
+
+    def __contains__(self, key: str) -> bool:
+        """Check if a field exists (dict-like interface)."""
+        return hasattr(self, key)
+
+    def get(self, key: str, default=None):
+        """Get attribute value with default fallback (dict-like interface)."""
+        try:
+            return getattr(self, key)
+        except AttributeError:
+            return default
+
+
 def has_legacy_hub_download_metadata(root: Path) -> bool:
    """Return ``True`` when *root* looks like a legacy Hub ``local_dir`` mirror.

@@ -294,7 +420,7 @@ def create_branch(repo_id: str, *, branch: str, repo_type: str | None = None) ->

 def create_lerobot_dataset_card(
    tags: list | None = None,
-    dataset_info: dict | None = None,
+    dataset_info: DatasetInfo | None = None,
    **kwargs,
 ) -> DatasetCard:
    """Create a `DatasetCard` for a LeRobot dataset.
@@ -305,7 +431,7 @@ def create_lerobot_dataset_card(

    Args:
        tags (list | None): A list of tags to add to the dataset card.
-        dataset_info (dict | None): The dataset's info dictionary, which will
+        dataset_info (DatasetInfo | None): The dataset's info object, which will
            be displayed on the card.
        **kwargs: Additional keyword arguments to populate the card template.

@@ -318,7 +444,7 @@ def create_lerobot_dataset_card(
        card_tags += tags
    if dataset_info:
        dataset_structure = "[meta/info.json](meta/info.json):\n"
-        dataset_structure += f"```json\n{json.dumps(dataset_info, indent=4)}\n```\n"
+        dataset_structure += f"```json\n{json.dumps(dataset_info.to_dict(), indent=4)}\n```\n"
        kwargs = {**kwargs, "dataset_structure": dataset_structure}
    card_data = DatasetCardData(
        license=kwargs.get("license"),
@@ -17,12 +17,13 @@ import contextlib
 import glob
 import importlib
 import logging
+import math
 import queue
 import shutil
 import tempfile
 import threading
 import warnings
-from dataclasses import dataclass, field
+from dataclasses import asdict, dataclass, field
 from fractions import Fraction
 from pathlib import Path
 from threading import Lock
@@ -37,7 +38,23 @@ import torchvision
 from datasets.features.features import register_feature
 from PIL import Image

-from lerobot.utils.import_utils import get_safe_default_codec
+from lerobot.datasets.pyav_utils import (
+    check_video_encoder_config_pyav,
+    depth_to_video_frame,
+    detect_available_encoders_pyav,
+    decode_depth_frame,
+    encode_depth_frame_pyav,
+    decode_depth_frame_pyav,
+)
+from lerobot.datasets.depth_utils import (
+    quantize_depth,
+    dequantize_depth,
+    DEFAULT_DEPTH_MIN,
+    DEFAULT_DEPTH_MAX,
+    DEFAULT_DEPTH_SHIFT,
+    DEFAULT_DEPTH_USE_LOG,
+)
+from lerobot.utils.import_utils import get_safe_default_video_backend

 logger = logging.getLogger(__name__)

@@ -52,70 +69,226 @@ HW_ENCODERS = [
    "h264_qsv",  # Intel Quick Sync
 ]

-VALID_VIDEO_CODECS = {"h264", "hevc", "libsvtav1", "auto"} | set(HW_ENCODERS)
+VALID_VIDEO_CODECS = {"h264", "hevc", "libsvtav1", "ffv1", "auto"} | set(HW_ENCODERS)
+
+LIBSVTAV1_DEFAULT_PRESET: int = 12


-def _get_codec_options(
-    vcodec: str,
-    g: int | None = 2,
-    crf: int | None = 30,
-    preset: int | None = None,
-) -> dict:
-    """Build codec-specific options dict for video encoding."""
-    options = {}
+@dataclass
+class VideoEncoderConfig:
+    """Video encoder configuration.

-    # GOP size (keyframe interval) - supported by VideoToolbox and software encoders
-    if g is not None and (vcodec in ("h264_videotoolbox", "hevc_videotoolbox") or vcodec not in HW_ENCODERS):
-        options["g"] = str(g)
+    Attributes:
+        vcodec: FFmpeg encoder name. ``"auto"`` is resolved during
+            construction (HW encoder if available, else ``libsvtav1``).
+        pix_fmt: Pixel format (e.g. ``"yuv420p"``).
+        g: GOP size (keyframe interval).
+        crf: Quality level — mapped to the native quality parameter of the
+            codec (``crf`` for software, ``qp`` for NVENC/VAAPI,
+            ``q:v`` for VideoToolbox, ``global_quality`` for QSV).
+        preset: Speed/quality preset. Accepted type is per-codec.
+        fast_decode: Fast-decode tuning. For ``libsvtav1`` this is a level (0-2)
+            embedded in ``svtav1-params``. For ``h264`` and ``hevc`` non-zero values
+            set ``tune=fastdecode``. Ignored for other codecs.
+        video_backend: Python library driving FFmpeg for encoding. Only ``"pyav"``
+            is currently supported.
+        extra_options: Free-form dictionary of additional FFmpeg options
+            (e.g. ``{"tune": "film", "profile:v": "high", "bf": 2}``).
+    """

-    # Quality control (codec-specific parameter names)
-    if crf is not None:
-        if vcodec in ("h264", "hevc", "libsvtav1"):
-            options["crf"] = str(crf)
-        elif vcodec in ("h264_videotoolbox", "hevc_videotoolbox"):
-            quality = max(1, min(100, int(100 - crf * 2)))
-            options["q:v"] = str(quality)
-        elif vcodec in ("h264_nvenc", "hevc_nvenc"):
-            options["rc"] = "constqp"
-            options["qp"] = str(crf)
-        elif vcodec in ("h264_vaapi",):
-            options["qp"] = str(crf)
-        elif vcodec in ("h264_qsv",):
-            options["global_quality"] = str(crf)
+    vcodec: str = "libsvtav1"
+    pix_fmt: str = "yuv420p"
+    g: int | None = 2
+    crf: int | None = 30
+    preset: int | str | None = None
+    fast_decode: int = 0
+    # TODO(CarolinePascal): add torchcodec support + find a way to unify the
+    # two backends (encoding and decoding).
+    video_backend: str = "pyav"
+    extra_options: dict[str, Any] = field(default_factory=dict)

-    # Preset (only for libsvtav1)
-    if vcodec == "libsvtav1":
-        options["preset"] = str(preset) if preset is not None else "12"
+    # Class-level marker persisted to ``info.json`` (via ``asdict``) so the
+    # reader can tell depth datasets from RGB ones without a separate dispatch
+    # path. ``init=False`` keeps it out of CLI/constructor surface; subclasses
+    # flip the default (see :class:`DepthEncoderConfig`).
+    is_depth_map: bool = field(default=False, init=False)

-    return options
+    def __post_init__(self) -> None:
+        self.resolve_vcodec()
+
+        # Empty-constructor ergonomics: ``VideoEncoderConfig()`` must "just work".
+        if self.preset is None and self.vcodec == "libsvtav1":
+            self.preset = LIBSVTAV1_DEFAULT_PRESET
+
+        self.validate()
+
+    def detect_available_encoders(self, encoders: list[str] | str) -> list[str]:
+        """Detect available encoders based on the video backend."""
+        if self.video_backend == "pyav":
+            return detect_available_encoders_pyav(encoders)
+        else:
+            return []
+
+    def validate(self) -> None:
+        """Validate the video encoder config."""
+        if self.video_backend == "pyav":
+            check_video_encoder_config_pyav(self)
+
+    def resolve_vcodec(self) -> None:
+        """Validate vcodec and resolve 'auto' to best available HW encoder, fallback to libsvtav1.
+
+        Any explicitly-requested codec that isn't in the local FFmpeg build is
+        also silently rewritten to ``libsvtav1`` so encoding never hard-fails on
+        a host missing the requested encoder.
+        """
+        # Backward compatibility: older datasets persist ``vcodec="av1"`` in
+        # ``info.json``. Rewrite to the canonical encoder name *before* the
+        # validation check below so loading those datasets keeps working.
+        if self.vcodec == "av1":
+            self.vcodec = "libsvtav1"
+
+        if self.vcodec not in VALID_VIDEO_CODECS:
+            raise ValueError(f"Invalid vcodec '{self.vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
+        if self.vcodec == "auto":
+            available = self.detect_available_encoders(HW_ENCODERS)
+            for encoder in HW_ENCODERS:
+                if encoder in available:
+                    logger.info(f"Auto-selected video codec: {encoder}")
+                    self.vcodec = encoder
+                    return
+            logger.info("No hardware encoder available, falling back to software encoder 'libsvtav1'")
+            self.vcodec = "libsvtav1"
+
+        if self.detect_available_encoders(self.vcodec):
+            logger.info(f"Using video codec: {self.vcodec}")
+            self.vcodec = self.vcodec
+            return
+        raise ValueError(f"Unsupported video codec: {self.vcodec} with video backend {self.video_backend}")
+
+    def get_codec_options(
+        self, encoder_threads: int | None = None, as_strings: bool = False
+    ) -> dict[str, str]:
+        """Translate the tuning fields to codec-specific FFmpeg options.
+
+        ``VideoEncoderConfig.extra_options`` are merged last but never override a structured field.
+
+        Args:
+            encoder_threads: Number of encoder threads set globally for all VideoEncoderConfigs.
+                For libsvtav1, this is mapped to ``lp`` via ``svtav1-params``.
+                For h264/hevc, this is mapped to ``threads``.
+                Hardware encoders ignore this parameter.
+            as_strings: If ``True``, casts values to strings.
+        """
+        opts: dict[str, Any] = {}
+
+        def set_if(key: str, value: Any) -> None:
+            if value is not None:
+                opts[key] = value if not as_strings else str(value)
+
+        # GOP size is not a codec-specific option, so it is always set.
+        set_if("g", self.g)
+
+        if self.vcodec == "libsvtav1":
+            set_if("crf", self.crf)
+            set_if("preset", self.preset)
+            svtav1_parts: list[str] = []
+            if self.fast_decode is not None:
+                svtav1_parts.append(f"fast-decode={max(0, min(2, self.fast_decode))}")
+            if encoder_threads is not None:
+                svtav1_parts.append(f"lp={encoder_threads}")
+            if svtav1_parts:
+                opts["svtav1-params"] = ":".join(svtav1_parts)
+        elif self.vcodec in ("h264", "hevc"):
+            set_if("crf", self.crf)
+            set_if("preset", self.preset)
+            if self.fast_decode:
+                opts["tune"] = "fastdecode"
+            set_if("threads", encoder_threads)
+        elif self.vcodec in ("h264_videotoolbox", "hevc_videotoolbox"):
+            if self.crf is not None:
+                opts["q:v"] = max(1, min(100, 100 - self.crf * 2))
+        elif self.vcodec in ("h264_nvenc", "hevc_nvenc"):
+            opts["rc"] = "constqp"
+            set_if("qp", self.crf)
+            set_if("preset", self.preset)
+        elif self.vcodec == "h264_vaapi":
+            set_if("qp", self.crf)
+        elif self.vcodec == "h264_qsv":
+            set_if("global_quality", self.crf)
+            set_if("preset", self.preset)
+        elif self.vcodec == "ffv1":
+            # Lossless intra-frame codec. ``crf``/``preset``/``fast_decode`` 
+            # are not meaningful.
+            set_if("threads", encoder_threads)
+        else:
+            set_if("crf", self.crf)
+            set_if("preset", self.preset)
+
+        # Extra options are merged last but never override structured fields (values are kept as given).
+        for k, v in self.extra_options.items():
+            if k not in opts:
+                set_if(k, v)
+
+        return opts


-def detect_available_hw_encoders() -> list[str]:
-    """Probe PyAV/FFmpeg for available hardware video encoders."""
-    available = []
-    for codec_name in HW_ENCODERS:
-        try:
-            av.codec.Codec(codec_name, "w")
-            available.append(codec_name)
-        except Exception:  # nosec B110
-            logger.debug("HW encoder '%s' not available", codec_name)  # nosec B110
-    return available
+@dataclass
+class DepthEncoderConfig(VideoEncoderConfig):
+    """Encoder configuration for depth-map streams.
+
+    Inherits the full :class:`VideoEncoderConfig` surface (codec, GOP, CRF,
+    preset, ``extra_options``…) and adds the four parameters of the depth
+    quantization pipeline (:func:`quantize_depth`). Inheritance — rather
+    than composition — keeps the CLI flat: ``--dataset.depth_encoder_config.<field>``
+    works identically to its RGB counterpart.
+
+    Defaults flip ``vcodec`` to ``"hevc"`` (Main 12 profile) and ``pix_fmt``
+    to ``"yuv420p12le"``, the most widely available 12-bit pixel format.
+    For archive-grade lossless storage use ``vcodec="ffv1"`` together with
+    ``pix_fmt="gray12le"`` (and clear ``crf``/``preset`` to ``None`` since
+    ``ffv1`` doesn't expose those tuning knobs).
+
+    The :attr:`is_depth_map` marker is class-fixed to ``True`` (``init=False``,
+    so it's hidden from CLI and constructor args) and is what the reader
+    side keys on to tell depth datasets from RGB ones.
+
+    Attributes:
+        depth_min: Minimum depth in physical units (e.g. metres) represented
+            by quantum ``0``.
+        depth_max: Maximum depth represented by quantum :data:`DEPTH_QMAX`.
+        shift: Pre-log offset for numerical stability near zero.
+        use_log: ``True`` for logarithmic quantization (default; matches
+            sensor error profile), ``False`` for linear.
+    """
+
+    vcodec: str = "hevc"
+    pix_fmt: str = "yuv420p12le"
+
+    depth_min: float = DEFAULT_DEPTH_MIN
+    depth_max: float = DEFAULT_DEPTH_MAX
+    shift: float = DEFAULT_DEPTH_SHIFT
+    use_log: bool = DEFAULT_DEPTH_USE_LOG
+
+    # Class invariant — kept out of ``__init__`` (and CLI) but persisted
+    # via ``asdict`` into ``info.json`` for the reader to detect depth.
+    is_depth_map: bool = field(default=True, init=False)
+
+    def quantize(self, depth: torch.Tensor | np.ndarray) -> torch.Tensor:
+        """Apply :func:`quantize_depth` bound to this config's parameters."""
+        return quantize_depth(depth, self.depth_min, self.depth_max, self.shift, self.use_log)
+
+    def dequantize(self, quantized: torch.Tensor | np.ndarray) -> torch.Tensor:
+        """Apply :func:`dequantize_depth` bound to this config's parameters."""
+        return dequantize_depth(quantized, self.depth_min, self.depth_max, self.shift, self.use_log)


-def resolve_vcodec(vcodec: str) -> str:
-    """Validate vcodec and resolve 'auto' to best available HW encoder, fallback to libsvtav1."""
-    if vcodec not in VALID_VIDEO_CODECS:
-        raise ValueError(f"Invalid vcodec '{vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
-    if vcodec != "auto":
-        logger.info(f"Using video codec: {vcodec}")
-        return vcodec
-    available = detect_available_hw_encoders()
-    for encoder in HW_ENCODERS:
-        if encoder in available:
-            logger.info(f"Auto-selected video codec: {encoder}")
-            return encoder
-    logger.info("No hardware encoder available, falling back to software encoder 'libsvtav1'")
-    return "libsvtav1"
+def depth_encoder_defaults() -> DepthEncoderConfig:
+    """Return a :class:`DepthEncoderConfig` with depth-camera defaults."""
+    return DepthEncoderConfig()
+
+def camera_encoder_defaults() -> VideoEncoderConfig:
+    """Return a :class:`VideoEncoderConfig` with RGB-camera defaults."""
+    return VideoEncoderConfig()


 def decode_video_frames(
@@ -142,7 +315,7 @@ def decode_video_frames(
    Currently supports torchcodec on cpu and pyav.
    """
    if backend is None:
-        backend = get_safe_default_codec()
+        backend = get_safe_default_video_backend()
    if backend == "torchcodec":
        return decode_video_frames_torchcodec(video_path, timestamps, tolerance_s, return_uint8=return_uint8)
    elif backend in ["pyav", "video_reader"]:
@@ -396,22 +569,136 @@ def decode_video_frames_torchcodec(
    return closest_frames


+def decode_depth_frames(
+    video_path: Path | str,
+    timestamps: list[float],
+    tolerance_s: float,
+    *,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    return_quantized: bool = False,
+    log_loaded_timestamps: bool = False,
+) -> torch.Tensor:
+    """Decode depth-map frames at the requested timestamps using PyAV.
+
+    Mirrors the timestamp-tolerance / closest-frame contract of
+    :func:`decode_video_frames` but operates entirely through PyAV (the
+    ``torchvision`` and ``torchcodec`` backends don't currently round-trip
+    12-bit pixel formats reliably).
+
+    Each decoded frame is reformatted to ``gray12le`` so the same path
+    handles ``yuv420p12le`` (HEVC default) and ``gray12le`` (ffv1 archive)
+    sources transparently.
+
+    Args:
+        video_path: Path to a depth video produced with a
+            :class:`DepthEncoderConfig`.
+        timestamps: Frame timestamps to retrieve, in seconds.
+        tolerance_s: Maximum allowed deviation between the queried and the
+            actually-decoded timestamps.
+        depth_min, depth_max, shift, use_log: Parameters used at quantization
+            time. Should match :func:`info_to_depth_kwargs` extracted from
+            ``info.json`` for the source dataset.
+        return_quantized: If ``True``, skip the dequantization step and
+            return raw 12-bit ``uint16`` quanta.
+        log_loaded_timestamps: Debug logging.
+
+    Returns:
+        ``torch.Tensor`` of shape ``(N, H, W)``:
+
+        * ``dtype=torch.float32`` (metric depth, default)
+        * ``dtype=torch.uint16`` when ``return_quantized=True``.
+
+    Raises:
+        FrameTimestampError: If a query timestamp can't be matched within
+            *tolerance_s*, or if no frames are decoded.
+    """
+    video_path_str = str(video_path)
+    first_ts = min(timestamps)
+    last_ts = max(timestamps)
+
+    loaded_frames: list[np.ndarray] = []
+    loaded_ts: list[float] = []
+
+    av.logging.set_level(av.logging.WARNING)
+    with av.open(video_path_str, "r") as container:
+        try:
+            stream = container.streams.video[0]
+        except IndexError as e:
+            raise FrameTimestampError(f"No video stream in {video_path_str}") from e
+
+        # Seek to the keyframe at-or-before first_ts (PyAV doesn't do
+        # accurate seek, so we still iterate forward to the requested range).
+        seek_pts = int(first_ts / stream.time_base)
+        container.seek(seek_pts, stream=stream, any_frame=False, backward=True)
+
+        for frame in container.decode(stream):
+            if frame.pts is None:
+                continue
+            current_ts = float(frame.pts * stream.time_base)
+            if log_loaded_timestamps:
+                logger.info(f"depth frame loaded at timestamp={current_ts:.4f}")
+            loaded_frames.append(
+                decode_depth_frame(
+                    frame,
+                    depth_min=depth_min,
+                    depth_max=depth_max,
+                    shift=shift,
+                    use_log=use_log,
+                    return_quantized=True,
+                )
+            )
+            loaded_ts.append(current_ts)
+            if current_ts >= last_ts:
+                break
+
+    av.logging.restore_default_callback()
+
+    if not loaded_frames:
+        raise FrameTimestampError(
+            f"No depth frames decoded from {video_path_str} for timestamps {timestamps}"
+        )
+
+    query_ts = torch.tensor(timestamps)
+    loaded_ts_t = torch.tensor(loaded_ts)
+    dist = torch.cdist(query_ts[:, None], loaded_ts_t[:, None], p=1)
+    min_, argmin_ = dist.min(1)
+
+    is_within_tol = min_ < tolerance_s
+    if not is_within_tol.all():
+        raise FrameTimestampError(
+            f"One or several query timestamps violate the tolerance "
+            f"({min_[~is_within_tol]} > {tolerance_s=})."
+            f"\nqueried timestamps: {query_ts}"
+            f"\nloaded timestamps: {loaded_ts_t}"
+            f"\nvideo: {video_path_str}"
+        )
+
+    closest = np.stack([loaded_frames[i] for i in argmin_])  # (N, H, W) uint16
+    quantized = torch.from_numpy(closest)
+
+    if return_quantized:
+        return quantized
+    return dequantize_depth(quantized, depth_min, depth_max, shift, use_log)
+
+
 def encode_video_frames(
    imgs_dir: Path | str,
    video_path: Path | str,
    fps: int,
-    vcodec: str = "libsvtav1",
-    pix_fmt: str = "yuv420p",
-    g: int | None = 2,
-    crf: int | None = 30,
-    fast_decode: int = 0,
+    camera_encoder_config: VideoEncoderConfig | None = None,
+    encoder_threads: int | None = None,
+    *,
    log_level: int | None = av.logging.WARNING,
    overwrite: bool = False,
-    preset: int | None = None,
-    encoder_threads: int | None = None,
 ) -> None:
    """More info on ffmpeg arguments tuning on `benchmark/video/README.md`"""
-    vcodec = resolve_vcodec(vcodec)
+    if camera_encoder_config is None:
+        camera_encoder_config = VideoEncoderConfig()
+    vcodec = camera_encoder_config.vcodec
+    pix_fmt = camera_encoder_config.pix_fmt

    video_path = Path(video_path)
    imgs_dir = Path(imgs_dir)
@@ -422,42 +709,18 @@ def encode_video_frames(

    video_path.parent.mkdir(parents=True, exist_ok=True)

-    # Encoders/pixel formats incompatibility check
-    if (vcodec == "libsvtav1" or vcodec == "hevc") and pix_fmt == "yuv444p":
-        logger.warning(
-            f"Incompatible pixel format 'yuv444p' for codec {vcodec}, auto-selecting format 'yuv420p'"
-        )
-        pix_fmt = "yuv420p"
-
    # Get input frames
    template = "frame-" + ("[0-9]" * 6) + ".png"
    input_list = sorted(
        glob.glob(str(imgs_dir / template)), key=lambda x: int(x.split("-")[-1].split(".")[0])
    )

-    # Define video output frame size (assuming all input frames are the same size)
    if len(input_list) == 0:
        raise FileNotFoundError(f"No images found in {imgs_dir}.")
    with Image.open(input_list[0]) as dummy_image:
        width, height = dummy_image.size

-    # Define video codec options
-    video_options = _get_codec_options(vcodec, g, crf, preset)
-
-    if fast_decode:
-        key = "svtav1-params" if vcodec == "libsvtav1" else "tune"
-        value = f"fast-decode={fast_decode}" if vcodec == "libsvtav1" else "fastdecode"
-        video_options[key] = value
-
-    if encoder_threads is not None:
-        if vcodec == "libsvtav1":
-            lp_param = f"lp={encoder_threads}"
-            if "svtav1-params" in video_options:
-                video_options["svtav1-params"] += f":{lp_param}"
-            else:
-                video_options["svtav1-params"] = lp_param
-        else:
-            video_options["threads"] = str(encoder_threads)
+    video_options = camera_encoder_config.get_codec_options(encoder_threads, as_strings=True)

    # Set logging level
    if log_level is not None:
@@ -494,7 +757,10 @@ def encode_video_frames(


 def concatenate_video_files(
-    input_video_paths: list[Path | str], output_video_path: Path, overwrite: bool = True
+    input_video_paths: list[Path | str],
+    output_video_path: Path,
+    overwrite: bool = True,
+    compatibility_check: bool = False,
 ):
    """
    Concatenate multiple video files into a single video file using pyav.
@@ -507,6 +773,7 @@ def concatenate_video_files(
        input_video_paths: Ordered list of input video file paths to concatenate.
        output_video_path: Path to the output video file.
        overwrite: Whether to overwrite the output video file if it already exists. Default is True.
+        compatibility_check: Whether to check if the input videos are compatible. Default is False.

    Note:
        - Creates a temporary directory for intermediate files that is cleaned up after use.
@@ -525,6 +792,22 @@ def concatenate_video_files(
    if len(input_video_paths) == 0:
        raise FileNotFoundError("No input video paths provided.")

+    # This check may be skipped at recording time as videos are encoded with the same encoder config.
+    if compatibility_check:
+        reference_video_info = get_video_info(input_video_paths[0])
+        for input_path in input_video_paths[1:]:
+            video_info = get_video_info(input_path)
+            if (
+                video_info["video.height"] != reference_video_info["video.height"]
+                or video_info["video.width"] != reference_video_info["video.width"]
+                or video_info["video.fps"] != reference_video_info["video.fps"]
+                or video_info["video.codec"] != reference_video_info["video.codec"]
+                or video_info["video.pix_fmt"] != reference_video_info["video.pix_fmt"]
+            ):
+                raise ValueError(
+                    f"Input video {input_path} is not compatible with the reference video {input_video_paths[0]}."
+                )
+
    # Create a temporary .ffconcat file to list the input video paths
    with tempfile.NamedTemporaryFile(mode="w", suffix=".ffconcat", delete=False) as tmp_concatenate_file:
        tmp_concatenate_file.write("ffconcat version 1.0\n")
@@ -591,33 +874,31 @@ class _CameraEncoderThread(threading.Thread):
        fps: int,
        vcodec: str,
        pix_fmt: str,
-        g: int | None,
-        crf: int | None,
-        preset: int | None,
+        codec_options: dict[str, str],
        frame_queue: queue.Queue,
        result_queue: queue.Queue,
        stop_event: threading.Event,
-        encoder_threads: int | None = None,
+        depth_encoder_config: "DepthEncoderConfig | None" = None,
    ):
        super().__init__(daemon=True)
        self.video_path = video_path
        self.fps = fps
        self.vcodec = vcodec
        self.pix_fmt = pix_fmt
-        self.g = g
-        self.crf = crf
-        self.preset = preset
+        self.codec_options = codec_options
        self.frame_queue = frame_queue
        self.result_queue = result_queue
        self.stop_event = stop_event
-        self.encoder_threads = encoder_threads
+        self.depth_encoder_config = depth_encoder_config
+

    def run(self) -> None:
        from .compute_stats import RunningQuantileStats, auto_downsample_height_width

        container = None
        output_stream = None
-        stats_tracker = RunningQuantileStats()
+        is_depth = self.depth_encoder_config is not None
+        stats_tracker = RunningQuantileStats() if not is_depth else None
        frame_count = 0

        try:
@@ -635,51 +916,45 @@ class _CameraEncoderThread(threading.Thread):
                    # Sentinel: flush and close
                    break

-                # Ensure HWC uint8 numpy array
+                # Ensure HWC (RGB or depth) uint8 (RGB only) numpy array
                if isinstance(frame_data, np.ndarray):
                    if frame_data.ndim == 3 and frame_data.shape[0] == 3:
                        # CHW -> HWC
                        frame_data = frame_data.transpose(1, 2, 0)
-                    if frame_data.dtype != np.uint8:
+                    if frame_data.dtype != np.uint8 and not is_depth:
                        frame_data = (frame_data * 255).astype(np.uint8)

                # Open container on first frame (to get width/height)
                if container is None:
                    height, width = frame_data.shape[:2]
-                    video_options = _get_codec_options(self.vcodec, self.g, self.crf, self.preset)
-                    if self.encoder_threads is not None:
-                        if self.vcodec == "libsvtav1":
-                            lp_param = f"lp={self.encoder_threads}"
-                            if "svtav1-params" in video_options:
-                                video_options["svtav1-params"] += f":{lp_param}"
-                            else:
-                                video_options["svtav1-params"] = lp_param
-                        else:
-                            video_options["threads"] = str(self.encoder_threads)
                    Path(self.video_path).parent.mkdir(parents=True, exist_ok=True)
                    container = av.open(str(self.video_path), "w")
-                    output_stream = container.add_stream(self.vcodec, self.fps, options=video_options)
+                    output_stream = container.add_stream(self.vcodec, self.fps, options=self.codec_options)
                    output_stream.pix_fmt = self.pix_fmt
                    output_stream.width = width
                    output_stream.height = height
                    output_stream.time_base = Fraction(1, self.fps)

                # Encode frame with explicit timestamps
-                pil_img = Image.fromarray(frame_data)
-                video_frame = av.VideoFrame.from_image(pil_img)
+                if is_depth:
+                    video_frame = encode_depth_frame_pyav(frame_data, pix_fmt=self.pix_fmt, depth_min=self.depth_encoder_config.depth_min, depth_max=self.depth_encoder_config.depth_max, shift=self.depth_encoder_config.shift, use_log=self.depth_encoder_config.use_log)
+                else:
+                    pil_img = Image.fromarray(frame_data)
+                    video_frame = av.VideoFrame.from_image(pil_img)
                video_frame.pts = frame_count
                video_frame.time_base = Fraction(1, self.fps)
                packet = output_stream.encode(video_frame)
                if packet:
                    container.mux(packet)

-                # Update stats with downsampled frame (per-channel stats like compute_episode_stats)
-                img_chw = frame_data.transpose(2, 0, 1)  # HWC -> CHW
-                img_downsampled = auto_downsample_height_width(img_chw)
-                # Reshape CHW to (H*W, C) for per-channel stats
-                channels = img_downsampled.shape[0]
-                img_for_stats = img_downsampled.transpose(1, 2, 0).reshape(-1, channels)
-                stats_tracker.update(img_for_stats)
+                if not is_depth:
+                    # Update stats with downsampled frame (per-channel stats like compute_episode_stats)
+                    img_chw = frame_data.transpose(2, 0, 1)  # HWC -> CHW
+                    img_downsampled = auto_downsample_height_width(img_chw)
+                    # Reshape CHW to (H*W, C) for per-channel stats
+                    channels = img_downsampled.shape[0]
+                    img_for_stats = img_downsampled.transpose(1, 2, 0).reshape(-1, channels)
+                    stats_tracker.update(img_for_stats)

                frame_count += 1

@@ -694,8 +969,10 @@ class _CameraEncoderThread(threading.Thread):

            av.logging.restore_default_callback()

-            # Get stats and put on result queue
-            if frame_count >= 2:
+            # Get stats and put on result queue (depth streams skip stats)
+            if is_depth:
+                self.result_queue.put(("ok", None))
+            elif frame_count >= 2:
                stats = stats_tracker.get_statistics()
                self.result_queue.put(("ok", stats))
            else:
@@ -724,22 +1001,40 @@ class StreamingVideoEncoder:
    def __init__(
        self,
        fps: int,
-        vcodec: str = "libsvtav1",
-        pix_fmt: str = "yuv420p",
-        g: int | None = 2,
-        crf: int | None = 30,
-        preset: int | None = None,
-        queue_maxsize: int = 30,
+        camera_encoder_config: VideoEncoderConfig | None = None,
        encoder_threads: int | None = None,
+        *,
+        queue_maxsize: int = 30,
+        depth_encoder_config: "DepthEncoderConfig | None" = None,
+        depth_keys: list[str] | None = None,
    ):
+        """
+        Args:
+            fps: Frames per second for the output videos.
+            camera_encoder_config: Video encoder settings applied to all cameras.
+                When ``None``, :class:`VideoEncoderConfig` defaults are used.
+            encoder_threads: Number of encoder threads (global setting).
+                ``None`` lets the codec decide.
+            queue_maxsize: Max frames to buffer per camera before
+                back-pressure drops frames.
+            depth_encoder_config: Optional depth encoder configuration applied
+                to all depth video keys listed in ``depth_keys``.
+            depth_keys: Video keys (matching the dataset feature names) that
+                must be encoded as quantized depth maps using
+                ``depth_encoder_config``. Required when ``depth_encoder_config``
+                is provided.
+        """
        self.fps = fps
-        self.vcodec = resolve_vcodec(vcodec)
-        self.pix_fmt = pix_fmt
-        self.g = g
-        self.crf = crf
-        self.preset = preset
+        self._camera_encoder_config = camera_encoder_config or VideoEncoderConfig()
+        self._encoder_threads = encoder_threads
        self.queue_maxsize = queue_maxsize
-        self.encoder_threads = encoder_threads
+        self._depth_encoder_config = depth_encoder_config
+        self._depth_keys: set[str] = set(depth_keys or [])
+        if self._depth_keys and self._depth_encoder_config is None:
+            raise ValueError(
+                "StreamingVideoEncoder received depth_keys without a depth_encoder_config; "
+                "either pass a DepthEncoderConfig or remove depth_keys."
+            )

        self._frame_queues: dict[str, queue.Queue] = {}
        self._result_queues: dict[str, queue.Queue] = {}
@@ -770,18 +1065,28 @@ class StreamingVideoEncoder:
            temp_video_dir = Path(tempfile.mkdtemp(dir=temp_dir))
            video_path = temp_video_dir / f"{video_key.replace('/', '_')}_streaming.mp4"

+            is_depth_key = video_key in self._depth_keys
+            encoder_cfg: VideoEncoderConfig
+            depth_cfg = None
+            if is_depth_key:
+                assert self._depth_encoder_config is not None  # guaranteed by __init__
+                encoder_cfg = self._depth_encoder_config
+                depth_cfg = self._depth_encoder_config
+            else:
+                encoder_cfg = self._camera_encoder_config
+
+            vcodec = encoder_cfg.vcodec
+            codec_options = encoder_cfg.get_codec_options(self._encoder_threads)
            encoder_thread = _CameraEncoderThread(
                video_path=video_path,
                fps=self.fps,
-                vcodec=self.vcodec,
-                pix_fmt=self.pix_fmt,
-                g=self.g,
-                crf=self.crf,
-                preset=self.preset,
+                vcodec=vcodec,
+                pix_fmt=encoder_cfg.pix_fmt,
+                codec_options=codec_options,
                frame_queue=frame_queue,
                result_queue=result_queue,
                stop_event=stop_event,
-                encoder_threads=self.encoder_threads,
+                depth_encoder_config=depth_cfg,
            )
            encoder_thread.start()

@@ -986,8 +1291,18 @@ def get_audio_info(video_path: Path | str) -> dict:
    return audio_info


-def get_video_info(video_path: Path | str) -> dict:
-    # Set logging level
+def get_video_info(
+    video_path: Path | str,
+    video_encoder_config: "VideoEncoderConfig | None" = None,
+) -> dict:
+    """Build the ``video.*`` / ``audio.*`` info dict persisted in ``info.json``.
+
+    Args:
+        video_path: Path to the encoded video file to probe.
+        video_encoder_config: If provided, record the exact encoder settings used to encode this
+            video. Stream-derived values take precedence — encoder fields are only written for keys
+            not already populated from the video file itself.
+    """
    logging.getLogger("libav").setLevel(av.logging.WARNING)

    # Getting video stream information
@@ -1004,7 +1319,6 @@ def get_video_info(video_path: Path | str) -> dict:
        video_info["video.width"] = video_stream.width
        video_info["video.codec"] = video_stream.codec.canonical_name
        video_info["video.pix_fmt"] = video_stream.pix_fmt
-        video_info["video.is_depth_map"] = False

        # Calculate fps from r_frame_rate
        video_info["video.fps"] = int(video_stream.base_rate)
@@ -1018,9 +1332,67 @@ def get_video_info(video_path: Path | str) -> dict:
    # Adding audio stream information
    video_info.update(**get_audio_info(video_path))

+    # Add additional encoder configuration if provided (no override of stream-derived values)
+    # Depth related fields flow naturally through this path.
+    if video_encoder_config is not None:
+        for field_name, field_value in asdict(video_encoder_config).items():
+            video_info.setdefault(f"video.{field_name}", field_value)
+
+    # Fallback case where no encoder config is provided or the video is not a depth map.
+    video_info.setdefault("video.is_depth_map", False)
+
    return video_info


+# ─── Depth metadata helpers (reader side) ────────────────────────────
+
+
+_DEPTH_INFO_KEYS: tuple[str, ...] = (
+    "video.depth_min",
+    "video.depth_max",
+    "video.shift",
+    "video.use_log",
+)
+
+
+def seed_depth_feature_info(
+    features: dict[str, dict],
+    depth_encoder_config: "DepthEncoderConfig | None",
+) -> None:
+    """Pre-populate per-feature ``video.<field>`` entries from *depth_encoder_config*.
+
+    ``update_video_info`` only runs after the first episode video is encoded,
+    so without this seeding step ``features[key]["info"]`` carries no
+    quantization range until then. Consumers that read the dataset feature
+    spec mid-recording (e.g. the rerun visualizer pinning the depth colormap
+    to ``video.depth_min`` / ``video.depth_max``) would otherwise see no
+    range during episode 1 and re-normalize per frame.
+
+    Stream-derived values written later by :func:`get_video_info` /
+    ``update_video_info`` win over these seeds (the merge is
+    ``{**existing, **stream_info}``), so callers can safely re-run this on
+    a partially-populated info dict.
+
+    No-op when ``depth_encoder_config`` is ``None`` or no feature is flagged
+    as a depth map.
+    """
+    if depth_encoder_config is None:
+        return
+    encoder_fields = {
+        f"video.{name}": value for name, value in asdict(depth_encoder_config).items()
+    }
+    for ft in features.values():
+        if ft.get("dtype") != "video":
+            continue
+        info = ft.get("info") or {}
+        if not info.get("video.is_depth_map", False):
+            continue
+        # Only fill fields not already set, so explicit user-provided info is preserved.
+        for k, v in encoder_fields.items():
+            info.setdefault(k, v)
+        ft["info"] = info
+
+
 def get_video_pixel_channels(pix_fmt: str) -> int:
    if "gray" in pix_fmt or "depth" in pix_fmt or "monochrome" in pix_fmt:
        return 1
@@ -24,8 +24,6 @@ from .pi0_fast.configuration_pi0_fast import PI0FastConfig as PI0FastConfig
 from .pi05.configuration_pi05 import PI05Config as PI05Config
 from .pretrained import PreTrainedPolicy as PreTrainedPolicy
 from .sac.configuration_sac import SACConfig as SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
-from .sarm.configuration_sarm import SARMConfig as SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig as TDMPCConfig
 from .utils import make_robot_action, prepare_observation_for_inference
@@ -46,9 +44,7 @@ __all__ = [
    "PI0Config",
    "PI0FastConfig",
    "PI05Config",
-    "RewardClassifierConfig",
    "SACConfig",
-    "SARMConfig",
    "SmolVLAConfig",
    "TDMPCConfig",
    "VQBeTConfig",
@@ -142,9 +142,10 @@ class ACTPolicy(PreTrainedPolicy):

        actions_hat, (mu_hat, log_sigma_x2_hat) = self.model(batch)

-        l1_loss = (
-            F.l1_loss(batch[ACTION], actions_hat, reduction="none") * ~batch["action_is_pad"].unsqueeze(-1)
-        ).mean()
+        abs_err = F.l1_loss(batch[ACTION], actions_hat, reduction="none")
+        valid_mask = ~batch["action_is_pad"].unsqueeze(-1)
+        num_valid = valid_mask.sum() * abs_err.shape[-1]
+        l1_loss = (abs_err * valid_mask).sum() / num_valid.clamp_min(1)

        loss_dict = {"l1_loss": l1_loss.item()}
        if self.config.use_vae:
@@ -380,7 +380,9 @@ class DiffusionModel(nn.Module):
                    f"{self.config.do_mask_loss_for_padding=}."
                )
            in_episode_bound = ~batch["action_is_pad"]
-            loss = loss * in_episode_bound.unsqueeze(-1)
+            mask = in_episode_bound.unsqueeze(-1)
+            num_valid = mask.sum() * loss.shape[-1]
+            return (loss * mask).sum() / num_valid.clamp_min(1)

        return loss.mean()

@@ -52,8 +52,6 @@ from .pi0.configuration_pi0 import PI0Config
 from .pi05.configuration_pi05 import PI05Config
 from .pretrained import PreTrainedPolicy
 from .sac.configuration_sac import SACConfig
-from .sac.reward_model.configuration_classifier import RewardClassifierConfig
-from .sarm.configuration_sarm import SARMConfig
 from .smolvla.configuration_smolvla import SmolVLAConfig
 from .tdmpc.configuration_tdmpc import TDMPCConfig
 from .utils import validate_visual_features_consistency
@@ -89,7 +87,7 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:

    Args:
        name: The name of the policy. Supported names are "tdmpc", "diffusion", "act",
-            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "reward_classifier", "smolvla", "wall_x".
+            "multi_task_dit", "vqbet", "pi0", "pi05", "sac", "smolvla", "wall_x".
    Returns:
        The policy class corresponding to the given name.

@@ -132,18 +130,10 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:
        from .sac.modeling_sac import SACPolicy

        return SACPolicy
-    elif name == "reward_classifier":
-        from .sac.reward_model.modeling_classifier import Classifier
-
-        return Classifier
    elif name == "smolvla":
        from .smolvla.modeling_smolvla import SmolVLAPolicy

        return SmolVLAPolicy
-    elif name == "sarm":
-        from .sarm.modeling_sarm import SARMRewardModel
-
-        return SARMRewardModel
    elif name == "groot":
        from .groot.modeling_groot import GrootPolicy

@@ -173,7 +163,7 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
    Args:
        policy_type: The type of the policy. Supported types include "tdmpc",
                     "multi_task_dit", "diffusion", "act", "vqbet", "pi0", "pi05", "sac",
-                     "smolvla", "reward_classifier", "wall_x".
+                     "smolvla", "wall_x".
        **kwargs: Keyword arguments to be passed to the configuration class constructor.

    Returns:
@@ -200,8 +190,6 @@ def make_policy_config(policy_type: str, **kwargs) -> PreTrainedConfig:
        return SACConfig(**kwargs)
    elif policy_type == "smolvla":
        return SmolVLAConfig(**kwargs)
-    elif policy_type == "reward_classifier":
-        return RewardClassifierConfig(**kwargs)
    elif policy_type == "groot":
        return GrootConfig(**kwargs)
    elif policy_type == "xvla":
@@ -378,14 +366,6 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

-    elif isinstance(policy_cfg, RewardClassifierConfig):
-        from .sac.reward_model.processor_classifier import make_classifier_processor
-
-        processors = make_classifier_processor(
-            config=policy_cfg,
-            dataset_stats=kwargs.get("dataset_stats"),
-        )
-
    elif isinstance(policy_cfg, SmolVLAConfig):
        from .smolvla.processor_smolvla import make_smolvla_pre_post_processors

@@ -394,14 +374,6 @@ def make_pre_post_processors(
            dataset_stats=kwargs.get("dataset_stats"),
        )

-    elif isinstance(policy_cfg, SARMConfig):
-        from .sarm.processor_sarm import make_sarm_pre_post_processors
-
-        processors = make_sarm_pre_post_processors(
-            config=policy_cfg,
-            dataset_stats=kwargs.get("dataset_stats"),
-            dataset_meta=kwargs.get("dataset_meta"),
-        )
    elif isinstance(policy_cfg, GrootConfig):
        from .groot.processor_groot import make_groot_pre_post_processors

@@ -13,7 +13,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from dataclasses import dataclass, field
 from pathlib import Path
 from typing import TYPE_CHECKING

@@ -174,17 +173,14 @@ N_COLOR_CHANNELS = 3


 # config
-@dataclass
 class GR00TN15Config(PretrainedConfig):
    model_type = "gr00t_n1_5"
-    backbone_cfg: dict = field(init=False, metadata={"help": "Backbone configuration."})

-    action_head_cfg: dict = field(init=False, metadata={"help": "Action head configuration."})
-
-    action_horizon: int = field(init=False, metadata={"help": "Action horizon."})
-
-    action_dim: int = field(init=False, metadata={"help": "Action dimension."})
-    compute_dtype: str = field(default="float32", metadata={"help": "Compute dtype."})
+    backbone_cfg: dict
+    action_head_cfg: dict
+    action_horizon: int
+    action_dim: int
+    compute_dtype: str = "float32"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
@@ -688,8 +688,9 @@ class DiffusionObjective(nn.Module):
        loss = F.mse_loss(predicted, target, reduction="none")

        if self.do_mask_loss_for_padding and "action_is_pad" in batch:
-            valid_actions = ~batch["action_is_pad"]
-            loss = loss * valid_actions.unsqueeze(-1)
+            mask = ~batch["action_is_pad"].unsqueeze(-1)
+            num_valid = mask.sum() * loss.shape[-1]
+            return (loss * mask).sum() / num_valid.clamp_min(1)

        return loss.mean()

@@ -752,8 +753,9 @@ class FlowMatchingObjective(nn.Module):
        loss = F.mse_loss(predicted_velocity, target_velocity, reduction="none")

        if self.do_mask_loss_for_padding and "action_is_pad" in batch:
-            valid_mask = ~batch["action_is_pad"]
-            loss = loss * valid_mask.unsqueeze(-1)
+            mask = ~batch["action_is_pad"].unsqueeze(-1)
+            num_valid = mask.sum() * loss.shape[-1]
+            return (loss * mask).sum() / num_valid.clamp_min(1)

        return loss.mean()

@@ -227,6 +227,7 @@ class PI0FastPaliGemma(nn.Module):
        # forward(..., adarms_cond=...) is supported (same as pi0/pi05).
        if use_adarms[0]:
            text_config = self.paligemma.config.text_config
+            del self.paligemma.model.language_model
            self.paligemma.model.language_model = PiGemmaModel(text_config)

        self.to_bfloat16_for_selected_params(precision)
@@ -197,6 +197,9 @@ class PiGemmaModel(GemmaModel):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
+        # Free parent-allocated layers/norm before replacing to avoid ~2x peak memory.
+        del self.layers
+        del self.norm
        # if not getattr(config, "use_adarms", False):
        #     return
        cond_dim = getattr(config, "adarms_cond_dim", None)
@@ -328,6 +331,7 @@ class PiGemmaForCausalLM(GemmaForCausalLM):  # type: ignore[misc]

    def __init__(self, config: GemmaConfig, **kwargs):
        super().__init__(config, **kwargs)
+        del self.model
        self.model = PiGemmaModel(config)


@@ -336,6 +340,7 @@ class PaliGemmaModelWithPiGemma(PaliGemmaModel):

    def __init__(self, config):
        super().__init__(config)
+        del self.language_model
        self.language_model = PiGemmaModel(config.text_config)


@@ -344,6 +349,7 @@ class PaliGemmaForConditionalGenerationWithPiGemma(PaliGemmaForConditionalGenera

    def __init__(self, config):
        super().__init__(config)
+        del self.model
        self.model = PaliGemmaModelWithPiGemma(config)

    # Make modules available through conditional class for BC
@@ -19,6 +19,7 @@ from .action_queue import ActionQueue
 from .configuration_rtc import RTCConfig
 from .latency_tracker import LatencyTracker
 from .modeling_rtc import RTCProcessor
+from .relative import reanchor_relative_rtc_prefix

 __all__ = [
    "ActionInterpolator",
@@ -26,4 +27,5 @@ __all__ = [
    "LatencyTracker",
    "RTCConfig",
    "RTCProcessor",
+    "reanchor_relative_rtc_prefix",
 ]
@@ -0,0 +1,58 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Relative-action helpers for Real-Time Chunking (RTC)."""
+
+from __future__ import annotations
+
+import torch
+
+from lerobot.processor import (
+    NormalizerProcessorStep,
+    RelativeActionsProcessorStep,
+    TransitionKey,
+    create_transition,
+    to_relative_actions,
+)
+
+
+def reanchor_relative_rtc_prefix(
+    prev_actions_absolute: torch.Tensor,
+    current_state: torch.Tensor,
+    relative_step: RelativeActionsProcessorStep,
+    normalizer_step: NormalizerProcessorStep | None,
+    policy_device: torch.device | str,
+) -> torch.Tensor:
+    """Convert absolute leftover actions into model-space for relative-action RTC policies.
+
+    When using relative actions, the RTC prefix (previous chunk's unexecuted tail)
+    is stored in absolute coordinates. Before feeding it back to the policy, this
+    helper re-expresses those actions relative to the robot's current joint state
+    and optionally normalizes them so the policy receives correctly scaled inputs.
+    """
+    state = current_state.detach().cpu()
+    if state.dim() == 1:
+        state = state.unsqueeze(0)
+
+    action_cpu = prev_actions_absolute.detach().cpu()
+    mask = relative_step._build_mask(action_cpu.shape[-1])
+    relative_actions = to_relative_actions(action_cpu, state, mask)
+
+    transition = create_transition(action=relative_actions)
+    if normalizer_step is not None:
+        transition = normalizer_step(transition)
+
+    return transition[TransitionKey.ACTION].to(policy_device)
@@ -1 +0,0 @@
-../../../../docs/source/policy_sarm_README.md
@@ -394,13 +394,21 @@ class SmolVLAPolicy(PreTrainedPolicy):
        loss_dict["losses_after_rm_padding"] = losses.clone().mean().item()

        if reduction == "none":
-            # Return per-sample losses (B,) by averaging over time and action dims
-            per_sample_loss = losses.mean(dim=(1, 2))
+            # Return per-sample losses (B,) by averaging over valid (time, action) entries
+            if actions_is_pad is None:
+                per_sample_loss = losses.mean(dim=(1, 2))
+            else:
+                num_valid = ((~actions_is_pad).sum(dim=1) * losses.shape[-1]).clamp_min(1)
+                per_sample_loss = losses.sum(dim=(1, 2)) / num_valid
            loss_dict["loss"] = per_sample_loss.mean().item()
            return per_sample_loss, loss_dict
        else:
-            # Default: return scalar mean loss
-            loss = losses.mean()
+            # Default: return scalar mean loss over valid (time, action) entries
+            if actions_is_pad is None:
+                loss = losses.mean()
+            else:
+                num_valid = ((~actions_is_pad).sum() * losses.shape[-1]).clamp_min(1)
+                loss = losses.sum() / num_valid
            loss_dict["loss"] = loss.item()
            return loss, loss_dict

@@ -557,7 +557,7 @@ class RewardClassifierProcessorStep(ProcessorStep):
    def __post_init__(self):
        """Initializes the reward classifier model after the dataclass is created."""
        if self.pretrained_path is not None:
-            from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
+            from lerobot.rewards.classifier.modeling_classifier import Classifier

            self.reward_classifier = Classifier.from_pretrained(self.pretrained_path)
            self.reward_classifier.to(self.device)
@@ -142,6 +142,10 @@ class RelativeActionsProcessorStep(ProcessorStep):
        new_transition[TransitionKey.ACTION] = to_relative_actions(action, state, mask)
        return new_transition

+    def get_cached_state(self) -> torch.Tensor | None:
+        """Return the cached ``observation.state`` used as the reference point for relative/absolute action conversions."""
+        return self._last_state
+
    def get_config(self) -> dict[str, Any]:
        return {
            "enabled": self.enabled,
@@ -182,7 +186,8 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
                "but relative_step is None. Ensure relative_step is set when constructing the postprocessor."
            )

-        if self.relative_step._last_state is None:
+        cached_state = self.relative_step.get_cached_state()
+        if cached_state is None:
            raise RuntimeError(
                "AbsoluteActionsProcessorStep requires state from RelativeActionsProcessorStep "
                "but no state has been cached. Ensure the preprocessor runs before the postprocessor."
@@ -194,9 +199,7 @@ class AbsoluteActionsProcessorStep(ProcessorStep):
            return new_transition

        mask = self.relative_step._build_mask(action.shape[-1])
-        new_transition[TransitionKey.ACTION] = to_absolute_actions(
-            action, self.relative_step._last_state, mask
-        )
+        new_transition[TransitionKey.ACTION] = to_absolute_actions(action, cached_state, mask)
        return new_transition

    def get_config(self) -> dict[str, Any]:
@@ -0,0 +1,36 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .classifier.configuration_classifier import RewardClassifierConfig as RewardClassifierConfig
+from .factory import (
+    get_reward_model_class as get_reward_model_class,
+    make_reward_model as make_reward_model,
+    make_reward_model_config as make_reward_model_config,
+    make_reward_pre_post_processors as make_reward_pre_post_processors,
+)
+from .pretrained import PreTrainedRewardModel as PreTrainedRewardModel
+from .sarm.configuration_sarm import SARMConfig as SARMConfig
+
+__all__ = [
+    # Configuration classes
+    "RewardClassifierConfig",
+    "SARMConfig",
+    # Base class
+    "PreTrainedRewardModel",
+    # Factory functions
+    "get_reward_model_class",
+    "make_reward_model",
+    "make_reward_model_config",
+    "make_reward_pre_post_processors",
+]
@@ -1,5 +1,3 @@
-# !/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -15,14 +13,15 @@
 # limitations under the License.
 from dataclasses import dataclass, field

-from lerobot.configs import NormalizationMode, PreTrainedConfig
+from lerobot.configs import NormalizationMode
+from lerobot.configs.rewards import RewardModelConfig
 from lerobot.optim import AdamWConfig, LRSchedulerConfig, OptimizerConfig
 from lerobot.utils.constants import OBS_IMAGE


-@PreTrainedConfig.register_subclass(name="reward_classifier")
+@RewardModelConfig.register_subclass(name="reward_classifier")
@dataclass
-class RewardClassifierConfig(PreTrainedConfig):
+class RewardClassifierConfig(RewardModelConfig):
    """Configuration for the Reward Classifier model."""

    name: str = "reward_classifier"
@@ -1,5 +1,3 @@
-# !/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -19,11 +17,10 @@ import logging
 import torch
 from torch import Tensor, nn

+from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig
+from lerobot.rewards.pretrained import PreTrainedRewardModel
 from lerobot.utils.constants import OBS_IMAGE, REWARD

-from ...pretrained import PreTrainedPolicy
-from .configuration_classifier import RewardClassifierConfig
-

 class ClassifierOutput:
    """Wrapper for classifier outputs with additional metadata."""
@@ -99,7 +96,7 @@ class SpatialLearnedEmbeddings(nn.Module):
        return output


-class Classifier(PreTrainedPolicy):
+class Classifier(PreTrainedRewardModel):
    """Image classifier built on top of a pre-trained encoder."""

    name = "reward_classifier"
@@ -235,6 +232,16 @@ class Classifier(PreTrainedPolicy):

        return ClassifierOutput(logits=logits, probabilities=probabilities, hidden_states=encoder_outputs)

+    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
+        """Returns 1.0 for success, 0.0 for failure based on image observations."""
+        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]
+        output = self.predict(images)
+
+        if self.config.num_classes == 2:
+            return (output.probabilities > 0.5).float()
+        else:
+            return torch.argmax(output.probabilities, dim=1).float()
+
    def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict[str, Tensor]]:
        """Standard forward pass for training compatible with train.py."""
        # Extract images and labels
@@ -269,10 +276,6 @@ class Classifier(PreTrainedPolicy):

    def predict_reward(self, batch, threshold=0.5):
        """Eval method. Returns predicted reward with the decision threshold as argument."""
-        # Check for both OBS_IMAGE and OBS_IMAGES prefixes
-        batch = self.normalize_inputs(batch)
-        batch = self.normalize_targets(batch)
-
        # Extract images from batch dict
        images = [batch[key] for key in self.config.input_features if key.startswith(OBS_IMAGE)]

@@ -282,28 +285,3 @@ class Classifier(PreTrainedPolicy):
            return (probs > threshold).float()
        else:
            return torch.argmax(self.predict(images).probabilities, dim=1)
-
-    def get_optim_params(self):
-        """Return optimizer parameters for the policy."""
-        return self.parameters()
-
-    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
-        """
-        This method is required by PreTrainedPolicy but not used for reward classifiers.
-        The reward classifier is not an actor and does not select actions.
-        """
-        raise NotImplementedError("Reward classifiers do not select actions")
-
-    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
-        """
-        This method is required by PreTrainedPolicy but not used for reward classifiers.
-        The reward classifier is not an actor and does not produce action chunks.
-        """
-        raise NotImplementedError("Reward classifiers do not predict action chunks")
-
-    def reset(self):
-        """
-        This method is required by PreTrainedPolicy but not used for reward classifiers.
-        The reward classifier is not an actor and does not select actions.
-        """
-        pass
@@ -1,5 +1,3 @@
-# !/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -27,8 +25,7 @@ from lerobot.processor import (
    policy_action_to_transition,
    transition_to_policy_action,
 )
-
-from .configuration_classifier import RewardClassifierConfig
+from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig


 def make_classifier_processor(
@@ -52,8 +49,6 @@ def make_classifier_processor(
    Args:
        config: The configuration object for the RewardClassifier.
        dataset_stats: A dictionary of statistics for normalization.
-        preprocessor_kwargs: Additional arguments for the pre-processor pipeline.
-        postprocessor_kwargs: Additional arguments for the post-processor pipeline.

    Returns:
        A tuple containing the configured pre-processor and post-processor pipelines.
@@ -0,0 +1,238 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import importlib
+import logging
+from typing import Any
+
+import torch
+
+from lerobot.configs.rewards import RewardModelConfig
+from lerobot.processor import PolicyAction, PolicyProcessorPipeline
+from lerobot.rewards.classifier.configuration_classifier import RewardClassifierConfig
+from lerobot.rewards.pretrained import PreTrainedRewardModel
+from lerobot.rewards.sarm.configuration_sarm import SARMConfig
+
+
+def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
+    """
+    Retrieves a reward model class by its registered name.
+
+    This function uses dynamic imports to avoid loading all reward model classes into
+    memory at once, improving startup time and reducing dependencies.
+
+    Args:
+        name: The name of the reward model. Supported names are "reward_classifier",
+              "sarm".
+
+    Returns:
+        The reward model class corresponding to the given name.
+
+    Raises:
+        ValueError: If the reward model name is not recognized.
+    """
+    if name == "reward_classifier":
+        from lerobot.rewards.classifier.modeling_classifier import Classifier
+
+        return Classifier
+    elif name == "sarm":
+        from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel
+
+        return SARMRewardModel
+    else:
+        try:
+            return _get_reward_model_cls_from_name(name=name)
+        except Exception as e:
+            raise ValueError(f"Reward model type '{name}' is not available.") from e
+
+
+def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:
+    """
+    Instantiates a reward model configuration object based on the reward type.
+
+    This factory function simplifies the creation of reward model configuration objects
+    by mapping a string identifier to the corresponding config class.
+
+    Args:
+        reward_type: The type of the reward model. Supported types include
+                     "reward_classifier", "sarm".
+        **kwargs: Keyword arguments to be passed to the configuration class constructor.
+
+    Returns:
+        An instance of a `RewardModelConfig` subclass.
+
+    Raises:
+        ValueError: If the `reward_type` is not recognized.
+    """
+    if reward_type == "reward_classifier":
+        return RewardClassifierConfig(**kwargs)
+    elif reward_type == "sarm":
+        return SARMConfig(**kwargs)
+    else:
+        try:
+            config_cls = RewardModelConfig.get_choice_class(reward_type)
+            return config_cls(**kwargs)
+        except Exception as e:
+            raise ValueError(f"Reward model type '{reward_type}' is not available.") from e
+
+
+def make_reward_model(cfg: RewardModelConfig, **kwargs) -> PreTrainedRewardModel:
+    """
+    Instantiate a reward model from its configuration.
+
+    Args:
+        cfg: The configuration for the reward model to be created. If
+             `cfg.pretrained_path` is set, the model will be loaded with weights
+             from that path.
+        **kwargs: Additional keyword arguments forwarded to the model constructor
+            (e.g., ``dataset_stats``, ``dataset_meta``).
+
+    Returns:
+        An instantiated and device-placed reward model.
+    """
+    reward_cls = get_reward_model_class(cfg.type)
+
+    kwargs["config"] = cfg
+
+    if cfg.pretrained_path:
+        kwargs["pretrained_name_or_path"] = cfg.pretrained_path
+        reward_model = reward_cls.from_pretrained(**kwargs)
+    else:
+        reward_model = reward_cls(**kwargs)
+
+    reward_model.to(cfg.device)
+    assert isinstance(reward_model, torch.nn.Module)
+
+    return reward_model
+
+
+def make_reward_pre_post_processors(
+    reward_cfg: RewardModelConfig,
+    **kwargs,
+) -> tuple[
+    PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
+    PolicyProcessorPipeline[PolicyAction, PolicyAction],
+]:
+    """
+    Create pre- and post-processor pipelines for a given reward model.
+
+    Each reward model type has a dedicated factory function for its processors.
+
+    Args:
+        reward_cfg: The configuration of the reward model for which to create processors.
+        **kwargs: Additional keyword arguments passed to the processor factory
+            (e.g., ``dataset_stats``, ``dataset_meta``).
+
+    Returns:
+        A tuple containing the input (pre-processor) and output (post-processor) pipelines.
+
+    Raises:
+        ValueError: If a processor factory is not implemented for the given reward
+            model configuration type.
+    """
+    # Create a new processor based on reward model type
+    if isinstance(reward_cfg, RewardClassifierConfig):
+        from lerobot.rewards.classifier.processor_classifier import make_classifier_processor
+
+        return make_classifier_processor(
+            config=reward_cfg,
+            dataset_stats=kwargs.get("dataset_stats"),
+        )
+
+    elif isinstance(reward_cfg, SARMConfig):
+        from lerobot.rewards.sarm.processor_sarm import make_sarm_pre_post_processors
+
+        return make_sarm_pre_post_processors(
+            config=reward_cfg,
+            dataset_stats=kwargs.get("dataset_stats"),
+            dataset_meta=kwargs.get("dataset_meta"),
+        )
+
+    else:
+        try:
+            processors = _make_processors_from_reward_model_config(
+                config=reward_cfg,
+                dataset_stats=kwargs.get("dataset_stats"),
+            )
+        except Exception as e:
+            raise ValueError(
+                f"Processor for reward model type '{reward_cfg.type}' is not implemented."
+            ) from e
+        return processors
+
+
+def _get_reward_model_cls_from_name(name: str) -> type[PreTrainedRewardModel]:
+    """Get reward model class from its registered name using dynamic imports.
+
+    This is used as a helper function to import reward models from 3rd party lerobot
+    plugins.
+
+    Args:
+        name: The name of the reward model.
+
+    Returns:
+        The reward model class corresponding to the given name.
+    """
+    if name not in RewardModelConfig.get_known_choices():
+        raise ValueError(
+            f"Unknown reward model name '{name}'. "
+            f"Available reward models: {RewardModelConfig.get_known_choices()}"
+        )
+
+    config_cls = RewardModelConfig.get_choice_class(name)
+    config_cls_name = config_cls.__name__
+
+    model_name = config_cls_name.removesuffix("Config")
+    if model_name == config_cls_name:
+        raise ValueError(
+            f"The config class name '{config_cls_name}' does not follow the expected naming convention. "
+            f"Make sure it ends with 'Config'!"
+        )
+
+    cls_name = model_name + "RewardModel"
+    module_path = config_cls.__module__.replace("configuration_", "modeling_")
+
+    module = importlib.import_module(module_path)
+    reward_cls = getattr(module, cls_name)
+    return reward_cls
+
+
+def _make_processors_from_reward_model_config(
+    config: RewardModelConfig,
+    dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None,
+) -> tuple[Any, Any]:
+    """Create pre- and post-processors from a reward model configuration using dynamic imports.
+
+    This is used as a helper function to import processor factories from 3rd party
+    lerobot reward model plugins.
+
+    Args:
+        config: The reward model configuration object.
+        dataset_stats: Dataset statistics for normalization.
+
+    Returns:
+        A tuple containing the input (pre-processor) and output (post-processor) pipelines.
+    """
+    reward_type = config.type
+    function_name = f"make_{reward_type}_pre_post_processors"
+    module_path = config.__class__.__module__.replace("configuration_", "processor_")
+    logging.debug(
+        f"Instantiating reward pre/post processors using function '{function_name}' "
+        f"from module '{module_path}'"
+    )
+    module = importlib.import_module(module_path)
+    function = getattr(module, function_name)
+    return function(config, dataset_stats=dataset_stats)
@@ -0,0 +1,244 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import abc
+import builtins
+import logging
+import os
+from importlib.resources import files
+from pathlib import Path
+from tempfile import TemporaryDirectory
+from typing import TYPE_CHECKING, Any, TypeVar
+
+import packaging
+import safetensors
+from huggingface_hub import HfApi, ModelCard, ModelCardData, hf_hub_download
+from huggingface_hub.constants import SAFETENSORS_SINGLE_FILE
+from huggingface_hub.errors import HfHubHTTPError
+from safetensors.torch import load_model as load_model_as_safetensor, save_model as save_model_as_safetensor
+from torch import Tensor, nn
+
+from lerobot.configs.rewards import RewardModelConfig
+from lerobot.utils.hub import HubMixin
+
+if TYPE_CHECKING:
+    from lerobot.configs.train import TrainPipelineConfig
+
+T = TypeVar("T", bound="PreTrainedRewardModel")
+
+
+class PreTrainedRewardModel(nn.Module, HubMixin, abc.ABC):
+    """Base class for reward models."""
+
+    config_class: None
+    name: None
+
+    def __init__(self, config: RewardModelConfig, *inputs, **kwargs):
+        super().__init__()
+        if not isinstance(config, RewardModelConfig):
+            raise ValueError(
+                f"Parameter config in `{self.__class__.__name__}(config)` should be an instance of class "
+                "`RewardModelConfig`. To create a model from a pretrained model use "
+                f"`model = {self.__class__.__name__}.from_pretrained(PRETRAINED_MODEL_NAME)`"
+            )
+        self.config = config
+
+    def __init_subclass__(cls, **kwargs):
+        super().__init_subclass__(**kwargs)
+        if not getattr(cls, "config_class", None):
+            raise TypeError(f"Class {cls.__name__} must define 'config_class'")
+        if not getattr(cls, "name", None):
+            raise TypeError(f"Class {cls.__name__} must define 'name'")
+
+    def _save_pretrained(self, save_directory: Path) -> None:
+        self.config._save_pretrained(save_directory)
+        model_to_save = self.module if hasattr(self, "module") else self
+        save_model_as_safetensor(model_to_save, str(save_directory / SAFETENSORS_SINGLE_FILE))
+
+    @classmethod
+    def from_pretrained(
+        cls: builtins.type[T],
+        pretrained_name_or_path: str | Path,
+        *,
+        config: RewardModelConfig | None = None,
+        force_download: bool = False,
+        resume_download: bool | None = None,
+        proxies: dict | None = None,
+        token: str | bool | None = None,
+        cache_dir: str | Path | None = None,
+        local_files_only: bool = False,
+        revision: str | None = None,
+        strict: bool = False,
+        **kwargs,
+    ) -> T:
+        """
+        The reward model is set in evaluation mode by default using `reward.eval()` (dropout modules are
+        deactivated). To train it, you should first set it back in training mode with `reward.train()`.
+        """
+        if config is None:
+            config = RewardModelConfig.from_pretrained(
+                pretrained_name_or_path=pretrained_name_or_path,
+                force_download=force_download,
+                resume_download=resume_download,
+                proxies=proxies,
+                token=token,
+                cache_dir=cache_dir,
+                local_files_only=local_files_only,
+                revision=revision,
+                **kwargs,
+            )
+        model_id = str(pretrained_name_or_path)
+        instance = cls(config, **kwargs)
+        if os.path.isdir(model_id):
+            print("Loading weights from local directory")
+            model_file = os.path.join(model_id, SAFETENSORS_SINGLE_FILE)
+            reward = cls._load_as_safetensor(instance, model_file, config.device or "cpu", strict)
+        else:
+            try:
+                model_file = hf_hub_download(
+                    repo_id=model_id,
+                    filename=SAFETENSORS_SINGLE_FILE,
+                    revision=revision,
+                    cache_dir=cache_dir,
+                    force_download=force_download,
+                    proxies=proxies,
+                    resume_download=resume_download,
+                    token=token,
+                    local_files_only=local_files_only,
+                )
+                reward = cls._load_as_safetensor(instance, model_file, config.device or "cpu", strict)
+            except HfHubHTTPError as e:
+                raise FileNotFoundError(
+                    f"{SAFETENSORS_SINGLE_FILE} not found on the HuggingFace Hub in {model_id}"
+                ) from e
+
+        reward.to(config.device)
+        reward.eval()
+        return reward
+
+    @classmethod
+    def _load_as_safetensor(cls, model: T, model_file: str, map_location: str, strict: bool) -> T:
+        # Create base kwargs
+        kwargs = {"strict": strict}
+
+        # Add device parameter for newer versions that support it
+        if packaging.version.parse(safetensors.__version__) >= packaging.version.parse("0.4.3"):
+            kwargs["device"] = map_location
+
+        # Load the model with appropriate kwargs
+        missing_keys, unexpected_keys = load_model_as_safetensor(model, model_file, **kwargs)
+        if missing_keys:
+            logging.warning(f"Missing key(s) when loading model: {missing_keys}")
+        if unexpected_keys:
+            logging.warning(f"Unexpected key(s) when loading model: {unexpected_keys}")
+
+        # For older versions, manually move to device if needed
+        if "device" not in kwargs and map_location != "cpu":
+            logging.warning(
+                "Loading model weights on other devices than 'cpu' is not supported natively in your version of safetensors."
+                " This means that the model is loaded on 'cpu' first and then copied to the device."
+                " This leads to a slower loading time."
+                " Please update safetensors to version 0.4.3 or above for improved performance."
+            )
+            model.to(map_location)
+        return model
+
+    def get_optim_params(self):
+        """
+        Returns the reward-model-specific parameters dict to be passed on to the optimizer.
+        """
+        return self.parameters()
+
+    def reset(self) -> None:
+        """Reset any internal state."""
+        pass
+
+    @abc.abstractmethod
+    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
+        """Compute a scalar reward signal for a batch of observations.
+
+        Args:
+            batch: Dictionary containing at minimum observation tensors.
+                   May also contain "action", "next_observation.*", etc.
+
+        Returns:
+            Tensor of shape ``(batch_size,)`` with reward values.
+        """
+        ...
+
+    def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict[str, Any]]:
+        """Training forward pass — override for trainable reward models."""
+        raise NotImplementedError(
+            f"{self.__class__.__name__} is not trainable. Only use compute_reward() for inference."
+        )
+
+    @property
+    def is_trainable(self) -> bool:
+        """Whether this reward model can be trained via ``lerobot-train``.
+
+        Trainable reward models override :meth:`forward`; zero-shot models
+        inherit the base implementation that raises ``NotImplementedError``.
+        """
+        return type(self).forward is not PreTrainedRewardModel.forward
+
+    def push_model_to_hub(self, cfg: "TrainPipelineConfig"):
+        api = HfApi()
+        repo_id = api.create_repo(
+            repo_id=self.config.repo_id, private=self.config.private, exist_ok=True
+        ).repo_id
+
+        # Push the files to the repo in a single commit
+        with TemporaryDirectory(ignore_cleanup_errors=True) as tmp:
+            saved_path = Path(tmp) / repo_id
+
+            self.save_pretrained(saved_path)  # Calls _save_pretrained and stores model tensors
+
+            card = self.generate_model_card(
+                cfg.dataset.repo_id, self.config.type, self.config.license, self.config.tags
+            )
+            card.save(str(saved_path / "README.md"))
+
+            cfg.save_pretrained(saved_path)  # Calls _save_pretrained and stores train config
+
+            commit_info = api.upload_folder(
+                repo_id=repo_id,
+                repo_type="model",
+                folder_path=saved_path,
+                commit_message="Upload reward model weights, train config and readme",
+                allow_patterns=["*.safetensors", "*.json", "*.yaml", "*.md"],
+                ignore_patterns=["*.tmp", "*.log"],
+            )
+
+            logging.info(f"Model pushed to {commit_info.repo_url.url}")
+
+    def generate_model_card(
+        self, dataset_repo_id: str, model_type: str, license: str | None, tags: list[str] | None
+    ) -> ModelCard:
+        card_data = ModelCardData(
+            license=license or "apache-2.0",
+            library_name="lerobot",
+            pipeline_tag="robotics",
+            tags=list(set(tags or []).union({"robotics", "lerobot", "reward-model", model_type})),
+            model_name=model_type,
+            datasets=dataset_repo_id,
+        )
+
+        template_card = (
+            files("lerobot.templates")
+            .joinpath("lerobot_rewardmodel_modelcard_template.md")
+            .read_text(encoding="utf-8")
+        )
+        card = ModelCard.from_template(card_data, template_str=template_card)
+        card.validate()
+        return card
@@ -1,4 +1,4 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -14,5 +14,6 @@

 from .configuration_sarm import SARMConfig
 from .modeling_sarm import SARMRewardModel
+from .processor_sarm import make_sarm_pre_post_processors

-__all__ = ["SARMConfig", "SARMRewardModel"]
+__all__ = ["SARMConfig", "SARMRewardModel", "make_sarm_pre_post_processors"]
@@ -25,18 +25,18 @@ need ~num_frames/30 queries instead of one per frame (~30x speedup).

 Usage:
    # Full RA-BC computation with visualizations
-    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4

    # Faster computation with stride (compute every 5 frames, interpolate the rest)
-    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --stride 5

    # Visualize predictions only (no RA-BC computation)
-    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --visualize-only \\
@@ -58,10 +58,9 @@ import torch
 from tqdm import tqdm

 from lerobot.datasets import LeRobotDataset
-
-from .modeling_sarm import SARMRewardModel
-from .processor_sarm import make_sarm_pre_post_processors
-from .sarm_utils import normalize_stage_tau
+from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel
+from lerobot.rewards.sarm.processor_sarm import make_sarm_pre_post_processors
+from lerobot.rewards.sarm.sarm_utils import normalize_stage_tau


 def get_reward_model_path_from_parquet(parquet_path: Path) -> str | None:
@@ -713,12 +712,12 @@ def main():
        epilog="""
 Examples:
    # Full RA-BC computation with visualizations
-    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4

    # Visualize predictions only (no RA-BC computation)
-    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+    python src/lerobot/rewards/sarm/compute_rabc_weights.py \\
        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
        --reward-model-path <USER>/sarm_single_uni4 \\
        --visualize-only \\
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
 # Copyright 2025 Qianzhong Chen, Justin Yu, Mac Schwager, Pieter Abbeel, Yide Shentu, Philipp Wu
 # and The HuggingFace Inc. team. All rights reserved.
 #
@@ -22,14 +20,15 @@ Paper: https://arxiv.org/abs/2509.25358

 from dataclasses import dataclass, field

-from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature, PreTrainedConfig
+from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature
+from lerobot.configs.rewards import RewardModelConfig
 from lerobot.optim import AdamWConfig, CosineDecayWithWarmupSchedulerConfig
 from lerobot.utils.constants import OBS_IMAGES, OBS_STATE


-@PreTrainedConfig.register_subclass("sarm")
+@RewardModelConfig.register_subclass("sarm")
@dataclass
-class SARMConfig(PreTrainedConfig):
+class SARMConfig(RewardModelConfig):
    """Configuration class for SARM (Stage-Aware Reward Modeling).

    Supports three annotation modes:
@@ -110,7 +109,6 @@ class SARMConfig(PreTrainedConfig):

    def __post_init__(self):
        super().__post_init__()
-
        if self.annotation_mode not in ["single_stage", "dense_only", "dual"]:
            raise ValueError(
                f"annotation_mode must be 'single_stage', 'dense_only', or 'dual', got {self.annotation_mode}"
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
 # Copyright 2025 Qianzhong Chen, Justin Yu, Mac Schwager, Pieter Abbeel, Yide Shentu, Philipp Wu
 # and The HuggingFace Inc. team. All rights reserved.
 #
@@ -34,14 +32,13 @@ import torch.nn as nn
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor

-from lerobot.utils.constants import OBS_STR
-
-from ..pretrained import PreTrainedPolicy
-from .configuration_sarm import SARMConfig
-from .sarm_utils import (
+from lerobot.rewards.pretrained import PreTrainedRewardModel
+from lerobot.rewards.sarm.configuration_sarm import SARMConfig
+from lerobot.rewards.sarm.sarm_utils import (
    normalize_stage_tau,
    pad_state_to_max_dim,
 )
+from lerobot.utils.constants import OBS_STR


 class StageTransformer(nn.Module):
@@ -353,7 +350,7 @@ def gen_stage_emb(num_classes: int, targets: torch.Tensor) -> torch.Tensor:
    return stage_onehot


-class SARMRewardModel(PreTrainedPolicy):
+class SARMRewardModel(PreTrainedRewardModel):
    """
    SARM Reward Model for stage-aware task completion rewards.

@@ -471,6 +468,23 @@ class SARMRewardModel(PreTrainedPolicy):
        self.subtask_model.to(device)
        return self

+    def compute_reward(self, batch: dict[str, Tensor]) -> Tensor:
+        """Compute dense progress reward in [0, 1] from batch.
+
+        Expects batch to contain:
+        - "observation_features" or video embeddings: (B, T, 512)
+        - "language_embedding" or text embeddings: (B, 512)
+        - optionally "observation.state": (B, T, state_dim)
+        """
+        text_emb = batch.get("language_embedding", batch.get("text_features"))
+        video_emb = batch.get("observation_features", batch.get("video_features"))
+        state = batch.get("observation.state", batch.get("state_features"))
+
+        rewards = self.calculate_rewards(text_emb, video_emb, state)
+        if isinstance(rewards, np.ndarray):
+            rewards = torch.from_numpy(rewards).float()
+        return rewards
+
    @torch.no_grad()
    def calculate_rewards(
        self,
@@ -631,17 +645,9 @@ class SARMRewardModel(PreTrainedPolicy):
        return self.parameters()

    def reset(self):
-        """Required by PreTrainedPolicy but not used for reward models."""
+        """SARM has no episode-level state to reset."""
        pass

-    def predict_action_chunk(self, batch: dict[str, Tensor]) -> Tensor:
-        """Required by PreTrainedPolicy but not used for reward models."""
-        raise NotImplementedError("SARM model does not predict action chunks")
-
-    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
-        """Required by PreTrainedPolicy but not used for SARM."""
-        raise NotImplementedError("SARM model does not select actions")
-
    def _train_step(
        self,
        img_emb: torch.Tensor,  # (B, N, T, D)
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -60,16 +58,15 @@ from lerobot.processor import (
    policy_action_to_transition,
    transition_to_policy_action,
 )
-from lerobot.types import EnvTransition, PolicyAction, TransitionKey
-from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME
-
-from .configuration_sarm import SARMConfig
-from .sarm_utils import (
+from lerobot.rewards.sarm.configuration_sarm import SARMConfig
+from lerobot.rewards.sarm.sarm_utils import (
    apply_rewind_augmentation,
    compute_absolute_indices,
    find_stage_and_tau,
    pad_state_to_max_dim,
 )
+from lerobot.types import EnvTransition, PolicyAction, TransitionKey
+from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME


 class SARMEncodingProcessorStep(ProcessorStep):
@@ -455,7 +452,13 @@ class SARMEncodingProcessorStep(ProcessorStep):
            inputs = {k: v.to(self.device) for k, v in inputs.items()}

            # Get image embeddings
-            embeddings = self.clip_model.get_image_features(**inputs).detach().cpu()
+            # transformers 5.x returns BaseModelOutputWithPooling instead of a plain tensor
+            output = self.clip_model.get_image_features(**inputs)
+            if not isinstance(output, torch.Tensor):
+                output = output.pooler_output
+                if output is None:
+                    raise ValueError("pooler_output should not be None for CLIP models.")
+            embeddings = output.detach().cpu()

            # Handle single frame case
            if embeddings.dim() == 1:
@@ -482,7 +485,13 @@ class SARMEncodingProcessorStep(ProcessorStep):
        inputs = self.clip_processor.tokenizer([text], return_tensors="pt", padding=True, truncation=True)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

-        text_embedding = self.clip_model.get_text_features(**inputs).detach().cpu()
+        # transformers 5.x returns BaseModelOutputWithPooling instead of a plain tensor
+        output = self.clip_model.get_text_features(**inputs)
+        if not isinstance(output, torch.Tensor):
+            output = output.pooler_output
+            if output is None:
+                raise ValueError("pooler_output should not be None for CLIP models.")
+        text_embedding = output.detach().cpu()
        text_embedding = text_embedding.expand(batch_size, -1)

        return text_embedding
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -14,14 +12,38 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+"""
+RA-BC (Reward-Aligned Behavior Cloning) sample weighting implementation.
+
+This module implements the SampleWeighter protocol for RA-BC training,
+which weights training samples based on their task progress as measured
+by the SARM reward model.
+
+The weights are computed based on progress deltas:
+    delta = progress[t + chunk_size] - progress[t]
+
+High-quality samples (positive progress) get higher weights, while
+samples with negative progress (going backwards) get zero weight.
+
+See: https://arxiv.org/abs/2509.25358 for the SARM paper.
+"""
+
 import logging
 from pathlib import Path
+from typing import TYPE_CHECKING

 import numpy as np
-import pandas as pd
 import torch
 from huggingface_hub import hf_hub_download

+from lerobot.utils.import_utils import _pandas_available
+from lerobot.utils.sample_weighting import SampleWeighter
+
+if TYPE_CHECKING or _pandas_available:
+    import pandas as pd
+else:
+    pd = None  # type: ignore[assignment]
+

 def resolve_hf_path(path: str | Path) -> Path:
    """Resolve a path that may be a HuggingFace URL (hf://datasets/...) to a local path."""
@@ -34,23 +56,27 @@ def resolve_hf_path(path: str | Path) -> Path:
    return Path(path)


-class RABCWeights:
+class RABCWeights(SampleWeighter):
    """
    Load precomputed SARM progress values and compute RA-BC weights during training.

+    This class implements the SampleWeighter ABC for use with the generic
+    sample weighting infrastructure in lerobot.
+
    Progress values are loaded from a parquet file (generated by compute_rabc_weights.py).
    During training, computes:
        - progress_delta = progress[t + chunk_size] - progress[t]
        - rabc_weight based on the delta (paper Eq. 8-9)

    Args:
-        progress_path: Path to parquet file with precomputed progress values
-        chunk_size: Number of frames ahead for computing progress delta
-        head_mode: Which SARM head to use ("sparse" or "dense")
-        kappa: Hard threshold for high-quality samples (default: 0.01)
-        epsilon: Small constant for numerical stability (default: 1e-6)
-        fallback_weight: Weight to use for frames without valid delta (default: 1.0)
-        device: Device to return tensors on
+        progress_path: Path to parquet file with precomputed progress values.
+                      Supports HuggingFace URLs (hf://datasets/...).
+        chunk_size: Number of frames ahead for computing progress delta.
+        head_mode: Which SARM head to use ("sparse" or "dense").
+        kappa: Hard threshold for high-quality samples (default: 0.01).
+        epsilon: Small constant for numerical stability (default: 1e-6).
+        fallback_weight: Weight to use for frames without valid delta (default: 1.0).
+        device: Device to return tensors on.
    """

    def __init__(
@@ -61,7 +87,7 @@ class RABCWeights:
        kappa: float = 0.01,
        epsilon: float = 1e-6,
        fallback_weight: float = 1.0,
-        device: torch.device = None,
+        device: torch.device | None = None,
    ):
        self.progress_path = resolve_hf_path(progress_path)
        self.chunk_size = chunk_size
@@ -87,8 +113,8 @@ class RABCWeights:

        logging.info(f"Using progress column: {self.progress_column}")

-        self.progress_lookup = {}
-        self.episode_lookup = {}
+        self.progress_lookup: dict[int, float] = {}
+        self.episode_lookup: dict[int, int] = {}

        for _, row in self.df.iterrows():
            global_idx = int(row["index"])
@@ -100,7 +126,7 @@ class RABCWeights:
            self.episode_lookup[global_idx] = episode_idx

        # Build episode boundaries for delta computation
-        self.episode_boundaries = {}
+        self.episode_boundaries: dict[int, dict[str, int]] = {}
        for episode_idx in self.df["episode_index"].unique():
            ep_df = self.df[self.df["episode_index"] == episode_idx]
            self.episode_boundaries[int(episode_idx)] = {
@@ -114,7 +140,7 @@ class RABCWeights:
        # Compute global statistics for weight computation
        self._compute_global_stats()

-    def _compute_global_stats(self):
+    def _compute_global_stats(self) -> None:
        """Compute global mean and std of progress deltas for weight calculation."""
        all_deltas = []

@@ -138,8 +164,8 @@ class RABCWeights:
                all_deltas.append(delta)

        if all_deltas:
-            self.delta_mean = max(np.mean(all_deltas), 0.0)
-            self.delta_std = max(np.std(all_deltas), self.epsilon)
+            self.delta_mean = max(float(np.mean(all_deltas)), 0.0)
+            self.delta_std = max(float(np.std(all_deltas)), self.epsilon)
            logging.info(f"Progress delta stats: mean={self.delta_mean:.4f}, std={self.delta_std:.4f}")
        else:
            self.delta_mean = 0.0
@@ -157,18 +183,19 @@ class RABCWeights:
        4. Compute weight using paper Eq. 8-9

        Args:
-            batch: Training batch containing "index" key with global frame indices
+            batch: Training batch containing "index" key with global frame indices.

        Returns:
            Tuple of:
-            - Weights tensor (batch_size,) normalized to sum to batch_size
-            - Stats dict with raw_mean_weight, num_zero_weight, num_full_weight
+            - Weights tensor (batch_size,) normalized to sum to batch_size.
+            - Stats dict with weighting statistics for logging.
        """
        indices = batch.get("index")
        if indices is None:
            logging.warning("RA-BC: Batch missing 'index' key, using uniform weights")
            batch_size = self._get_batch_size(batch)
-            return torch.ones(batch_size, device=self.device), {"raw_mean_weight": 1.0}
+            stats = {"mean_weight": 1.0, "num_zero_weight": 0, "num_full_weight": batch_size}
+            return torch.ones(batch_size, device=self.device), stats

        # Convert to list of ints
        if isinstance(indices, torch.Tensor):
@@ -183,29 +210,29 @@ class RABCWeights:
            delta = self._compute_delta(idx)
            deltas.append(delta)

-        deltas = np.array(deltas, dtype=np.float32)
+        deltas_array = np.array(deltas, dtype=np.float32)

        # Compute weights from deltas
-        weights = self._compute_weights(deltas)
+        weights = self._compute_weights(deltas_array)

        # Compute stats before normalization for logging
        raw_mean_weight = float(np.nanmean(weights))
        num_zero_weight = int(np.sum(weights == 0))
        num_full_weight = int(np.sum(weights == 1.0))
        batch_stats = {
-            "raw_mean_weight": raw_mean_weight,
+            "mean_weight": raw_mean_weight,
            "num_zero_weight": num_zero_weight,
            "num_full_weight": num_full_weight,
        }

-        weights = torch.tensor(weights, device=self.device, dtype=torch.float32)
+        weights_tensor = torch.tensor(weights, device=self.device, dtype=torch.float32)

        # Normalize to sum to batch_size
-        batch_size = len(weights)
-        weight_sum = weights.sum() + self.epsilon
-        weights = weights * batch_size / weight_sum
+        batch_size = len(weights_tensor)
+        weight_sum = weights_tensor.sum() + self.epsilon
+        weights_tensor = weights_tensor * batch_size / weight_sum

-        return weights, batch_stats
+        return weights_tensor, batch_stats

    def _compute_delta(self, global_idx: int) -> float:
        """Compute progress delta for a single frame."""
@@ -241,7 +268,7 @@ class RABCWeights:
        - Final weight: wi = 1{ri > κ} + 1{0 ≤ ri ≤ κ}˜wi

        Returns:
-            Array of weights
+            Array of weights.
        """
        valid_mask = ~np.isnan(deltas)

@@ -273,12 +300,13 @@ class RABCWeights:
            if key in batch:
                val = batch[key]
                if isinstance(val, (torch.Tensor, np.ndarray)):
-                    return val.shape[0]
+                    return int(val.shape[0])
        return 1

    def get_stats(self) -> dict:
-        """Get statistics."""
+        """Get global statistics about the RA-BC weighting."""
        return {
+            "type": "rabc",
            "num_frames": len(self.progress_lookup),
            "chunk_size": self.chunk_size,
            "head_mode": self.head_mode,
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
 # Copyright 2025 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -193,15 +193,15 @@ def convert_lerobot_dataset_to_cropped_lerobot_dataset(
        fps=int(original_dataset.fps),
        root=new_dataset_root,
        robot_type=original_dataset.meta.robot_type,
-        features=original_dataset.meta.info["features"],
+        features=original_dataset.meta.info.features,
        use_videos=len(original_dataset.meta.video_keys) > 0,
    )

    # Update the metadata for every image key that will be cropped:
    # (Here we simply set the shape to be the final resize_size.)
    for key in crop_params_dict:
-        if key in new_dataset.meta.info["features"]:
-            new_dataset.meta.info["features"][key]["shape"] = [3] + list(resize_size)
+        if key in new_dataset.meta.info.features:
+            new_dataset.meta.info.features[key]["shape"] = (3, *resize_size)

    # TODO:  Directly modify the mp4 video + meta info features, instead of recreating a dataset
    prev_episode_index = 0
@@ -68,9 +68,16 @@ class SOFollower(Robot):

    @property
    def _cameras_ft(self) -> dict[str, tuple]:
-        return {
-            cam: (self.config.cameras[cam].height, self.config.cameras[cam].width, 3) for cam in self.cameras
-        }
+        features: dict[str, tuple] = {}
+        for cam in self.cameras:
+            cam_cfg = self.config.cameras[cam]
+            features[cam] = (cam_cfg.height, cam_cfg.width, 3)
+            # Cameras with a depth stream (e.g. RealSense with use_depth=True) also
+            # emit a 2D depth feature; hw_to_dataset_features routes 2D shapes to
+            # ``observation.depth.<bare>`` with the depth-map marker.
+            if getattr(cam_cfg, "use_depth", False):
+                features[f"{cam}_depth"] = (cam_cfg.height, cam_cfg.width)
+        return features

    @cached_property
    def observation_features(self) -> dict[str, type | tuple]:
@@ -190,6 +197,14 @@ class SOFollower(Robot):
            dt_ms = (time.perf_counter() - start) * 1e3
            logger.debug(f"{self} read {cam_key}: {dt_ms:.1f}ms")

+            # Cameras with a depth stream populate a sibling ``<cam>_depth`` key
+            # (consumed by hw_to_dataset_features / build_dataset_frame).
+            if getattr(self.config.cameras[cam_key], "use_depth", False):
+                start = time.perf_counter()
+                obs_dict[f"{cam_key}_depth"] = cam.read_latest_depth()
+                dt_ms = (time.perf_counter() - start) * 1e3
+                logger.debug(f"{self} read {cam_key} depth: {dt_ms:.1f}ms")
+
        return obs_dict

    @check_if_not_connected
@@ -75,7 +75,7 @@ class SentryStrategyConfig(RolloutStrategyConfig):
    # Target video file size in MB for episode rotation.  Episodes are
    # saved once the estimated video duration would exceed this limit.
    # Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when set to None.
-    target_video_file_size_mb: float | None = None
+    target_video_file_size_mb: int | None = None


@RolloutStrategyConfig.register_subclass("highlight")
@@ -90,7 +90,7 @@ class HighlightStrategyConfig(RolloutStrategyConfig):
    """

    ring_buffer_seconds: float = 10.0
-    ring_buffer_max_memory_mb: float = 1024.0
+    ring_buffer_max_memory_mb: int = 1024
    save_key: str = "s"
    push_key: str = "h"

@@ -150,7 +150,7 @@ class DAggerStrategyConfig(RolloutStrategyConfig):
    upload_every_n_episodes: int = 5
    # Target video file size in MB for episode rotation (record_autonomous
    # mode only).  Defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB when None.
-    target_video_file_size_mb: float | None = None
+    target_video_file_size_mb: int | None = None
    input_device: str = "keyboard"
    keyboard: DAggerKeyboardConfig = field(default_factory=DAggerKeyboardConfig)
    pedal: DAggerPedalConfig = field(default_factory=DAggerPedalConfig)
@@ -209,6 +209,12 @@ class RolloutConfig:
    # Rename map for mapping robot/dataset observation keys to policy keys
    rename_map: dict[str, str] = field(default_factory=dict)

+    # Hardware teardown
+    # When True (default), smoothly interpolate the robot back to the joint
+    # positions captured at startup before disconnecting.  Set to False to
+    # leave the robot in its final achieved pose at shutdown.
+    return_to_initial_position: bool = True
+
    # Torch compile
    use_torch_compile: bool = False
    torch_compile_backend: str = "inductor"
@@ -27,7 +27,7 @@ from threading import Event

 import torch

-from lerobot.configs import FeatureType, PreTrainedConfig
+from lerobot.configs import FeatureType
 from lerobot.datasets import (
    LeRobotDataset,
    aggregate_pipeline_dataset_features,
@@ -43,6 +43,7 @@ from lerobot.processor import (
    make_default_processors,
    rename_stats,
 )
+from lerobot.processor.relative_action_processor import RelativeActionsProcessorStep
 from lerobot.robots import make_robot_from_config
 from lerobot.teleoperators import Teleoperator, make_teleoperator_from_config
 from lerobot.utils.feature_utils import combine_feature_dicts, hw_to_dataset_features
@@ -51,6 +52,7 @@ from .configs import BaseStrategyConfig, DAggerStrategyConfig, RolloutConfig
 from .inference import (
    InferenceEngine,
    RTCInferenceConfig,
+    SyncInferenceConfig,
    create_inference_engine,
 )
 from .robot_wrapper import ThreadSafeRobot
@@ -176,33 +178,26 @@ def build_rollout_context(
    policy_config = cfg.policy
    policy_class = get_policy_class(policy_config.type)

-    full_config = PreTrainedConfig.from_pretrained(cfg.policy.pretrained_path)
-    for attr in ("device", "use_amp"):
-        if hasattr(cfg.policy, attr) and hasattr(full_config, attr):
-            cli_val = getattr(cfg.policy, attr)
-            if cli_val is not None:
-                setattr(full_config, attr, cli_val)
+    if hasattr(policy_config, "compile_model"):
+        policy_config.compile_model = cfg.use_torch_compile

-    if hasattr(full_config, "compile_model"):
-        full_config.compile_model = cfg.use_torch_compile
-
-    if full_config.type == "vqbet" and cfg.device == "mps":
+    if policy_config.type == "vqbet" and cfg.device == "mps":
        raise NotImplementedError(
            "Current implementation of VQBeT does not support `mps` backend. "
            "Please use `cpu` or `cuda` backend."
        )

-    if full_config.use_peft:
+    if policy_config.use_peft:
        from peft import PeftConfig, PeftModel

-        peft_path = cfg.policy.pretrained_path
+        peft_path = policy_config.pretrained_path
        peft_config = PeftConfig.from_pretrained(peft_path)
        policy = policy_class.from_pretrained(
-            pretrained_name_or_path=peft_config.base_model_name_or_path, config=full_config
+            pretrained_name_or_path=peft_config.base_model_name_or_path, config=policy_config
        )
        policy = PeftModel.from_pretrained(policy, peft_path, config=peft_config)
    else:
-        policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=full_config)
+        policy = policy_class.from_pretrained(policy_config.pretrained_path, config=policy_config)

    if is_rtc:
        policy.config.rtc_config = cfg.inference.rtc
@@ -257,10 +252,12 @@ def build_rollout_context(
        teleop.connect()
        logger.info("Teleoperator connected")

-    # DAgger requires teleop with motor control capabilities (enable_torque,
-    # disable_torque, write_goal_positions).
-    # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-    # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+    # TODO(Steven): once Teleoperator motor-control methods are standardised
+    # (``enable_torque`` / ``disable_torque`` / ``write_goal_positions``), gate
+    # the DAgger strategy on their presence here and fail fast with a helpful
+    # message instead of relying on the operator to pre-align the leader by
+    # hand.  See :func:`DAggerStrategy._apply_transition` for the matching
+    # disabled call sites.
    # if isinstance(cfg.strategy, DAggerStrategyConfig) and teleop is not None:
    #     required_teleop_methods = ("enable_torque", "disable_torque", "write_goal_positions")
    #     missing = [m for m in required_teleop_methods if not callable(getattr(teleop, m, None))]
@@ -272,10 +269,13 @@ def build_rollout_context(
    #         )

    # --- 4. Features + action-key reconciliation ---------------------
-    # TODO(Steven): Only `.pos` joint features are used for policy inference — velocity and
-    # torque channels are observation-only and must be excluded from the state
-    # and action tensors that the policy sees.
+    # TODO(Steven):Only ``.pos`` joint features are routed to the policy as state and as the
+    # action target; velocity and torque channels (when present) are kept in
+    # the raw observation but excluded from the policy-facing tensors.
    all_obs_features = robot.observation_features
+    # ``observation_features`` values are either a tuple (camera shape) or the
+    # ``float`` type itself used as a sentinel for scalar motor features —
+    # see ``dict[str, type | tuple]`` annotation on ``Robot.observation_features``.
    observation_features_hw = {
        k: v
        for k, v in all_obs_features.items()
@@ -308,7 +308,9 @@ def build_rollout_context(
    # Validate visual features if no rename_map is active
    rename_map = cfg.rename_map
    if not rename_map:
-        expected_visuals = {k for k, v in full_config.input_features.items() if v.type == FeatureType.VISUAL}
+        expected_visuals = {
+            k for k, v in policy_config.input_features.items() if v.type == FeatureType.VISUAL
+        }
        provided_visuals = {
            f"observation.images.{k}" for k, v in robot.observation_features.items() if isinstance(v, tuple)
        }
@@ -353,6 +355,7 @@ def build_rollout_context(
                    "Use --dataset.repo_id=<user>/rollout_<name> for policy deployment datasets."
                )
            cfg.dataset.stamp_repo_id()
+            target_video_mb = getattr(cfg.strategy, "target_video_file_size_mb", None)
            dataset = LeRobotDataset.create(
                cfg.dataset.repo_id,
                cfg.dataset.fps,
@@ -368,6 +371,7 @@ def build_rollout_context(
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
                encoder_threads=cfg.dataset.encoder_threads,
+                video_files_size_in_mb=target_video_mb,
            )

    if dataset is not None:
@@ -391,6 +395,15 @@ def build_rollout_context(
        },
    )

+    if isinstance(cfg.inference, SyncInferenceConfig) and any(
+        isinstance(step, RelativeActionsProcessorStep) and step.enabled
+        for step in getattr(preprocessor, "steps", ())
+    ):
+        raise NotImplementedError(
+            "SyncInferenceEngine does not support policies with relative actions for now."
+            "Use --inference.type=rtc or remove relative action processor steps from the policy pipeline."
+        )
+
    # --- 7. Inference strategy (needs policy + pre/post + hardware) --
    logger.info(
        "Creating inference engine (type=%s)...",
@@ -14,8 +14,8 @@

 """Inference engine package — backend-agnostic action production.

-Concrete strategies (sync, RTC, …) expose the same small interface so
-rollout strategies never branch on the inference backend.
+Concrete backends (``sync``, ``rtc``, ...) expose the same small interface so
+rollout strategies never branch on which backend is in use.
 """

 from .base import InferenceEngine
@@ -15,8 +15,8 @@
 """Inference engine ABC.

 Rollout strategies consume actions through this small interface so they
-do not need to know whether the inference engine is synchronous, runs in
-a background thread (RTC), or comes from an external source.
+do not need to know whether inference happens inline on the control thread
+or asynchronously in a background thread (RTC).
 """

 from __future__ import annotations
@@ -29,9 +29,10 @@ import torch
 class InferenceEngine(abc.ABC):
    """Abstract backend for producing actions during rollout.

-    Subclasses decide whether inference happens inline, in a background
-    thread, or externally. The contract is minimal so new backends can
-    be added without touching rollout strategies.
+    Subclasses decide whether inference happens inline on the control
+    thread or asynchronously in a background thread.  The contract is
+    minimal so additional backends can be plugged in without touching
+    rollout strategies.

    Lifecycle
    ---------
@@ -43,8 +44,8 @@ class InferenceEngine(abc.ABC):
    -----------------
    ``get_action(obs_frame)`` — return the next action tensor, or
    ``None`` if none is available (e.g. async queue empty).  Sync
-    backends always compute from ``obs_frame``; async backends may
-    ignore it (they get observations via ``notify_observation``).
+    backends always compute from ``obs_frame``; async backends ignore
+    it (they receive observations via ``notify_observation``).

    Optional hooks
    --------------
@@ -68,9 +68,8 @@ class SyncInferenceConfig(InferenceEngineConfig):
 class RTCInferenceConfig(InferenceEngineConfig):
    """Real-Time Chunking: async policy inference in a background thread."""

-    # ``RTCConfig`` is a small dataclass with default-only fields, so eagerly
-    # constructing one here costs nothing and keeps draccus' CLI surface flat
-    # (``--inference.rtc.execution_horizon=...`` etc.).  No need to lazy-init.
+    # Eagerly constructed so draccus exposes nested fields directly on the CLI
+    # (e.g. ``--inference.rtc.execution_horizon=...``).
    rtc: RTCConfig = field(default_factory=RTCConfig)
    queue_threshold: int = 30

@@ -32,18 +32,14 @@ from typing import Any
 import torch

 from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.policies.rtc import ActionQueue, LatencyTracker
+from lerobot.policies.rtc import ActionQueue, LatencyTracker, reanchor_relative_rtc_prefix
 from lerobot.policies.rtc.configuration_rtc import RTCConfig
 from lerobot.policies.utils import prepare_observation_for_inference
 from lerobot.processor import (
    NormalizerProcessorStep,
    PolicyProcessorPipeline,
    RelativeActionsProcessorStep,
-    TransitionKey,
-    create_transition,
-    to_relative_actions,
 )
-from lerobot.utils.constants import OBS_STATE
 from lerobot.utils.feature_utils import build_dataset_frame

 from ..robot_wrapper import ThreadSafeRobot
@@ -66,35 +62,6 @@ _RTC_JOIN_TIMEOUT_S: float = 3.0
 # ---------------------------------------------------------------------------


-def _reanchor_relative_rtc_prefix(
-    prev_actions_absolute: torch.Tensor,
-    current_state: torch.Tensor,
-    relative_step: RelativeActionsProcessorStep,
-    normalizer_step: NormalizerProcessorStep | None,
-    policy_device: torch.device | str,
-) -> torch.Tensor:
-    """Convert absolute leftover actions into model-space for relative-action RTC policies.
-
-    When using relative actions, the RTC prefix (previous chunk's unexecuted tail)
-    is stored in absolute coordinates. Before feeding it back to the policy, this
-    helper re-expresses those actions relative to the robot's current joint state
-    and optionally normalizes them so the policy receives correctly scaled inputs.
-    """
-    state = current_state.detach().cpu()
-    if state.dim() == 1:
-        state = state.unsqueeze(0)
-
-    action_cpu = prev_actions_absolute.detach().cpu()
-    mask = relative_step._build_mask(action_cpu.shape[-1])
-    relative_actions = to_relative_actions(action_cpu, state, mask)
-
-    transition = create_transition(action=relative_actions)
-    if normalizer_step is not None:
-        transition = normalizer_step(transition)
-
-    return transition[TransitionKey.ACTION].to(policy_device)
-
-
 def _normalize_prev_actions_length(prev_actions: torch.Tensor, target_steps: int) -> torch.Tensor:
    """Pad or truncate RTC prefix actions to a fixed length for stable compiled inference."""
    if prev_actions.ndim != 2:
@@ -109,21 +76,6 @@ def _normalize_prev_actions_length(prev_actions: torch.Tensor, target_steps: int
    return padded


-def _get_current_raw_state(
-    relative_step: RelativeActionsProcessorStep,
-    fallback_state: torch.Tensor | None,
-) -> torch.Tensor | None:
-    """Return the current raw state cached by the relative-action step.
-
-    ``RelativeActionsProcessorStep`` caches the observation state before any
-    observation normalization. Re-anchoring RTC leftovers must use that raw
-    state rather than the normalized observation that the policy consumes.
-    """
-    if relative_step._last_state is not None:
-        return relative_step._last_state
-    return fallback_state
-
-
 # ---------------------------------------------------------------------------
 # RTCInferenceEngine
 # ---------------------------------------------------------------------------
@@ -333,15 +285,15 @@ class RTCInferenceEngine(InferenceEngine):
                        preprocessed = self._preprocessor(obs_batch)

                        if prev_actions is not None and self._relative_step is not None:
-                            state_tensor = _get_current_raw_state(
-                                self._relative_step, obs_batch.get(OBS_STATE)
-                            )
-                            if state_tensor is not None:
+                            # Rebase against the raw cached state so the leftover tail stays in
+                            # the training-time coordinate frame.
+                            raw_state = self._relative_step.get_cached_state()
+                            if raw_state is not None:
                                prev_abs = queue.get_processed_left_over()
                                if prev_abs is not None and prev_abs.numel() > 0:
-                                    prev_actions = _reanchor_relative_rtc_prefix(
+                                    prev_actions = reanchor_relative_rtc_prefix(
                                        prev_actions_absolute=prev_abs,
-                                        current_state=state_tensor,
+                                        current_state=raw_state,
                                        relative_step=self._relative_step,
                                        normalizer_step=self._normalizer_step,
                                        policy_device=policy_device,
@@ -17,28 +17,41 @@
 from __future__ import annotations

 import logging
-from collections import deque
 from contextlib import nullcontext
 from copy import copy

 import torch

 from lerobot.policies.pretrained import PreTrainedPolicy
-from lerobot.policies.utils import prepare_observation_for_inference
-from lerobot.processor import PolicyProcessorPipeline, RelativeActionsProcessorStep
-from lerobot.utils.constants import ACTION
+from lerobot.policies.utils import make_robot_action, prepare_observation_for_inference
+from lerobot.processor import PolicyProcessorPipeline

 from .base import InferenceEngine

 logger = logging.getLogger(__name__)


+# TODO(Steven): support relative-action policies.  The per-tick flow refreshes
+# ``RelativeActionsProcessorStep._last_state`` every call, so cached chunk
+# actions popped on later ticks get reanchored to the *current* robot state and
+# absolute targets drift through the chunk.  Relative-action policies are
+# rejected at context-build time today; RTC postprocesses the whole chunk and
+# is unaffected.
+#
+# Candidate fix: drive the policy via ``predict_action_chunk`` and serve a
+# local FIFO of postprocessed actions.  Eliminates drift by construction and
+# saves per-tick pre/post work, but bypasses ``select_action`` — needs
+# fallbacks for SAC (raises), ACT temporal ensembling (ensembler lives in
+# ``select_action``), and Diffusion-family (obs-history queues populated as a
+# side effect of ``select_action``).
+
+
 class SyncInferenceEngine(InferenceEngine):
    """Inline synchronous inference: compute one action per call.

-    ``get_action`` runs the full policy pipeline when its local action
-    queue is empty, postprocesses the whole predicted chunk immediately,
-    and then returns one already-postprocessed CPU action at a time.
+    ``get_action`` runs the full policy pipeline (pre/post-processor +
+    ``select_action``) on the given observation frame and returns a
+    CPU action tensor reordered to match the dataset action keys.
    """

    def __init__(
@@ -60,19 +73,6 @@ class SyncInferenceEngine(InferenceEngine):
        self._task = task
        self._device = torch.device(device or "cpu")
        self._robot_type = robot_type
-        self._processed_action_queue: deque[torch.Tensor] = deque()
-
-        self._relative_step = next(
-            (s for s in preprocessor.steps if isinstance(s, RelativeActionsProcessorStep) and s.enabled),
-            None,
-        )
-        if self._relative_step is not None and self._relative_step.action_names is None:
-            cfg_names = getattr(policy.config, "action_feature_names", None)
-            action_names = cfg_names or dataset_features.get(ACTION, {}).get("names")
-            if action_names:
-                self._relative_step.action_names = list(action_names)
-            logger.info("Relative actions enabled: sync chunks will be postprocessed before queueing")
-
        logger.info(
            "SyncInferenceEngine initialized (device=%s, action_keys=%d)",
            self._device,
@@ -93,29 +93,9 @@ class SyncInferenceEngine(InferenceEngine):
        self._policy.reset()
        self._preprocessor.reset()
        self._postprocessor.reset()
-        self._processed_action_queue.clear()
-
-    def _enqueue_processed_chunk(self, action_chunk: torch.Tensor) -> None:
-        """Queue postprocessed per-step actions in policy output order."""
-        if action_chunk.ndim == 2:
-            action_chunk = action_chunk.unsqueeze(0)
-
-        n_action_steps = getattr(self._policy.config, "n_action_steps", action_chunk.shape[1])
-        action_chunk = action_chunk[:, : min(n_action_steps, action_chunk.shape[1])]
-
-        for action in action_chunk.squeeze(0):
-            action_tensor = action.detach().cpu()
-            if len(action_tensor) != len(self._ordered_action_keys):
-                raise ValueError(
-                    f"Action tensor length ({len(action_tensor)}) != action keys "
-                    f"({len(self._ordered_action_keys)})"
-                )
-            self._processed_action_queue.append(action_tensor)

    def get_action(self, obs_frame: dict | None) -> torch.Tensor | None:
        """Run the full inference pipeline on ``obs_frame`` and return an action tensor."""
-        if self._processed_action_queue:
-            return self._processed_action_queue.popleft().clone()
        if obs_frame is None:
            return None
        # Shallow copy is intentional: the caller (`send_next_action`) builds
@@ -132,10 +112,11 @@ class SyncInferenceEngine(InferenceEngine):
                observation, self._device, self._task, self._robot_type
            )
            observation = self._preprocessor(observation)
-            action_chunk = self._policy.predict_action_chunk(observation)
-            processed_chunk = self._postprocessor(action_chunk)
+            action = self._policy.select_action(observation)
+            action = self._postprocessor(action)
+        action_tensor = action.squeeze(0).cpu()

-        self._enqueue_processed_chunk(processed_chunk)
-        if not self._processed_action_queue:
-            return None
-        return self._processed_action_queue.popleft().clone()
+        # Reorder to match dataset action ordering so the caller can treat
+        # the returned tensor uniformly across backends.
+        action_dict = make_robot_action(action_tensor, self._dataset_features)
+        return torch.tensor([action_dict[k] for k in self._ordered_action_keys])
@@ -47,7 +47,7 @@ class RolloutRingBuffer:
        count.
    """

-    def __init__(self, max_seconds: float = 30.0, max_memory_mb: float = 2048.0, fps: float = 30.0) -> None:
+    def __init__(self, max_seconds: float = 30.0, max_memory_mb: int = 2048, fps: float = 30.0) -> None:
        self._max_frames = int(max_seconds * fps)
        self._max_bytes = int(max_memory_mb * 1024 * 1024)
        self._buffer: deque[dict] = deque(maxlen=self._max_frames)
@@ -60,8 +60,7 @@ class BaseStrategy(RolloutStrategy):
                break

            obs = robot.get_observation()
-            obs_processed = ctx.processors.robot_observation_processor(obs)
-            engine.notify_observation(obs_processed)
+            obs_processed = self._process_observation_and_notify(ctx.processors, obs)

            if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
                continue
@@ -79,5 +78,8 @@ class BaseStrategy(RolloutStrategy):

    def teardown(self, ctx: RolloutContext) -> None:
        """Disconnect hardware and stop inference."""
-        self._teardown_hardware(ctx.hardware)
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
        logger.info("Base strategy teardown complete")
@@ -32,7 +32,7 @@ from ..inference import InferenceEngine

 if TYPE_CHECKING:
    from ..configs import RolloutStrategyConfig
-    from ..context import HardwareContext, RolloutContext, RuntimeContext
+    from ..context import HardwareContext, ProcessorContext, RolloutContext, RuntimeContext

 logger = logging.getLogger(__name__)

@@ -50,6 +50,7 @@ class RolloutStrategy(abc.ABC):
        self._engine: InferenceEngine | None = None
        self._interpolator: ActionInterpolator | None = None
        self._warmup_flushed: bool = False
+        self._cached_obs_processed: dict | None = None

    def _init_engine(self, ctx: RolloutContext) -> None:
        """Attach the inference engine and action interpolator, then start the backend.
@@ -65,8 +66,32 @@ class RolloutStrategy(abc.ABC):
        self._engine.reset()
        self._engine.start()
        self._warmup_flushed = False
+        self._cached_obs_processed = None
        logger.info("Inference engine started")

+    def _process_observation_and_notify(self, processors: ProcessorContext, obs_raw: dict) -> dict:
+        """Run the observation processor and notify the engine — throttled to policy ticks.
+
+        Callers are responsible for calling ``robot.get_observation()`` every loop
+        iteration so ``obs_raw`` stays fresh for the action post-processor.  This
+        helper gates only the comparatively expensive bits — the processor pipeline
+        and ``engine.notify_observation`` — to fire when the interpolator signals
+        it needs a new action (once per ``interpolation_multiplier`` ticks).  On
+        interpolated ticks the cached ``obs_processed`` is reused.
+
+        With ``interpolation_multiplier == 1`` this is equivalent to the unthrottled
+        path: ``needs_new_action()`` is True every tick.
+
+        The cache is implicitly invalidated whenever ``interpolator.reset()`` is
+        called (warmup completion, DAgger phase transitions back to AUTONOMOUS),
+        because reset makes ``needs_new_action()`` return True on the next call.
+        """
+        if self._cached_obs_processed is None or self._interpolator.needs_new_action():
+            obs_processed = processors.robot_observation_processor(obs_raw)
+            self._engine.notify_observation(obs_processed)
+            self._cached_obs_processed = obs_processed
+        return self._cached_obs_processed
+
    def _handle_warmup(self, use_torch_compile: bool, loop_start: float, control_interval: float) -> bool:
        """Handle torch.compile warmup phase.

@@ -91,16 +116,20 @@ class RolloutStrategy(abc.ABC):
            engine.resume()
        return False

-    def _teardown_hardware(self, hw: HardwareContext) -> None:
-        """Stop the inference engine, return robot to initial position, and disconnect hardware."""
+    def _teardown_hardware(self, hw: HardwareContext, return_to_initial_position: bool = True) -> None:
+        """Stop the inference engine, optionally return robot to initial position, and disconnect hardware."""
        if self._engine is not None:
            logger.info("Stopping inference engine...")
            self._engine.stop()
        robot = hw.robot_wrapper.inner
        if robot.is_connected:
-            if hw.initial_position:
+            if return_to_initial_position and hw.initial_position:
                logger.info("Returning robot to initial position before shutdown...")
                self._return_to_initial_position(hw)
+            elif not return_to_initial_position:
+                logger.info(
+                    "Skipping return-to-initial-position (disabled by config); leaving robot in final pose."
+                )
            logger.info("Disconnecting robot...")
            robot.disconnect()
        teleop = hw.teleop
@@ -194,7 +223,7 @@ def estimate_max_episode_seconds(
    The estimate ignores codec-specific settings (CRF, preset) on purpose:
    we only need a rough lower bound on bitrate, not a precise prediction.

-    Falls back to 600 s (10 min) when no video features are present.
+    Falls back to 300 s (5 min) when no video features are present.
    """
    # 0.1 bits-per-pixel is a *low* estimate for CRF-30 streaming video of
    # robot footage (real-world is typically 0.1 – 0.3 bpp).  Under-
@@ -208,16 +237,16 @@ def estimate_max_episode_seconds(
        if feat.get("dtype") == "video":
            shape = feat.get("shape", ())

-            # Assuming shape could be (C, H, W) or (T, C, H, W)
-            # We want to extract the spatial dimensions.
-            if len(shape) >= 3:
-                h, w = shape[-2], shape[-1]
-                pixels = h * w
-                if pixels > 0:
-                    camera_pixels.append(pixels)
+            # (H, W, C) — bits-per-pixel is a per-spatial-pixel metric,
+            # so we exclude the channel dimension from the count.
+            if len(shape) == 3:
+                pixels = shape[0] * shape[1]
+                camera_pixels.append(pixels)
+            else:
+                raise ValueError(f"Unexpected video feature shape: {shape}")

    if not camera_pixels:
-        return 600.0
+        return 300.0

    # Use the smallest camera: it produces the lowest bitrate and therefore
    # takes the longest to reach the target — the conservative choice.
@@ -227,7 +256,7 @@ def estimate_max_episode_seconds(

    # Guard against division by zero just in case
    if bytes_per_second <= 0:
-        return 600.0
+        return 300.0

    return (target_size_mb * 1024 * 1024) / bytes_per_second

@@ -24,14 +24,21 @@ the ``input_device`` config field.  Each device exposes three actions:
    1. **pause_resume** — Toggle policy execution (AUTONOMOUS <-> PAUSED).
    2. **correction**   — Toggle correction recording (PAUSED <-> CORRECTING).
    3. **upload**        — Push dataset to hub on demand (corrections-only mode).
-    ESC (keyboard only)  — Stop session.
+    ESC (keyboard only) — Stop session.

-Recording Modes:
+Recording modes:
    ``record_autonomous=True``:  Sentry-like continuous recording with
        time-based episode rotation.  Both autonomous and correction
        frames are recorded; corrections tagged ``intervention=True``.
    ``record_autonomous=False``: Only correction windows are recorded.
        Each correction (start to stop) becomes one episode.
+
+Teleoperator expectations:
+    The user is responsible for keeping the leader arm aligned with the
+    follower arm at the moment a correction begins.  Programmatic motor
+    handover (``enable_torque`` / ``disable_torque`` / ``write_goal_positions``)
+    is intentionally not invoked here — see the TODO in
+    :func:`DAggerStrategy._apply_transition` for the open design decision.
 """

 from __future__ import annotations
@@ -168,8 +175,10 @@ class DAggerEvents:
 # ---------------------------------------------------------------------------


-# TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-# user is responsible for moving the teleop to the same position as the robot when starting the correction.
+# TODO(Steven): re-enable programmatic teleop alignment once we decide whether
+# to enforce motor-control methods on every Teleoperator.  Until then the user
+# is responsible for moving the leader arm to the follower's pose at the moment
+# a correction begins.
 def _teleop_smooth_move_to(
    teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50
 ) -> None:
@@ -371,7 +380,10 @@ class DAggerStrategy(RolloutStrategy):
                    logger.info("Dataset uploaded to hub")
                    log_say("Dataset uploaded to hub", play_sounds)

-        self._teardown_hardware(ctx.hardware)
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
        logger.info("DAgger strategy teardown complete")

    # ------------------------------------------------------------------
@@ -403,8 +415,8 @@ class DAggerStrategy(RolloutStrategy):
        engine.reset()
        interpolator.reset()
        events.reset()
-        # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-        # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+        # TODO(Steven): re-enable once Teleoperator motor-control methods are
+        # standardised; until then the user pre-aligns the leader by hand.
        # teleop.disable_torque()
        engine.resume()

@@ -434,19 +446,22 @@ class DAggerStrategy(RolloutStrategy):

                    phase = events.phase
                    obs = robot.get_observation()
-                    obs_processed = ctx.processors.robot_observation_processor(obs)
-                    obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)

                    # --- CORRECTING: human teleop control ---
+                    # TODO(Steven): teleop runs at the same FPS as the policy. To
+                    # decouple the two, sample teleop at its native rate and
+                    # interpolate to the control loop's tick rate.
                    if phase == DAggerPhase.CORRECTING:
+                        obs_processed = ctx.processors.robot_observation_processor(obs)
                        teleop_action = teleop.get_action()
                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
                        robot.send_action(robot_action_to_send)
                        last_action = robot_action_to_send
                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)
-                        action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
                        if record_tick % record_stride == 0:
+                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
                            frame = {
                                **obs_frame,
                                **action_frame,
@@ -463,7 +478,7 @@ class DAggerStrategy(RolloutStrategy):

                    # --- AUTONOMOUS: policy control ---
                    else:
-                        engine.notify_observation(obs_processed)
+                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)

                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
                            continue
@@ -472,8 +487,9 @@ class DAggerStrategy(RolloutStrategy):
                        if action_dict is not None:
                            self._log_telemetry(obs_processed, action_dict, ctx.runtime)
                            last_action = ctx.processors.robot_action_processor((action_dict, obs))
-                            action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
                            if record_tick % record_stride == 0:
+                                obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                                action_frame = build_dataset_frame(features, action_dict, prefix=ACTION)
                                frame = {
                                    **obs_frame,
                                    **action_frame,
@@ -483,9 +499,9 @@ class DAggerStrategy(RolloutStrategy):
                                dataset.add_frame(frame)
                            record_tick += 1

-                    # Episode rotation derived from video file-size target.
-                    # Do NOT save mid-correction — wait for the correction
-                    # to finish so the episode boundary is clean.
+                    # Episode rotation derived from the video file-size target.
+                    # Saving is deferred while a correction is ongoing so the
+                    # episode boundary lands on a clean autonomous frame.
                    elapsed = time.perf_counter() - episode_start
                    if elapsed >= episode_duration_s and phase != DAggerPhase.CORRECTING:
                        with self._episode_lock:
@@ -516,8 +532,8 @@ class DAggerStrategy(RolloutStrategy):
            finally:
                logger.info("DAgger continuous control loop ended — pausing engine")
                engine.pause()
-                # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-                # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+                # TODO(Steven): re-enable once Teleoperator motor-control methods
+                # are standardised across all teleop implementations.
                # teleop.disable_torque()
                with contextlib.suppress(Exception):
                    with self._episode_lock:
@@ -554,8 +570,8 @@ class DAggerStrategy(RolloutStrategy):
        engine.reset()
        interpolator.reset()
        events.reset()
-        # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-        # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+        # TODO(Steven): re-enable once Teleoperator motor-control methods are
+        # standardised; until then the user pre-aligns the leader by hand.
        # teleop.disable_torque()
        engine.resume()

@@ -608,10 +624,13 @@ class DAggerStrategy(RolloutStrategy):

                    phase = events.phase
                    obs = robot.get_observation()
-                    obs_processed = ctx.processors.robot_observation_processor(obs)

                    # --- CORRECTING: human teleop control + recording ---
+                    # TODO(Steven): teleop runs at the same FPS as the policy. To
+                    # decouple the two, sample teleop at its native rate and
+                    # interpolate to the control loop's tick rate.
                    if phase == DAggerPhase.CORRECTING:
+                        obs_processed = ctx.processors.robot_observation_processor(obs)
                        teleop_action = teleop.get_action()
                        processed_teleop = ctx.processors.teleop_action_processor((teleop_action, obs))
                        robot_action_to_send = ctx.processors.robot_action_processor((processed_teleop, obs))
@@ -619,9 +638,9 @@ class DAggerStrategy(RolloutStrategy):
                        last_action = robot_action_to_send
                        self._log_telemetry(obs_processed, processed_teleop, ctx.runtime)

-                        obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
-                        action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
                        if record_tick % record_stride == 0:
+                            obs_frame = build_dataset_frame(features, obs_processed, prefix=OBS_STR)
+                            action_frame = build_dataset_frame(features, processed_teleop, prefix=ACTION)
                            dataset.add_frame(
                                {
                                    **obs_frame,
@@ -639,7 +658,7 @@ class DAggerStrategy(RolloutStrategy):

                    # --- AUTONOMOUS: policy control (no recording) ---
                    else:
-                        engine.notify_observation(obs_processed)
+                        obs_processed = self._process_observation_and_notify(ctx.processors, obs)

                        if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
                            continue
@@ -660,8 +679,8 @@ class DAggerStrategy(RolloutStrategy):
            finally:
                logger.info("DAgger corrections-only loop ended — pausing engine")
                engine.pause()
-                # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-                # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+                # TODO(Steven): re-enable once Teleoperator motor-control methods
+                # are standardised across all teleop implementations.
                # teleop.disable_torque()
                with contextlib.suppress(Exception):
                    with self._episode_lock:
@@ -691,15 +710,16 @@ class DAggerStrategy(RolloutStrategy):
            _robot_pos = {
                k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features
            }
-            # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-            # user is responsible for moving the teleop to the same position as the robot when starting the correction.
-            # Consider also a method that moves the robot to the teleop smoothly (similar to what we do at HW shutdown).
-            # _teleop_smooth_move_to(teleop, robot_pos, duration_s=2.0, fps=50)
+            # TODO(Steven): once Teleoperator motor-control methods are
+            # standardised, drive the leader to the follower's pose here so the
+            # operator does not need to pre-align the arm by hand.  Until then
+            # the user is responsible for the alignment.
+            # _teleop_smooth_move_to(teleop, _robot_pos, duration_s=2.0, fps=50)

        elif new_phase == DAggerPhase.CORRECTING:
            logger.info("Entering correction mode — human teleop control")
-            # TODO(Steven): either enforce this (meaning all teleop must implement these methods) or
-            # user is responsible for moving the teleop to the same position as the robot when starting the correction.
+            # TODO(Steven): re-enable once Teleoperator motor-control methods
+            # are standardised across all teleop implementations.
            # teleop.disable_torque()

        elif new_phase == DAggerPhase.AUTONOMOUS:
@@ -64,8 +64,8 @@ class HighlightStrategy(RolloutStrategy):
    3. The episode is saved and the ring buffer resumes capturing.

    Requires ``streaming_encoding=True`` (enforced in config validation)
-    so that ``dataset.add_frame`` is a non-blocking queue put — draining
-    900 frames stays sub-ms per frame.
+    so that ``dataset.add_frame`` is a non-blocking queue put — flushing
+    the entire ring buffer in one tick must not stall the control loop.
    """

    config: HighlightStrategyConfig
@@ -135,8 +135,7 @@ class HighlightStrategy(RolloutStrategy):
                        break

                    obs = robot.get_observation()
-                    obs_processed = ctx.processors.robot_observation_processor(obs)
-                    engine.notify_observation(obs_processed)
+                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)

                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
                        continue
@@ -228,7 +227,10 @@ class HighlightStrategy(RolloutStrategy):
                    logger.info("Dataset uploaded to hub")
                    log_say("Dataset uploaded to hub", play_sounds)

-        self._teardown_hardware(ctx.hardware)
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
        logger.info("Highlight strategy teardown complete")

    def _setup_keyboard(self, shutdown_event: ThreadingEvent) -> None:
@@ -111,8 +111,7 @@ class SentryStrategy(RolloutStrategy):
                        break

                    obs = robot.get_observation()
-                    obs_processed = ctx.processors.robot_observation_processor(obs)
-                    engine.notify_observation(obs_processed)
+                    obs_processed = self._process_observation_and_notify(ctx.processors, obs)

                    if self._handle_warmup(cfg.use_torch_compile, loop_start, control_interval):
                        continue
@@ -197,7 +196,10 @@ class SentryStrategy(RolloutStrategy):
                    logger.info("Dataset uploaded to hub")
                    log_say("Dataset uploaded to hub", play_sounds)

-        self._teardown_hardware(ctx.hardware)
+        self._teardown_hardware(
+            ctx.hardware,
+            return_to_initial_position=ctx.runtime.cfg.return_to_initial_position,
+        )
        logger.info("Sentry strategy teardown complete")

    def _background_push(self, dataset, cfg) -> None:
@@ -70,6 +70,7 @@ from lerobot.datasets.io_utils import (
    get_parquet_file_size_in_mb,
    get_parquet_num_frames,
    load_info,
+    load_json,
    write_episodes,
    write_info,
    write_stats,
@@ -81,9 +82,11 @@ from lerobot.datasets.utils import (
    DEFAULT_DATA_PATH,
    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_PATH,
+    INFO_PATH,
    LEGACY_EPISODES_PATH,
    LEGACY_EPISODES_STATS_PATH,
    LEGACY_TASKS_PATH,
+    DatasetInfo,
    update_chunk_file_indices,
 )
 from lerobot.datasets.video_utils import concatenate_video_files, get_video_duration_in_s
@@ -165,7 +168,7 @@ def legacy_load_tasks(local_dir: Path) -> tuple[dict, dict]:
 def validate_local_dataset_version(local_path: Path) -> None:
    """Validate that the local dataset has the expected v2.1 version."""
    info = load_info(local_path)
-    dataset_version = info.get("codebase_version", "unknown")
+    dataset_version = info.codebase_version or "unknown"
    if dataset_version != V21:
        raise ValueError(
            f"Local dataset has codebase version '{dataset_version}', expected '{V21}'. "
@@ -256,14 +259,14 @@ def convert_data(root: Path, new_root: Path, data_file_size_in_mb: int):

 def get_video_keys(root):
    info = load_info(root)
-    features = info["features"]
+    features = info.features
    video_keys = [key for key, ft in features.items() if ft["dtype"] == "video"]
    return video_keys


 def get_image_keys(root):
    info = load_info(root)
-    features = info["features"]
+    features = info.features
    image_keys = [key for key, ft in features.items() if ft["dtype"] == "image"]
    return image_keys

@@ -434,7 +437,8 @@ def convert_episodes_metadata(root, new_root, episodes_metadata, episodes_video_


 def convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb):
-    info = load_info(root)
+    # Load as raw dict to remove legacy v2.1 fields before constructing DatasetInfo.
+    info = load_json(root / INFO_PATH)
    info["codebase_version"] = V30
    del info["total_chunks"]
    del info["total_videos"]
@@ -449,7 +453,9 @@ def convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb):
            # already has fps in video_info
            continue
        info["features"][key]["fps"] = info["fps"]
-    write_info(info, new_root)
+    # Convert raw dict to typed DatasetInfo before writing
+    dataset_info = DatasetInfo.from_dict(info)
+    write_info(dataset_info, new_root)


 def convert_dataset(
@@ -49,6 +49,14 @@ Delete episodes and save to a new dataset at a specific path and with a new repo
        --operation.type delete_episodes \
        --operation.episode_indices "[0, 2, 5]"

+Delete episodes and re-encode video segments with h264:
+    lerobot-edit-dataset \
+        --repo_id lerobot/pusht \
+        --operation.type delete_episodes \
+        --operation.episode_indices "[0, 2, 5]" \
+        --operation.camera_encoder_config.vcodec h264 \
+        --operation.camera_encoder_config.crf 23
+
 Split dataset by fractions (pusht_train, pusht_val):
    lerobot-edit-dataset \
        --repo_id lerobot/pusht \
@@ -74,6 +82,14 @@ Split into more than two splits:
        --operation.type split \
        --operation.splits '{"train": 0.6, "val": 0.2, "test": 0.2}'

+Split dataset and re-encode video segments with h264:
+    lerobot-edit-dataset \
+        --repo_id lerobot/pusht \
+        --operation.type split \
+        --operation.splits '{"train": 0.8, "val": 0.2}' \
+        --operation.camera_encoder_config.vcodec h264 \
+        --operation.camera_encoder_config.crf 23
+
 Merge multiple datasets:
    lerobot-edit-dataset \
        --new_repo_id lerobot/pusht_merged \
@@ -150,11 +166,24 @@ Show dataset information without feature details:
        --operation.type info \
        --operation.show_features false

-Recompute dataset statistics:
+Recompute dataset statistics (saves to lerobot/pusht_recomputed_stats by default):
    lerobot-edit-dataset \
        --repo_id lerobot/pusht \
        --operation.type recompute_stats

+Recompute stats and save to a specific new repo_id:
+    lerobot-edit-dataset \
+        --repo_id lerobot/pusht \
+        --new_repo_id lerobot/pusht_new_stats \
+        --operation.type recompute_stats
+
+Recompute stats in-place (overwrites original dataset stats):
+    lerobot-edit-dataset \
+        --repo_id lerobot/pusht \
+        --new_repo_id lerobot/pusht \
+        --operation.type recompute_stats \
+        --operation.overwrite true
+
 Recompute stats for relative actions and push to hub:
    lerobot-edit-dataset \
        --repo_id lerobot/pusht \
@@ -174,7 +203,7 @@ import abc
 import logging
 import shutil
 import sys
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path

 import draccus
@@ -182,6 +211,8 @@ import draccus
 from lerobot.configs import parser
 from lerobot.datasets import (
    LeRobotDataset,
+    VideoEncoderConfig,
+    camera_encoder_defaults,
    convert_image_to_video_dataset,
    delete_episodes,
    merge_datasets,
@@ -205,12 +236,14 @@ class OperationConfig(draccus.ChoiceRegistry, abc.ABC):
@dataclass
 class DeleteEpisodesConfig(OperationConfig):
    episode_indices: list[int] | None = None
+    camera_encoder_config: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)


@OperationConfig.register_subclass("split")
@dataclass
 class SplitConfig(OperationConfig):
    splits: dict[str, float | list[int]] | None = None
+    camera_encoder_config: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)


@OperationConfig.register_subclass("merge")
@@ -237,11 +270,7 @@ class ModifyTasksConfig(OperationConfig):
@dataclass
 class ConvertImageToVideoConfig(OperationConfig):
    output_dir: str | None = None
-    vcodec: str = "libsvtav1"
-    pix_fmt: str = "yuv420p"
-    g: int = 2
-    crf: int = 30
-    fast_decode: int = 0
+    camera_encoder_config: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)
    episode_indices: list[int] | None = None
    num_workers: int = 4
    max_episodes_per_batch: int | None = None
@@ -256,6 +285,7 @@ class RecomputeStatsConfig(OperationConfig):
    relative_exclude_joints: list[str] | None = None
    chunk_size: int = 50
    num_workers: int = 0
+    overwrite: bool = False


@OperationConfig.register_subclass("info")
@@ -280,16 +310,30 @@ class EditDatasetConfig:
    push_to_hub: bool = False


+def _resolve_io_paths(
+    repo_id: str,
+    new_repo_id: str | None,
+    root: Path | str | None,
+    new_root: Path | str | None,
+    default_new_repo_id: str | None = None,
+) -> tuple[str, Path, Path]:
+    """Resolve input/output paths and repo_id for dataset operations.
+
+    Returns (output_repo_id, input_path, output_path) with resolved (symlink-safe) paths.
+    """
+    input_path = (Path(root) if root else HF_LEROBOT_HOME / repo_id).resolve()
+    output_repo_id = new_repo_id or default_new_repo_id or repo_id
+    output_path = (Path(new_root) if new_root else HF_LEROBOT_HOME / output_repo_id).resolve()
+    return output_repo_id, input_path, output_path
+
+
 def get_output_path(
    repo_id: str,
    new_repo_id: str | None,
    root: Path | str | None,
    new_root: Path | str | None,
 ) -> tuple[str, Path]:
-    input_path = Path(root) if root else HF_LEROBOT_HOME / repo_id
-
-    output_repo_id = new_repo_id if new_repo_id else repo_id
-    output_path = Path(new_root) if new_root else HF_LEROBOT_HOME / output_repo_id
+    output_repo_id, input_path, output_path = _resolve_io_paths(repo_id, new_repo_id, root, new_root)

    # In case of in-place modification, create a backup of the original dataset (if it exists)
    if output_path == input_path:
@@ -328,6 +372,7 @@ def handle_delete_episodes(cfg: EditDatasetConfig) -> None:
        episode_indices=cfg.operation.episode_indices,
        output_dir=output_dir,
        repo_id=output_repo_id,
+        camera_encoder_config=cfg.operation.camera_encoder_config,
    )

    logging.info(f"Dataset saved to {output_dir}")
@@ -359,6 +404,7 @@ def handle_split(cfg: EditDatasetConfig) -> None:
        dataset,
        splits=cfg.operation.splits,
        output_dir=cfg.new_root,
+        camera_encoder_config=cfg.operation.camera_encoder_config,
    )

    for split_name, split_ds in split_datasets.items():
@@ -529,11 +575,8 @@ def handle_convert_image_to_video(cfg: EditDatasetConfig) -> None:
        dataset=dataset,
        output_dir=output_dir,
        repo_id=output_repo_id,
-        vcodec=getattr(cfg.operation, "vcodec", "libsvtav1"),
-        pix_fmt=getattr(cfg.operation, "pix_fmt", "yuv420p"),
-        g=getattr(cfg.operation, "g", 2),
-        crf=getattr(cfg.operation, "crf", 30),
-        fast_decode=getattr(cfg.operation, "fast_decode", 0),
+        camera_encoder_config=getattr(cfg.operation, "camera_encoder_config", None)
+        or camera_encoder_defaults(),
        episode_indices=getattr(cfg.operation, "episode_indices", None),
        num_workers=getattr(cfg.operation, "num_workers", 4),
        max_episodes_per_batch=getattr(cfg.operation, "max_episodes_per_batch", None),
@@ -557,7 +600,39 @@ def handle_recompute_stats(cfg: EditDatasetConfig) -> None:
    if not isinstance(cfg.operation, RecomputeStatsConfig):
        raise ValueError("Operation config must be RecomputeStatsConfig")

-    dataset = LeRobotDataset(cfg.repo_id, root=cfg.root)
+    # Determine whether this is an in-place operation
+    output_repo_id, input_root, output_root = _resolve_io_paths(
+        cfg.repo_id,
+        cfg.new_repo_id,
+        cfg.root,
+        cfg.new_root,
+        default_new_repo_id=f"{cfg.repo_id}_recomputed_stats",
+    )
+    in_place = output_root == input_root
+
+    if in_place and not cfg.operation.overwrite:
+        raise ValueError(
+            f"recompute_stats would overwrite the dataset in-place at {input_root}. "
+            "Pass --operation.overwrite true to allow in-place modification, "
+            "or use --new_repo_id / --new_root to write to a different location. "
+            f"Default output repo_id when neither is set: '{cfg.repo_id}_recomputed_stats'."
+        )
+
+    if in_place:
+        logging.warning(
+            f"Overwriting dataset stats in-place at {input_root}. The original stats will be lost."
+        )
+        dataset = LeRobotDataset(cfg.repo_id, root=input_root)
+    else:
+        logging.info(f"Copying dataset from {input_root} to {output_root}")
+        if output_root.exists():
+            backup_path = output_root.with_name(output_root.name + "_old")
+            logging.warning(f"Output directory {output_root} already exists. Moving to {backup_path}")
+            if backup_path.exists():
+                shutil.rmtree(backup_path)
+            shutil.move(output_root, backup_path)
+        shutil.copytree(input_root, output_root)
+        dataset = LeRobotDataset(output_repo_id, root=output_root)

    logging.info(f"Recomputing stats for {cfg.repo_id}")
    if cfg.operation.relative_action:
@@ -578,7 +653,7 @@ def handle_recompute_stats(cfg: EditDatasetConfig) -> None:
    logging.info(f"Stats written to {dataset.root}")

    if cfg.push_to_hub:
-        logging.info(f"Pushing to hub as {dataset.meta.repo_id}...")
+        logging.info(f"Pushing to hub as {dataset.repo_id}...")
        dataset.push_to_hub()


@@ -63,6 +63,27 @@ lerobot-record \\
  --dataset.streaming_encoding=true \\
  --dataset.encoder_threads=2
 ```
+
+Example recording with custom video encoding parameters:
+```shell
+lerobot-record \\
+    --robot.type=so100_follower \\
+    --robot.port=/dev/tty.usbmodem58760431541 \\
+    --robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \\
+    --robot.id=black \\
+    --teleop.type=so100_leader \\
+    --teleop.port=/dev/tty.usbmodem58760431551 \\
+    --teleop.id=blue \\
+    --dataset.repo_id=<my_username>/<my_dataset_name> \\
+    --dataset.num_episodes=2 \\
+    --dataset.single_task="Grab the cube" \\
+    --dataset.streaming_encoding=true \\
+    --dataset.encoder_threads=2 \\
+    --dataset.camera_encoder_config.vcodec=h264 \\
+    --dataset.camera_encoder_config.preset=fast \\
+    --dataset.camera_encoder_config.extra_options={"tune": "film", "profile:v": "high", "bf": 2} \\
+    --display_data=true
+```
 """

 import logging
@@ -83,10 +104,12 @@ from lerobot.common.control_utils import (
 from lerobot.configs import parser
 from lerobot.configs.dataset import DatasetRecordConfig
 from lerobot.datasets import (
+    DepthEncoderConfig,
    LeRobotDataset,
    VideoEncodingManager,
    aggregate_pipeline_dataset_features,
    create_initial_features,
+    depth_encoder_defaults,
    safe_stop_image_writer,
 )
 from lerobot.processor import (
@@ -305,7 +328,10 @@ def record_loop(

        if display_data:
            log_rerun_data(
-                observation=obs_processed, action=action_values, compress_images=display_compressed_images
+                observation=obs_processed,
+                action=action_values,
+                compress_images=display_compressed_images,
+                features=dataset.features if dataset is not None else None,
            )

        dt_s = time.perf_counter() - start_loop_t
@@ -377,10 +403,11 @@ def record(
                cfg.dataset.repo_id,
                root=cfg.dataset.root,
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
+                camera_encoder_config=cfg.dataset.camera_encoder_config,
+                depth_encoder_config=cfg.dataset.depth_encoder_config,
+                encoder_threads=cfg.dataset.encoder_threads,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
                image_writer_processes=cfg.dataset.num_image_writer_processes if num_cameras > 0 else 0,
                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera * num_cameras
                if num_cameras > 0
@@ -389,7 +416,8 @@ def record(
            sanity_check_dataset_robot_compatibility(dataset, robot, cfg.dataset.fps, dataset_features)
        else:
            # Reject eval_ prefix — for policy evaluation use lerobot-rollout
-            if cfg.dataset.repo_id.startswith("eval_"):
+            repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
+            if repo_name.startswith("eval_"):
                raise ValueError(
                    "Dataset names starting with 'eval_' are reserved for policy evaluation. "
                    "lerobot-record is for data collection only. Use lerobot-rollout for policy deployment."
@@ -405,10 +433,11 @@ def record(
                image_writer_processes=cfg.dataset.num_image_writer_processes,
                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera * len(robot.cameras),
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
-                vcodec=cfg.dataset.vcodec,
+                camera_encoder_config=cfg.dataset.camera_encoder_config,
+                depth_encoder_config=cfg.dataset.depth_encoder_config,
+                encoder_threads=cfg.dataset.encoder_threads,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
-                encoder_threads=cfg.dataset.encoder_threads,
            )

        robot.connect()
@@ -419,7 +448,7 @@ def record(

        if not cfg.dataset.streaming_encoding:
            logging.info(
-                "Streaming encoding is disabled. If you have capable hardware, consider enabling it for way faster episode saving. --dataset.streaming_encoding=true --dataset.encoder_threads=2 # --dataset.vcodec=auto. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding"
+                "Streaming encoding is disabled. If you have capable hardware, consider enabling it for way faster episode saving. --dataset.streaming_encoding=true --dataset.encoder_threads=2 # --dataset.camera_encoder_config.vcodec=auto. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding"
            )

        with VideoEncodingManager(dataset):
@@ -47,6 +47,7 @@ from lerobot.datasets import EpisodeAwareSampler, make_dataset
 from lerobot.envs import close_envs, make_env, make_env_pre_post_processors
 from lerobot.optim.factory import make_optimizer_and_scheduler
 from lerobot.policies import PreTrainedPolicy, make_policy, make_pre_post_processors
+from lerobot.rewards import make_reward_pre_post_processors
 from lerobot.utils.import_utils import register_third_party_plugins
 from lerobot.utils.logging_utils import AverageMeter, MetricsTracker
 from lerobot.utils.random_utils import set_seed
@@ -70,8 +71,8 @@ def update_policy(
    accelerator: "Accelerator",
    lr_scheduler=None,
    lock=None,
-    rabc_weights_provider=None,
-) -> tuple[MetricsTracker, dict]:
+    sample_weighter=None,
+) -> tuple[MetricsTracker, dict | None]:
    """
    Performs a single training step to update the policy's weights.

@@ -87,7 +88,7 @@ def update_policy(
        accelerator: The Accelerator instance for distributed training and mixed precision.
        lr_scheduler: An optional learning rate scheduler.
        lock: An optional lock for thread-safe optimizer updates.
-        rabc_weights_provider: Optional RABCWeights instance for sample weighting.
+        sample_weighter: Optional SampleWeighter instance for per-sample loss weighting.

    Returns:
        A tuple containing:
@@ -97,27 +98,31 @@ def update_policy(
    start_time = time.perf_counter()
    policy.train()

-    # Get RA-BC weights if enabled
-    rabc_batch_weights = None
-    rabc_batch_stats = None
-    if rabc_weights_provider is not None:
-        rabc_batch_weights, rabc_batch_stats = rabc_weights_provider.compute_batch_weights(batch)
+    # Compute sample weights if a weighter is provided
+    sample_weights = None
+    weight_stats = None
+    if sample_weighter is not None:
+        sample_weights, weight_stats = sample_weighter.compute_batch_weights(batch)

    # Let accelerator handle mixed precision
    with accelerator.autocast():
-        # Use per-sample loss when RA-BC is enabled for proper weighting
-        if rabc_batch_weights is not None:
-            # Get per-sample losses
+        if sample_weights is not None:
+            # Use per-sample loss for weighted training
+            # Note: Policies supporting sample weighting must implement forward(batch, reduction="none")
            per_sample_loss, output_dict = policy.forward(batch, reduction="none")

-            # Apply RA-BC weights: L_RA-BC = Σ(w_i * l_i) / (Σw_i + ε)
-            # rabc_batch_weights is already normalized to sum to batch_size
+            # Weighted loss: each sample's contribution is scaled by its weight.
+            # We divide by weight sum (not batch size) so that if some weights are zero,
+            # the remaining samples contribute proportionally more, preserving gradient scale.
+            # Weights are pre-normalized to sum to batch_size for stable training dynamics.
            epsilon = 1e-6
-            loss = (per_sample_loss * rabc_batch_weights).sum() / (rabc_batch_weights.sum() + epsilon)
-            # Log raw mean weight (before normalization) - this is the meaningful metric
-            output_dict["rabc_mean_weight"] = rabc_batch_stats["raw_mean_weight"]
-            output_dict["rabc_num_zero_weight"] = rabc_batch_stats["num_zero_weight"]
-            output_dict["rabc_num_full_weight"] = rabc_batch_stats["num_full_weight"]
+            loss = (per_sample_loss * sample_weights).sum() / (sample_weights.sum() + epsilon)
+
+            # Log weighting statistics
+            if output_dict is None:
+                output_dict = {}
+            for key, value in weight_stats.items():
+                output_dict[f"sample_weight_{key}"] = value
        else:
            loss, output_dict = policy.forward(batch)

@@ -188,8 +193,8 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):

        ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
        # Accelerate auto-detects the device based on the available hardware and ignores the policy.device setting.
-        # Force the device to be CPU when policy.device is set to CPU.
-        force_cpu = cfg.policy.device == "cpu"
+        # Force the device to be CPU when the active config's device is set to CPU (works for both policy and reward model training).
+        force_cpu = cfg.trainable_config.device == "cpu"
        accelerator = Accelerator(
            step_scheduler_with_optimizer=False,
            kwargs_handlers=[ddp_kwargs],
@@ -245,26 +250,44 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
        logging.info("Creating env")
        eval_env = make_env(cfg.env, n_envs=cfg.eval.batch_size, use_async_envs=cfg.eval.use_async_envs)

-    if is_main_process:
-        logging.info("Creating policy")
-    policy = make_policy(
-        cfg=cfg.policy,
-        ds_meta=dataset.meta,
-        rename_map=cfg.rename_map,
-    )
+    if cfg.is_reward_model_training:
+        if is_main_process:
+            logging.info("Creating reward model")
+        from lerobot.rewards import make_reward_model
+
+        policy = make_reward_model(
+            cfg=cfg.reward_model,
+            dataset_stats=dataset.meta.stats,
+            dataset_meta=dataset.meta,
+        )
+        if not policy.is_trainable:
+            raise ValueError(
+                f"Reward model '{policy.name}' is zero-shot and cannot be trained via lerobot-train. "
+                "Use it directly for inference via compute_reward() (e.g. offline precompute)."
+            )
+    else:
+        if is_main_process:
+            logging.info("Creating policy")
+        policy = make_policy(
+            cfg=cfg.policy,
+            ds_meta=dataset.meta,
+            rename_map=cfg.rename_map,
+        )

    if cfg.peft is not None:
+        if cfg.is_reward_model_training:
+            raise ValueError("PEFT is only supported for policy training. ")
        logging.info("Using PEFT! Wrapping model.")
-        # Convert CLI peft config to dict for overrides
        peft_cli_overrides = dataclasses.asdict(cfg.peft)
        policy = policy.wrap_with_peft(peft_cli_overrides=peft_cli_overrides)

-    # Wait for all processes to finish policy creation before continuing
+    # Wait for all processes to finish model creation before continuing
    accelerator.wait_for_everyone()

-    processor_pretrained_path = cfg.policy.pretrained_path
+    active_cfg = cfg.trainable_config
+    processor_pretrained_path = active_cfg.pretrained_path
    if (
-        getattr(cfg.policy, "use_relative_actions", False)
+        getattr(active_cfg, "use_relative_actions", False)
        and processor_pretrained_path is not None
        and not cfg.resume
    ):
@@ -274,18 +297,15 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
        )
        processor_pretrained_path = None

-    # Create processors - only provide dataset_stats if not resuming from saved processors
    processor_kwargs = {}
    postprocessor_kwargs = {}
    if (processor_pretrained_path and not cfg.resume) or not processor_pretrained_path:
-        # Only provide dataset_stats when not resuming from saved processor state
        processor_kwargs["dataset_stats"] = dataset.meta.stats

-    # For SARM, always provide dataset_meta for progress normalization
-    if cfg.policy.type == "sarm":
+    if cfg.is_reward_model_training:
        processor_kwargs["dataset_meta"] = dataset.meta

-    if processor_pretrained_path is not None:
+    if not cfg.is_reward_model_training and processor_pretrained_path is not None:
        processor_kwargs["preprocessor_overrides"] = {
            "device_processor": {"device": device.type},
            "normalizer_processor": {
@@ -305,38 +325,36 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
            },
        }

-    preprocessor, postprocessor = make_pre_post_processors(
-        policy_cfg=cfg.policy,
-        pretrained_path=processor_pretrained_path,
-        **processor_kwargs,
-        **postprocessor_kwargs,
-    )
+    if cfg.is_reward_model_training:
+        preprocessor, postprocessor = make_reward_pre_post_processors(
+            cfg.reward_model,
+            **processor_kwargs,
+        )
+    else:
+        preprocessor, postprocessor = make_pre_post_processors(
+            policy_cfg=cfg.policy,
+            pretrained_path=processor_pretrained_path,
+            **processor_kwargs,
+            **postprocessor_kwargs,
+        )

    if is_main_process:
        logging.info("Creating optimizer and scheduler")
    optimizer, lr_scheduler = make_optimizer_and_scheduler(cfg, policy)

-    # Load precomputed SARM progress for RA-BC if enabled
-    # Generate progress using: src/lerobot/policies/sarm/compute_rabc_weights.py
-    rabc_weights = None
-    if cfg.use_rabc:
-        from lerobot.utils.rabc import RABCWeights
+    # Create sample weighter if configured (e.g., for RA-BC training)
+    sample_weighter = None
+    if cfg.sample_weighting is not None:
+        from lerobot.utils.sample_weighting import make_sample_weighter

-        # Get chunk_size from policy config
-        chunk_size = getattr(policy.config, "chunk_size", None)
-        if chunk_size is None:
-            raise ValueError("Chunk size is not found in policy config")
-
-        head_mode = getattr(cfg, "rabc_head_mode", "sparse")
-        logging.info(f"Loading SARM progress for RA-BC from {cfg.rabc_progress_path}")
-        logging.info(f"Using chunk_size={chunk_size} from policy config, head_mode={head_mode}")
-        rabc_weights = RABCWeights(
-            progress_path=cfg.rabc_progress_path,
-            chunk_size=chunk_size,
-            head_mode=head_mode,
-            kappa=getattr(cfg, "rabc_kappa", 0.01),
-            epsilon=getattr(cfg, "rabc_epsilon", 1e-6),
-            device=device,
+        if is_main_process:
+            logging.info(f"Creating sample weighter: {cfg.sample_weighting.type}")
+        sample_weighter = make_sample_weighter(
+            cfg.sample_weighting,
+            policy,
+            device,
+            dataset_root=cfg.dataset.root,
+            dataset_repo_id=cfg.dataset.repo_id,
        )

    step = 0  # number of policy updates (forward + backward + optim)
@@ -365,13 +383,13 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
        logging.info(f"{num_total_params=} ({format_big_number(num_total_params)})")

    # create dataloader for offline training
-    if hasattr(cfg.policy, "drop_n_last_frames"):
+    if hasattr(active_cfg, "drop_n_last_frames"):
        shuffle = False
        sampler = EpisodeAwareSampler(
            dataset.meta.episodes["dataset_from_index"],
            dataset.meta.episodes["dataset_to_index"],
            episode_indices_to_use=dataset.episodes,
-            drop_n_last_frames=cfg.policy.drop_n_last_frames,
+            drop_n_last_frames=active_cfg.drop_n_last_frames,
            shuffle=True,
        )
    else:
@@ -448,7 +466,7 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
            cfg.optimizer.grad_clip_norm,
            accelerator=accelerator,
            lr_scheduler=lr_scheduler,
-            rabc_weights_provider=rabc_weights,
+            sample_weighter=sample_weighter,
        )

        # Note: eval and checkpoint happens *after* the `step`th training update has completed, so we
@@ -467,16 +485,10 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
                wandb_log_dict = train_tracker.to_dict()
                if output_dict:
                    wandb_log_dict.update(output_dict)
-                # Log RA-BC statistics if enabled
-                if rabc_weights is not None:
-                    rabc_stats = rabc_weights.get_stats()
-                    wandb_log_dict.update(
-                        {
-                            "rabc_delta_mean": rabc_stats["delta_mean"],
-                            "rabc_delta_std": rabc_stats["delta_std"],
-                            "rabc_num_frames": rabc_stats["num_frames"],
-                        }
-                    )
+                # Log sample weighting statistics if enabled
+                if sample_weighter is not None:
+                    weighter_stats = sample_weighter.get_stats()
+                    wandb_log_dict.update({f"sample_weighting/{k}": v for k, v in weighter_stats.items()})
                wandb_logger.log_dict(wandb_log_dict, step)
            train_tracker.reset_averages()

@@ -558,14 +570,15 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
    if is_main_process:
        logging.info("End of training")

-        if cfg.policy.push_to_hub:
-            unwrapped_policy = accelerator.unwrap_model(policy)
-            if cfg.policy.use_peft:
-                unwrapped_policy.push_model_to_hub(cfg, peft_model=unwrapped_policy)
+        if getattr(active_cfg, "push_to_hub", False):
+            unwrapped_model = accelerator.unwrap_model(policy)
+            # PEFT only applies when training a policy — reward models use the plain path.
+            if not cfg.is_reward_model_training and cfg.policy.use_peft:
+                unwrapped_model.push_model_to_hub(cfg, peft_model=unwrapped_model)
            else:
-                unwrapped_policy.push_model_to_hub(cfg)
-            preprocessor.push_to_hub(cfg.policy.repo_id)
-            postprocessor.push_to_hub(cfg.policy.repo_id)
+                unwrapped_model.push_model_to_hub(cfg)
+            preprocessor.push_to_hub(active_cfg.repo_id)
+            postprocessor.push_to_hub(active_cfg.repo_id)

    # Properly clean up the distributed process group
    accelerator.wait_for_everyone()
@@ -0,0 +1,55 @@
+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+# prettier-ignore
+{{card_data}}
+---
+
+# Reward Model Card for {{ model_name | default("Reward Model ID", true) }}
+
+<!-- Provide a quick summary of what the reward model is/does. -->
+
+{% if model_name == "reward_classifier" %}
+A reward classifier is a lightweight neural network that scores observations or trajectories for task success, providing a learned reward signal or offline evaluation when explicit rewards are unavailable.
+{% elif model_name == "sarm" %}
+A Success-Aware Reward Model (SARM) predicts a dense reward signal from observations, typically used downstream for reinforcement learning or human-in-the-loop fine-tuning when task success is not directly observable.
+{% else %}
+_Reward model type not recognized — please update this template._
+{% endif %}
+
+This reward model has been trained and pushed to the Hub using [LeRobot](https://github.com/huggingface/lerobot).
+See the full documentation at [LeRobot Docs](https://huggingface.co/docs/lerobot/index).
+
+---
+
+## How to Get Started with the Reward Model
+
+### Train from scratch
+
+```bash
+lerobot-train \
+  --dataset.repo_id=${HF_USER}/<dataset> \
+  --reward_model.type={{ model_name | default("reward_classifier", true) }} \
+  --output_dir=outputs/train/<desired_reward_model_repo_id> \
+  --job_name=lerobot_reward_training \
+  --reward_model.device=cuda \
+  --reward_model.repo_id=${HF_USER}/<desired_reward_model_repo_id> \
+  --wandb.enable=true
+```
+
+_Writes checkpoints to `outputs/train/<desired_reward_model_repo_id>/checkpoints/`._
+
+### Load the reward model in Python
+
+```python
+from lerobot.rewards import make_reward_model
+
+reward_model = make_reward_model(pretrained_path="<hf_user>/<reward_model_repo_id>")
+reward = reward_model.compute_reward(batch)
+```
+
+---
+
+## Model Details
+
+- **License:** {{ license | default("\[More Information Needed]", true) }}
@@ -86,11 +86,24 @@ def hw_to_dataset_features(
        }

    for key, shape in cam_fts.items():
-        features[f"{prefix}.images.{key}"] = {
-            "dtype": "video" if use_video else "image",
-            "shape": shape,
-            "names": ["height", "width", "channels"],
-        }
+        if len(shape) == 2:
+            # Single-channel feature (e.g. depth map). The hardware-side key is
+            # expected to use a "_depth" suffix to disambiguate from its color
+            # counterpart; we strip it so the dataset feature is published as
+            # ``{prefix}.depth.<bare>`` and aligned with ``observation.images.<bare>``.
+            bare = key.removesuffix("_depth") if key.endswith("_depth") else key
+            features[f"{prefix}.depth.{bare}"] = {
+                "dtype": "video" if use_video else "image",
+                "shape": shape,
+                "names": ["height", "width"],
+                "info": {"video.is_depth_map": True},
+            }
+        else:
+            features[f"{prefix}.images.{key}"] = {
+                "dtype": "video" if use_video else "image",
+                "shape": shape,
+                "names": ["height", "width", "channels"],
+            }

    _validate_feature_names(features)
    return features
@@ -120,7 +133,14 @@ def build_dataset_frame(
        elif ft["dtype"] == "float32" and len(ft["shape"]) == 1:
            frame[key] = np.array([values[name] for name in ft["names"]], dtype=np.float32)
        elif ft["dtype"] in ["image", "video"]:
-            frame[key] = values[key.removeprefix(f"{prefix}.images.")]
+            if key.startswith(f"{prefix}.depth."):
+                bare = key.removeprefix(f"{prefix}.depth.")
+                # Hardware emits depth values under "<bare>_depth" to disambiguate
+                # from the color stream stored at "<bare>" — fall back to the bare
+                # name when the producer already uses dataset-style keys.
+                frame[key] = values.get(f"{bare}_depth", values.get(bare))
+            else:
+                frame[key] = values[key.removeprefix(f"{prefix}.images.")]

    return frame

@@ -69,7 +69,7 @@ def is_package_available(
        return package_exists


-def get_safe_default_codec():
+def get_safe_default_video_backend():
    logger = logging.getLogger(__name__)
    if importlib.util.find_spec("torchcodec"):
        return "torchcodec"
@@ -115,7 +115,9 @@ _feetech_sdk_available = is_package_available("feetech-servo-sdk", import_name="
 _reachy2_sdk_available = is_package_available("reachy2_sdk")
 _can_available = is_package_available("python-can", "can")
 _unitree_sdk_available = is_package_available("unitree-sdk2py", "unitree_sdk2py")
-_pyrealsense2_available = is_package_available("pyrealsense2")
+_pyrealsense2_available = is_package_available("pyrealsense2") or is_package_available(
+    "pyrealsense2-macosx", import_name="pyrealsense2"
+)
 _zmq_available = is_package_available("pyzmq", import_name="zmq")
 _hebi_available = is_package_available("hebi-py", import_name="hebi")
 _teleop_available = is_package_available("teleop")
@@ -0,0 +1,239 @@
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Sample weighting abstraction for training.
+
+This module provides an abstract base class for sample weighting strategies (e.g., RA-BC)
+that can be used during training without polluting the training script with
+policy-specific code.
+
+Example usage:
+    # In training config
+    sample_weighting:
+        type: rabc
+        progress_path: hf://datasets/my-dataset/sarm_progress.parquet
+        head_mode: sparse
+        kappa: 0.01
+
+    # In training script
+    sample_weighter = make_sample_weighter(cfg.sample_weighting, policy, device, dataset_root=cfg.dataset.root, dataset_repo_id=cfg.dataset.repo_id)
+    ...
+    weights, stats = sample_weighter.compute_batch_weights(batch)
+"""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+import torch
+
+if TYPE_CHECKING:
+    from lerobot.policies.pretrained import PreTrainedPolicy
+
+
+class SampleWeighter(ABC):
+    """
+    Implementations compute per-sample weights that can be used to weight
+    the loss during training. This enables techniques like:
+    - RA-BC (Reward-Aligned Behavior Cloning)
+    - Importance sampling
+    - Curriculum learning
+    - Quality-based filtering
+    """
+
+    @abstractmethod
+    def compute_batch_weights(self, batch: dict) -> tuple[torch.Tensor, dict]:
+        """
+        Compute per-sample weights for a training batch.
+
+        Args:
+            batch: Training batch dictionary containing at minimum an "index" key
+                   with global frame indices.
+        """
+
+    @abstractmethod
+    def get_stats(self) -> dict:
+        """
+        Get global statistics about the weighting strategy.
+        """
+
+
+@dataclass
+class SampleWeightingConfig:
+    """
+    Configuration for sample weighting during training.
+
+    This is a generic config that supports multiple weighting strategies.
+    The `type` field determines which implementation to use, and `extra_params`
+    contains additional type-specific parameters.
+
+    Attributes:
+        type: Weighting strategy type ("rabc", "uniform", etc.)
+        progress_path: Path to precomputed progress values (for RABC)
+        head_mode: Which model head to use for progress ("sparse" or "dense")
+        kappa: Hard threshold for high-quality samples (RABC-specific)
+        epsilon: Small constant for numerical stability
+        extra_params: Additional type-specific parameters passed to the weighter
+    """
+
+    type: str = "rabc"
+    progress_path: str | None = None
+    head_mode: str = "sparse"
+    kappa: float = 0.01
+    epsilon: float = 1e-6
+    # Additional type-specific params can be added here or passed via extra_params
+    extra_params: dict = field(default_factory=dict)
+
+
+def make_sample_weighter(
+    config: SampleWeightingConfig | None,
+    policy: PreTrainedPolicy,
+    device: torch.device,
+    dataset_root: str | None = None,
+    dataset_repo_id: str | None = None,
+) -> SampleWeighter | None:
+    """
+    Factory function to create a SampleWeighter from config.
+
+    This keeps policy-specific initialization logic out of the training script.
+
+    Args:
+        config: Sample weighting configuration, or None to disable weighting.
+        policy: The policy being trained (used to extract chunk_size, etc.)
+        device: Device to place weight tensors on.
+        dataset_root: Local path to dataset root (for auto-detecting progress_path).
+        dataset_repo_id: HuggingFace repo ID (for auto-detecting progress_path).
+    """
+    if config is None:
+        return None
+
+    if config.type == "rabc":
+        return _make_rabc_weighter(config, policy, device, dataset_root, dataset_repo_id)
+
+    if config.type == "uniform":
+        # No-op weighter that returns uniform weights
+        return UniformWeighter(device=device)
+
+    raise ValueError(f"Unknown sample weighting type: '{config.type}'. Supported types: 'rabc', 'uniform'")
+
+
+def _make_rabc_weighter(
+    config: SampleWeightingConfig,
+    policy: PreTrainedPolicy,
+    device: torch.device,
+    dataset_root: str | None = None,
+    dataset_repo_id: str | None = None,
+) -> SampleWeighter:
+    """Create RABC weighter with policy-specific initialization.
+
+    Args:
+        config: Sample weighting configuration.
+        policy: The policy being trained (used to extract chunk_size).
+        device: Device to place weight tensors on.
+        dataset_root: Local path to dataset root (for auto-detecting progress_path).
+        dataset_repo_id: HuggingFace repo ID (for auto-detecting progress_path).
+    """
+    # Import here to avoid circular imports and keep RABC code in SARM module
+    from lerobot.rewards.sarm.rabc import RABCWeights
+
+    # Extract chunk_size from policy config
+    chunk_size = getattr(policy.config, "chunk_size", None)
+    if chunk_size is None:
+        raise ValueError(
+            "RABC sample weighting requires a policy with 'chunk_size' in its config. "
+            "This is typically set for action-chunking policies like ACT, Diffusion, PI0, etc."
+        )
+
+    # Determine progress_path: use explicit config or auto-detect from dataset
+    progress_path = config.progress_path
+    if progress_path is None:
+        if dataset_root:
+            progress_path = str(Path(dataset_root) / "sarm_progress.parquet")
+        elif dataset_repo_id:
+            progress_path = f"hf://datasets/{dataset_repo_id}/sarm_progress.parquet"
+        else:
+            raise ValueError(
+                "RABC sample weighting requires 'progress_path' to be set, "
+                "or dataset_root/dataset_repo_id for auto-detection. "
+                "Generate progress values using: "
+                "python -m lerobot.rewards.sarm.compute_rabc_weights --help"
+            )
+
+    return RABCWeights(
+        progress_path=progress_path,
+        chunk_size=chunk_size,
+        head_mode=config.head_mode,
+        kappa=config.kappa,
+        epsilon=config.epsilon,
+        device=device,
+        **config.extra_params,
+    )
+
+
+class UniformWeighter(SampleWeighter):
+    """
+    No-op sample weighter that returns uniform weights.
+
+    Useful as a baseline or when you want to disable weighting without
+    changing the training code structure.
+
+    Note:
+        Batch size is determined by looking for tensor values in the batch
+        dictionary. The method checks common keys like "action", "index",
+        and "observation.state" first, then falls back to scanning all values.
+    """
+
+    def __init__(self, device: torch.device):
+        self.device = device
+
+    def compute_batch_weights(self, batch: dict) -> tuple[torch.Tensor, dict]:
+        """Return uniform weights (all ones)."""
+        batch_size = self._determine_batch_size(batch)
+
+        weights = torch.ones(batch_size, device=self.device)
+        stats = {"mean_weight": 1.0, "type": "uniform"}
+        return weights, stats
+
+    def _determine_batch_size(self, batch: dict) -> int:
+        """
+        Determine batch size from the batch dictionary.
+
+        Checks common keys first, then scans all values for tensors.
+
+        Args:
+            batch: Training batch dictionary.
+        """
+        if not batch:
+            raise ValueError("Cannot determine batch size from empty batch")
+
+        # Check common keys first
+        for key in ["action", "index", "observation.state"]:
+            if key in batch and isinstance(batch[key], torch.Tensor):
+                return batch[key].shape[0]
+
+        # Scan all values for any tensor
+        for value in batch.values():
+            if isinstance(value, torch.Tensor) and value.ndim >= 1:
+                return value.shape[0]
+
+        # Last resort: return 1 (this handles non-tensor batches)
+        return 1
+
+    def get_stats(self) -> dict:
+        """Return empty stats for uniform weighting."""
+        return {"type": "uniform"}
@@ -63,10 +63,56 @@ def _is_scalar(x):
    )


+def _derive_depth_obs_ranges(
+    features: dict[str, dict] | None,
+) -> dict[str, tuple[float, float] | None]:
+    """Map observation keys of depth features to their ``(depth_min, depth_max)`` range.
+
+    A feature is considered a depth map when its ``info`` dict carries
+    ``video.is_depth_map=True`` (the marker set by ``hw_to_dataset_features``
+    and persisted in ``info.json``). For each such feature, we record both
+    the fully-namespaced dataset key (e.g. ``observation.depth.front``) and
+    the corresponding raw observation key forms the robot is likely to emit
+    (``front`` and ``front_depth``) so a single membership check covers all
+    call sites.
+
+    The mapped value is the ``(depth_min, depth_max)`` range stored on the
+    feature (matching the quantization range used at encoding time), or
+    ``None`` when the metadata doesn't expose a range — in which case the
+    caller should let Rerun auto-normalize. Anchoring the colormap to a
+    fixed range avoids per-frame re-normalization, which otherwise looks
+    like flicker on near-static scenes.
+    """
+    ranges: dict[str, tuple[float, float] | None] = {}
+    if not features:
+        return ranges
+    depth_prefix = f"{OBS_STR}.depth."
+    for fk, fv in features.items():
+        info = fv.get("info") if isinstance(fv, dict) else None
+        if not isinstance(info, dict) or not info.get("video.is_depth_map", False):
+            continue
+        depth_min = info.get("video.depth_min")
+        depth_max = info.get("video.depth_max")
+        rng: tuple[float, float] | None = None
+        if (
+            isinstance(depth_min, (int, float))
+            and isinstance(depth_max, (int, float))
+            and depth_max > depth_min
+        ):
+            rng = (float(depth_min), float(depth_max))
+        ranges[fk] = rng
+        if fk.startswith(depth_prefix):
+            bare = fk[len(depth_prefix) :]
+            ranges[bare] = rng
+            ranges[f"{bare}_depth"] = rng
+    return ranges
+
+
 def log_rerun_data(
    observation: RobotObservation | None = None,
    action: RobotAction | None = None,
    compress_images: bool = False,
+    features: dict[str, dict] | None = None,
 ) -> None:
    """
    Logs observation and action data to Rerun for real-time visualization.
@@ -76,6 +122,13 @@ def log_rerun_data(
    - Scalars values (floats, ints) are logged as `rr.Scalars`.
    - 3D NumPy arrays that resemble images (e.g., with 1, 3, or 4 channels first) are transposed
      from CHW to HWC format, (optionally) compressed to JPEG and logged as `rr.Image` or `rr.EncodedImage`.
+    - 2D NumPy arrays whose key matches a depth feature in ``features`` (i.e. carrying
+      ``video.is_depth_map=True``) are logged as ``rr.DepthImage`` with the Viridis
+      colormap and ``meter=1.0`` (depth values are expected in metric meters). When
+      the feature exposes ``video.depth_min`` / ``video.depth_max`` (the encoder
+      quantization range, persisted in ``info.json``), the colormap is anchored to
+      that range via ``depth_range`` to keep the visualization stable across frames.
+      Depth images are never JPEG-compressed regardless of ``compress_images``.
    - 1D NumPy arrays are logged as a series of individual scalars, with each element indexed.
    - Other multi-dimensional arrays are flattened and logged as individual scalars.

@@ -85,11 +138,16 @@ def log_rerun_data(
        observation: An optional dictionary containing observation data to log.
        action: An optional dictionary containing action data to log.
        compress_images: Whether to compress images before logging to save bandwidth & memory in exchange for cpu and quality.
+        features: Optional dataset feature spec (e.g. ``LeRobotDataset.features``). When
+            provided, observation entries matching a depth-map feature are rendered with
+            ``rr.DepthImage`` instead of the generic ``rr.Image`` path.
    """

    require_package("rerun-sdk", extra="viz", import_name="rerun")
    import rerun as rr

+    depth_obs_ranges = _derive_depth_obs_ranges(features)
+
    if observation:
        for k, v in observation.items():
            if v is None:
@@ -100,6 +158,20 @@ def log_rerun_data(
                rr.log(key, rr.Scalars(float(v)))
            elif isinstance(v, np.ndarray):
                arr = v
+                is_depth = bool(depth_obs_ranges) and (k in depth_obs_ranges or key in depth_obs_ranges)
+                if is_depth and arr.ndim == 2:
+                    # Viridis-colormapped DepthImage; never JPEG-compress (lossy on float metric depth).
+                    # Anchor the colormap to the encoder range when available, so the
+                    # visualization doesn't flicker as per-frame min/max drift.
+                    depth_range = depth_obs_ranges.get(k) or depth_obs_ranges.get(key)
+                    depth_kwargs: dict = {
+                        "meter": 1.0,
+                        "colormap": rr.components.Colormap.Viridis,
+                    }
+                    if depth_range is not None:
+                        depth_kwargs["depth_range"] = depth_range
+                    rr.log(key, rr.DepthImage(arr, **depth_kwargs), static=True)
+                    continue
                # Convert CHW -> HWC when needed
                if arr.ndim == 3 and arr.shape[0] in (1, 3, 4) and arr.shape[-1] not in (1, 3, 4):
                    arr = np.transpose(arr, (1, 2, 0))
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c2b8f8532c7a0b776de5e536b8b54e30b1a0c2e3d5cc25a2d86fe43e40ae5e8c
+oid sha256:8a31653c11eccdd4d80fd3f6a351cd54c49b8a48db1f7e9faf38fddd7900a09f
 size 515400
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:224b5fa4828aa88171b68c036e8919c1eae563e2113f03b6461eadf5bf8525a6
+oid sha256:75bf051698b37dcd7517ec8025a896ab5a0551a6dde5f89d0a3d5d50966e83e6
 size 31672
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:016d2fa8fe5f58017dfd46f4632fdc19dfd751e32a2c7cde2077c6f95546d6bd
+oid sha256:88e10930a10041d50f2cf369e6813ac14618d13dad1c21bdde1ac7798611c6ba
 size 68
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:eca0d87a699620e4fec7e68539b0be91e4cc933f6bf12032da52c182ab6f38cf
+oid sha256:89833a5ccdb7d85c83f717ff8ec68b8e822005cb8803899acaae88c578e2e3ae
 size 31672
@@ -202,6 +202,31 @@ def test_read_latest_too_old():
            _ = camera.read_latest(max_age_ms=0)  # immediately too old


+def test_async_read_depth_without_use_depth_raises():
+    """``async_read_depth`` must reject cameras configured without ``use_depth=True``."""
+    config = RealSenseCameraConfig(serial_number_or_name="042", warmup_s=0)
+    with RealSenseCamera(config) as camera, pytest.raises(RuntimeError, match="use_depth=False"):
+        _ = camera.async_read_depth()
+
+
+def test_read_latest_depth_without_use_depth_raises():
+    """``read_latest_depth`` must reject cameras configured without ``use_depth=True``."""
+    config = RealSenseCameraConfig(serial_number_or_name="042", warmup_s=0)
+    with RealSenseCamera(config) as camera, pytest.raises(RuntimeError, match="use_depth=False"):
+        _ = camera.read_latest_depth()
+
+
+def test_depth_to_meters_uses_depth_scale():
+    """``_depth_to_meters`` must scale uint16 raw depth into float32 metric meters."""
+    config = RealSenseCameraConfig(serial_number_or_name="042", warmup_s=0)
+    camera = RealSenseCamera(config)
+    camera.depth_scale = 0.001  # typical D-series scale (1 mm/unit)
+    raw = np.array([[0, 1000, 2500], [4095, 65535, 0]], dtype=np.uint16)
+    meters = camera._depth_to_meters(raw)
+    assert meters.dtype == np.float32
+    np.testing.assert_allclose(meters, raw.astype(np.float32) * 0.001)
+
+
@pytest.mark.parametrize(
    "rotation",
    [
@@ -113,7 +113,7 @@ def assert_metadata_consistency(aggr_ds, ds_0, ds_1):
    """Test that metadata is correctly aggregated."""
    # Test basic info
    assert aggr_ds.fps == ds_0.fps == ds_1.fps, "FPS should be the same across all datasets"
-    assert aggr_ds.meta.info["robot_type"] == ds_0.meta.info["robot_type"] == ds_1.meta.info["robot_type"], (
+    assert aggr_ds.meta.info.robot_type == ds_0.meta.info.robot_type == ds_1.meta.info.robot_type, (
        "Robot type should be the same"
    )

@@ -153,8 +153,8 @@ def assert_video_frames_integrity(aggr_ds, ds_0, ds_1):

    video_keys = list(
        filter(
-            lambda key: aggr_ds.meta.info["features"][key]["dtype"] == "video",
-            aggr_ds.meta.info["features"].keys(),
+            lambda key: aggr_ds.meta.info.features[key]["dtype"] == "video",
+            aggr_ds.meta.info.features.keys(),
        )
    )

@@ -142,6 +142,36 @@ def test_create_without_videos_has_no_video_path(tmp_path):
    assert meta.video_keys == []


+def test_depth_keys_property_filters_by_marker(tmp_path):
+    """``depth_keys`` selects only video features carrying ``video.is_depth_map=True``."""
+    features = {
+        **SIMPLE_FEATURES,
+        "observation.images.cam": {
+            "dtype": "video",
+            "shape": (64, 96, 3),
+            "names": ["height", "width", "channels"],
+            "info": None,
+        },
+        "observation.depth.cam": {
+            "dtype": "video",
+            "shape": (64, 96),
+            "names": ["height", "width"],
+            "info": {"video.is_depth_map": True},
+        },
+    }
+    meta = LeRobotDatasetMetadata.create(
+        repo_id="test/depth_keys", fps=DEFAULT_FPS, features=features, root=tmp_path / "depth_keys"
+    )
+
+    assert set(meta.video_keys) == {"observation.images.cam", "observation.depth.cam"}
+    assert meta.depth_keys == ["observation.depth.cam"]
+    
+def test_depth_keys_empty_when_no_marker(tmp_path):
+    meta = LeRobotDatasetMetadata.create(
+        repo_id="test/no_depth", fps=DEFAULT_FPS, features=VIDEO_FEATURES, root=tmp_path / "no_depth"
+    )
+    assert meta.depth_keys == []
+
 def test_create_raises_on_existing_directory(tmp_path):
    """create() raises if root directory already exists."""
    root = tmp_path / "existing"
@@ -161,7 +191,7 @@ def test_init_loads_existing_metadata(tmp_path, lerobot_dataset_metadata_factory

    assert meta.total_episodes == 3
    assert meta.total_frames == 150
-    assert meta.fps == info["fps"]
+    assert meta.fps == info.fps


 # ── Property accessors ───────────────────────────────────────────────
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
CarolinePascal	4445849b86	feat(depth maps writer): adding support for raw depth maps recording with image writer	2026-05-01 00:49:09 +02:00
CarolinePascal	f43bf75f9b	fix(viz): anchor rerun DepthImage colormap to encoder depth range	2026-05-01 00:49:09 +02:00
CarolinePascal	b540fa94a9	feat(viz): render depth observations as rr.DepthImage in Viridis log_rerun_data now accepts an optional `features` dict and uses the `video.is_depth_map=True` info marker to detect depth observations. Matching 2D arrays are logged as `rr.DepthImage(arr, meter=1.0, colormap=rr.components.Colormap.Viridis)` and are never JPEG-compressed (compression is lossy on float32 metric depth). Detection covers both the namespaced dataset key (e.g. `observation.depth.front`) and the raw observation keys the robot emits (`front`, `front_depth`), so it works for both the typed LeRobotDataset.features dict and the plain robot observation flow. When `features` is None the previous behaviour is preserved (depth arrays fall back to the generic `rr.Image` path), so non-depth recordings and existing call sites are unaffected. lerobot-record now forwards `dataset.features` so depth keys are picked up automatically when `--display_data=true`. Made-with: Cursor	2026-05-01 00:49:09 +02:00
CarolinePascal	efad15f600	feat(record): plumb DepthEncoderConfig through lerobot-record Surface DepthEncoderConfig and depth_encoder_defaults from lerobot.datasets, and wire dataset.depth_encoder_config through LeRobotDataset.create() and LeRobotDataset.resume() so depth-capable recordings (e.g. RealSense use_depth=True) can be tuned from the CLI: --dataset.depth_encoder_config.depth_min=0.1 --dataset.depth_encoder_config.depth_max=4.0 --dataset.depth_encoder_config.vcodec=ffv1 The default factory keeps depth-stream defaults (12-bit HEVC, log quantization), so non-depth recordings are unaffected. Made-with: Cursor	2026-05-01 00:49:09 +02:00
CarolinePascal	407d1882a2	feat(robots/so_follower): emit + populate depth keys when use_depth When an SO follower has a camera configured with use_depth=True (e.g. a RealSense), the robot now exposes a paired depth feature so the dataset records both modalities: - _cameras_ft adds a 2D "<cam>_depth" entry alongside the 3-channel color shape; hw_to_dataset_features turns this into observation.depth.<cam> with the depth-map marker. - get_observation reads cam.read_latest_depth() (float32 metric meters from the RealSense async depth API) into <cam>_depth so build_dataset_frame can route it. Detection is duck-typed via getattr(..., "use_depth", False) so other cameras without that attribute keep their RGB-only behaviour unchanged. Made-with: Cursor	2026-05-01 00:49:09 +02:00
CarolinePascal	0d6e4f3bad	feat(features): route 2D camera shapes to observation.depth.<key> hw_to_dataset_features now treats a camera entry whose shape has length 2 as a single-channel depth feature: it emits the feature as "{prefix}.depth.<bare>" with names=["height", "width"] and an info={"video.is_depth_map": True} marker so the depth-encoder branch in LeRobotDataset is engaged. The "_depth" hardware-side suffix (if present) is stripped so a paired RGB + depth camera ends up as "observation.images.<cam>" + "observation.depth.<cam>". build_dataset_frame mirrors the routing: depth feature keys read their value from "<bare>_depth" in the raw observation dict, with fallback to the bare name for producers that already emit dataset-style keys. Tests: add tests/utils/test_feature_utils.py covering the routing of 2D vs 3D camera shapes, the paired RGB+depth case, and the build_dataset_frame value routing. Made-with: Cursor	2026-05-01 00:49:09 +02:00
CarolinePascal	536b29d963	feat(cameras/realsense): expose async depth in metric meters	2026-05-01 00:48:40 +02:00
CarolinePascal	2744e26593	feat(depth): wire DatasetReader to decode_depth_frames	2026-05-01 00:41:38 +02:00
CarolinePascal	de64ad3f7e	feat(depth): wire StreamingVideoEncoder + writer to depth encoder	2026-05-01 00:29:34 +02:00
CarolinePascal	d777359662	feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter	2026-04-30 23:55:28 +02:00
CarolinePascal	5d0a20bd9c	feat(video): alias "av1" to "libsvtav1" for backward compat	2026-04-30 23:43:02 +02:00
CarolinePascal	2c796d3352	feat(depth): persist depth metadata + add reader helpers	2026-04-30 23:38:56 +02:00
CarolinePascal	df1648c102	feat(video): add ffv1 to supported codecs	2026-04-30 17:32:50 +02:00
CarolinePascal	3bd96a4346	feat(depth): add depth quantization helpers and tests	2026-04-30 17:31:03 +02:00
CarolinePascal	016799dfa1	chore(format): formatting code	2026-04-30 14:42:37 +02:00
CarolinePascal	51b9038458	chore(PyAV): cleaning up PyAV utils and encoding parameters checks to stick to the minimun required tooling.	2026-04-30 14:31:08 +02:00
CarolinePascal	cc9a2e5c99	chore(format): fixing formatting issues	2026-04-29 16:48:57 +02:00
CarolinePascal	a2376389f9	test(new): adding new tests for encoding related features	2026-04-29 16:48:56 +02:00
CarolinePascal	57a619ab02	test(existing): adapting existing tests	2026-04-29 16:48:56 +02:00
CarolinePascal	7f624adcc5	chore(duplicate): removing duplicate get_codec_options definition	2026-04-29 16:48:56 +02:00
CarolinePascal	375cf1fdf3	feat(pyav checks): making pyav parameters checks more robust	2026-04-29 16:48:56 +02:00
CarolinePascal	b2c2bb7641	feat(VideoEncoderConfig init): making VideoEncoderConfig more robust and adaptable to multiple backends	2026-04-29 16:48:56 +02:00
CarolinePascal	4a87ee1537	fix(concatenation compatibility): adding compatibility check when concatenating video files	2026-04-29 16:48:56 +02:00
CarolinePascal	e44f86e516	feat(metadata): adding encoding parameters in dataset metadata	2026-04-29 16:48:56 +02:00
CarolinePascal	a0e3acdb67	chore(docs): updating the docs	2026-04-29 16:46:16 +02:00
CarolinePascal	38ff579bcc	feat(VideoEncoderConfig): propagating the VideoEncoderConfig in the codebase	2026-04-29 16:44:47 +02:00
CarolinePascal	479e444517	feat(VideoEncoderConfig): creating a VideoEncoderConfig to encapsulate encoding parameters	2026-04-29 16:42:14 +02:00
CarolinePascal	9787b8fa26	feat(pyav utils): adding suport for PyAV encoding parameters validation	2026-04-29 16:42:14 +02:00
CarolinePascal	71f39f6912	chore(video backend): renaming codec into video_backend in get_safe_default_video_backend()	2026-04-29 16:42:14 +02:00
Khalil Meftah	b5f65e5332	Expose sarm package API and ship reward model card template (#3477 ) * chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in * chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.	2026-04-29 16:17:16 +02:00
Khalil Meftah	cd6b43ea7a	fix(train): migrate legacy RA-BC fields in train config loading (#3480 )	2026-04-29 16:17:00 +02:00
Steven Palma	2236bbe7a3	fix(rollout): propagate policy-specific CLI config paramaters (#3483 ) Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-29 16:13:10 +02:00
Maxime Ellerbach	cb0a944941	refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472 ) * refactor(datasets): replace untyped dict with typed DatasetInfo dataclass Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json. Changes: - Add DatasetInfo dataclass with explicit fields and validation - Implement __post_init__ for shape conversion (list ↔ tuple) - Add dict-style compatibility layer (__getitem__, __setitem__, .get()) - Add from_dict() and to_dict() for JSON serialization - Update io_utils to use load_info/write_info with DatasetInfo - Update dataset utilities and metadata to use attribute access - Remove aggregate.py dict-style field access - Add tests fixture support for DatasetInfo Benefits: - Type safety with IDE auto-completion - Validation at construction time - Explicit schema documentation * fix pre-commit * update docstring inside DatasetInfo.from_dict() * sorts the unknown to have deterministic output Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * refactoring the last few old fieds * fix crop dataset roi type mismatch * use consistantly int for data and video_files_size_in_mb --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: jjolla93 <jjolla93@gmail.com>	2026-04-28 18:40:30 +02:00
Khalil Meftah	8a3d64033f	Reward models refactor (#3142 ) * feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes * refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/ * refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/ * refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py * refactor(rewards): update imports and delete old reward model locations * test(rewards): add reward model tests and update existing test imports * fix(rewards): restore full Classifier and SARM implementations * test(rewards): restore missing CUDA and mixed precision classifier processor tests * refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train * refactor(lerobot_train.py): add missing sampling weight script * linter + missing files * add testing for sampl weighter * revert some useless changes, improve typing * update docs * add automatic detection of the progress path * remove type exp * improve comment * fix: move rabc.py to rewards/sarm/ and update import paths * refactor(imports): update reward model imports to new module structure * refactor(imports): update reward model imports to reflect new module structure * refactor(imports): conditionally import pandas based on availability * feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig * refactor(policies): remove reward model branches from policy factory and __init__ * refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash * feat(train): route reward model training through rewards/factory instead of policies/factory * refactor(train): streamline reward model training logic * fix(rewards): ensure FileNotFoundError is raised for missing config_file * refactor(train): update __get_path_fields__ to include reward_model for config loading * refactor(classifier): remove redundant input normalization in predict_reward method * fix(train): raise ValueError for non-trainable reward models in train function * refactor(pretrained_rm): add model card template * refactor(tests): reward models * refactor(sarm): update reset method and remove unused action prediction methods * refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function * fix(train): raise ValueError for PEFT usage in reward model training * refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2026-04-28 17:56:24 +02:00
Steven Palma	03ee50e08f	chore(ci): bump docs workflows (#3476 )	2026-04-28 15:06:44 +02:00
Steven Palma	ca87ccd941	feat(rollout): decouple policy deployment from data recording with new `lerobot-rollout` CLI (#3413 ) * feat(scripts): lerobot-rollout * fix(rollout) require dataset in dagger + use duration too * fix(docs): dagger num_episodes * test(rollout): fix expectations * fix(rollout): features check * fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config * docs(rollout): edit rename_map instructions * chore(rollout): multiple minor improvements * chore(rollout): address coments + minor improvements * fix(rollout): enable default * fix(tests): default value RTCConfig * fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): prevent relativeactions with sync inference engine Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): rtc reanchor to non normalized state Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): fixing the episode length to use hwc (#3469) also reducing default length to 5 minutes * feat(rollout): go back to initial position is now a config * fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470) * chore(rollout): note about dagger correction stage * chore(docs): update comments and docstring * fix(test): move rtc relative out of rollout module * fix(rollout): address the review comments --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-28 00:57:35 +02:00
Steven Palma	77352c495c	chore(dependencies): update uv.lock (#3437 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-04-27 23:15:46 +02:00
Steven Palma	05a5223885	fix(pi): avoid peak RAM in PiGemma construction by freeing replaced submodules (#3454 ) Co-Authored-By: Daiki Kamata <daiki.kamata@access-company.com> Co-Authored-By: Jack Vial <jackvial@users.noreply.github.com> Co-Authored-By: Ajay Anubolu <AjAnubolu@users.noreply.github.com> Co-Authored-By: Finn F. <F-Fer@users.noreply.github.com>	2026-04-24 17:50:12 +02:00
Steven Palma	580d818aa9	fix(dataset): no default overwrite in lerobot tool recompute stats (#3452 )	2026-04-24 15:07:19 +02:00
Steven Palma	587aa82021	fix(imports): realsense import name is platform dependent (#3451 )	2026-04-24 12:55:38 +02:00
Chuyao Shen	12b88fce02	not use dataclass (#3414 ) Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-24 11:26:59 +02:00
masato-ka	fc6c94c82a	fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in… (#3419 ) * fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding In transformers 5.x, CLIPModel.get_image_features() and get_text_features() return BaseModelOutputWithPooling instead of a plain torch.FloatTensor. Added isinstance check to extract pooler_output when the return value is not a tensor, maintaining backward compatibility with transformers 4.x. Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach' * Adding assertion check for pooler_output of CLIP. This change is response to below comment. https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387 * Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776 --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-23 16:26:58 +02:00
Steven Palma	1add460678	fix(policy): loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT (#3442 ) * Fix loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT When action_is_pad masks out padded timesteps, the subsequent .mean() still divides by the total element count (including zeroed-out padding), underestimating the loss. With 60-70% padding this can cut the effective gradient signal by 2-3x. Replace mask-then-mean with mask-then-sum / valid-count for all three affected policies. TDMPC is not affected because it sums over time before averaging over batch. Fixes #3353 * linting Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/diffusion/modeling_diffusion.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * apply ACT loss normalization suggestion from review Divide by num_valid (timesteps * action_dim) instead of just timesteps, matching the diffusion/multi_task_dit fix. Addresses review from @whats2000 (https://github.com/huggingface/lerobot/pull/3377#discussion_r3106845791). * fix(test): update safetensor act --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>	2026-04-23 15:23:54 +02:00
Qi Jia	4587c2b648	fix xvla docs (#3291 ) Co-authored-by: Qi Jia <kaufou@gmail.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-04-23 14:50:32 +02:00
whats2000	2236cdb302	fix(smolvla): correct loss normalization for padded actions (#3434 ) Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the `action_is_pad` mask to zero out padded timesteps, then calls `.mean()` (or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number of elements including the zeroed padding, the loss is diluted by the padding fraction. Fixed by normalizing only over valid (non-padded) scalar entries: num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1) loss = losses.sum(...) / num_valid `clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both reduction paths are updated. Behavior when `action_is_pad` is missing is unchanged (`losses.mean()`). Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2, 30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` — same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for ACT. Heavier-padding recipes will see a larger gap. Refs: #3353 (original report for ACT), #3377 (fix for the other three policies).	2026-04-23 10:34:11 +02:00
				`@@ -1 +0,0 @@`
				`../../../../docs/source/policy_sarm_README.md`