Compare commits

...

10 Commits

17 changed files with 173 additions and 105 deletions
+2
View File
@@ -33,6 +33,8 @@
title: Using the Dataset Tools title: Using the Dataset Tools
- local: dataset_subtask - local: dataset_subtask
title: Using Subtasks in the Dataset title: Using Subtasks in the Dataset
- local: video_encoding_parameters
title: Video encoding parameters
- local: streaming_video_encoding - local: streaming_video_encoding
title: Streaming Video Encoding title: Streaming Video Encoding
title: "Datasets" title: "Datasets"
+9 -19
View File
@@ -14,22 +14,12 @@ This makes `save_episode()` near-instant (the video is already encoded by the ti
## 2. Tuning Parameters ## 2. Tuning Parameters
All encoding parameters are grouped under `camera_encoder_config` (a `VideoEncoderConfig` dataclass), accessible from the CLI via `--dataset.camera_encoder_config.<field>`. | Parameter | CLI Flag | Type | Default | Description |
| ----------------------- | ---------------------------------------- | ------------- | ------------- | ----------------------------------------------------------------- |
| Parameter | CLI Flag | Type | Default | Description | | `streaming_encoding` | `--dataset.streaming_encoding` | `bool` | `True` | Enable real-time encoding during capture |
| ----------------------- | --------------------------------------------- | ------------- | ------------- | ------------------------------------------------------------------- | | `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder |
| `streaming_encoding` | `--dataset.streaming_encoding` | `bool` | `True` | Enable real-time encoding during capture | | `encoder_threads` | `--dataset.encoder_threads` | `int \| None` | `None` (auto) | Threads per encoder instance. `None` will leave the vcoded decide |
| `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder | | `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `60` | Max buffered frames per camera (~2s at 30fps). Consumes RAM |
| `pix_fmt` | `--dataset.camera_encoder_config.pix_fmt` | `str` | `"yuv420p"` | Pixel format |
| `g` | `--dataset.camera_encoder_config.g` | `int \| None` | `2` | GOP size (keyframe interval) |
| `crf` | `--dataset.camera_encoder_config.crf` | `int \| None` | `30` | Quality level (mapped to codec-specific parameter) |
| `preset` | `--dataset.camera_encoder_config.preset` | `int \| None` | `12` | Speed preset (libsvtav1 only, 0 = slowest … 13 = fastest) |
| `fast_decode` | `--dataset.camera_encoder_config.fast_decode` | `int` | `0` | Fast-decode tuning level |
| `encoder_threads` | `--dataset.encoder_threads` | `int \| None` | `None` (auto) | Threads per encoder instance (global). `None` lets the codec decide |
| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `60` | Max buffered frames per camera (~2s at 30fps). Consumes RAM |
> [!TIP]
> Not all parameters apply to every codec. `VideoEncoderConfig` will warn at startup if you set a parameter that your chosen codec ignores (e.g. `preset` with `h264_nvenc`).
## 3. Performance Considerations ## 3. Performance Considerations
@@ -50,7 +40,7 @@ Streaming encoding means the CPU is encoding video **during** the capture loop,
### `encoder_threads` Tuning ### `encoder_threads` Tuning
This parameter (`--dataset.encoder_threads`) controls how many threads each encoder instance uses internally: This parameter controls how many threads each encoder instance uses internally:
- **Higher values** (e.g., 4-5): Faster encoding, but uses more CPU cores per camera. Good for high-end systems with many cores. - **Higher values** (e.g., 4-5): Faster encoding, but uses more CPU cores per camera. Good for high-end systems with many cores.
- **Lower values** (e.g., 1-2): Less CPU per camera, freeing cores for capture and visualization. Good for low-res images and capable CPUs. - **Lower values** (e.g., 1-2): Less CPU per camera, freeing cores for capture and visualization. Good for low-res images and capable CPUs.
@@ -58,7 +48,7 @@ This parameter (`--dataset.encoder_threads`) controls how many threads each enco
### Backpressure and Frame Dropping ### Backpressure and Frame Dropping
Each camera has a bounded queue (`encoder_queue_maxsize`, default 60 frames). When the encoder can't keep up: Each camera has a bounded queue (`encoder_queue_maxsize`, default 30 frames). When the encoder can't keep up:
1. The queue fills up (consuming RAM) 1. The queue fills up (consuming RAM)
2. New frames are **dropped** (not blocked) — the capture loop continues uninterrupted 2. New frames are **dropped** (not blocked) — the capture loop continues uninterrupted
@@ -162,4 +152,4 @@ lerobot-record --dataset.camera_encoder_config.vcodec=h264 --dataset.streaming_e
## 7. Closing note ## 7. Closing note
Performance ultimately depends on your exact setup — frames-per-second, resolution, CPU cores and load, available memory, episode length, and the encoder you choose. Always test with your target workload, be mindful about your CPU & system capabilities and tune `encoder_threads`, `encoder_queue_maxsize`, and Performance ultimately depends on your exact setup — frames-per-second, resolution, CPU cores and load, available memory, episode length, and the encoder you choose. Always test with your target workload, be mindful about your CPU & system capabilities and tune `encoder_threads`, `encoder_queue_maxsize`, and
`camera_encoder_config.vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration. `vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration.
+1 -8
View File
@@ -147,14 +147,7 @@ lerobot-edit-dataset \
**Parameters:** **Parameters:**
- `output_dir`: Custom output directory (optional - by default uses `new_repo_id` or `{repo_id}_video`) - `output_dir`: Custom output directory (optional - by default uses `new_repo_id` or `{repo_id}_video`)
- `camera_encoder_config`: Video encoder settings — all sub-fields accessible via `--operation.camera_encoder_config.<field>`: - `camera_encoder_config`: Video encoder settings — all sub-fields accessible via `--operation.camera_encoder_config.<field>. See [Video Encoding Parameters](./video_encoding_parameters) for more details.
- `vcodec`: Video codec — `h264`, `hevc`, `libsvtav1`, `auto`, or hardware codecs (default: `libsvtav1`)
- `pix_fmt`: Pixel format — `yuv420p`, `yuv444p` (default: `yuv420p`)
- `g`: GOP size — lower values give better quality but larger files (default: 2)
- `crf`: Quality level — lower is better, 0 is lossless (default: 30)
- `preset`: Speed preset, libsvtav1 only (default: 12)
- `fast_decode`: Fast-decode tuning (default: 0)
- `encoder_threads`: Threads per encoder instance — global setting, separate from `camera_encoder_config` (default: None)
- `episode_indices`: List of specific episodes to convert (default: all episodes) - `episode_indices`: List of specific episodes to convert (default: all episodes)
- `num_workers`: Number of parallel workers for processing (default: 4) - `num_workers`: Number of parallel workers for processing (default: 4)
+81
View File
@@ -0,0 +1,81 @@
# Video encoding parameters
When **video storage** is on, LeRobot stores each camera stream as an **MP4** file rather than saving **every timestep as its own image file**. **Video encoding compress across time**, which usually cuts **dataset size and I/O** compared to heaps of PNGs, and MP4 stays a **familiar format** for players and loaders. Incoding frames into a MP4 file is a full FFmpeg pipeline: choice of encoder, pixel format, GOP/keyframes, quality vs speed, and
optional extra encoder flags. **Many of those knobs are user-tunable** and are exposed on the dataset config as
**`dataset.camera_encoder_config`** — a nested **`VideoEncoderConfig`** (`lerobot.datasets.video_utils.
VideoEncoderConfig`) passed through **PyAV**.
You can set these parameters from the CLI with **`--dataset.camera_encoder_config.<field>`** (e.g. `lerobot-record`, `lerobot-rollout`). The same block applies to **every** camera video stream in that run. **Video storage must be on** — **`use_videos=True`** in Python APIs or **`--dataset.video=true`** (recording default); with video off, inputs stay as images and **`camera_encoder_config` is ignored.**
For **when** frames are written vs encoded (streaming vs post-episode), queues, and other top-level **`--dataset.*`** switches, see [Streaming Video Encoding](./streaming_video_encoding). For codec/size/speed experiments, see the [video-benchmark Space](https://huggingface.co/spaces/lerobot/video-benchmark).
---
## Tuning Parameters
| Parameter | CLI flag | Type | Default | Description |
| --------------- | ----------------------------------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec name. `"auto"` picks the first available hardware encoder from a fixed preference list, else `libsvtav1`. |
| `pix_fmt` | `--dataset.camera_encoder_config.pix_fmt` | `str` | `"yuv420p"` | Output pixel format; must be supported by the specified codec in your FFmpeg build. |
| `g` | `--dataset.camera_encoder_config.g` | `int \| None` | `2` | GOP size (keyframes every `g` frames). Emitted as FFmpeg option `g`. |
| `crf` | `--dataset.camera_encoder_config.crf` | `int \| None` | `30` | Abstract **quality**; mapped per codec in the table below (CRF, QP, `q:v`, etc.). Lower → higher quality / larger output where the mapping is monotone. |
| `preset` | `--dataset.camera_encoder_config.preset` | `int \| str \| None` | `12`\* | Video encoding speed preset; meaning depends on the specified codec. \*Unset + `libsvtav1` → LeRobot sets `12`. |
| `fast_decode` | `--dataset.camera_encoder_config.fast_decode` | `int` | `0` | `libsvtav1`: `02` passed in `svtav1-params`; `h264` / `hevc` (software): if `>0`, sets `tune=fastdecode`; other codecs: often unused. |
| `video_backend` | `--dataset.camera_encoder_config.video_backend` | `str` | `"pyav"` | Only `"pyav"` is implemented for video encoding today. |
| `extra_options` | (nested config / non-scalar) | `dict` | `{}` | Extra FFmpeg options merged after the built-in mapping; **cannot** override keys already set from structured fields above. |
---
## Validation
| What | Behavior |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Video codec presence | `vcodec` must exist as a video encoder in the local FFmpeg build (after resolving `"auto"`). |
| Pixel format | `pix_fmt` is checked against the encoders reported pixel formats when available. |
| Options | `get_codec_options()` output (including values originating from `extra_options`) is checked against PyAV/FFmpeg option metadata (ranges, integer constraints, string choices) where applicable. |
---
## Mapping: `VideoEncoderConfig` → FFmpeg options
From **`get_codec_options()`** after `vcodec` resolution. Only fields on `camera_encoder_config` are listed here (no global thread / queue flags).
| Resolved `vcodec` | `g` | Quality from `crf` | `preset` | `fast_decode` |
| ---------------------------------------- | --- | --------------------------- | -------- | ------------------------------------------ |
| `libsvtav1` | `g` | `crf` | `preset` | `svtav1-params` includes `fast-decode=0…2` |
| `h264`, `hevc` (software) | `g` | `crf` | `preset` | `tune=fastdecode` if `fast_decode > 0` |
| `h264_videotoolbox`, `hevc_videotoolbox` | `g` | `q:v` (derived from `crf`) | — | — |
| `h264_nvenc`, `hevc_nvenc` | `g` | `rc=constqp` + `qp` ← `crf` | `preset` | — |
| `h264_vaapi` | `g` | `qp` ← `crf` | — | — |
| `h264_qsv` | `g` | `global_quality` ← `crf` | `preset` | — |
---
## `extra_options`
- Merged **after** structured options; keys **already** set by `g`, `crf`, `preset`, etc. are **not** replaced by `extra_options`.
- Values are strings or numbers as FFmpeg expects; numeric values are validated when the codec exposes option metadata.
---
## Example
```bash
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/tty.usbmodem58760431541 \
--robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
--robot.id=black \
--teleop.type=so100_leader \
--teleop.port=/dev/tty.usbmodem58760431551 \
--teleop.id=blue \
--dataset.repo_id=<my_username>/<my_dataset_name> \
--dataset.num_episodes=2 \
--dataset.single_task="Grab the cube" \
--dataset.streaming_encoding=true \
--dataset.encoder_threads=2 \
--dataset.camera_encoder_config.vcodec=h264 \
--dataset.camera_encoder_config.preset=fast \
--dataset.camera_encoder_config.extra_options={"tune": "film", "profile:v": "high", "bf": 2} \
--display_data=true
```
+6 -5
View File
@@ -14,10 +14,12 @@
"""Shared dataset recording configuration used by both ``lerobot-record`` and ``lerobot-rollout``.""" """Shared dataset recording configuration used by both ``lerobot-record`` and ``lerobot-rollout``."""
from dataclasses import dataclass from dataclasses import dataclass, field
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
from lerobot.datasets.video_utils import VideoEncoderConfig, camera_encoder_defaults
@dataclass @dataclass
class DatasetRecordConfig: class DatasetRecordConfig:
@@ -55,10 +57,9 @@ class DatasetRecordConfig:
# Number of episodes to record before batch encoding videos # Number of episodes to record before batch encoding videos
# Set to 1 for immediate encoding (default behavior), or higher for batched encoding # Set to 1 for immediate encoding (default behavior), or higher for batched encoding
video_encoding_batch_size: int = 1 video_encoding_batch_size: int = 1
# Video codec for encoding videos. Options: 'h264', 'hevc', 'libsvtav1', 'auto', # Video encoder settings for camera MP4s (codec, quality, GOP, etc.). Tuned via CLI nested keys,
# or hardware-specific: 'h264_videotoolbox', 'h264_nvenc', 'h264_vaapi', 'h264_qsv'. # e.g. ``--dataset.camera_encoder_config.vcodec=h264`` (see ``VideoEncoderConfig``).
# Use 'auto' to auto-detect the best available hardware encoder. camera_encoder_config: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)
vcodec: str = "libsvtav1"
# Enable streaming video encoding: encode frames in real-time during capture instead # Enable streaming video encoding: encode frames in real-time during capture instead
# of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
streaming_encoding: bool = False streaming_encoding: bool = False
+19 -9
View File
@@ -62,7 +62,12 @@ from .utils import (
DEFAULT_EPISODES_PATH, DEFAULT_EPISODES_PATH,
update_chunk_file_indices, update_chunk_file_indices,
) )
from .video_utils import VideoEncoderConfig, encode_video_frames, get_video_info from .video_utils import (
VideoEncoderConfig,
camera_encoder_defaults,
encode_video_frames,
get_video_info,
)
def _load_episode_with_stats(src_dataset: LeRobotDataset, episode_idx: int) -> dict: def _load_episode_with_stats(src_dataset: LeRobotDataset, episode_idx: int) -> dict:
@@ -101,7 +106,8 @@ def delete_episodes(
episode_indices: List of episode indices to delete. episode_indices: List of episode indices to delete.
output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig. output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig.
repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig. repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig.
camera_encoder_config: Video encoder settings used when re-encoding video segments (default: :class:`VideoEncoderConfig()`). camera_encoder_config: Video encoder settings used when re-encoding video segments
(``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`).
""" """
if not episode_indices: if not episode_indices:
raise ValueError("No episodes to delete") raise ValueError("No episodes to delete")
@@ -165,7 +171,8 @@ def split_dataset(
splits: Either a dict mapping split names to episode indices, or a dict mapping splits: Either a dict mapping split names to episode indices, or a dict mapping
split names to fractions (must sum to <= 1.0). split names to fractions (must sum to <= 1.0).
output_dir: Root directory where the split datasets will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. output_dir: Root directory where the split datasets will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id.
camera_encoder_config: Video encoder settings used when re-encoding video segments (default: :class:`VideoEncoderConfig()`). camera_encoder_config: Video encoder settings used when re-encoding video segments
(``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`).
Examples: Examples:
Split by specific episodes Split by specific episodes
@@ -598,10 +605,11 @@ def _keep_episodes_from_video_with_av(
Ranges are half-open intervals: [start_frame, end_frame), where start_frame Ranges are half-open intervals: [start_frame, end_frame), where start_frame
is inclusive and end_frame is exclusive. is inclusive and end_frame is exclusive.
fps: Frame rate of the video. fps: Frame rate of the video.
camera_encoder_config: Video encoder settings (default: :class:`VideoEncoderConfig()`). camera_encoder_config: Video encoder settings
(``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`).
""" """
if camera_encoder_config is None: if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig() camera_encoder_config = camera_encoder_defaults()
from fractions import Fraction from fractions import Fraction
import av import av
@@ -705,13 +713,14 @@ def _copy_and_reindex_videos(
src_dataset: Source dataset to copy from src_dataset: Source dataset to copy from
dst_meta: Destination metadata object dst_meta: Destination metadata object
episode_mapping: Mapping from old episode indices to new indices episode_mapping: Mapping from old episode indices to new indices
camera_encoder_config: Video encoder settings used when re-encoding segments (default: :class:`VideoEncoderConfig()`). camera_encoder_config: Video encoder settings used when re-encoding segments
(``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`).
Returns: Returns:
dict mapping episode index to its video metadata (chunk_index, file_index, timestamps) dict mapping episode index to its video metadata (chunk_index, file_index, timestamps)
""" """
if camera_encoder_config is None: if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig() camera_encoder_config = camera_encoder_defaults()
if src_dataset.meta.episodes is None: if src_dataset.meta.episodes is None:
src_dataset.meta.episodes = load_episodes(src_dataset.meta.root) src_dataset.meta.episodes = load_episodes(src_dataset.meta.root)
@@ -1654,7 +1663,8 @@ def convert_image_to_video_dataset(
dataset: The source LeRobot dataset with images dataset: The source LeRobot dataset with images
output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig. output_dir: Root directory where the edited dataset will be stored. If not specified, defaults to $HF_LEROBOT_HOME/repo_id. Equivalent to new_root in EditDatasetConfig.
repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig. repo_id: Edited dataset identifier. Equivalent to new_repo_id in EditDatasetConfig.
camera_encoder_config: Video encoder settings (default: :class:`VideoEncoderConfig()`). camera_encoder_config: Video encoder settings
(``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`).
episode_indices: List of episode indices to convert (None = all episodes) episode_indices: List of episode indices to convert (None = all episodes)
num_workers: Number of threads for parallel processing (default: 4) num_workers: Number of threads for parallel processing (default: 4)
max_episodes_per_batch: Maximum episodes per video batch to avoid memory issues (None = no limit) max_episodes_per_batch: Maximum episodes per video batch to avoid memory issues (None = no limit)
@@ -1664,7 +1674,7 @@ def convert_image_to_video_dataset(
New LeRobotDataset with images encoded as videos New LeRobotDataset with images encoded as videos
""" """
if camera_encoder_config is None: if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig() camera_encoder_config = camera_encoder_defaults()
# Check that it's an image dataset # Check that it's an image dataset
if len(dataset.meta.video_keys) > 0: if len(dataset.meta.video_keys) > 0:
+4 -2
View File
@@ -53,6 +53,7 @@ from .utils import (
from .video_utils import ( from .video_utils import (
StreamingVideoEncoder, StreamingVideoEncoder,
VideoEncoderConfig, VideoEncoderConfig,
camera_encoder_defaults,
concatenate_video_files, concatenate_video_files,
encode_video_frames, encode_video_frames,
get_video_duration_in_s, get_video_duration_in_s,
@@ -95,7 +96,7 @@ class DatasetWriter:
self, self,
meta: LeRobotDatasetMetadata, meta: LeRobotDatasetMetadata,
root: Path, root: Path,
camera_encoder_config: VideoEncoderConfig, camera_encoder_config: VideoEncoderConfig | None,
encoder_threads: int | None, encoder_threads: int | None,
batch_encoding_size: int, batch_encoding_size: int,
streaming_encoder: StreamingVideoEncoder | None = None, streaming_encoder: StreamingVideoEncoder | None = None,
@@ -108,6 +109,7 @@ class DatasetWriter:
settings, and episode persistence). settings, and episode persistence).
root: Local dataset root directory. root: Local dataset root directory.
camera_encoder_config: Video encoder settings applied to all cameras. camera_encoder_config: Video encoder settings applied to all cameras.
``None`` uses :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`.
encoder_threads: Number of encoder threads (global). ``None`` encoder_threads: Number of encoder threads (global). ``None``
lets the codec decide. lets the codec decide.
batch_encoding_size: Number of episodes to accumulate before batch_encoding_size: Number of episodes to accumulate before
@@ -118,7 +120,7 @@ class DatasetWriter:
""" """
self._meta = meta self._meta = meta
self._root = root self._root = root
self._camera_encoder_config = camera_encoder_config self._camera_encoder_config = camera_encoder_config or camera_encoder_defaults()
self._encoder_threads = encoder_threads self._encoder_threads = encoder_threads
self._batch_encoding_size = batch_encoding_size self._batch_encoding_size = batch_encoding_size
self._streaming_encoder = streaming_encoder self._streaming_encoder = streaming_encoder
+15 -26
View File
@@ -178,8 +178,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
batch_encoding_size (int, optional): Number of episodes to accumulate before batch encoding videos. batch_encoding_size (int, optional): Number of episodes to accumulate before batch encoding videos.
Set to 1 for immediate encoding (default), or higher for batched encoding. Defaults to 1. Set to 1 for immediate encoding (default), or higher for batched encoding. Defaults to 1.
camera_encoder_config (VideoEncoderConfig | None, optional): Video encoder settings for cameras camera_encoder_config (VideoEncoderConfig | None, optional): Video encoder settings for cameras
(codec, quality, etc.). Defaults to (codec, quality, etc.). When ``None``, :func:`~lerobot.datasets.video_utils.camera_encoder_defaults`
:class:`~lerobot.datasets.video_utils.VideoEncoderConfig` defaults when ``None``. is used by the writer.
encoder_threads (int | None, optional): Number of encoder threads (global). ``None`` lets the encoder_threads (int | None, optional): Number of encoder threads (global). ``None`` lets the
codec decide. codec decide.
streaming_encoding (bool, optional): If True, encode video frames in real-time during capture streaming_encoding (bool, optional): If True, encode video frames in real-time during capture
@@ -204,9 +204,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
self._video_backend = video_backend if video_backend else get_safe_default_video_backend() self._video_backend = video_backend if video_backend else get_safe_default_video_backend()
self._return_uint8 = return_uint8 self._return_uint8 = return_uint8
self._batch_encoding_size = batch_encoding_size self._batch_encoding_size = batch_encoding_size
if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig()
self._camera_encoder_config = camera_encoder_config
self._encoder_threads = encoder_threads self._encoder_threads = encoder_threads
if self._requested_root is not None: if self._requested_root is not None:
@@ -253,15 +250,15 @@ class LeRobotDataset(torch.utils.data.Dataset):
if streaming_encoding and len(self.meta.video_keys) > 0: if streaming_encoding and len(self.meta.video_keys) > 0:
streaming_enc = self._build_streaming_encoder( streaming_enc = self._build_streaming_encoder(
self.meta.fps, self.meta.fps,
self._camera_encoder_config, camera_encoder_config,
self._encoder_threads,
encoder_queue_maxsize, encoder_queue_maxsize,
encoder_threads,
) )
self.writer = DatasetWriter( self.writer = DatasetWriter(
meta=self.meta, meta=self.meta,
root=self.root, root=self.root,
camera_encoder_config=self._camera_encoder_config, camera_encoder_config=camera_encoder_config,
encoder_threads=self._encoder_threads, encoder_threads=encoder_threads,
batch_encoding_size=batch_encoding_size, batch_encoding_size=batch_encoding_size,
streaming_encoder=streaming_enc, streaming_encoder=streaming_enc,
initial_frames=self.meta.total_frames, initial_frames=self.meta.total_frames,
@@ -302,15 +299,15 @@ class LeRobotDataset(torch.utils.data.Dataset):
@staticmethod @staticmethod
def _build_streaming_encoder( def _build_streaming_encoder(
fps: int, fps: int,
camera_encoder_config: VideoEncoderConfig, camera_encoder_config: VideoEncoderConfig | None,
encoder_threads: int | None,
encoder_queue_maxsize: int, encoder_queue_maxsize: int,
encoder_threads: int | None,
) -> StreamingVideoEncoder: ) -> StreamingVideoEncoder:
return StreamingVideoEncoder( return StreamingVideoEncoder(
fps=fps, fps=fps,
camera_encoder_config=camera_encoder_config, camera_encoder_config=camera_encoder_config,
encoder_threads=encoder_threads,
queue_maxsize=encoder_queue_maxsize, queue_maxsize=encoder_queue_maxsize,
encoder_threads=encoder_threads,
) )
# ── Metadata properties ─────────────────────────────────────────── # ── Metadata properties ───────────────────────────────────────────
@@ -656,9 +653,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
video_backend: Video decoding backend (used when reading back). video_backend: Video decoding backend (used when reading back).
batch_encoding_size: Number of episodes to accumulate before batch_encoding_size: Number of episodes to accumulate before
batch-encoding videos. ``1`` means encode immediately. batch-encoding videos. ``1`` means encode immediately.
camera_encoder_config: Video encoder settings for cameras; defaults camera_encoder_config: Video encoder settings for cameras (codec, quality, etc.).
match :class:`~lerobot.datasets.video_utils.VideoEncoderConfig` When ``None``, :func:`~lerobot.datasets.video_utils.camera_encoder_defaults` is used.
when ``None``.
encoder_threads: Number of encoder threads (global). ``None`` encoder_threads: Number of encoder threads (global). ``None``
lets the codec decide. lets the codec decide.
metadata_buffer_size: Number of episode metadata records to buffer metadata_buffer_size: Number of episode metadata records to buffer
@@ -671,8 +667,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
Returns: Returns:
A new :class:`LeRobotDataset` in write mode. A new :class:`LeRobotDataset` in write mode.
""" """
if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig()
obj = cls.__new__(cls) obj = cls.__new__(cls)
obj.meta = LeRobotDatasetMetadata.create( obj.meta = LeRobotDatasetMetadata.create(
repo_id=repo_id, repo_id=repo_id,
@@ -696,7 +690,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
obj._video_backend = video_backend if video_backend is not None else get_safe_default_video_backend() obj._video_backend = video_backend if video_backend is not None else get_safe_default_video_backend()
obj._return_uint8 = False obj._return_uint8 = False
obj._batch_encoding_size = batch_encoding_size obj._batch_encoding_size = batch_encoding_size
obj._camera_encoder_config = camera_encoder_config
obj._encoder_threads = encoder_threads obj._encoder_threads = encoder_threads
# Reader is lazily created on first access (write-only mode) # Reader is lazily created on first access (write-only mode)
@@ -705,7 +698,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
streaming_enc = None streaming_enc = None
if streaming_encoding and len(obj.meta.video_keys) > 0: if streaming_encoding and len(obj.meta.video_keys) > 0:
streaming_enc = cls._build_streaming_encoder( streaming_enc = cls._build_streaming_encoder(
fps, camera_encoder_config, encoder_threads, encoder_queue_maxsize fps, camera_encoder_config, encoder_queue_maxsize, encoder_threads
) )
obj.writer = DatasetWriter( obj.writer = DatasetWriter(
meta=obj.meta, meta=obj.meta,
@@ -761,9 +754,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
video_backend: Video decoding backend for reading back data. video_backend: Video decoding backend for reading back data.
batch_encoding_size: Number of episodes to accumulate before batch_encoding_size: Number of episodes to accumulate before
batch-encoding videos. batch-encoding videos.
camera_encoder_config: Video encoder settings for cameras; defaults camera_encoder_config: Video encoder settings for cameras (codec, quality, etc.).
match :class:`~lerobot.datasets.video_utils.VideoEncoderConfig` When ``None``, :func:`~lerobot.datasets.video_utils.camera_encoder_defaults` is used.
when ``None``.
encoder_threads: Number of encoder threads (global). ``None`` encoder_threads: Number of encoder threads (global). ``None``
lets the codec decide. lets the codec decide.
image_writer_processes: Subprocesses for async image writing. image_writer_processes: Subprocesses for async image writing.
@@ -801,9 +793,6 @@ class LeRobotDataset(torch.utils.data.Dataset):
obj.repo_id, obj._requested_root, obj.revision, force_cache_sync=force_cache_sync obj.repo_id, obj._requested_root, obj.revision, force_cache_sync=force_cache_sync
) )
if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig()
obj._camera_encoder_config = camera_encoder_config
obj._encoder_threads = encoder_threads obj._encoder_threads = encoder_threads
obj.root = obj.meta.root obj.root = obj.meta.root
@@ -813,7 +802,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
streaming_enc = None streaming_enc = None
if streaming_encoding and len(obj.meta.video_keys) > 0: if streaming_encoding and len(obj.meta.video_keys) > 0:
streaming_enc = cls._build_streaming_encoder( streaming_enc = cls._build_streaming_encoder(
obj.meta.fps, camera_encoder_config, encoder_threads, encoder_queue_maxsize obj.meta.fps, camera_encoder_config, encoder_queue_maxsize, encoder_threads
) )
obj.writer = DatasetWriter( obj.writer = DatasetWriter(
meta=obj.meta, meta=obj.meta,
+2 -6
View File
@@ -28,7 +28,7 @@ from typing import TYPE_CHECKING, Any
import av import av
if TYPE_CHECKING: if TYPE_CHECKING:
from lerobot.datasets.video_utils import VideoEncoderConfig from .video_utils import VideoEncoderConfig
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -177,10 +177,6 @@ def check_video_encoder_config_pyav(config: VideoEncoderConfig) -> None:
vcodec = config.vcodec vcodec = config.vcodec
options = _get_codec_options_by_name(vcodec) options = _get_codec_options_by_name(vcodec)
if not options: if not options:
logger.warning( raise ValueError(f"Codec {vcodec!r} is not available in the bundled FFmpeg build")
"Codec %r is not available in the bundled FFmpeg build; ",
vcodec,
)
return
_check_pixel_format(config.vcodec, config.pix_fmt) _check_pixel_format(config.vcodec, config.pix_fmt)
_check_codec_options(config.vcodec, config.get_codec_options(), config) _check_codec_options(config.vcodec, config.get_codec_options(), config)
+18 -18
View File
@@ -37,17 +37,18 @@ import torchvision
from datasets.features.features import register_feature from datasets.features.features import register_feature
from PIL import Image from PIL import Image
from lerobot.datasets.pyav_utils import ( from lerobot.utils.import_utils import get_safe_default_video_backend
from .pyav_utils import (
check_video_encoder_config_pyav, check_video_encoder_config_pyav,
detect_available_encoders_pyav, detect_available_encoders_pyav,
) )
from lerobot.utils.import_utils import get_safe_default_video_backend
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# List of hardware encoders to probe for auto-selection. Availability depends on the platform and FFmpeg build. # List of hardware encoders to probe for auto-selection. Availability depends on the platform and FFmpeg build.
# Determines the order of preference for auto-selection when vcodec="auto" is used. # Determines the order of preference for auto-selection when vcodec="auto" is used.
HW_ENCODERS = [ HW_VIDEO_CODECS = [
"h264_videotoolbox", # macOS "h264_videotoolbox", # macOS
"hevc_videotoolbox", # macOS "hevc_videotoolbox", # macOS
"h264_nvenc", # NVIDIA GPU "h264_nvenc", # NVIDIA GPU
@@ -56,7 +57,7 @@ HW_ENCODERS = [
"h264_qsv", # Intel Quick Sync "h264_qsv", # Intel Quick Sync
] ]
VALID_VIDEO_CODECS = {"h264", "hevc", "libsvtav1", "auto"} | set(HW_ENCODERS) VALID_VIDEO_CODECS = {"h264", "hevc", "libsvtav1", "auto"} | set(HW_VIDEO_CODECS)
LIBSVTAV1_DEFAULT_PRESET: int = 12 LIBSVTAV1_DEFAULT_PRESET: int = 12
@@ -116,33 +117,33 @@ class VideoEncoderConfig:
check_video_encoder_config_pyav(self) check_video_encoder_config_pyav(self)
def resolve_vcodec(self) -> None: def resolve_vcodec(self) -> None:
"""Validate vcodec and resolve 'auto' to best available HW encoder, fallback to libsvtav1. """Check ``vcodec`` and, when it is ``"auto"``, pick a concrete encoder.
Any explicitly-requested codec that isn't in the local FFmpeg build is For ``"auto"``, the first hardware encoder in the preference list that FFmpeg
also silently rewritten to ``libsvtav1`` so encoding never hard-fails on exposes is chosen; if none are available, ``libsvtav1`` is used. If the
a host missing the requested encoder. resolved codec (explicit or after auto-selection) is not present in the
local FFmpeg build, raises ``ValueError``.
""" """
if self.vcodec not in VALID_VIDEO_CODECS: if self.vcodec not in VALID_VIDEO_CODECS:
raise ValueError(f"Invalid vcodec '{self.vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}") raise ValueError(f"Invalid vcodec '{self.vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
if self.vcodec == "auto": if self.vcodec == "auto":
available = self.detect_available_encoders(HW_ENCODERS) available = self.detect_available_encoders(HW_VIDEO_CODECS)
for encoder in HW_ENCODERS: for encoder in HW_VIDEO_CODECS:
if encoder in available: if encoder in available:
logger.info(f"Auto-selected video codec: {encoder}") logger.info(f"Auto-selected video codec: {encoder}")
self.vcodec = encoder self.vcodec = encoder
return return
logger.info("No hardware encoder available, falling back to software encoder 'libsvtav1'") logger.warning("No hardware encoder available, falling back to software encoder 'libsvtav1'")
self.vcodec = "libsvtav1" self.vcodec = "libsvtav1"
if self.detect_available_encoders(self.vcodec): if self.detect_available_encoders(self.vcodec):
logger.info(f"Using video codec: {self.vcodec}") logger.info(f"Using video codec: {self.vcodec}")
self.vcodec = self.vcodec
return return
raise ValueError(f"Unsupported video codec: {self.vcodec} with video backend {self.video_backend}") raise ValueError(f"Unsupported video codec: {self.vcodec} with video backend {self.video_backend}")
def get_codec_options( def get_codec_options(
self, encoder_threads: int | None = None, as_strings: bool = False self, encoder_threads: int | None = None, as_strings: bool = False
) -> dict[str, str]: ) -> dict[str, Any]:
"""Translate the tuning fields to codec-specific FFmpeg options. """Translate the tuning fields to codec-specific FFmpeg options.
``VideoEncoderConfig.extra_options`` are merged last but never override a structured field. ``VideoEncoderConfig.extra_options`` are merged last but never override a structured field.
@@ -498,7 +499,7 @@ def encode_video_frames(
) -> None: ) -> None:
"""More info on ffmpeg arguments tuning on `benchmark/video/README.md`""" """More info on ffmpeg arguments tuning on `benchmark/video/README.md`"""
if camera_encoder_config is None: if camera_encoder_config is None:
camera_encoder_config = VideoEncoderConfig() camera_encoder_config = camera_encoder_defaults()
vcodec = camera_encoder_config.vcodec vcodec = camera_encoder_config.vcodec
pix_fmt = camera_encoder_config.pix_fmt pix_fmt = camera_encoder_config.pix_fmt
@@ -794,22 +795,21 @@ class StreamingVideoEncoder:
self, self,
fps: int, fps: int,
camera_encoder_config: VideoEncoderConfig | None = None, camera_encoder_config: VideoEncoderConfig | None = None,
encoder_threads: int | None = None,
*,
queue_maxsize: int = 30, queue_maxsize: int = 30,
encoder_threads: int | None = None,
): ):
""" """
Args: Args:
fps: Frames per second for the output videos. fps: Frames per second for the output videos.
camera_encoder_config: Video encoder settings applied to all cameras. camera_encoder_config: Video encoder settings applied to all cameras.
When ``None``, :class:`VideoEncoderConfig` defaults are used. When ``None``, :func:`camera_encoder_defaults` is used.
encoder_threads: Number of encoder threads (global setting). encoder_threads: Number of encoder threads (global setting).
``None`` lets the codec decide. ``None`` lets the codec decide.
queue_maxsize: Max frames to buffer per camera before queue_maxsize: Max frames to buffer per camera before
back-pressure drops frames. back-pressure drops frames.
""" """
self.fps = fps self.fps = fps
self._camera_encoder_config = camera_encoder_config or VideoEncoderConfig() self._camera_encoder_config = camera_encoder_config or camera_encoder_defaults()
self._encoder_threads = encoder_threads self._encoder_threads = encoder_threads
self.queue_maxsize = queue_maxsize self.queue_maxsize = queue_maxsize
+2 -2
View File
@@ -332,7 +332,7 @@ def build_rollout_context(
cfg.dataset.repo_id, cfg.dataset.repo_id,
root=cfg.dataset.root, root=cfg.dataset.root,
batch_encoding_size=cfg.dataset.video_encoding_batch_size, batch_encoding_size=cfg.dataset.video_encoding_batch_size,
vcodec=cfg.dataset.vcodec, camera_encoder_config=cfg.dataset.camera_encoder_config,
streaming_encoding=cfg.dataset.streaming_encoding, streaming_encoding=cfg.dataset.streaming_encoding,
encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize, encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
encoder_threads=cfg.dataset.encoder_threads, encoder_threads=cfg.dataset.encoder_threads,
@@ -367,7 +367,7 @@ def build_rollout_context(
image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera
* len(robot.cameras if hasattr(robot, "cameras") else []), * len(robot.cameras if hasattr(robot, "cameras") else []),
batch_encoding_size=cfg.dataset.video_encoding_batch_size, batch_encoding_size=cfg.dataset.video_encoding_batch_size,
vcodec=cfg.dataset.vcodec, camera_encoder_config=cfg.dataset.camera_encoder_config,
streaming_encoding=cfg.dataset.streaming_encoding, streaming_encoding=cfg.dataset.streaming_encoding,
encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize, encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
encoder_threads=cfg.dataset.encoder_threads, encoder_threads=cfg.dataset.encoder_threads,
+14 -10
View File
@@ -298,7 +298,7 @@ class TestEncoderDetection:
@require_videotoolbox @require_videotoolbox
def test_auto_picks_videotoolbox_when_available(self): def test_auto_picks_videotoolbox_when_available(self):
"""``h264_videotoolbox`` sits at the top of ``HW_ENCODERS`` so it wins when present.""" """``h264_videotoolbox`` sits at the top of ``HW_VIDEO_CODECS`` so it wins when present."""
cfg = VideoEncoderConfig(vcodec="auto") cfg = VideoEncoderConfig(vcodec="auto")
assert cfg.vcodec == "h264_videotoolbox" assert cfg.vcodec == "h264_videotoolbox"
@@ -312,7 +312,7 @@ class TestEncoderDetection:
assert "h264_nvenc" in VALID_VIDEO_CODECS assert "h264_nvenc" in VALID_VIDEO_CODECS
ARTIFACTS = Path(__file__).parent.parent / "fixtures" / "artifacts" / "videos" TEST_ARTIFACTS_DIR = Path(__file__).parent.parent / "artifacts" / "encoded_videos"
# Default video feature set used by persistence tests. # Default video feature set used by persistence tests.
VIDEO_FEATURES = { VIDEO_FEATURES = {
@@ -361,7 +361,7 @@ def _add_frames(dataset: LeRobotDataset, num_frames: int) -> None:
class TestGetVideoInfo: class TestGetVideoInfo:
def test_returns_all_stream_fields(self): def test_returns_all_stream_fields(self):
info = get_video_info(ARTIFACTS / "clip_4frames.mp4") info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4")
assert info["video.height"] == 64 assert info["video.height"] == 64
assert info["video.width"] == 96 assert info["video.width"] == 96
@@ -378,7 +378,7 @@ class TestGetVideoInfo:
def test_merges_encoder_config_as_video_prefixed_entries(self): def test_merges_encoder_config_as_video_prefixed_entries(self):
cfg = VideoEncoderConfig(vcodec="libsvtav1", g=2, crf=30, preset=12) cfg = VideoEncoderConfig(vcodec="libsvtav1", g=2, crf=30, preset=12)
info = get_video_info(ARTIFACTS / "clip_4frames.mp4", camera_encoder_config=cfg) info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder_config=cfg)
assert info["video.g"] == 2 assert info["video.g"] == 2
assert info["video.crf"] == 30 assert info["video.crf"] == 30
@@ -391,7 +391,7 @@ class TestGetVideoInfo:
def test_stream_derived_keys_take_precedence_over_config(self): def test_stream_derived_keys_take_precedence_over_config(self):
cfg = VideoEncoderConfig(vcodec="libsvtav1", pix_fmt="yuv420p") cfg = VideoEncoderConfig(vcodec="libsvtav1", pix_fmt="yuv420p")
info = get_video_info(ARTIFACTS / "clip_4frames.mp4", camera_encoder_config=cfg) info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder_config=cfg)
assert info["video.codec"] # populated from stream, not from config's vcodec assert info["video.codec"] # populated from stream, not from config's vcodec
assert info["video.pix_fmt"] == "yuv420p" assert info["video.pix_fmt"] == "yuv420p"
@@ -478,7 +478,9 @@ class TestConcatenateVideoFiles:
def test_two_clips_frame_count(self, tmp_path): def test_two_clips_frame_count(self, tmp_path):
"""Output frame count equals the sum of the two input frame counts.""" """Output frame count equals the sum of the two input frame counts."""
out = tmp_path / "out.mp4" out = tmp_path / "out.mp4"
concatenate_video_files([ARTIFACTS / "clip_6frames.mp4", ARTIFACTS / "clip_4frames.mp4"], out) concatenate_video_files(
[TEST_ARTIFACTS_DIR / "clip_6frames.mp4", TEST_ARTIFACTS_DIR / "clip_4frames.mp4"], out
)
with av.open(str(out)) as container: with av.open(str(out)) as container:
total = sum(1 for _ in container.decode(video=0)) total = sum(1 for _ in container.decode(video=0))
@@ -486,7 +488,7 @@ class TestConcatenateVideoFiles:
def test_three_clips_frame_count(self, tmp_path): def test_three_clips_frame_count(self, tmp_path):
out = tmp_path / "out.mp4" out = tmp_path / "out.mp4"
clip = ARTIFACTS / "clip_5frames.mp4" clip = TEST_ARTIFACTS_DIR / "clip_5frames.mp4"
concatenate_video_files([clip, clip, clip], out) concatenate_video_files([clip, clip, clip], out)
with av.open(str(out)) as container: with av.open(str(out)) as container:
@@ -497,7 +499,9 @@ class TestConcatenateVideoFiles:
def test_geometry_preserved(self, tmp_path): def test_geometry_preserved(self, tmp_path):
"""Output resolution, fps, codec and pixel format must match the inputs.""" """Output resolution, fps, codec and pixel format must match the inputs."""
out = tmp_path / "out.mp4" out = tmp_path / "out.mp4"
concatenate_video_files([ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_4frames.mp4"], out) concatenate_video_files(
[TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_4frames.mp4"], out
)
info = get_video_info(out) info = get_video_info(out)
assert info["video.height"] == 64 assert info["video.height"] == 64
@@ -509,7 +513,7 @@ class TestConcatenateVideoFiles:
def test_compatibility_check_raises_on_different_codec(self, tmp_path): def test_compatibility_check_raises_on_different_codec(self, tmp_path):
with pytest.raises(ValueError): with pytest.raises(ValueError):
concatenate_video_files( concatenate_video_files(
[ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_h264.mp4"], [TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_h264.mp4"],
tmp_path / "out.mp4", tmp_path / "out.mp4",
compatibility_check=True, compatibility_check=True,
) )
@@ -517,7 +521,7 @@ class TestConcatenateVideoFiles:
def test_compatibility_check_raises_on_different_resolution(self, tmp_path): def test_compatibility_check_raises_on_different_resolution(self, tmp_path):
with pytest.raises(ValueError): with pytest.raises(ValueError):
concatenate_video_files( concatenate_video_files(
[ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_32x48.mp4"], [TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_32x48.mp4"],
tmp_path / "out.mp4", tmp_path / "out.mp4",
compatibility_check=True, compatibility_check=True,
) )