mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
Compare commits
5 Commits
9b3c752b64
...
a3c670b987
| Author | SHA1 | Date | |
|---|---|---|---|
| a3c670b987 | |||
| 8cd74ea8b8 | |||
| be1180b240 | |||
| fff6bc1a93 | |||
| 141304ac78 |
@@ -33,6 +33,8 @@
|
||||
title: Using the Dataset Tools
|
||||
- local: dataset_subtask
|
||||
title: Using Subtasks in the Dataset
|
||||
- local: video_encoding_parameters
|
||||
title: Video encoding parameters
|
||||
- local: streaming_video_encoding
|
||||
title: Streaming Video Encoding
|
||||
title: "Datasets"
|
||||
|
||||
@@ -14,22 +14,12 @@ This makes `save_episode()` near-instant (the video is already encoded by the ti
|
||||
|
||||
## 2. Tuning Parameters
|
||||
|
||||
All encoding parameters are grouped under `camera_encoder_config` (a `VideoEncoderConfig` dataclass), accessible from the CLI via `--dataset.camera_encoder_config.<field>`.
|
||||
|
||||
| Parameter | CLI Flag | Type | Default | Description |
|
||||
| ----------------------- | --------------------------------------------- | ------------- | ------------- | ------------------------------------------------------------------- |
|
||||
| `streaming_encoding` | `--dataset.streaming_encoding` | `bool` | `True` | Enable real-time encoding during capture |
|
||||
| `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder |
|
||||
| `pix_fmt` | `--dataset.camera_encoder_config.pix_fmt` | `str` | `"yuv420p"` | Pixel format |
|
||||
| `g` | `--dataset.camera_encoder_config.g` | `int \| None` | `2` | GOP size (keyframe interval) |
|
||||
| `crf` | `--dataset.camera_encoder_config.crf` | `int \| None` | `30` | Quality level (mapped to codec-specific parameter) |
|
||||
| `preset` | `--dataset.camera_encoder_config.preset` | `int \| None` | `12` | Speed preset (libsvtav1 only, 0 = slowest … 13 = fastest) |
|
||||
| `fast_decode` | `--dataset.camera_encoder_config.fast_decode` | `int` | `0` | Fast-decode tuning level |
|
||||
| `encoder_threads` | `--dataset.encoder_threads` | `int \| None` | `None` (auto) | Threads per encoder instance (global). `None` lets the codec decide |
|
||||
| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `60` | Max buffered frames per camera (~2s at 30fps). Consumes RAM |
|
||||
|
||||
> [!TIP]
|
||||
> Not all parameters apply to every codec. `VideoEncoderConfig` will warn at startup if you set a parameter that your chosen codec ignores (e.g. `preset` with `h264_nvenc`).
|
||||
| Parameter | CLI Flag | Type | Default | Description |
|
||||
| ----------------------- | ---------------------------------------- | ------------- | ------------- | ----------------------------------------------------------------- |
|
||||
| `streaming_encoding` | `--dataset.streaming_encoding` | `bool` | `True` | Enable real-time encoding during capture |
|
||||
| `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder |
|
||||
| `encoder_threads` | `--dataset.encoder_threads` | `int \| None` | `None` (auto) | Threads per encoder instance. `None` will leave the vcoded decide |
|
||||
| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `60` | Max buffered frames per camera (~2s at 30fps). Consumes RAM |
|
||||
|
||||
## 3. Performance Considerations
|
||||
|
||||
@@ -50,7 +40,7 @@ Streaming encoding means the CPU is encoding video **during** the capture loop,
|
||||
|
||||
### `encoder_threads` Tuning
|
||||
|
||||
This parameter (`--dataset.encoder_threads`) controls how many threads each encoder instance uses internally:
|
||||
This parameter controls how many threads each encoder instance uses internally:
|
||||
|
||||
- **Higher values** (e.g., 4-5): Faster encoding, but uses more CPU cores per camera. Good for high-end systems with many cores.
|
||||
- **Lower values** (e.g., 1-2): Less CPU per camera, freeing cores for capture and visualization. Good for low-res images and capable CPUs.
|
||||
@@ -58,7 +48,7 @@ This parameter (`--dataset.encoder_threads`) controls how many threads each enco
|
||||
|
||||
### Backpressure and Frame Dropping
|
||||
|
||||
Each camera has a bounded queue (`encoder_queue_maxsize`, default 60 frames). When the encoder can't keep up:
|
||||
Each camera has a bounded queue (`encoder_queue_maxsize`, default 30 frames). When the encoder can't keep up:
|
||||
|
||||
1. The queue fills up (consuming RAM)
|
||||
2. New frames are **dropped** (not blocked) — the capture loop continues uninterrupted
|
||||
@@ -162,4 +152,4 @@ lerobot-record --dataset.camera_encoder_config.vcodec=h264 --dataset.streaming_e
|
||||
## 7. Closing note
|
||||
|
||||
Performance ultimately depends on your exact setup — frames-per-second, resolution, CPU cores and load, available memory, episode length, and the encoder you choose. Always test with your target workload, be mindful about your CPU & system capabilities and tune `encoder_threads`, `encoder_queue_maxsize`, and
|
||||
`camera_encoder_config.vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration.
|
||||
`vcodec` reasonably. That said, a common practical configuration (for many applications) is three cameras at 640×480x3 @30fps; this usually runs fine with the default streaming video encoding settings in modern systems. Always verify your recorded dataset is healthy by comparing the video duration to the CLI episode duration and confirming the row count equals FPS × CLI duration.
|
||||
|
||||
@@ -147,14 +147,7 @@ lerobot-edit-dataset \
|
||||
**Parameters:**
|
||||
|
||||
- `output_dir`: Custom output directory (optional - by default uses `new_repo_id` or `{repo_id}_video`)
|
||||
- `camera_encoder_config`: Video encoder settings — all sub-fields accessible via `--operation.camera_encoder_config.<field>`:
|
||||
- `vcodec`: Video codec — `h264`, `hevc`, `libsvtav1`, `auto`, or hardware codecs (default: `libsvtav1`)
|
||||
- `pix_fmt`: Pixel format — `yuv420p`, `yuv444p` (default: `yuv420p`)
|
||||
- `g`: GOP size — lower values give better quality but larger files (default: 2)
|
||||
- `crf`: Quality level — lower is better, 0 is lossless (default: 30)
|
||||
- `preset`: Speed preset, libsvtav1 only (default: 12)
|
||||
- `fast_decode`: Fast-decode tuning (default: 0)
|
||||
- `encoder_threads`: Threads per encoder instance — global setting, separate from `camera_encoder_config` (default: None)
|
||||
- `camera_encoder_config`: Video encoder settings — all sub-fields accessible via `--operation.camera_encoder_config.<field>. See [Video Encoding Parameters](./video_encoding_parameters) for more details.
|
||||
- `episode_indices`: List of specific episodes to convert (default: all episodes)
|
||||
- `num_workers`: Number of parallel workers for processing (default: 4)
|
||||
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
# Video encoding parameters
|
||||
|
||||
When **video storage** is on, LeRobot stores each camera stream as an **MP4** file rather than saving **every timestep as its own image file**. **Video encoding compress across time**, which usually cuts **dataset size and I/O** compared to heaps of PNGs, and MP4 stays a **familiar format** for players and loaders. Incoding frames into a MP4 file is a full FFmpeg pipeline: choice of encoder, pixel format, GOP/keyframes, quality vs speed, and
|
||||
optional extra encoder flags. **Many of those knobs are user-tunable** and are exposed on the dataset config as
|
||||
**`dataset.camera_encoder_config`** — a nested **`VideoEncoderConfig`** (`lerobot.datasets.video_utils.
|
||||
VideoEncoderConfig`) passed through **PyAV**.
|
||||
|
||||
You can set these parameters from the CLI with **`--dataset.camera_encoder_config.<field>`** (e.g. `lerobot-record`, `lerobot-rollout`). The same block applies to **every** camera video stream in that run. **Video storage must be on** — **`use_videos=True`** in Python APIs or **`--dataset.video=true`** (recording default); with video off, inputs stay as images and **`camera_encoder_config` is ignored.**
|
||||
|
||||
For **when** frames are written vs encoded (streaming vs post-episode), queues, and other top-level **`--dataset.*`** switches, see [Streaming Video Encoding](./streaming_video_encoding). For codec/size/speed experiments, see the [video-benchmark Space](https://huggingface.co/spaces/lerobot/video-benchmark).
|
||||
|
||||
---
|
||||
|
||||
## Tuning Parameters
|
||||
|
||||
| Parameter | CLI flag | Type | Default | Description |
|
||||
| --------------- | ----------------------------------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `vcodec` | `--dataset.camera_encoder_config.vcodec` | `str` | `"libsvtav1"` | Video codec name. `"auto"` picks the first available hardware encoder from a fixed preference list, else `libsvtav1`. |
|
||||
| `pix_fmt` | `--dataset.camera_encoder_config.pix_fmt` | `str` | `"yuv420p"` | Output pixel format; must be supported by the specified codec in your FFmpeg build. |
|
||||
| `g` | `--dataset.camera_encoder_config.g` | `int \| None` | `2` | GOP size (keyframes every `g` frames). Emitted as FFmpeg option `g`. |
|
||||
| `crf` | `--dataset.camera_encoder_config.crf` | `int \| None` | `30` | Abstract **quality**; mapped per codec in the table below (CRF, QP, `q:v`, etc.). Lower → higher quality / larger output where the mapping is monotone. |
|
||||
| `preset` | `--dataset.camera_encoder_config.preset` | `int \| str \| None` | `12`\* | Video encoding speed preset; meaning depends on the specified codec. \*Unset + `libsvtav1` → LeRobot sets `12`. |
|
||||
| `fast_decode` | `--dataset.camera_encoder_config.fast_decode` | `int` | `0` | `libsvtav1`: `0–2` passed in `svtav1-params`; `h264` / `hevc` (software): if `>0`, sets `tune=fastdecode`; other codecs: often unused. |
|
||||
| `video_backend` | `--dataset.camera_encoder_config.video_backend` | `str` | `"pyav"` | Only `"pyav"` is implemented for video encoding today. |
|
||||
| `extra_options` | (nested config / non-scalar) | `dict` | `{}` | Extra FFmpeg options merged after the built-in mapping; **cannot** override keys already set from structured fields above. |
|
||||
|
||||
---
|
||||
|
||||
## Validation
|
||||
|
||||
| What | Behavior |
|
||||
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Video codec presence | `vcodec` must exist as a video encoder in the local FFmpeg build (after resolving `"auto"`). |
|
||||
| Pixel format | `pix_fmt` is checked against the encoder’s reported pixel formats when available. |
|
||||
| Options | `get_codec_options()` output (including values originating from `extra_options`) is checked against PyAV/FFmpeg option metadata (ranges, integer constraints, string choices) where applicable. |
|
||||
|
||||
---
|
||||
|
||||
## Mapping: `VideoEncoderConfig` → FFmpeg options
|
||||
|
||||
From **`get_codec_options()`** after `vcodec` resolution. Only fields on `camera_encoder_config` are listed here (no global thread / queue flags).
|
||||
|
||||
| Resolved `vcodec` | `g` | Quality from `crf` | `preset` | `fast_decode` |
|
||||
| ---------------------------------------- | --- | --------------------------- | -------- | ------------------------------------------ |
|
||||
| `libsvtav1` | `g` | `crf` | `preset` | `svtav1-params` includes `fast-decode=0…2` |
|
||||
| `h264`, `hevc` (software) | `g` | `crf` | `preset` | `tune=fastdecode` if `fast_decode > 0` |
|
||||
| `h264_videotoolbox`, `hevc_videotoolbox` | `g` | `q:v` (derived from `crf`) | — | — |
|
||||
| `h264_nvenc`, `hevc_nvenc` | `g` | `rc=constqp` + `qp` ← `crf` | `preset` | — |
|
||||
| `h264_vaapi` | `g` | `qp` ← `crf` | — | — |
|
||||
| `h264_qsv` | `g` | `global_quality` ← `crf` | `preset` | — |
|
||||
|
||||
---
|
||||
|
||||
## `extra_options`
|
||||
|
||||
- Merged **after** structured options; keys **already** set by `g`, `crf`, `preset`, etc. are **not** replaced by `extra_options`.
|
||||
- Values are strings or numbers as FFmpeg expects; numeric values are validated when the codec exposes option metadata.
|
||||
|
||||
---
|
||||
|
||||
## Example
|
||||
|
||||
```bash
|
||||
lerobot-record \
|
||||
--robot.type=so100_follower \
|
||||
--robot.port=/dev/tty.usbmodem58760431541 \
|
||||
--robot.cameras="{laptop: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
|
||||
--robot.id=black \
|
||||
--teleop.type=so100_leader \
|
||||
--teleop.port=/dev/tty.usbmodem58760431551 \
|
||||
--teleop.id=blue \
|
||||
--dataset.repo_id=<my_username>/<my_dataset_name> \
|
||||
--dataset.num_episodes=2 \
|
||||
--dataset.single_task="Grab the cube" \
|
||||
--dataset.streaming_encoding=true \
|
||||
--dataset.encoder_threads=2 \
|
||||
--dataset.camera_encoder_config.vcodec=h264 \
|
||||
--dataset.camera_encoder_config.preset=fast \
|
||||
--dataset.camera_encoder_config.extra_options={"tune": "film", "profile:v": "high", "bf": 2} \
|
||||
--display_data=true
|
||||
```
|
||||
@@ -251,14 +251,14 @@ class LeRobotDataset(torch.utils.data.Dataset):
|
||||
streaming_enc = self._build_streaming_encoder(
|
||||
self.meta.fps,
|
||||
camera_encoder_config,
|
||||
self._encoder_threads,
|
||||
encoder_queue_maxsize,
|
||||
encoder_threads,
|
||||
)
|
||||
self.writer = DatasetWriter(
|
||||
meta=self.meta,
|
||||
root=self.root,
|
||||
camera_encoder_config=camera_encoder_config,
|
||||
encoder_threads=self._encoder_threads,
|
||||
encoder_threads=encoder_threads,
|
||||
batch_encoding_size=batch_encoding_size,
|
||||
streaming_encoder=streaming_enc,
|
||||
initial_frames=self.meta.total_frames,
|
||||
@@ -300,14 +300,14 @@ class LeRobotDataset(torch.utils.data.Dataset):
|
||||
def _build_streaming_encoder(
|
||||
fps: int,
|
||||
camera_encoder_config: VideoEncoderConfig | None,
|
||||
encoder_threads: int | None,
|
||||
encoder_queue_maxsize: int,
|
||||
encoder_threads: int | None,
|
||||
) -> StreamingVideoEncoder:
|
||||
return StreamingVideoEncoder(
|
||||
fps=fps,
|
||||
camera_encoder_config=camera_encoder_config,
|
||||
encoder_threads=encoder_threads,
|
||||
queue_maxsize=encoder_queue_maxsize,
|
||||
encoder_threads=encoder_threads,
|
||||
)
|
||||
|
||||
# ── Metadata properties ───────────────────────────────────────────
|
||||
@@ -698,7 +698,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
|
||||
streaming_enc = None
|
||||
if streaming_encoding and len(obj.meta.video_keys) > 0:
|
||||
streaming_enc = cls._build_streaming_encoder(
|
||||
fps, camera_encoder_config, encoder_threads, encoder_queue_maxsize
|
||||
fps, camera_encoder_config, encoder_queue_maxsize, encoder_threads
|
||||
)
|
||||
obj.writer = DatasetWriter(
|
||||
meta=obj.meta,
|
||||
@@ -802,7 +802,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
|
||||
streaming_enc = None
|
||||
if streaming_encoding and len(obj.meta.video_keys) > 0:
|
||||
streaming_enc = cls._build_streaming_encoder(
|
||||
obj.meta.fps, camera_encoder_config, encoder_threads, encoder_queue_maxsize
|
||||
obj.meta.fps, camera_encoder_config, encoder_queue_maxsize, encoder_threads
|
||||
)
|
||||
obj.writer = DatasetWriter(
|
||||
meta=obj.meta,
|
||||
|
||||
@@ -28,7 +28,7 @@ from typing import TYPE_CHECKING, Any
|
||||
import av
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from lerobot.datasets.video_utils import VideoEncoderConfig
|
||||
from .video_utils import VideoEncoderConfig
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -37,11 +37,12 @@ import torchvision
|
||||
from datasets.features.features import register_feature
|
||||
from PIL import Image
|
||||
|
||||
from lerobot.datasets.pyav_utils import (
|
||||
from lerobot.utils.import_utils import get_safe_default_video_backend
|
||||
|
||||
from .pyav_utils import (
|
||||
check_video_encoder_config_pyav,
|
||||
detect_available_encoders_pyav,
|
||||
)
|
||||
from lerobot.utils.import_utils import get_safe_default_video_backend
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -794,9 +795,8 @@ class StreamingVideoEncoder:
|
||||
self,
|
||||
fps: int,
|
||||
camera_encoder_config: VideoEncoderConfig | None = None,
|
||||
encoder_threads: int | None = None,
|
||||
*,
|
||||
queue_maxsize: int = 30,
|
||||
encoder_threads: int | None = None,
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
|
||||
@@ -312,7 +312,7 @@ class TestEncoderDetection:
|
||||
assert "h264_nvenc" in VALID_VIDEO_CODECS
|
||||
|
||||
|
||||
ARTIFACTS = Path(__file__).parent.parent / "fixtures" / "artifacts" / "videos"
|
||||
TEST_ARTIFACTS_DIR = Path(__file__).parent.parent / "artifacts" / "encoded_videos"
|
||||
|
||||
# Default video feature set used by persistence tests.
|
||||
VIDEO_FEATURES = {
|
||||
@@ -361,7 +361,7 @@ def _add_frames(dataset: LeRobotDataset, num_frames: int) -> None:
|
||||
|
||||
class TestGetVideoInfo:
|
||||
def test_returns_all_stream_fields(self):
|
||||
info = get_video_info(ARTIFACTS / "clip_4frames.mp4")
|
||||
info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4")
|
||||
|
||||
assert info["video.height"] == 64
|
||||
assert info["video.width"] == 96
|
||||
@@ -378,7 +378,7 @@ class TestGetVideoInfo:
|
||||
def test_merges_encoder_config_as_video_prefixed_entries(self):
|
||||
cfg = VideoEncoderConfig(vcodec="libsvtav1", g=2, crf=30, preset=12)
|
||||
|
||||
info = get_video_info(ARTIFACTS / "clip_4frames.mp4", camera_encoder_config=cfg)
|
||||
info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder_config=cfg)
|
||||
|
||||
assert info["video.g"] == 2
|
||||
assert info["video.crf"] == 30
|
||||
@@ -391,7 +391,7 @@ class TestGetVideoInfo:
|
||||
def test_stream_derived_keys_take_precedence_over_config(self):
|
||||
cfg = VideoEncoderConfig(vcodec="libsvtav1", pix_fmt="yuv420p")
|
||||
|
||||
info = get_video_info(ARTIFACTS / "clip_4frames.mp4", camera_encoder_config=cfg)
|
||||
info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder_config=cfg)
|
||||
|
||||
assert info["video.codec"] # populated from stream, not from config's vcodec
|
||||
assert info["video.pix_fmt"] == "yuv420p"
|
||||
@@ -478,7 +478,9 @@ class TestConcatenateVideoFiles:
|
||||
def test_two_clips_frame_count(self, tmp_path):
|
||||
"""Output frame count equals the sum of the two input frame counts."""
|
||||
out = tmp_path / "out.mp4"
|
||||
concatenate_video_files([ARTIFACTS / "clip_6frames.mp4", ARTIFACTS / "clip_4frames.mp4"], out)
|
||||
concatenate_video_files(
|
||||
[TEST_ARTIFACTS_DIR / "clip_6frames.mp4", TEST_ARTIFACTS_DIR / "clip_4frames.mp4"], out
|
||||
)
|
||||
|
||||
with av.open(str(out)) as container:
|
||||
total = sum(1 for _ in container.decode(video=0))
|
||||
@@ -486,7 +488,7 @@ class TestConcatenateVideoFiles:
|
||||
|
||||
def test_three_clips_frame_count(self, tmp_path):
|
||||
out = tmp_path / "out.mp4"
|
||||
clip = ARTIFACTS / "clip_5frames.mp4"
|
||||
clip = TEST_ARTIFACTS_DIR / "clip_5frames.mp4"
|
||||
concatenate_video_files([clip, clip, clip], out)
|
||||
|
||||
with av.open(str(out)) as container:
|
||||
@@ -497,7 +499,9 @@ class TestConcatenateVideoFiles:
|
||||
def test_geometry_preserved(self, tmp_path):
|
||||
"""Output resolution, fps, codec and pixel format must match the inputs."""
|
||||
out = tmp_path / "out.mp4"
|
||||
concatenate_video_files([ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_4frames.mp4"], out)
|
||||
concatenate_video_files(
|
||||
[TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_4frames.mp4"], out
|
||||
)
|
||||
|
||||
info = get_video_info(out)
|
||||
assert info["video.height"] == 64
|
||||
@@ -509,7 +513,7 @@ class TestConcatenateVideoFiles:
|
||||
def test_compatibility_check_raises_on_different_codec(self, tmp_path):
|
||||
with pytest.raises(ValueError):
|
||||
concatenate_video_files(
|
||||
[ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_h264.mp4"],
|
||||
[TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_h264.mp4"],
|
||||
tmp_path / "out.mp4",
|
||||
compatibility_check=True,
|
||||
)
|
||||
@@ -517,7 +521,7 @@ class TestConcatenateVideoFiles:
|
||||
def test_compatibility_check_raises_on_different_resolution(self, tmp_path):
|
||||
with pytest.raises(ValueError):
|
||||
concatenate_video_files(
|
||||
[ARTIFACTS / "clip_4frames.mp4", ARTIFACTS / "clip_32x48.mp4"],
|
||||
[TEST_ARTIFACTS_DIR / "clip_4frames.mp4", TEST_ARTIFACTS_DIR / "clip_32x48.mp4"],
|
||||
tmp_path / "out.mp4",
|
||||
compatibility_check=True,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user