From 096fdd3ea5118daebe6788af23c7b419d1c29b7b Mon Sep 17 00:00:00 2001 From: CarolinePascal Date: Thu, 2 Jul 2026 19:16:20 +0200 Subject: [PATCH] chore(format): formatting code --- docs/source/video_encoding_parameters.mdx | 206 +++++++++++++++++----- 1 file changed, 164 insertions(+), 42 deletions(-) diff --git a/docs/source/video_encoding_parameters.mdx b/docs/source/video_encoding_parameters.mdx index b3ffcf86c..eb0d1f7f4 100644 --- a/docs/source/video_encoding_parameters.mdx +++ b/docs/source/video_encoding_parameters.mdx @@ -67,27 +67,90 @@ All flags below are prefixed with `--dataset.rgb_encoder.` on the CLI. Depth maps (Intel RealSense, Reachy 2) are stored as their **own video streams** alongside the RGB streams. Raw depth (`uint16` millimetres or `float32` metres) can't survive an 8-bit codec, so LeRobot **quantizes** each map to a 12-bit code (`[0, 4095]`) — logarithmically by default, to match the `1/depth` error profile of depth sensors — then packs it into a high-bit-depth pixel format (`gray12le`) and encodes it with a 12-bit codec.
-
- Raw depthuint16 mm
float32 m
- -
- Record time - Clipto [depth_min,
depth_max]
- - Quantize12-bit codes 0–4095
log (default) or linear
- - Packinto gray12le
plane
- - EncodeHEVC
Main 12
+
+ + Raw depth + + uint16 mm +
+ float32 m +
+
+ + → + +
+ + Record time + + + Clip + + to [depth_min, +
+ depth_max] +
+
+ + → + + + Quantize + + 12-bit codes 0–4095 +
+ log (default) or linear +
+
+ + → + + + Pack + + into gray12le +
+ plane +
+
+ + → + + + Encode + + HEVC +
+ Main 12 +
+
+
+ + → + + + MP4 + + stored +
+ stream +
+
+ + → + +
+ + Load time + + + Dequantize + + to mm / m + + +
- - MP4stored
stream
- -
- Load time - Dequantizeto mm / m -
-
Configure the depth pipeline through a parallel **`depth_encoder`** block (`DepthEncoderConfig`). It shares every `RGBEncoderConfig` field (`vcodec`, `pix_fmt`, `crf`, …) and adds four quantizer knobs, set via `--dataset.depth_encoder.`: @@ -171,32 +234,69 @@ Two sources contribute to the `info` block:
-
Stream-derived
+
+ Stream-derived +
-
Read back from the encoded MP4 with PyAV.
+
+ Read back from the encoded MP4 with PyAV. +
- video.height - video.width - video.codec - video.pix_fmt - video.fps - video.channels - is_depth_map - audio.* + + video.height + + + video.width + + + video.codec + + + video.pix_fmt + + + video.fps + + + video.channels + + + is_depth_map + + + audio.* +
-
Encoder-derived
+
+ Encoder-derived +
-
Taken from RGBEncoderConfig / DepthEncoderConfig.
+
+ Taken from RGBEncoderConfig /{" "} + DepthEncoderConfig. +
- video.g - video.crf - video.preset - video.fast_decode - video.video_backend - video.extra_options + + video.g + + + video.crf + + + video.preset + + + video.fast_decode + + + video.video_backend + + + video.extra_options +
@@ -216,11 +316,33 @@ When aggregating datasets with `merge_datasets`, video files are concatenated as
- Must match - Stream-derived fields — video.codec, video.pix_fmt, video.height, video.width, video.fps — must match across sources, otherwise FFmpeg's concat demuxer fails. + + Must match + + + Stream-derived fields — video.codec,{" "} + video.pix_fmt,{" "} + video.height,{" "} + video.width,{" "} + video.fps — must match across + sources, otherwise FFmpeg's concat demuxer fails. +
- Merged loosely - Encoder-tuning fields — video.g, video.crf, video.preset, video.fast_decode, video.extra_options. If every source agrees, the value is kept; if not, it's set to null (or {} for video.extra_options) and a warning is logged. + + Merged loosely + + + Encoder-tuning fields — video.g,{" "} + video.crf,{" "} + video.preset,{" "} + video.fast_decode,{" "} + video.extra_options. If every source + agrees, the value is kept; if not, it's set to{" "} + null (or{" "} + {} for{" "} + video.extra_options) and a warning is + logged. +