admin/lerobot

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-11 14:49:43 +00:00

Files

T

Pepijn 3ec7c25e7d speedup stats and encoding

2026-02-06 11:26:27 +01:00

8.2 KiB

Raw Permalink Blame History

Streaming Video Encoding — Encode on the fly during recording

Problem

After each episode, save_episode() blocks for ~79 seconds on a 3-camera setup (3197 frames, 107s episode):

Step	Time
Write 9591 PNGs to disk	~19s
Read PNGs back → compute image stats	~15s
Read PNGs again → encode 3× AV1 videos → delete PNGs	~44.5s
Save parquet + metadata	~0.6s
Total	~79s

The entire pipeline writes frames as temporary PNGs, reads them back twice (stats + encoding), then deletes them. This round-trip is the bottleneck.

Architecture

Before: sequential post-episode pipeline

  Recording loop                          save_episode() — BLOCKS ~79s
 ┌─────────────┐          ┌──────────────────────────────────────────────────────────┐
 │  30fps loop  │          │                                                          │
 │              │  frames  │  frame_buffer ──► write PNGs ──► read PNGs ──► stats     │
 │  camera ─►───┼──► list  │       (~19s)         │              (~15s)               │
 │  teleop      │          │                      ▼                                   │
 │  policy      │          │               read PNGs ──► AV1 encode ──► delete PNGs   │
 │              │          │                        (~44.5s)                           │
 └──────┬───────┘          └──────────────────────────────────────────────────────────┘
        │                                         │
        ▼                                         ▼
   episode ends                             next episode
   (~107s recording)                        (~79s blocked)

Data path: frame → list → PNG disk → read → stats + PNG disk → read → encode → MP4 → delete PNGs

After: streaming pipeline (encodes during recording)

  Recording loop (encoding happens HERE)            save_episode() — ~0.5s
 ┌───────────────────────────────────────┐         ┌──────────────────┐
 │  30fps control loop                   │         │                  │
 │                                       │         │  flush encoders  │
 │  camera ──► frame ─┬─► queue ──► [T1] ├── AV1 ─┤  (already done)  │
 │                    │    queue ──► [T2] ├── AV1 ─┤  ~0.16s          │
 │                    │    queue ──► [T3] ├── AV1 ─┤                  │
 │                    │                  │         │  running stats   │
 │                    └─► downsample ──► │─ stats ─┤  → finalize      │
 │                       RunningQuantile │         │  ~0.01s          │
 │  teleop / policy (never blocked)      │         │                  │
 └───────────────────────────────────────┘         │  save parquet    │
                                                   │  ~0.36s          │
        [T1] [T2] [T3] = encoder threads           └──────────────────┘
        (one per camera, GIL released by PyAV)

Data path: frame → queue → encode → MP4 (zero PNGs, zero re-reads)

Stats computation changes

	Before	After
Method	`compute_episode_stats()` reads all PNGs from disk, decodes them, computes min/max/mean/std/quantiles	`RunningQuantileStats` accumulates stats incrementally per frame during recording
Input	Full-resolution PNGs read back from disk	Downsampled frames (via `auto_downsample_height_width`, ~150×100px) directly from memory
When	After episode ends, inside `save_episode()`	During recording, inside `add_frame()` (~2ms per frame)
Output	`{mean, std, min, max, q01..q99}` shaped `(C,1,1)` in `[0,1]`	Identical shape and scale — `RunningQuantileStats.get_statistics()` → reshape `(C,1,1)` / 255
I/O	Reads 9591 PNGs (~15s)	Zero disk I/O
Numeric features	Computed from episode buffer (unchanged)	Computed from episode buffer (unchanged)

The running stats use the same auto_downsample_height_width function and produce the same statistical keys (mean, std, min, max, count, q01, q10, q50, q90, q99). Video features are excluded from the post-episode compute_episode_stats() call when streaming is active — only numeric features go through that path.

Results

Tested on the same 3-camera setup (2028 frames, 67.6s episode):

Step	Before	After	Speedup
Frame writing (PNGs)	~19s	0s	∞ (eliminated)
Episode stats	~15s	0.01s	1500×
Video encoding	~44.5s	0.16s	278×
Parquet + meta	~0.6s	0.36s	~same
Total `save_episode()`	~79s	0.55s	143×

The video encoding time drops to near-zero because most encoding already happened during recording. finish_episode() only flushes the last few buffered frames.

Per-frame overhead during recording

Operation	Time
`queue.put(frame)` (non-blocking)	~0.01ms
`auto_downsample_height_width`	~0.5ms
`RunningQuantileStats.update`	~1ms
Total per frame	~2ms (well within 33ms budget at 30fps)

Usage

Streaming is on by default. Users on weaker PCs can disable it to fall back to the old post-episode pipeline:

# Default (streaming ON)
lerobot-record --dataset.repo_id=user/dataset ...

# Old behavior (streaming OFF)
lerobot-record --dataset.repo_id=user/dataset --dataset.streaming_encoding=false

For the RaC data collection script, set streaming_encoding: false in the dataset config.

Files Changed

`src/lerobot/datasets/video_utils.py`

Added StreamingVideoEncoder — manages one _CameraEncoder thread per camera
Added _CameraEncoder — daemon thread that reads frames from a queue and encodes with PyAV
Non-blocking unbounded queue ensures the control loop is never delayed

`src/lerobot/datasets/lerobot_dataset.py`

create() / start_streaming_encoder(): new streaming_encoding parameter
add_frame(): when streaming, feeds frames to encoder + accumulates running stats instead of writing PNGs
save_episode(): when streaming, uses running stats and calls finish_episode() to get already-encoded video paths
clear_episode_buffer(): cancels in-progress encoding on re-record
finalize(): cleans up encoder on shutdown
Full backward compatibility: when streaming_encoding=False, all existing code paths are unchanged

`src/lerobot/scripts/lerobot_record.py`

Added streaming_encoding: bool = True to DatasetRecordConfig
Wired through to both create() and resume paths

`examples/rac/rac_data_collection_openarms_rtc.py`

Added streaming_encoding: bool = True to RaCRTCDatasetConfig
Frames are added inline during the control loop (streaming) or buffered for post-loop writing (old path)
Automatically detects mode and adjusts behavior

Design Notes

Why threads, not processes? PyAV/FFmpeg releases the GIL during encoding. Threads share memory (zero-copy frame passing), avoiding the serialization overhead of multiprocessing.
Why unbounded queue? At 30fps production vs ~72fps encoding throughput, the queue stays near-empty. Even during brief encoder stalls, memory growth is bounded by episode length. The control loop must never block.
Why running stats? Avoids the expensive read-back-from-disk step. RunningQuantileStats + auto_downsample_height_width compute identical statistics incrementally with ~2ms overhead per frame.
Backward compatible: Setting streaming_encoding=false restores the original PNG → encode pipeline exactly. No behavior changes for existing users who don't opt in.

8.2 KiB Raw Permalink Blame History Unescape Escape