8.2 KiB
Streaming Video Encoding — Encode on the fly during recording
Problem
After each episode, save_episode() blocks for ~79 seconds on a 3-camera setup (3197 frames, 107s episode):
| Step | Time |
|---|---|
| Write 9591 PNGs to disk | ~19s |
| Read PNGs back → compute image stats | ~15s |
| Read PNGs again → encode 3× AV1 videos → delete PNGs | ~44.5s |
| Save parquet + metadata | ~0.6s |
| Total | ~79s |
The entire pipeline writes frames as temporary PNGs, reads them back twice (stats + encoding), then deletes them. This round-trip is the bottleneck.
Architecture
Before: sequential post-episode pipeline
Recording loop save_episode() — BLOCKS ~79s
┌─────────────┐ ┌──────────────────────────────────────────────────────────┐
│ 30fps loop │ │ │
│ │ frames │ frame_buffer ──► write PNGs ──► read PNGs ──► stats │
│ camera ─►───┼──► list │ (~19s) │ (~15s) │
│ teleop │ │ ▼ │
│ policy │ │ read PNGs ──► AV1 encode ──► delete PNGs │
│ │ │ (~44.5s) │
└──────┬───────┘ └──────────────────────────────────────────────────────────┘
│ │
▼ ▼
episode ends next episode
(~107s recording) (~79s blocked)
Data path: frame → list → PNG disk → read → stats + PNG disk → read → encode → MP4 → delete PNGs
After: streaming pipeline (encodes during recording)
Recording loop (encoding happens HERE) save_episode() — ~0.5s
┌───────────────────────────────────────┐ ┌──────────────────┐
│ 30fps control loop │ │ │
│ │ │ flush encoders │
│ camera ──► frame ─┬─► queue ──► [T1] ├── AV1 ─┤ (already done) │
│ │ queue ──► [T2] ├── AV1 ─┤ ~0.16s │
│ │ queue ──► [T3] ├── AV1 ─┤ │
│ │ │ │ running stats │
│ └─► downsample ──► │─ stats ─┤ → finalize │
│ RunningQuantile │ │ ~0.01s │
│ teleop / policy (never blocked) │ │ │
└───────────────────────────────────────┘ │ save parquet │
│ ~0.36s │
[T1] [T2] [T3] = encoder threads └──────────────────┘
(one per camera, GIL released by PyAV)
Data path: frame → queue → encode → MP4 (zero PNGs, zero re-reads)
Stats computation changes
| Before | After | |
|---|---|---|
| Method | compute_episode_stats() reads all PNGs from disk, decodes them, computes min/max/mean/std/quantiles |
RunningQuantileStats accumulates stats incrementally per frame during recording |
| Input | Full-resolution PNGs read back from disk | Downsampled frames (via auto_downsample_height_width, ~150×100px) directly from memory |
| When | After episode ends, inside save_episode() |
During recording, inside add_frame() (~2ms per frame) |
| Output | {mean, std, min, max, q01..q99} shaped (C,1,1) in [0,1] |
Identical shape and scale — RunningQuantileStats.get_statistics() → reshape (C,1,1) / 255 |
| I/O | Reads 9591 PNGs (~15s) | Zero disk I/O |
| Numeric features | Computed from episode buffer (unchanged) | Computed from episode buffer (unchanged) |
The running stats use the same auto_downsample_height_width function and produce the same statistical keys (mean, std, min, max, count, q01, q10, q50, q90, q99). Video features are excluded from the post-episode compute_episode_stats() call when streaming is active — only numeric features go through that path.
Results
Tested on the same 3-camera setup (2028 frames, 67.6s episode):
| Step | Before | After | Speedup |
|---|---|---|---|
| Frame writing (PNGs) | ~19s | 0s | ∞ (eliminated) |
| Episode stats | ~15s | 0.01s | 1500× |
| Video encoding | ~44.5s | 0.16s | 278× |
| Parquet + meta | ~0.6s | 0.36s | ~same |
Total save_episode() |
~79s | 0.55s | 143× |
The video encoding time drops to near-zero because most encoding already happened during recording. finish_episode() only flushes the last few buffered frames.
Per-frame overhead during recording
| Operation | Time |
|---|---|
queue.put(frame) (non-blocking) |
~0.01ms |
auto_downsample_height_width |
~0.5ms |
RunningQuantileStats.update |
~1ms |
| Total per frame | ~2ms (well within 33ms budget at 30fps) |
Usage
Streaming is on by default. Users on weaker PCs can disable it to fall back to the old post-episode pipeline:
# Default (streaming ON)
lerobot-record --dataset.repo_id=user/dataset ...
# Old behavior (streaming OFF)
lerobot-record --dataset.repo_id=user/dataset --dataset.streaming_encoding=false
For the RaC data collection script, set streaming_encoding: false in the dataset config.
Files Changed
src/lerobot/datasets/video_utils.py
- Added
StreamingVideoEncoder— manages one_CameraEncoderthread per camera - Added
_CameraEncoder— daemon thread that reads frames from a queue and encodes with PyAV - Non-blocking unbounded queue ensures the control loop is never delayed
src/lerobot/datasets/lerobot_dataset.py
create()/start_streaming_encoder(): newstreaming_encodingparameteradd_frame(): when streaming, feeds frames to encoder + accumulates running stats instead of writing PNGssave_episode(): when streaming, uses running stats and callsfinish_episode()to get already-encoded video pathsclear_episode_buffer(): cancels in-progress encoding on re-recordfinalize(): cleans up encoder on shutdown- Full backward compatibility: when
streaming_encoding=False, all existing code paths are unchanged
src/lerobot/scripts/lerobot_record.py
- Added
streaming_encoding: bool = TruetoDatasetRecordConfig - Wired through to both
create()andresumepaths
examples/rac/rac_data_collection_openarms_rtc.py
- Added
streaming_encoding: bool = TruetoRaCRTCDatasetConfig - Frames are added inline during the control loop (streaming) or buffered for post-loop writing (old path)
- Automatically detects mode and adjusts behavior
Design Notes
- Why threads, not processes? PyAV/FFmpeg releases the GIL during encoding. Threads share memory (zero-copy frame passing), avoiding the serialization overhead of multiprocessing.
- Why unbounded queue? At 30fps production vs ~72fps encoding throughput, the queue stays near-empty. Even during brief encoder stalls, memory growth is bounded by episode length. The control loop must never block.
- Why running stats? Avoids the expensive read-back-from-disk step.
RunningQuantileStats+auto_downsample_height_widthcompute identical statistics incrementally with ~2ms overhead per frame. - Backward compatible: Setting
streaming_encoding=falserestores the original PNG → encode pipeline exactly. No behavior changes for existing users who don't opt in.