Files
lerobot/tests/datasets
Pepijn bc876949ff perf(streaming): parallel per-camera range fetch in the episode byte cache
The cluster benchmark showed fetch-bound throughput: resident decode
1824 samples/s vs stream keep-up 693 (target 1000), with fetch at ~465
MiB/s aggregate (~233/rank, i.e. ~4 effective HTTPS streams). Fixes:

- One prefetch future per (episode, camera) instead of per episode:
  cameras no longer fetch back-to-back on a single thread, so the worker
  pool converts directly into concurrent range GETs.
- Default fetch workers 4 -> 16, exposed as video_fetch_workers on
  StreamingLeRobotDataset for sweeps.
- RangeFetcher uses fs.cat_file (one ranged GET per fetch, no
  open/seek/read layering) and resolves any fsspec URL, so S3-compatible
  stores (e.g. Backblaze B2 via s3://) work identically to hf://.

Also fixed in passing: a latent deadlock on the payload-cache hit path
(_get_or_build_decoder re-acquired the non-reentrant lock; unhit so far
because payload hits are rare), and episode_byte_cache no longer imports
private torchcodec symbols at module import time (they vary across
torchcodec versions and broke the module on macOS wheels).

New unit tests (decoder layer stubbed): cameras fetch in parallel
(wall-clock bound), error propagation through ensure_ready/get_decoder,
cache-hit deadlock regression, cat_file range correctness. Local Hub
microbench shows +46% aggregate at 16 vs 4 workers on a residential
link that saturates at ~15 MiB/s; the real before/after needs the
cluster benchmark where per-stream throughput, not the link, binds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 15:41:01 +02:00
..