mirror of
https://github.com/huggingface/lerobot.git
synced 2026-07-03 16:17:15 +00:00
bc876949ff
The cluster benchmark showed fetch-bound throughput: resident decode 1824 samples/s vs stream keep-up 693 (target 1000), with fetch at ~465 MiB/s aggregate (~233/rank, i.e. ~4 effective HTTPS streams). Fixes: - One prefetch future per (episode, camera) instead of per episode: cameras no longer fetch back-to-back on a single thread, so the worker pool converts directly into concurrent range GETs. - Default fetch workers 4 -> 16, exposed as video_fetch_workers on StreamingLeRobotDataset for sweeps. - RangeFetcher uses fs.cat_file (one ranged GET per fetch, no open/seek/read layering) and resolves any fsspec URL, so S3-compatible stores (e.g. Backblaze B2 via s3://) work identically to hf://. Also fixed in passing: a latent deadlock on the payload-cache hit path (_get_or_build_decoder re-acquired the non-reentrant lock; unhit so far because payload hits are rare), and episode_byte_cache no longer imports private torchcodec symbols at module import time (they vary across torchcodec versions and broke the module on macOS wheels). New unit tests (decoder layer stubbed): cameras fetch in parallel (wall-clock bound), error propagation through ensure_ready/get_decoder, cache-hit deadlock regression, cat_file range correctness. Local Hub microbench shows +46% aggregate at 16 vs 4 workers on a residential link that saturates at ~15 MiB/s; the real before/after needs the cluster benchmark where per-stream throughput, not the link, binds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>