lerobot/tests/datasets at bc876949ffb18378ef78f98cf520db1942eea4ab - lerobot - Gitea: Git with a cup of tea

admin/lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-03 16:17:15 +00:00

Files

T

History

Pepijn bc876949ff perf(streaming): parallel per-camera range fetch in the episode byte cache

The cluster benchmark showed fetch-bound throughput: resident decode
1824 samples/s vs stream keep-up 693 (target 1000), with fetch at ~465
MiB/s aggregate (~233/rank, i.e. ~4 effective HTTPS streams). Fixes:

- One prefetch future per (episode, camera) instead of per episode:
  cameras no longer fetch back-to-back on a single thread, so the worker
  pool converts directly into concurrent range GETs.
- Default fetch workers 4 -> 16, exposed as video_fetch_workers on
  StreamingLeRobotDataset for sweeps.
- RangeFetcher uses fs.cat_file (one ranged GET per fetch, no
  open/seek/read layering) and resolves any fsspec URL, so S3-compatible
  stores (e.g. Backblaze B2 via s3://) work identically to hf://.

Also fixed in passing: a latent deadlock on the payload-cache hit path
(_get_or_build_decoder re-acquired the non-reentrant lock; unhit so far
because payload hits are rare), and episode_byte_cache no longer imports
private torchcodec symbols at module import time (they vary across
torchcodec versions and broke the module on macOS wheels).

New unit tests (decoder layer stubbed): cameras fetch in parallel
(wall-clock bound), error propagation through ensure_ready/get_decoder,
cache-hit deadlock regression, cat_file range correctness. Local Hub
microbench shows +46% aggregate at 16 vs 4 workers on a residential
link that saturates at ~15 MiB/s; the real before/after needs the
cluster benchmark where per-stream throughput, not the link, binds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-07-02 15:41:01 +02:00

..

test_aggregate.py

feat(encoding parameters): adding support for user provided video encoding parameters (#3455 )

2026-05-14 23:46:42 +02:00

test_byte_index.py

Add in-memory byte index and manifest-driven episode MP4 cache.

2026-06-16 15:03:17 +00:00

test_compute_stats.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_dataset_metadata.py

Add extensive language support (#3467 )

2026-05-19 14:46:11 +02:00

test_dataset_reader.py

feat(encoding parameters): adding support for user provided video encoding parameters (#3455 )

2026-05-14 23:46:42 +02:00

test_dataset_tools.py

feat(video re-encoding): Adding utility and dataset edition tool for video re-encoding (#3611 )

2026-05-19 14:46:14 +02:00

test_dataset_utils.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_dataset_writer.py

feat(encoding parameters): adding support for user provided video encoding parameters (#3455 )

2026-05-14 23:46:42 +02:00

test_datasets.py

fix(datasets): normalize shape=(1,) numeric values before HF encoding (#3344 )

2026-05-19 16:53:19 +02:00

test_delta_timestamps.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_episode_byte_cache.py

perf(streaming): parallel per-camera range fetch in the episode byte cache

2026-07-02 15:41:01 +02:00

test_image_transforms.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_image_writer.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_language_render.py

Add extensive language support (#3467 )

2026-05-19 14:46:11 +02:00

test_language.py

Add extensive language support (#3467 )

2026-05-19 14:46:11 +02:00

test_lerobot_dataset.py

refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472 )

2026-04-28 18:40:30 +02:00

test_quantiles_dataset_integration.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_sampler.py

fix(train): synchronize EpisodeAwareSampler shuffling across ranks and gate dataset download per node (#3768 )

2026-06-11 11:07:42 +02:00

test_streaming_distributed.py

feat(streaming): episode-pool iteration with decode-on-exit, video prefetch, and exact resume

2026-06-11 15:02:15 +02:00

test_streaming_native.py

feat(streaming): random-episode admission via reshard() + multi-input-shard shuffle

2026-06-15 13:33:27 +00:00

test_streaming_video_encoder.py

feat(encoding parameters): adding support for user provided video encoding parameters (#3455 )

2026-05-14 23:46:42 +02:00

test_streaming.py

refactor(streaming): rebuild StreamingLeRobotDataset on native datasets primitives

2026-06-11 21:03:09 +02:00

test_video_decoder_cache.py

fix(datasets): bound VideoDecoderCache to prevent OOM on large datasets (#3614 )

2026-05-19 16:54:25 +02:00

test_video_encoding.py

feat(video re-encoding): Adding utility and dataset edition tool for video re-encoding (#3611 )

2026-05-19 14:46:14 +02:00

test_visualize_dataset.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00