lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-18 08:47:05 +00:00

Author	SHA1	Message	Date
Pepijn	79b547de32	Merge remote episode-pool work into the full pool rewrite The remote commit (`2ab71231c`) added an opt-in episode pool, deferred decode in the legacy buffer path, decode/fetch timing instrumentation, remote-IO retries (video_utils), and 32MB row-group writing (dataset_tools). The pool rewrite on this side makes the episode pool the only iteration path (with prefetch-on-admit, per-consumer seeding, worker-exact fast-forward resume), so streaming_dataset.py resolves to the rewrite with the remote instrumentation ported into it: - 5-slot shared counters + timing_stats() (decode_s_total/fetch_s_total) - fetch timed around episode admission, decode timed around emission - benchmark/slurm keep the remote updates, with episode_pool_size as the knob (buffer_size deprecated and ignored) video_utils retries and dataset_tools row groups are taken unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:17:04 +02:00
Pepijn	a7b7f4964e	fix(streaming): worker-exact resume arithmetic and multi-worker resume test The fast-forward skip assumed every DataLoader worker delivers batches; workers that own no shards yield nothing and are stopped, so the batch round-robin runs over min(num_workers, num_shards) active workers. Use that effective count (shard-less workers skip nothing). Adds a resume test under num_workers=2 asserting exact continuation. Note: the test fixtures write a single parquet file regardless of data_files_size_in_mb, so worker-splitting tests exercise the degenerate single-shard layout; multi-shard behavior is covered by the rank-level split_dataset_by_node tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:11:00 +02:00
Pepijn	1050c2fb6c	feat(streaming): episode-pool iteration with decode-on-exit, video prefetch, and exact resume Replace the shard/Backtrackable/decoded-shuffle-buffer internals with an episode pool: each (rank x worker) consumer keeps episode_pool_size whole episodes' tabular rows in RAM and emits uniformly random frames across them. delta_timestamps windows become exact in-RAM slices with correct boundary padding (the Backtrackable machinery and its lookback/lookahead ceilings are gone), and video is decoded only when a sample is emitted, so pool memory stays tabular-sized instead of buffer_size decoded samples. - Prefetch-on-admit: when streaming from a remote source, each pooled episode's video files download to a local cache in the background (refcounted, since v3 packs several episodes per file; deleted on eviction), so decode-on-exit reads local bytes instead of paying network seek latency. - Per-consumer RNG derived from (seed, epoch, rank, worker): consumers decorrelated, runs reproducible, epochs reshuffle automatically. - Deterministic fast-forward resume: load_state_dict takes the trainer's {batches_consumed, batch_size}; each worker re-derives its own skip from the DataLoader's round-robin batch assignment and replays tabular-only (no decode). Exact within an epoch, works with num_workers > 0, and the same state file serves every rank. Replaces the per-shard HF state_dict approach, which lived in worker processes and could not be captured from the trainer. - Shard-cap default removed (max_num_shards=None uses every parquet shard); runtime warnings for non-divisible world sizes (datasets degrades to read-everything splitting) and workers left without shards. - episode_pool_size replaces buffer_size (deprecated, ignored with a warning); decoder cache sized to the pool working set, capped at 128. Legacy order-replication tests asserted the old buffer algorithm step-by-step and are rewritten as behavior contracts (exactly-once coverage, per-seed determinism, epoch reshuffle). Value-level parity tests against the map-style dataset pass unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:02:15 +02:00
pepijn	2ab71231cd	feat(streaming): defer video decode, episode-pool shuffle, and remote-IO retries - streaming_dataset: defer torchcodec decode until a sample leaves the shuffle buffer (buffer now holds ~KB tabular rows, not MB of pixels) and add an opt-in episode-pool shuffle (episode_pool_size) with exact in-episode delta lookups; expose decode/fetch timing_stats. - video_utils: retry transient hf:///fsspec/httpx transport errors during streaming decode (LEROBOT_REMOTE_IO_MAX_RETRIES). - dataset_tools: write multiple ~32MB row groups with a page index to bound per-shard streaming memory. - benchmarks/slurm: streaming benchmark + matrix submitter updates. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-11 10:08:28 +00:00
Pepijn	2d1c17d971	docs(streaming): note AV1 is LeRobot's default codec (vcodec=libsvtav1) So the A100/H100 no-AV1-NVDEC limitation applies to most LeRobot v3 datasets, not just RoboCasa — GPU decode needs an Ada GPU, an hevc/h264-encoded dataset, or a re-encode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:10:18 +02:00
Pepijn	7241f029c6	docs(streaming): A100/H100 NVDEC cannot decode AV1 — correct guidance NVIDIA's decode support matrix: the compute GPUs A100 (GA100) and H100 (GH100) have no AV1 NVDEC decoder; only Ada (L4/L40/RTX40) and some Ampere (A10/A40/A16) do. So on A100/H100 nodes, AV1 datasets must be decoded on CPU or re-encoded to H.265/H.264 — no torchcodec build enables cuda AV1 decode there. Also distinguish that error from "Unsupported device: cuda (variant: ffmpeg)", which is a torchcodec-built-without-CUDA issue. Update diagnose_decode.py message + benchmark README accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:08:54 +02:00
Pepijn	23c58f5f9e	feat(streaming): decode diagnostic + fail benchmark on 0 frames - benchmark: raise SystemExit if 0 frames were measured, so a run that produces no batches (swallowed decode error, all batches dropped) fails loudly instead of being reported green with NaN/zero numbers (the misleading "COMPLETED" CUDA jobs). - add benchmarks/streaming/diagnose_decode.py: isolates the streaming decode path (resolve path -> fsspec.open -> torchcodec VideoDecoder -> get one frame) and prints package versions + the first bytes of the handle. Pinpoints decode failures: bad/ placeholder bytes vs ffmpeg/torchcodec build issue. RoboCasa videos are AV1; the failure message calls out AV1 decoder + NVDEC-on-Ada requirements explicitly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:40:24 +02:00
Pepijn	a32a2c647b	feat(streaming): full-matrix SLURM submitter + results summarizer slurm/run_streaming_matrix.sh fans the benchmark matrix (sources {hub,bucket, warmed_bucket} x modes {single,sarm} x decode {cpu,cuda}) out as isolated single-GPU SLURM jobs, so an OOM in one config is contained and reported per-job by SLURM. Worker count and shuffle buffer are bounded (lower for cuda, which holds a CUDA context + NVDEC session per worker) to avoid host/VRAM OOM. Source/mode/decode/workers/buffer/account/ partition are env-overridable; SOURCES/MODES/DECODES select subsets. benchmarks/streaming/summarize_results.py collapses the per-run JSONs into one comparison table + summary.csv (frames/s/node, first-batch + p50/p95/p99 latency, cache hit-rate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:51:36 +02:00
Pepijn	343ecd7980	feat(streaming): optional GPU (NVDEC) video decode device Add `video_decode_device` to StreamingLeRobotDataset and a `device` arg to VideoDecoderCache, passed to torchcodec's VideoDecoder. "cuda" offloads H.264/H.265 decode to the GPU's dedicated NVDEC engine (independent of the training SMs); requires a CUDA-enabled torchcodec build. benchmark: `--video_decode_device` flag. With cuda + num_workers>0 it forces the `spawn` start method (CUDA cannot init in forked workers) and disables CPU pin_memory (frames are already on-GPU). Decode device is recorded in results and the output filename. README documents the NVDEC option and its concurrency/IPC caveats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:47:11 +02:00
Pepijn	f7c8a526e8	feat(streaming): wallclock benchmark throughput, cross-worker cache stats, bucket source - benchmark: frames_per_s_node now measures sustained wall-clock throughput over the post-warmup window. The previous metric summed inter-batch gaps, which collapse to ~0 under async prefetch (consumer drains a pre-filled queue) and overstated throughput ~100x. - VideoDecoderCache gains an optional shared [hits, misses, evictions] counter tensor; StreamingLeRobotDataset.video_decoder_cache_stats() aggregates it across DataLoader workers (lock-free, approximate; hit_rate preserved). Fixes empty cache stats with workers. - StreamingLeRobotDataset.data_files_root: read bulk data/ + videos/ from an fsspec root (e.g. hf://buckets/<owner>/<name>) while metadata still loads from repo_id. Enables bucket / prewarmed-bucket benchmark sources without copying metadata. Exposed as benchmark --data_files_root. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:25:44 +02:00
Pepijn	68fa5d80b0	feat(streaming): multinode example, dataloading benchmark, distributed smoke test - examples/scaling/train_streaming_multinode.py: Accelerate-based distributed/ resumable streaming training (no DistributedSampler; rank/world_size auto-resolved), checkpoints the dataset stream state, and supports a --dummy pure-dataloading path with throughput logging. SLURM launcher in slurm/train_streaming_robocasa.sh. - benchmarks/streaming/benchmark_streaming.py: dummy-consumer dataloading benchmark (single / sarm frame modes) emitting frames/s/node, p50/p95/p99 sample latency, first-batch latency, and VideoDecoderCache reuse stats as JSON + CSV. SLURM launcher + README documenting the source/node/mode matrix and manual bucket prewarming. - VideoDecoderCache: add hit/miss/eviction counters and a stats() method so the benchmark can surface decoder thrash (no new cache, no eviction-policy change). - tests/datasets/test_streaming_distributed.py: accelerate-launch smoke test asserting per-rank disjointness; skips (does not false-pass) when <2 processes spawn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 13:48:23 +02:00
Steven Palma	04125492e4	fix(datasets): expand torchcodec platform coverage + rewrite pyav fallback for torchvision >0.26 (#3588 ) * fix(deps): better versioning control for torchcodec * refactor(video_utils): replace torchvision with pyav * adding Torchcodec version to lerobot-info * chore(benchmarks): delete video benchmark --------- Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co>	2026-05-12 16:59:11 +02:00
Steven Palma	5f15232271	chore: remove usernames + use entrypoints in docs, comments & sample commands (#2988 )	2026-02-18 22:46:12 +01:00
Caroline Pascal	648ea8f485	fix(benchmark) : fixing video benchmark (#2094 ) * fix(time benchmark): removing deprecated TimeBenchmark dependency * fix(typo): renaming frames in an up-to-date fashion * feat(duets): rearanging crf and g parameters in a proper unique combination manner * fix(segfault): fixing segfault by adding a lock in ThreadPoolExecutor * chore(update) : update datasets, codecs and backends to the latest versions * chore(unused files): removing unused files * fix(dataset paths): fix datasets paths to live among lerobot datasets	2025-11-26 17:41:31 +01:00
Steven Palma	43d878a102	chore: replace hard-coded obs values with constants throughout all the source code (#2037 ) * chore: replace hard-coded OBS values with constants throughout all the source code * chore(tests): replace hard-coded OBS values with constants throughout all the test code	2025-09-25 15:36:47 +02:00
Steven Palma	af1760f175	chore(utils): move benchmark and buffer to their respective modules (#2028 )	2025-09-24 16:46:38 +02:00
Michel Aractingi	f55c6e89f0	Dataset v3 (#1412 ) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Remi Cadene <re.cadene@gmail.com> Co-authored-by: Tavish <tavish9.chen@gmail.com> Co-authored-by: fracapuano <francesco.capuano@huggingface.co> Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>	2025-09-15 09:53:30 +02:00
Steven Palma	378e1f0338	Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520 ) * chore: update .gitignore * chore: update pre-commit * chore(deps): update pyproject * fix(ci): multiple fixes * chore: pre-commit apply * chore: address review comments * Update pyproject.toml Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * chore(deps): add todo --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>	2025-07-17 14:30:20 +02:00
Simon Alibert	d4ee470b00	Package folder structure (#1417 ) * Move files * Replace imports & paths * Update relative paths * Update doc symlinks * Update instructions paths * Fix imports * Update grpc files * Update more instructions * Downgrade grpc-tools * Update manifest * Update more paths * Update config paths * Update CI paths * Update bandit exclusions * Remove walkthrough section	2025-07-01 16:34:46 +02:00
Steven Palma	c940676bdd	fix(benchmarks): remove .numpy() from frame in benchmark script (#1354 )	2025-06-19 17:07:13 +02:00
Caroline Pascal	6d723c45a9	feat(encoding): switching to PyAV for ffmpeg related tasks (#983 )	2025-04-29 17:39:35 +02:00
Steven Palma	4041f57943	feat(visualization): replace cv2 GUI with Rerun (and solves ffmpeg versioning issues) (#903 )	2025-04-09 17:33:01 +02:00
Steven Palma	1c15bab70f	fix(codec): hot-fix for default codec in linux arm platforms (#868 )	2025-03-17 13:23:11 +01:00
Jade Choghari	0e98c6ee96	Add torchcodec cpu (#798 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Remi <re.cadene@gmail.com> Co-authored-by: Remi <remi.cadene@huggingface.co> Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>	2025-03-14 16:53:42 +01:00
Simon Alibert	a1809ad3de	Add typos checks (#770 )	2025-02-25 23:51:15 +01:00
CharlesCNorton	bc16e1b497	fix(docs): typos in benchmark readme.md (#614 ) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>	2025-01-09 09:35:27 +01:00
Simon Alibert	32eb0cec8f	Dataset v2.0 (#461 ) Co-authored-by: Remi <remi.cadene@huggingface.co>	2024-11-29 19:04:00 +01:00
Simon Alibert	0b21210d72	Convert datasets to av1 encoding (#302 )	2024-07-22 20:08:59 +02:00
Simon Alibert	e410e5d711	Improve video benchmark (#282 ) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Remi <re.cadene@gmail.com>	2024-07-09 20:20:25 +02:00

29 Commits