Files
lerobot/tests/datasets
Pepijn 06aa6a0425 feat(streaming): exact-once epoch coverage for the byte-cache episode pool
The pool path sampled frames with replacement and never guaranteed a full
epoch (episodes rotated on a fixed cadence; frames drawn randomly, none
tracked). Add ExactCoveragePool: a deterministic planner that enumerates
every frame of every episode exactly once per epoch while keeping at most
pool_size episodes resident, so batch mixing stays high (uniform draw over
all remaining frames in the pool) but coverage is complete and reproducible.

Mechanics (the "evict only when all frames sampled" model): episodes are
admitted in a seeded global permutation; each resident episode carries a
seeded frame-index shuffle; each draw picks a resident episode with
probability proportional to its remaining frames and pops one; an episode
is evicted only when its last frame is emitted, then a new one is admitted;
the epoch ends when admission is exhausted and every resident episode drains.
Order is a pure function of (seed, epoch) -> resumable by deterministic
fast-forward. The planner does no I/O and exposes admission_order so callers
can prefetch episodes ahead of the sampling frontier.

Wired into the benchmark as --coverage {sampled,exact}: run_exact_coverage_stream
prefetches stream_prefetch_episodes beyond the frontier so a freshly admitted
episode's bytes are resident before it is drawn, then decodes each frame once,
paced to target.

Tests: 7 planner unit tests (exact-once coverage incl. a 45k-frame epoch,
pool-size bound, per-(seed,epoch) determinism with coverage preserved,
admission/eviction events, coupon-collector mixing, zero-length episodes) and
a mocked-cache structural test of run_exact_coverage_stream asserting
every-frame-once-per-camera plus the prefetch-before-decode invariant.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 15:01:35 +02:00
..