- examples/scaling/train_streaming_multinode.py: Accelerate-based distributed/
resumable streaming training (no DistributedSampler; rank/world_size auto-resolved),
checkpoints the dataset stream state, and supports a --dummy pure-dataloading path
with throughput logging. SLURM launcher in slurm/train_streaming_robocasa.sh.
- benchmarks/streaming/benchmark_streaming.py: dummy-consumer dataloading benchmark
(single / sarm frame modes) emitting frames/s/node, p50/p95/p99 sample latency,
first-batch latency, and VideoDecoderCache reuse stats as JSON + CSV. SLURM launcher
+ README documenting the source/node/mode matrix and manual bucket prewarming.
- VideoDecoderCache: add hit/miss/eviction counters and a stats() method so the
benchmark can surface decoder thrash (no new cache, no eviction-policy change).
- tests/datasets/test_streaming_distributed.py: accelerate-launch smoke test asserting
per-rank disjointness; skips (does not false-pass) when <2 processes spawn.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>