mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-18 08:47:05 +00:00
674c990a39
Raise the default episode_pool_size to 1024 (DatasetConfig + StreamingLeRobotDataset) for better default shuffle quality at scale. Streaming is now a first-class option of the main train script: when cfg.dataset.streaming is set, the dataloader is not handed to accelerate (the dataset is already rank-disjoint via split_dataset_by_node, so IterableDatasetShard would drop (N-1)/N of each rank's stream), batches are moved to device manually, and the episode-aware sampler is skipped. Remove the standalone examples/scaling/train_streaming_multinode.py example in favor of this wiring. Co-authored-by: Cursor <cursoragent@cursor.com>