fix(dataset): use revision-safe Hub cache for downloaded datasets (#3233)

* refactor(dataset): enhance dataset root directory handling and introduce hub cache support

- Updated DatasetConfig and LeRobotDatasetMetadata to clarify root directory behavior and introduce a dedicated hub cache for downloads.
- Refactored LeRobotDataset and StreamingLeRobotDataset to utilize the new hub cache and improve directory management.
- Added tests to ensure correct behavior when using the hub cache and handling different revisions without a specified root directory.

* refactor(dataset): improve root directory handling in LeRobotDataset

- Updated LeRobotDataset to store the requested root path separately from the actual root path.
- Adjusted metadata loading to use the requested root, enhancing clarity and consistency in directory management.

* refactor(dataset): minor improvements for hub cache support

* chore(datasets): guard in resume + assertion test

---------

Co-authored-by: AdilZouitine <adilzouitinegm@gmail.com>
Co-authored-by: mickaelChen <mickael.chen.levinson@gmail.com>
This commit is contained in:
Steven Palma
2026-03-27 22:21:55 +01:00
committed by GitHub
parent 975d89b38d
commit 4e45acca52
8 changed files with 440 additions and 40 deletions
+4
View File
@@ -65,6 +65,10 @@ if "LEROBOT_HOME" in os.environ:
# cache dir
default_cache_path = Path(HF_HOME) / "lerobot"
HF_LEROBOT_HOME = Path(os.getenv("HF_LEROBOT_HOME", default_cache_path)).expanduser()
# LeRobot's own revision-safe Hub cache (NOT the system-wide ~/.cache/huggingface/hub/).
# Used as the ``cache_dir`` argument to ``snapshot_download`` so that different
# dataset revisions are stored in isolated snapshot directories.
HF_LEROBOT_HUB_CACHE = HF_LEROBOT_HOME / "hub"
# calibration dir
default_calibration_path = HF_LEROBOT_HOME / "calibration"