refactor(datasets): replace untyped dict with typed DatasetInfo dataclass

Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.

Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo

Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation
This commit is contained in:
jjolla93
2026-04-07 15:41:35 +09:00
committed by Maxime Ellerbach
parent ca87ccd941
commit 275be6c9d3
14 changed files with 265 additions and 152 deletions
+5 -5
View File
@@ -80,18 +80,18 @@ def _write_dataset_tree(
)
tasks = tasks_factory(total_tasks=1)
episodes = episodes_factory(
features=info["features"],
fps=info["fps"],
features=info.features,
fps=info.fps,
total_episodes=1,
total_frames=3,
tasks=tasks,
)
stats = stats_factory(features=info["features"])
stats = stats_factory(features=info.features)
hf_dataset = hf_dataset_factory(
features=info["features"],
features=info.features,
tasks=tasks,
episodes=episodes,
fps=info["fps"],
fps=info.fps,
)
create_info(root, info)