Commit Graph

18 Commits

Author SHA1 Message Date
Steven Palma 123495250b refactor(dataset): split LeRobotDataset into DatasetReader & DatasetWriter (+ API cleanup) (#3180)
* refactor(dataset): split reader and writer

* chore(dataset): remove proxys

* refactor(dataset): better reader & writer encapsulation

* refactor(datasets): clean API + reduce leaky implementations

* refactor(dataset): API cleaning for writer, reader and meta

* refactor(dataset): expose writer & reader + other minor improvements

* refactor(dataset): improve teardown routine

* refactor(dataset): add hf_dataset property at the facade level

* chore(dataset): add init for datasset module

* docs(dataset): add docstrings for public API of the dataset classes

* tests(dataset): add tests for new classes

* fix(dataset): remove circular dependecy
2026-03-26 19:09:25 +01:00
Steven Palma d90e4bcfd3 refactor(dataset): modular files (#3171)
* refactor(dataset): modular files

* refactor(dataset): update imports across the codebase
2026-03-15 23:58:09 -07:00
Steven Palma e96339a3b4 feat(dataset): add streaming video encoding + HW encoder support (#2974)
* feat(dataset): init stream encoding

* feat(dataset): use threads to fix frame pickle latency

* refactor(dataset): remove HW encoded related changes

* add lp (#2977)

* feat(dataset): add Hw encoding + log drop frames (#2978)

* chore(docs): add streaming video encoding guide

* fix(dataset): style docs + testing

* chore(docs): simplify sttreaming video encoding guide

* chore(dataset): add commands + streaming encoding default false + print note if false + queue default is now 30

* chore(docs): add verification note advice

* chore(dataset): adjusting defaults & docs for streaming encoding

* docs(scripts): improve docstrings

* test(dataset): polish streaming encoding tests

* chore(dataset): move FYI log related to streaming

* chore(dataset): add arg vcodec to suggestions

* refactor(dataset): better handling for auto and available vcodec

* chore(dataset): change log level

* docs(dataset): add note related to training performance vcodec

* docs(dataset): add more notes to streaming encoding

---------

Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
2026-02-23 13:57:43 +01:00
Alex Tyshka 77dc49b3a3 Fix delta timestamps with episodes filter and add tests (#2612) 2026-01-16 18:14:54 +01:00
Alex Tyshka 33910673ec Bugfix: Add tests for image deletion and fix mixed image-video deletion (#2592)
* Add tests for image deletion and fix mixed-image-video deletion

* Fix docstring whitespace

* Remove debug print

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>

* Remove inaccurate comment

* Remove batched video test

---------

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>
2026-01-16 18:14:15 +01:00
Leo Tronchon 8b6fc0ae05 feat(datasets): expose video codec option for dataset recording (#2771)
* expose codec options + add tests

* pre-commit run -a
2026-01-08 18:06:39 +01:00
Michel Aractingi 12f2f35760 - Introduce _current_file_start_frame for better tracking of the number of frames in each parquet file (#2280)
- Added testing for that section in `test_datasets.py`
2025-10-21 16:17:12 +02:00
Michel Aractingi f2ff370459 Incremental parquet writing (#1903)
* incremental parquet writing

* add .finalise() and a backup __del__ for stopping writers

* fix missing import

* precommit fixes added back the use of embed images

* added lazy loading for hf_Dataset to avoid frequently reloading the dataset during recording

* fix bug in video timestamps

* Added proper closing of parquet file before reading

* Added rigorous testing to validate the consistency of the meta data after creation of a new dataset

* fix bug in episode index during clear_episode_buffer

* fix(empty concat): check for empty paths list before data files concatenation

* fix(v3.0 message): updating v3.0 backward compatibility message.

* added fixes for the resume logic

* answering co-pilot review

* reverting some changes and style nits

* removed unused functions

* fix chunk_id and file_id when resuming

* - fix parquet loading when resuming
- add test to verify the parquet file integrity when resuming so that data files are now overwritten

* added general function get_file_size_in_mb and removed the one for video

* fix table size value when resuming

* Remove unnecessary reloading of the parquet file when resuming record.
Write to a new parquet file when resuming record

* added back reading parquet file for image datasets only

* - respond to Qlhoest comments
- Use pyarrows `from_pydict` function
- Add buffer for episode metadata to write to the parquet file in batches to improve efficiency
- Remove the  use of `to_parquet_with_hf_images`

* fix(dataset_tools) with the new logic using proper finalize
bug in finding the latest path of the metdata that was pointing to the data files
added check for the metadata size in the case the metadatabuffer was not written yet

* nit in flush_metadata_buffer

* fix(lerobot_dataset) return the right dataset len when a subset of the dataset is requested

---------

Co-authored-by: Harsimrat Sandhawalia <hs.sandhawalia@gmail.com>
2025-10-11 11:01:30 +02:00
Steven Palma bbcf66bd82 chore: enable simplify in ruff lint (#2085) 2025-09-29 15:06:56 +02:00
Steven Palma c5b5955c5a chore: replace hard-coded next values with constants throughout all the source code (#2056) 2025-09-26 14:30:07 +02:00
Steven Palma d2782cf66b chore: replace hard-coded action values with constants throughout all the source code (#2055)
* chore: replace hard-coded 'action' values with constants throughout all the source code

* chore(tests): replace hard-coded action values with constants throughout all the test code
2025-09-26 13:33:18 +02:00
Steven Palma 43d878a102 chore: replace hard-coded obs values with constants throughout all the source code (#2037)
* chore: replace hard-coded OBS values with constants throughout all the source code

* chore(tests): replace hard-coded OBS values with constants throughout all the test code
2025-09-25 15:36:47 +02:00
Steven Palma 7cf04a5ec3 chore: move constants to utils (#2016) 2025-09-24 11:11:53 +02:00
Michel Aractingi f55c6e89f0 Dataset v3 (#1412)
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Remi Cadene <re.cadene@gmail.com>
Co-authored-by: Tavish <tavish9.chen@gmail.com>
Co-authored-by: fracapuano <francesco.capuano@huggingface.co>
Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
2025-09-15 09:53:30 +02:00
Simon Alibert d4ee470b00 Package folder structure (#1417)
* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
2025-07-01 16:34:46 +02:00
Pepijn 0b2285d1ec Feat: Improve hub integration (#1382)
* feat(policies): Initial setup to push policies to hub with tags and model card

* feat: add dataset that is used to train

* Add model template summary

* fix: Update link model_card template

* fix: remove print

* fix: change import name

* fix: add model summary in template

* fix: minor text

* fix: comments Lucain

* fix: feedback steven

* fix: restructure push to hub

* fix: remove unneeded changes

* fix: import

* fix: import 2

* Add MANIFEST.in

* fix: feedback pr

* Fix tests

* tests: Add smolvla end-to-end test

* Fix: smolvla test

* fix test name

* fix policy tests

* Add push to hub false policy tests

* Do push to hub cleaner

* fix(ci): add push_to_hub false in tests

---------

Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2025-06-26 14:36:16 +02:00
Simon Alibert e23b41e79a Hardware API redesign (#777)
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
2025-06-05 17:48:43 +02:00
Simon Alibert 974028bd28 Organize test folders (#856)
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-03-13 14:05:55 +01:00