Commit Graph

1067 Commits

Author SHA1 Message Date
Pepijn f8a185f753 cleanup logging 2025-10-14 17:05:47 +02:00
Pepijn a66b50d372 scale lr decay if we reduce steps 2025-10-14 15:59:46 +02:00
Pepijn 9950bfd66f fix bug 2025-10-14 15:22:59 +02:00
Pepijn 4170d1b6f1 cleanup 2025-10-14 14:48:18 +02:00
Pepijn d3f1ece680 cleanup update method 2025-10-14 14:33:58 +02:00
Pepijn 4061b3f5b3 always use accelerate 2025-10-14 14:24:55 +02:00
Pepijn d2687e9486 add some debugging 2025-10-14 14:13:50 +02:00
Pepijn bb824f2275 change accelerate detection 2025-10-14 14:08:20 +02:00
Pepijn da78460b65 fix OOM bug 2025-10-14 14:01:51 +02:00
Pepijn a0d0b00e04 small improvements in train 2025-10-14 13:53:38 +02:00
Pepijn cabc47c5ad simplify accelerate main process detection 2025-10-14 13:38:36 +02:00
Pepijn 50ff388bf6 update docs, and small improvements in train 2025-10-14 13:31:52 +02:00
Pepijn a86cea5708 fix path optimizer state 2025-10-14 11:26:57 +02:00
Pepijn 6486982ab4 small fixes 2025-10-14 10:46:19 +02:00
Pepijn 2bc154e706 Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggingface/lerobot into feat/accelerate-melt-gpus 2025-10-14 10:25:20 +02:00
Pepijn 0d79130729 pre download dataset in tests 2025-10-14 10:24:46 +02:00
Pepijn ed267d4cf1 Merge branch 'main' into feat/accelerate-melt-gpus 2025-10-14 01:13:23 -07:00
Pepijn 252bca9354 dont push to hub in multi gpu tests 2025-10-14 10:06:32 +02:00
Pepijn 43bef1d91c fix test 2025-10-13 17:59:59 +02:00
Pepijn 4c40be57d8 change runner 2025-10-13 17:28:06 +02:00
Francesco Capuano 6f5bb4d4a4 fix outdated example in docs (#2182)
* fix outdated example

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

* Update docs/source/il_robots.mdx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

---------

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-13 16:43:23 +02:00
Pepijn c711a628b9 add tests 2025-10-13 16:25:46 +02:00
Francesco Capuano f29311ccb0 fix: very minor fix but hey devil is in details (#2168)
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2025-10-13 10:44:53 +02:00
Michel Aractingi 0c79cf8f4e Add missing finalize calls in example (#2175)
- add missing calls to dataset.finalize in the example recording scripts
- add section in the dataset docs on calling dataset.finalize
2025-10-11 21:15:43 +02:00
Michel Aractingi f2ff370459 Incremental parquet writing (#1903)
* incremental parquet writing

* add .finalise() and a backup __del__ for stopping writers

* fix missing import

* precommit fixes added back the use of embed images

* added lazy loading for hf_Dataset to avoid frequently reloading the dataset during recording

* fix bug in video timestamps

* Added proper closing of parquet file before reading

* Added rigorous testing to validate the consistency of the meta data after creation of a new dataset

* fix bug in episode index during clear_episode_buffer

* fix(empty concat): check for empty paths list before data files concatenation

* fix(v3.0 message): updating v3.0 backward compatibility message.

* added fixes for the resume logic

* answering co-pilot review

* reverting some changes and style nits

* removed unused functions

* fix chunk_id and file_id when resuming

* - fix parquet loading when resuming
- add test to verify the parquet file integrity when resuming so that data files are now overwritten

* added general function get_file_size_in_mb and removed the one for video

* fix table size value when resuming

* Remove unnecessary reloading of the parquet file when resuming record.
Write to a new parquet file when resuming record

* added back reading parquet file for image datasets only

* - respond to Qlhoest comments
- Use pyarrows `from_pydict` function
- Add buffer for episode metadata to write to the parquet file in batches to improve efficiency
- Remove the  use of `to_parquet_with_hf_images`

* fix(dataset_tools) with the new logic using proper finalize
bug in finding the latest path of the metdata that was pointing to the data files
added check for the metadata size in the case the metadatabuffer was not written yet

* nit in flush_metadata_buffer

* fix(lerobot_dataset) return the right dataset len when a subset of the dataset is requested

---------

Co-authored-by: Harsimrat Sandhawalia <hs.sandhawalia@gmail.com>
2025-10-11 11:01:30 +02:00
Juan Pizarro 25f60c301b use TeleopEvents.RERECORD_EPISODE in gym_manipulator (#2165)
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-10-11 00:15:42 +02:00
Jade Choghari 0699b46d87 refactor(envs): add custom-observation-size (#2167) 2025-10-10 20:41:37 +02:00
Pepijn a74affad7c try with local rank 2025-10-10 15:52:49 +02:00
Pepijn 63fcebd5a7 main logging 2025-10-10 15:01:27 +02:00
Pepijn 8ebda30d1a Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggingface/lerobot into feat/accelerate-melt-gpus 2025-10-10 14:06:01 +02:00
Pepijn b65172f819 only log in main process 2025-10-10 14:05:53 +02:00
Pepijn deaeb4281c Merge branch 'main' into feat/accelerate-melt-gpus 2025-10-10 13:35:58 +02:00
Pepijn 771b03c30d fix pre commit 2025-10-10 13:35:26 +02:00
Michel Aractingi b8f7e401d4 Dataset tools (#2100)
* feat(dataset-tools): add dataset utilities and example script

- Introduced dataset tools for LeRobotDataset, including functions for deleting episodes, splitting datasets, adding/removing features, and merging datasets.
- Added an example script demonstrating the usage of these utilities.
- Implemented comprehensive tests for all new functionalities to ensure reliability and correctness.

* style fixes

* move example to dataset dir

* missing lisence

* fixes mostly path

* clean comments

* move tests to functions instead of class based

* - fix video editting, decode, delete frames and rencode video
- copy unchanged video and parquet files to avoid recreating the entire dataset

* Fortify tooling tests

* Fix type issue resulting from saving numpy arrays with shape 3,1,1

* added lerobot_edit_dataset

* - revert changes in examples
- remove hardcoded split names

* update comment

* fix comment
add lerobot-edit-dataset shortcut

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>

* style nit after copilot review

* fix: bug in dataset root when editing the dataset in place (without setting new_repo_id

* Fix bug in aggregate.py when accumelating video timestamps; add tests to fortify aggregate videos

* Added missing output repo id

* migrate delete episode to using pyav instead of decoding, writing frames to disk and encoding again.
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>

* added modified suffix in case repo_id is not set in delete_episode

* adding docs for dataset tools

* bump av version and add back time_base assignment

* linter

* modified push_to_hub logic in lerobot_edit_dataset

* fix(progress bar): fixing the progress bar issue in dataset tools

* chore(concatenate): removing no longer needed concatenate_datasets usage

* fix(file sizes forwarding): forwarding files and chunk sizes in metadata info when splitting and aggregating datasets

* style fix

* refactor(aggregate): Fix video indexing and timestamp bugs in dataset merging

There were three critical bugs in aggregate.py that prevented correct dataset merging:

1. Video file indices: Changed from += to = assignment to correctly reference
   merged video files

2. Video timestamps: Implemented per-source-file offset tracking to maintain
   continuous timestamps when merging split datasets (was causing non-monotonic
   timestamp warnings)

3. File rotation offsets: Store timestamp offsets after rotation decision to
   prevent out-of-bounds frame access (was causing "Invalid frame index" errors
   with small file size limits)

Changes:
- Updated update_meta_data() to apply per-source-file timestamp offsets
- Updated aggregate_videos() to track offsets correctly during file rotation
- Added get_video_duration_in_s import for duration calculation

* Improved docs for split dataset and added a check for the possible case that the split size results in zero episodes

* chore(docs): update merge documentation details

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

---------

Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
Co-authored-by: Jack Vial <vialjack@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-10-10 12:32:07 +02:00
Pepijn 656fc0f059 Remove validate_robot_cameras_for_policy (#2150)
* Remove validate_robot_cameras_for_policy as with rename processor the image keys can be renamed an mapped

* fix precommit
2025-10-10 11:34:21 +02:00
Pepijn d709acfc55 Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggingface/lerobot into feat/accelerate-melt-gpus 2025-10-10 11:25:55 +02:00
Pepijn 95b6035baa Place logging under accelerate and update docs 2025-10-10 11:25:53 +02:00
Pepijn 629bbca96b Merge branch 'main' into feat/accelerate-melt-gpus 2025-10-10 10:09:53 +02:00
Steven Palma 829d2d1ad9 fic(docs): local docs links (#2149) 2025-10-09 15:20:07 +02:00
Pepijn 52751e8e6d Merge branch 'main' into feat/accelerate-melt-gpus 2025-10-09 15:19:48 +02:00
Pepijn 4b7cd7211a add docs and only push model once 2025-10-09 15:11:47 +02:00
Pepijn 4ccf28437a Add act documentation (#2139)
* Add act documentation

* remove citation as we link the paper

* simplify docs

* fix pre commit
2025-10-08 20:07:14 +02:00
Steven Palma 9a49e57c72 refactor(datasets): add compress_level parameter to write_image() and set it to 1 (#2135)
* refactor(datasets): add compress_level parameter to write_image() and set it to 1

* docs(dataset): add docs to write_image()
2025-10-08 20:06:56 +02:00
Steven Palma 6c28ef894a chore(docs): add missing license headers (#2140) 2025-10-08 14:27:52 +02:00
Steven Palma bf3c8746b7 feat(devices): add lazy loading for 3rd party robots cameras and teleoperators (#2123)
* feat(devices): add lazy loading for 3rd party robots cameras and teleoperators

Co-authored-by: Darko Lukić <lukicdarkoo@gmail.com>

* feat(devices): load device class based on assumptions in naming

* docs(devices): instructions for using 3rd party devices

* docs: address review feedback

* chore(docs): add example for 3rd party devices

---------

Co-authored-by: Darko Lukić <lukicdarkoo@gmail.com>
2025-10-07 17:46:22 +02:00
Pepijn 9f32e00f90 fix(async): Add pre and post processing to async inference and update docs (#2132)
* Add pre and post processing to async inference and update docs

* precommit fix typo

* fix tests

* refactor(async): no None branching for processors in _predict_action_chunk

---------

Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2025-10-07 15:10:31 +02:00
Michel Aractingi fcaa0ea5f9 remove extra time base set. (#2133)
Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
2025-10-07 14:09:36 +02:00
Iulia Feroli 5ac9356135 Update README.md to fix broken link to example notebook for visuals (#2117)
Folder structure of examples seems to have changed with extra `dataset` folder and the notebook has also changed names.

Signed-off-by: Iulia Feroli <iuliaferoli@gmail.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2025-10-07 09:43:32 +02:00
Steven Palma b74e2a6113 feat(deps): ceil dependency versions (#2091) 2025-10-05 17:53:43 +02:00
AdilZouitine dbce707db5 Initialize logging in training script for both main and non-main processes
- Added `init_logging` calls to ensure proper logging setup when using the accelerator and in standard training mode.
- This change enhances the clarity and consistency of logging during training sessions.
2025-10-03 16:43:05 +02:00