Commit Graph

59 Commits

Author SHA1 Message Date
CarolinePascal cf6f92ac74 fix(re-encoding): removing inconsistent re-encoding option in lerobot_edit_dataset 2026-05-13 16:22:09 +02:00
CarolinePascal d1917a4871 chore(format): formatting code 2026-05-13 15:06:57 +02:00
CarolinePascal c96092f199 chore(rename): renaming camera_encoder_config to camera_encoder 2026-05-13 14:25:59 +02:00
CarolinePascal fb142866dd fix(typos): fixing remaining typos 2026-05-13 12:40:47 +02:00
CarolinePascal 5b976cc926 fix(typos): fixing multiple typos 2026-05-12 18:22:55 +02:00
CarolinePascal 88862c62b1 feat(aggregate): updating dataset aggregation procedure. Encoding tuning paramters (crf, g,...) are ignored for validation and changed to None in the aggregated dataset if incompatible. 2026-05-12 17:43:16 +02:00
CarolinePascal e22ecbbcda fix(typos): fixing typos and small mistakes 2026-05-12 17:43:16 +02:00
CarolinePascal cd2316460d fix(imports): refactoring the file architecture to avoid circular imports. VideoEncoderConfig is now defined in lerobot.configs and lazily imports av at runtime. 2026-05-12 17:43:16 +02:00
CarolinePascal 0c3e24e3d6 chore(fromat): formatting code 2026-05-12 17:43:16 +02:00
CarolinePascal 7f7c958d0b test(artifacts): cleaning up artifacts for the video encoding tests 2026-05-12 17:43:16 +02:00
CarolinePascal 9c64f3f994 chore(format): formatting code, fixing error messages and variable names 2026-05-12 17:43:16 +02:00
CarolinePascal 6b9110d3e9 chore(PyAV): cleaning up PyAV utils and encoding parameters checks to stick to the minimun required tooling. 2026-05-12 17:43:15 +02:00
CarolinePascal c7dc56d8b5 chore(format): fixing formatting issues 2026-05-12 17:43:15 +02:00
CarolinePascal 7040a106a2 test(new): adding new tests for encoding related features 2026-05-12 17:43:15 +02:00
CarolinePascal 6799a24b09 test(existing): adapting existing tests 2026-05-12 17:43:15 +02:00
CarolinePascal 3dfa408d17 chore(video backend): renaming codec into video_backend in get_safe_default_video_backend() 2026-05-12 17:42:27 +02:00
Maxime Ellerbach cb0a944941 refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472)
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass

Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.

Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo

Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation

* fix pre-commit

* update docstring inside DatasetInfo.from_dict()

* sorts the unknown to have deterministic output

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>

* refactoring the last few old fieds


* fix crop dataset roi type mismatch


* use consistantly int for data and video_files_size_in_mb

---------

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
2026-04-28 18:40:30 +02:00
Steven Palma ca87ccd941 feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI (#3413)
* feat(scripts): lerobot-rollout

* fix(rollout) require dataset in dagger + use duration too

* fix(docs): dagger num_episodes

* test(rollout): fix expectations

* fix(rollout): features check

* fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config

* docs(rollout): edit rename_map instructions

* chore(rollout): multiple minor improvements

* chore(rollout): address coments + minor improvements

* fix(rollout): enable default

* fix(tests): default value RTCConfig

* fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): prevent relativeactions with sync inference engine

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): rtc reanchor to non normalized state

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): fixing the episode length to use hwc (#3469)

also reducing default length to 5 minutes

* feat(rollout): go back to initial position is now a config

* fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470)

* chore(rollout): note about dagger correction stage

* chore(docs): update comments and docstring

* fix(test): move rtc relative out of rollout module

* fix(rollout): address the review comments

---------

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>
2026-04-28 00:57:35 +02:00
whats2000 52f508c51c fix(dataset): cleanup_interrupted_episode wipes image temp dirs (#3405) 2026-04-19 12:04:24 +02:00
Steven Palma df0763a2bc feat(dependencies): minimal default tag install (#3362) 2026-04-12 20:03:04 +02:00
Caroline Pascal d762f4bfe8 fix(dataset): adding metadata loading when reading from a dataset after writing (#3305)
* fix(one shot load): adding metadata loading when reading from a dataset after writing

* refactor(one shot load): move metadata reload to ensure_readable() on LeRobotDatasetMetadata

Move the metadata reload from DatasetReader.load_and_activate() to a new
public ensure_readable() method on LeRobotDatasetMetadata, called from
LeRobotDataset._ensure_reader(). This places lifecycle management in the
right layer: metadata owns its readiness check, the dataset orchestrates
the write-to-read transition, and the reader stays clean.

Also adds a regression test using delta_timestamps to exercise the
meta.episodes access path in the create -> write -> finalize -> read flow.

Co-authored-by: Steven Palma <imstevenpmwork@users.noreply.github.com>

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@users.noreply.github.com>
2026-04-10 11:29:40 +02:00
Francesco Capuano 7c032f19fc feat(dataset): registering torchvision transforms (#3153)
* add: a flexible transformation registry

* fix: image transforms can be set both at init and after

* add: tests

* fix: take in review

* feat(datasets): add image transform setters

* fix: pre-commit

* fix: CI

---------

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
2026-04-07 15:59:11 +02:00
Steven Palma 4e45acca52 fix(dataset): use revision-safe Hub cache for downloaded datasets (#3233)
* refactor(dataset): enhance dataset root directory handling and introduce hub cache support

- Updated DatasetConfig and LeRobotDatasetMetadata to clarify root directory behavior and introduce a dedicated hub cache for downloads.
- Refactored LeRobotDataset and StreamingLeRobotDataset to utilize the new hub cache and improve directory management.
- Added tests to ensure correct behavior when using the hub cache and handling different revisions without a specified root directory.

* refactor(dataset): improve root directory handling in LeRobotDataset

- Updated LeRobotDataset to store the requested root path separately from the actual root path.
- Adjusted metadata loading to use the requested root, enhancing clarity and consistency in directory management.

* refactor(dataset): minor improvements for hub cache support

* chore(datasets): guard in resume + assertion test

---------

Co-authored-by: AdilZouitine <adilzouitinegm@gmail.com>
Co-authored-by: mickaelChen <mickael.chen.levinson@gmail.com>
2026-03-27 22:21:55 +01:00
Steven Palma 123495250b refactor(dataset): split LeRobotDataset into DatasetReader & DatasetWriter (+ API cleanup) (#3180)
* refactor(dataset): split reader and writer

* chore(dataset): remove proxys

* refactor(dataset): better reader & writer encapsulation

* refactor(datasets): clean API + reduce leaky implementations

* refactor(dataset): API cleaning for writer, reader and meta

* refactor(dataset): expose writer & reader + other minor improvements

* refactor(dataset): improve teardown routine

* refactor(dataset): add hf_dataset property at the facade level

* chore(dataset): add init for datasset module

* docs(dataset): add docstrings for public API of the dataset classes

* tests(dataset): add tests for new classes

* fix(dataset): remove circular dependecy
2026-03-26 19:09:25 +01:00
Steven Palma d90e4bcfd3 refactor(dataset): modular files (#3171)
* refactor(dataset): modular files

* refactor(dataset): update imports across the codebase
2026-03-15 23:58:09 -07:00
Steven Palma 9d3b62aa61 chore(dataset): basic house-keeping (#3170) 2026-03-15 22:12:09 -07:00
Steven Palma 7c2ec31793 refactor(datasets): module cleanup (#3169) 2026-03-15 20:42:15 -07:00
Steven Palma e96339a3b4 feat(dataset): add streaming video encoding + HW encoder support (#2974)
* feat(dataset): init stream encoding

* feat(dataset): use threads to fix frame pickle latency

* refactor(dataset): remove HW encoded related changes

* add lp (#2977)

* feat(dataset): add Hw encoding + log drop frames (#2978)

* chore(docs): add streaming video encoding guide

* fix(dataset): style docs + testing

* chore(docs): simplify sttreaming video encoding guide

* chore(dataset): add commands + streaming encoding default false + print note if false + queue default is now 30

* chore(docs): add verification note advice

* chore(dataset): adjusting defaults & docs for streaming encoding

* docs(scripts): improve docstrings

* test(dataset): polish streaming encoding tests

* chore(dataset): move FYI log related to streaming

* chore(dataset): add arg vcodec to suggestions

* refactor(dataset): better handling for auto and available vcodec

* chore(dataset): change log level

* docs(dataset): add note related to training performance vcodec

* docs(dataset): add more notes to streaming encoding

---------

Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
2026-02-23 13:57:43 +01:00
Reece O'Mahoney 97e7e0f9ed feat(datasets): improve image transform support (#2885)
* improve image transform support

* add tests

* Add stricter transform check and extra test

* improve subclass check
2026-02-05 15:39:58 +01:00
Jade Choghari b18cef2e26 feat(dataset): add subtask support (#2860)
* add subtask

* remove folder

* add docs

* update doc

* add testing

* update test

* update constant naming + doc

* more docs
2026-01-30 19:29:37 +01:00
Michel Aractingi ec04b7ce3a Feat(dataset_tools.py) Add modify tasks tool (#2875)
* feat(datasets): add modify_tasks function for in-place task editing

Add a new utility function to modify tasks in LeRobotDataset in-place.
This allows users to:
- Set a single task for all episodes
- Set specific tasks for individual episodes
- Combine a default task with per-episode overrides

* feat(edit-dataset): add CLI support for modify_tasks operation

Integrate the modify_tasks function into lerobot_edit_dataset CLI.
Users can now modify dataset tasks via command line:
Supports setting a default task, per-episode tasks, or both combined.

* test(datasets): add tests for modify_tasks function

Add comprehensive test coverage for the modify_tasks utility:
- Single task for all episodes
- Episode-specific task assignment
- Default task with per-episode overrides
- Error handling for missing/invalid arguments
- Verification of task_index correctness
- In-place modification behavior
- Metadata preservation

* respond to copilot review
2026-01-30 13:19:42 +01:00
Michel Aractingi 736b43f3cf Fix(aggregate.py) Aggregation of datasets when sub-datasets are already a result of a previous merge (#2861)
* Fix aggeregation of datasets when subdatasets are already a result of a previous merge

* docstring

* respond to copilot review + add regression test

* Remove unnecessary int conversion for indicies
2026-01-28 13:31:27 +01:00
Jade Choghari 79688a09f2 improve(dataset-tools): image2video editing tools : Multiple episodes per video file (#2811)
* improve image2video

* add episodes video encoding

* fix mypy failing

* iterate on review

* nit

* remove max, and let it be optional

* iterate more

* update docs

* fix test

---------

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2026-01-20 11:04:22 +01:00
Francesco Capuano b2ff219624 Fixes aggregation of image datasets (#2717)
* fix: use features when aggregating image based datasets

* add: test asserting for data type

* add: features param to writing dataset

---------

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-19 23:36:41 +01:00
Alex Tyshka 77dc49b3a3 Fix delta timestamps with episodes filter and add tests (#2612) 2026-01-16 18:14:54 +01:00
Alex Tyshka 33910673ec Bugfix: Add tests for image deletion and fix mixed image-video deletion (#2592)
* Add tests for image deletion and fix mixed-image-video deletion

* Fix docstring whitespace

* Remove debug print

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>

* Remove inaccurate comment

* Remove batched video test

---------

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>
2026-01-16 18:14:15 +01:00
Leo Tronchon 8b6fc0ae05 feat(datasets): expose video codec option for dataset recording (#2771)
* expose codec options + add tests

* pre-commit run -a
2026-01-08 18:06:39 +01:00
Jade Choghari 7f40b3bf82 feat(dataset): add tool to convert images to video datasets (#2560)
* add video encoding tool

* style

* make it work

* more fixes
2025-12-08 18:50:21 +01:00
Michel Aractingi 12f2f35760 - Introduce _current_file_start_frame for better tracking of the number of frames in each parquet file (#2280)
- Added testing for that section in `test_datasets.py`
2025-10-21 16:17:12 +02:00
Bryson Jones 88100943ef add affine transforms and test (#2145)
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-10-19 21:39:30 +02:00
Michel Aractingi 8e940bf361 Feat/expand add features (#2202)
* make add_feature take multiple features at a time and rename to add_features

* - New function: modify_features that was a combination of remove features and add features.
 - This function is important for when we want to add a feature and remove another so we can do it in one time to avoid copying and creating the dataset multiple times
2025-10-14 16:19:50 +02:00
Michel Aractingi f2ff370459 Incremental parquet writing (#1903)
* incremental parquet writing

* add .finalise() and a backup __del__ for stopping writers

* fix missing import

* precommit fixes added back the use of embed images

* added lazy loading for hf_Dataset to avoid frequently reloading the dataset during recording

* fix bug in video timestamps

* Added proper closing of parquet file before reading

* Added rigorous testing to validate the consistency of the meta data after creation of a new dataset

* fix bug in episode index during clear_episode_buffer

* fix(empty concat): check for empty paths list before data files concatenation

* fix(v3.0 message): updating v3.0 backward compatibility message.

* added fixes for the resume logic

* answering co-pilot review

* reverting some changes and style nits

* removed unused functions

* fix chunk_id and file_id when resuming

* - fix parquet loading when resuming
- add test to verify the parquet file integrity when resuming so that data files are now overwritten

* added general function get_file_size_in_mb and removed the one for video

* fix table size value when resuming

* Remove unnecessary reloading of the parquet file when resuming record.
Write to a new parquet file when resuming record

* added back reading parquet file for image datasets only

* - respond to Qlhoest comments
- Use pyarrows `from_pydict` function
- Add buffer for episode metadata to write to the parquet file in batches to improve efficiency
- Remove the  use of `to_parquet_with_hf_images`

* fix(dataset_tools) with the new logic using proper finalize
bug in finding the latest path of the metdata that was pointing to the data files
added check for the metadata size in the case the metadatabuffer was not written yet

* nit in flush_metadata_buffer

* fix(lerobot_dataset) return the right dataset len when a subset of the dataset is requested

---------

Co-authored-by: Harsimrat Sandhawalia <hs.sandhawalia@gmail.com>
2025-10-11 11:01:30 +02:00
Michel Aractingi b8f7e401d4 Dataset tools (#2100)
* feat(dataset-tools): add dataset utilities and example script

- Introduced dataset tools for LeRobotDataset, including functions for deleting episodes, splitting datasets, adding/removing features, and merging datasets.
- Added an example script demonstrating the usage of these utilities.
- Implemented comprehensive tests for all new functionalities to ensure reliability and correctness.

* style fixes

* move example to dataset dir

* missing lisence

* fixes mostly path

* clean comments

* move tests to functions instead of class based

* - fix video editting, decode, delete frames and rencode video
- copy unchanged video and parquet files to avoid recreating the entire dataset

* Fortify tooling tests

* Fix type issue resulting from saving numpy arrays with shape 3,1,1

* added lerobot_edit_dataset

* - revert changes in examples
- remove hardcoded split names

* update comment

* fix comment
add lerobot-edit-dataset shortcut

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>

* style nit after copilot review

* fix: bug in dataset root when editing the dataset in place (without setting new_repo_id

* Fix bug in aggregate.py when accumelating video timestamps; add tests to fortify aggregate videos

* Added missing output repo id

* migrate delete episode to using pyav instead of decoding, writing frames to disk and encoding again.
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>

* added modified suffix in case repo_id is not set in delete_episode

* adding docs for dataset tools

* bump av version and add back time_base assignment

* linter

* modified push_to_hub logic in lerobot_edit_dataset

* fix(progress bar): fixing the progress bar issue in dataset tools

* chore(concatenate): removing no longer needed concatenate_datasets usage

* fix(file sizes forwarding): forwarding files and chunk sizes in metadata info when splitting and aggregating datasets

* style fix

* refactor(aggregate): Fix video indexing and timestamp bugs in dataset merging

There were three critical bugs in aggregate.py that prevented correct dataset merging:

1. Video file indices: Changed from += to = assignment to correctly reference
   merged video files

2. Video timestamps: Implemented per-source-file offset tracking to maintain
   continuous timestamps when merging split datasets (was causing non-monotonic
   timestamp warnings)

3. File rotation offsets: Store timestamp offsets after rotation decision to
   prevent out-of-bounds frame access (was causing "Invalid frame index" errors
   with small file size limits)

Changes:
- Updated update_meta_data() to apply per-source-file timestamp offsets
- Updated aggregate_videos() to track offsets correctly during file rotation
- Added get_video_duration_in_s import for duration calculation

* Improved docs for split dataset and added a check for the possible case that the split size results in zero episodes

* chore(docs): update merge documentation details

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

---------

Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
Co-authored-by: Jack Vial <vialjack@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-10-10 12:32:07 +02:00
Pepijn abde7be3b3 Add OpenPi, Pi0 and Pi0.5 (#1910)
* initial commit

* change device in test

* do detailed import

* adhere to python 3.11 syntax

* fix autodocstring

* additionally

* do same in other files

* add model. prefix to all keys in state dict

* use dummy stats

* add pi05

* also shorten action_steps

* fix test

* all test pass! and fix tokenizer max length between 05 and 0

* remove test

* fix transformer dependency

* fix test

* split pi0 and pi05 policy in seperate files

* fix test

* fix push to hub test

* add some comments, license and readme

* remove warning in config

* add pi05 to factory

* remove check

* rename action_horizon to chunk_size

* clean up padding of state and action (more in line with lerobot pi0)

* add openpi image transforms for training and add more flexibility to _preprocess_images similar to lerobot pi0

* fix key match from pytorch state dict (similar keys to openpi implementation now)

* also for pi05

* update to python 3.11

* revert to openpi transformer replace python 3.11

* fix(modeling pi0): nit  warning message

* use safeauto_docstring

* fix: remove unused param

* fix from pretrained

* add preprocess tests

* also compile forward method

* Do not add model prefix to normalization

* use same name for action and state dim as lerobot pi0 and remove fixed image keys

* load from pretrained_path

* temp: hardcode base model

* fix override self.pretrained_path = None overwrite

* rename to loss

* remove additional image augmentations, lerobot dataset already does this

* Add docs

* put tests in test folder

* Add test to instatiate all base models

* go back to python 3.10

* update docs

* adapt docs pi05

* change docs: finetune base model options

* minor docs fixes and dependencies

* remove todo

* cast float64 to float32 for mps

* skip if no transformers

* fix tests

* add new models to modelcard

* add back init

* fix circular input

* feat: only run pi test on GPU

* remove require_nightly_gpu

* replace decorator test_pi0_openpi

* rename action_dim, state_dim to max_action_dim, max_state_dim

* fix doc and constants

* cleanup tests

* fix from pretrained

* fix tests

* add comment pi0 pi05 tests, add image features to pi0 pi05 hub tests

* fix, state is included in language not in flow head

* Move test to specific folder

* and paligemma task with newline

* remove add_special_tokens, not needed

* feedback pr

* Remove previous pi0 and rename pi0_openpi and pi05_openpi

* Add Quantile stats to LeRobotDataset (#1985)

* - Add RunningQuantileStats class for efficient histogram-based quantile computation
- Integrate quantile parameters (compute_quantiles, quantiles) into LeRobotDataset
- Support quantile computation during episode collection and aggregation
- Add comprehensive function-based test suite (24 tests) for quantile functionality
- Maintain full backward compatibility with existing stats computation
- Enable configurable quantiles (default: [0.01, 0.99]) for robust normalization

* style fixes, make quantiles computation by default to new datasets

* fix tests

* - Added DEFAULT_QUANTILES=[0.01, 0.10, 0.50, 0.90, 0.99] to be computed for each features instead of being chosen by the user
- Fortified tests.

* - add helper functions to reshape stats
- add missing test for quantiles

* - Add QUANTILE normalization mode to normalize the data with the 1st and 99th percentiles.
- Add QUANTILE10 normalization mode to normalize the data with the 10th and 90th percentiles.

* style fixes

* Added missing lisence

* Simplify compute_stats

* - added script `augment_dataset_quantile_stats.py` so that we can add quantile stats to existing v3 datasets that dont have quatniles
- modified quantile computation instead of using the edge for the value, interpolate the values in the bin

* rename pi0/pi05 files

* Remove open pi patch and use custom transformer branch for now

* renaming

* fix

* Revert "fix"

This reverts commit 1ea65730ac.

* fix naming

* feet(pi0/pi0.5): add pipeline (#2009)

* feat(processor): convert openpi model with processor

* TODO: Make test works

* fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests

- Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`.
- Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`.
- Enhanced task handling in tests to ensure proper formatting and batch size consistency.
- Cleaned up commented-out test code for clarity.

* refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy

- Updated imports and references throughout the codebase to reflect the new naming convention.
- Introduced a new processor file for PI0 to handle pre-processing and post-processing steps.
- Adjusted tests to utilize the renamed classes, ensuring consistency and functionality.
- Enhanced clarity and maintainability by removing outdated naming conventions.

* refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration

- Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions.
- Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`.
- Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter.
- Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability.
- Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility.

* feat(processor): convert openpi model with processor

* TODO: Make test works

* fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests

- Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`.
- Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`.
- Enhanced task handling in tests to ensure proper formatting and batch size consistency.
- Cleaned up commented-out test code for clarity.

* refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy

- Updated imports and references throughout the codebase to reflect the new naming convention.
- Introduced a new processor file for PI0 to handle pre-processing and post-processing steps.
- Adjusted tests to utilize the renamed classes, ensuring consistency and functionality.
- Enhanced clarity and maintainability by removing outdated naming conventions.

* refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration

- Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions.
- Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`.
- Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter.
- Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability.
- Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility.

* refactor(pi05): update imports and rename configuration classes

- Changed imports to reflect the new naming convention for PI05 configuration and policy classes.
- Renamed `PI05OpenPIConfig` to `PI05Config` and `PI05OpenPIPolicy` to `PI05Policy` for consistency.
- Introduced a new processor file for PI05, implementing pre-processing and post-processing steps.
- Updated tests to utilize the renamed classes, ensuring functionality and consistency across the codebase.

* update(pi05): increase tokenizer_max_length for improved processing

- Changed the `tokenizer_max_length` from 48 to 200 to enhance the model's capability in handling longer sequences.
- This adjustment aims to improve the overall performance and flexibility of the PI05 configuration.

* add default for state (max_state_dim)

* correct naming

* fix import

* cleanup code

* remove unused test

* us quantiles for action

* move to device

* remove discrete state assert

* fix pi05 test

* move pi05 to device

* use base models in comparison tests

* small renames for tests

* change number of tokens pi05 test

* fix openpi tokenization in test

* fix hub test

* fix test

* assert lerobot vs openpi tests

---------

Co-authored-by: Pepijn <pepijn@huggingface.co>

* add headers

* add back previously removed imports

* update if statement load processor with dataset stats

* remove to avoid circular import

* inject dataset stats for pretrained models

* check normalization before applying

* add link to  quantile augument script

* fix(policies): transformers import for ci in PI0 & PI05 (#2039)

* fix(policies): transformers import for ci in PI0

* fix(policies): transformers import for ci in PI05

* test(processor): fix expected raise when normalization types are missing (#2040)

* switch normalization order pipeline for pi05

* Fix/quantiles script (#2064)

* refactor augment stats with quantiles script
add parallelization for faster processing
shift the quantile normalization between -1 1

* fix replay buffer tests

* fix comment

* overwrite the pipeline normalization features with the policy features

* remove double normalization overwrite

* cleanup from pretrained

* remove typo

* also set norm_map

* fix(augment_quantiles) images incorrectly divided by 255

* clamp quantiles

* link to lerobot base models

* rename tests

* encorperate PR feedback

* update docstring for RunningQuantileStats

* update doc links

* Revert "clamp quantiles"

This reverts commit 172207471c.

* fix self.paligemma

* fix tests related to quantiles that were scaled to [0,1], the new range is [-1, 1]

* fix libero doc and use different transformer branch

* use fix branch instead of feat

* update results libero

* add new line

* fix formatting

* precommit

* update results libero

* update libero doc

* update title

* final changes

* add quantiles to test

* run pre commit

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2025-10-02 13:14:45 +02:00
Steven Palma bbcf66bd82 chore: enable simplify in ruff lint (#2085) 2025-09-29 15:06:56 +02:00
Steven Palma c5b5955c5a chore: replace hard-coded next values with constants throughout all the source code (#2056) 2025-09-26 14:30:07 +02:00
Steven Palma d2782cf66b chore: replace hard-coded action values with constants throughout all the source code (#2055)
* chore: replace hard-coded 'action' values with constants throughout all the source code

* chore(tests): replace hard-coded action values with constants throughout all the test code
2025-09-26 13:33:18 +02:00
Steven Palma 43d878a102 chore: replace hard-coded obs values with constants throughout all the source code (#2037)
* chore: replace hard-coded OBS values with constants throughout all the source code

* chore(tests): replace hard-coded OBS values with constants throughout all the test code
2025-09-25 15:36:47 +02:00
Steven Palma 7cf04a5ec3 chore: move constants to utils (#2016) 2025-09-24 11:11:53 +02:00
Steven Palma c9787bd98a feat(script): add entry point for image transform viz (#2007)
* feat(Scripts): add entry point for img transform viz

* chore(style): pre-commit style
2025-09-23 18:47:36 +02:00