* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* refactor(dataset): split reader and writer
* chore(dataset): remove proxys
* refactor(dataset): better reader & writer encapsulation
* refactor(datasets): clean API + reduce leaky implementations
* refactor(dataset): API cleaning for writer, reader and meta
* refactor(dataset): expose writer & reader + other minor improvements
* refactor(dataset): improve teardown routine
* refactor(dataset): add hf_dataset property at the facade level
* chore(dataset): add init for datasset module
* docs(dataset): add docstrings for public API of the dataset classes
* tests(dataset): add tests for new classes
* fix(dataset): remove circular dependecy
* Add SLURM SARM progress annotation script.
Provide a standalone two-stage compute/aggregate pipeline for RA-BC progress generation so large datasets can be processed in parallel and optionally uploaded to the Hub.
Made-with: Cursor
* fix pr comments
* remove comments
* make add_feature take multiple features at a time and rename to add_features
* - New function: modify_features that was a combination of remove features and add features.
- This function is important for when we want to add a feature and remove another so we can do it in one time to avoid copying and creating the dataset multiple times
* feat(dataset-tools): add dataset utilities and example script
- Introduced dataset tools for LeRobotDataset, including functions for deleting episodes, splitting datasets, adding/removing features, and merging datasets.
- Added an example script demonstrating the usage of these utilities.
- Implemented comprehensive tests for all new functionalities to ensure reliability and correctness.
* style fixes
* move example to dataset dir
* missing lisence
* fixes mostly path
* clean comments
* move tests to functions instead of class based
* - fix video editting, decode, delete frames and rencode video
- copy unchanged video and parquet files to avoid recreating the entire dataset
* Fortify tooling tests
* Fix type issue resulting from saving numpy arrays with shape 3,1,1
* added lerobot_edit_dataset
* - revert changes in examples
- remove hardcoded split names
* update comment
* fix comment
add lerobot-edit-dataset shortcut
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
* style nit after copilot review
* fix: bug in dataset root when editing the dataset in place (without setting new_repo_id
* Fix bug in aggregate.py when accumelating video timestamps; add tests to fortify aggregate videos
* Added missing output repo id
* migrate delete episode to using pyav instead of decoding, writing frames to disk and encoding again.
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
* added modified suffix in case repo_id is not set in delete_episode
* adding docs for dataset tools
* bump av version and add back time_base assignment
* linter
* modified push_to_hub logic in lerobot_edit_dataset
* fix(progress bar): fixing the progress bar issue in dataset tools
* chore(concatenate): removing no longer needed concatenate_datasets usage
* fix(file sizes forwarding): forwarding files and chunk sizes in metadata info when splitting and aggregating datasets
* style fix
* refactor(aggregate): Fix video indexing and timestamp bugs in dataset merging
There were three critical bugs in aggregate.py that prevented correct dataset merging:
1. Video file indices: Changed from += to = assignment to correctly reference
merged video files
2. Video timestamps: Implemented per-source-file offset tracking to maintain
continuous timestamps when merging split datasets (was causing non-monotonic
timestamp warnings)
3. File rotation offsets: Store timestamp offsets after rotation decision to
prevent out-of-bounds frame access (was causing "Invalid frame index" errors
with small file size limits)
Changes:
- Updated update_meta_data() to apply per-source-file timestamp offsets
- Updated aggregate_videos() to track offsets correctly during file rotation
- Added get_video_duration_in_s import for duration calculation
* Improved docs for split dataset and added a check for the possible case that the split size results in zero episodes
* chore(docs): update merge documentation details
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
---------
Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
Co-authored-by: Jack Vial <vialjack@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* Remove unused scripts, add docs for image transforms and add example
* fix(examples): move train_policy.py under examples, remove outdated readme parts
* remove script thats copied to train folder
* remove outdated links to examples and example tests