* refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
* chore: clarify torch.compile disabled note in SACAlgorithm
* fix(teleop): keyboard EE teleop not registering special keys and losing intervention state
Fixes#2345
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
* fix: remove leftover normalization calls from reward classifier predict_reward
Fixes#2355
* fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()
* refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference
* perf: remove redundant CPU→GPU→CPU transition move in learner
* Fix: add kwargs in reward classifier __init__()
* fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer
* fix: add try/finally to control_loop to ensure image writer cleanup on exit
* fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error
* fix: skip tests that require grpc if not available
* fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests
* fix(tests): skip tests that require grpc if not available
* refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages
* fix(config): update vision encoder model name to lerobot/resnet10
* fix(sac): clarify torch.compile status
* refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity
* refactor(sac): simplify optimizer return structure
* perf(rl): use async iterators in OnlineOfflineMixer.get_iterator
* refactor(sac): decouple algorithm hyperparameters from policy config
* update losses names in tests
* fix docstring
* remove unused type alias
* fix test for flat dict structure
* refactor(policies): rename policies/sac → policies/gaussian_actor
* refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
* perf(observation_processor): add CUDA support for image processing
* fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline
(cherry picked from commit 9c2af818ff)
* fix(rl): add time limit processor to environment pipeline
(cherry picked from commit cd105f65cb)
* fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100
(cherry picked from commit 494f469a2b)
* fix(rl): update neutral gripper action
(cherry picked from commit 9c9064e5be)
* fix(rl): merge environment and action-processor info in transition processing
(cherry picked from commit 30e1886b64)
* fix(rl): mirror gym_manipulator in actor
(cherry picked from commit d2a046dfc5)
* fix(rl): postprocess action in actor
(cherry picked from commit c2556439e5)
* fix(rl): improve action processing for discrete and continuous actions
(cherry picked from commit f887ab3f6a)
* fix(rl): enhance intervention handling in actor and learner
(cherry picked from commit ef8bfffbd7)
* Revert "perf(observation_processor): add CUDA support for image processing"
This reverts commit 38b88c414c.
* refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
* refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation
* refactor(rl): add type property to RLAlgorithmConfig for better clarity
* refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility
* refactor(tests): remove grpc import checks from test files for cleaner code
* fix(tests): gate RL tests on the `datasets` extra
* refactor: simplify docstrings for clarity and conciseness across multiple files
* fix(rl): update gripper position key and handle action absence during reset
* fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset
* refactor: clean up import statements
* chore: address reviewer comments
* chore: improve visual stats reshaping logic and update docstring for clarity
* refactor: enforce mandatory config_class and name attributes in RLAlgorithm
* refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer
* refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
* refactor: add require_package calls for grpcio and gym-hil in relevant modules
* refactor(rl): move grpcio guards to runtime entry points
* feat(rl): consolidate HIL-SERL checkpoint into HF-style components
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
* refactor(rl): move actor weight-sync wire format from policy to algorithm
* refactor(rl): update type hints for learner and actor functions
* refactor(rl): hoist grpcio guard to module top in actor/learner
* chore(rl): manage import pattern in actor (#3564)
* chore(rl): manage import pattern in actor
* chore(rl): optional grpc imports in learner; quote grpc ServicerContext types
---------
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
* update uv.lock
* chore(doc): update doc
---------
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* feat(async_inference): server always sends CPU tensors, client handles device conversion
* fix:fix the type annotation of RawObservation in src/lerobot/async_inference/helpers.py
* update the import of robot_client
---------
Co-authored-by: Sato shinji <wwwsatoshinji@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: KB <kevin-brian.n-diaye@epita.fr>
* (unscrewing things up) (#2288)
* fix: expose a function explicitly building a frame for inference
* fix: first make dataset frame, then make ready for inference
* fix: reducing reliance on lerobot record for policy's ouptuts too
* fix: encapsulating squeezing out + device handling from predict action
* fix: remove duplicated call to build_inference_frame and add a function to only perform data type handling (whole conversion is: keys matching + data type conversion)
* refactor(envs): add custom-observation-size (#2167)
* fix: add MockMotorBus to MockRobot
* rl: first drafts
* add: all components of HIL SERL
* fix: actor block works
* fix: less friction, less friction
* add: hil-serl complete example
* fix: dataset names
* fix: restructuring example folder
* fix: act works but found bug in how ACT works
* fix: same path for both pre and postprocessors
* fix: paths
* add: example usage for act
* add: using ACT example
* fix: training examples
* fix: using examples
* fix: camera index
* fix: rename workflows into tutorial so that the path of the files is lerobot/examples/tutorial/...
* fix: upload everything in one repo
* fix: model name
* fix: simplify model path
* add: VLAs example
---------
Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
* fix: minor fix using named attributes
* fix: change model to act
* fix: named attributes for inference frame building
* fix: minor fixes to smolvla
* fix: small changes to pi0
* remove: old file that should have never been committed (ups sorry sorry)
---------
Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>