* fix(deps): better versioning control for torchcodec
* refactor(video_utils): replace torchvision with pyav
* adding Torchcodec version to lerobot-info
* chore(benchmarks): delete video benchmark
---------
Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co>
* refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
* chore: clarify torch.compile disabled note in SACAlgorithm
* fix(teleop): keyboard EE teleop not registering special keys and losing intervention state
Fixes#2345
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
* fix: remove leftover normalization calls from reward classifier predict_reward
Fixes#2355
* fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()
* refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference
* perf: remove redundant CPU→GPU→CPU transition move in learner
* Fix: add kwargs in reward classifier __init__()
* fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer
* fix: add try/finally to control_loop to ensure image writer cleanup on exit
* fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error
* fix: skip tests that require grpc if not available
* fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests
* fix(tests): skip tests that require grpc if not available
* refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages
* fix(config): update vision encoder model name to lerobot/resnet10
* fix(sac): clarify torch.compile status
* refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity
* refactor(sac): simplify optimizer return structure
* perf(rl): use async iterators in OnlineOfflineMixer.get_iterator
* refactor(sac): decouple algorithm hyperparameters from policy config
* update losses names in tests
* fix docstring
* remove unused type alias
* fix test for flat dict structure
* refactor(policies): rename policies/sac → policies/gaussian_actor
* refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
* perf(observation_processor): add CUDA support for image processing
* fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline
(cherry picked from commit 9c2af818ff)
* fix(rl): add time limit processor to environment pipeline
(cherry picked from commit cd105f65cb)
* fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100
(cherry picked from commit 494f469a2b)
* fix(rl): update neutral gripper action
(cherry picked from commit 9c9064e5be)
* fix(rl): merge environment and action-processor info in transition processing
(cherry picked from commit 30e1886b64)
* fix(rl): mirror gym_manipulator in actor
(cherry picked from commit d2a046dfc5)
* fix(rl): postprocess action in actor
(cherry picked from commit c2556439e5)
* fix(rl): improve action processing for discrete and continuous actions
(cherry picked from commit f887ab3f6a)
* fix(rl): enhance intervention handling in actor and learner
(cherry picked from commit ef8bfffbd7)
* Revert "perf(observation_processor): add CUDA support for image processing"
This reverts commit 38b88c414c.
* refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
* refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation
* refactor(rl): add type property to RLAlgorithmConfig for better clarity
* refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility
* refactor(tests): remove grpc import checks from test files for cleaner code
* fix(tests): gate RL tests on the `datasets` extra
* refactor: simplify docstrings for clarity and conciseness across multiple files
* fix(rl): update gripper position key and handle action absence during reset
* fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset
* refactor: clean up import statements
* chore: address reviewer comments
* chore: improve visual stats reshaping logic and update docstring for clarity
* refactor: enforce mandatory config_class and name attributes in RLAlgorithm
* refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer
* refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
* refactor: add require_package calls for grpcio and gym-hil in relevant modules
* refactor(rl): move grpcio guards to runtime entry points
* feat(rl): consolidate HIL-SERL checkpoint into HF-style components
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
* refactor(rl): move actor weight-sync wire format from policy to algorithm
* refactor(rl): update type hints for learner and actor functions
* refactor(rl): hoist grpcio guard to module top in actor/learner
* chore(rl): manage import pattern in actor (#3564)
* chore(rl): manage import pattern in actor
* chore(rl): optional grpc imports in learner; quote grpc ServicerContext types
---------
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
* update uv.lock
* chore(doc): update doc
---------
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* chore(deps): ceiling + cuda
* ci: bump cuda version docker image
* ci: add cpu wheel to release workflow
* chore(deps): update uv.lock
* docs: update installation with cuda note
* docs(omx): adding some examples and scripts
* cleaning up and reviewing the cli args
* adding __init__.py to example folder, adjusting the examples
* adding reference to pretrained act policy
* moving `.send_action` before `dataset.add_frame` for consistency
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adjusting docstring
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adressing hardcoded dataset fps
* removed init as it worked without
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
If VideoDecoder() raises during initialization, the fsspec file handle
was leaked since it was opened via __enter__() but never closed on the
exception path. Now explicitly closes the handle before re-raising.
* chore(deps): allow torch 2.11/2.12 and fix autocast deprecation
- Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26),
and torchcodec to <0.13 (was <0.11) to allow installs against the latest
stable torch 2.11 and the upcoming 2.12 line.
- Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda")
in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+).
- Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130,
torchcodec 0.11.1, full CUDA 13 stack).
Verified locally with `uv sync --locked` from a clean .venv and the lerobot
test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is
identical to the pre-bump baseline: 18 pre-existing failures
(test_sac_policy*, test_pi0_rtc*, test_pi05_rtc*, test_replay_buffer*),
0 new, 0 fixed.
AI assistance: this change was authored with Claude Code per AI_POLICY.md.
* fix(policies): use device-agnostic autocast dtype lookup
Pass query_states.device.type to torch.get_autocast_dtype() instead of
hardcoding 'cuda', so the cast matches the active autocast context when
running under CPU/MPS/XPU autocast.
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* fix(train): restrict legacy RA-BC migration to JSON checkpoints only
_migrate_legacy_rabc_fields was called for all config files, causing
json.load to raise DecodeError when a YAML/TOML config was passed to
lerobot-train for a new training run. Guard the block with an
.endswith(".json") check so migration only runs when resuming from
a JSON checkpoint.
* fix(ci): run multi-task benchmark evals 5-at-a-time in parallel
The eval script supports running tasks concurrently via a
ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four
multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus
— 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of
running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra)
are unchanged.
* fix(ci): cap VLABench smoke eval at 50 steps per task
VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s
the smoke eval took ~80 minutes of rollouts on top of the image build.
The eval is a pipeline smoke test (running_success_rate stays at 0% on
this short rollout anyway), so we don't need full episodes — cap each
task at 50 steps to bring total rollout time down ~10x.
* fix(ci): run VLABench tasks 5-at-a-time in parallel
The eval script already supports running multiple tasks concurrently via
a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10
VLABench tasks finish in ~2 waves instead of running sequentially.
* feat: add pretrained vision encoder weights for diffusion and vqbet
* fix test by re-generating artifacts
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and
cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3
on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA
repo, so the build failed at apt-get install. Bumping both to -12-6 to
match the base image.
* feat(policies): add EO-1 model
* chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx
* chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs.
This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path.
Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in
* chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass
Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.
Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo
Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation
* fix pre-commit
* update docstring inside DatasetInfo.from_dict()
* sorts the unknown to have deterministic output
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* refactoring the last few old fieds
* fix crop dataset roi type mismatch
* use consistantly int for data and video_files_size_in_mb
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding
In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return BaseModelOutputWithPooling instead of a plain torch.FloatTensor.
Added isinstance check to extract pooler_output when the return value is not
a tensor, maintaining backward compatibility with transformers 4.x.
Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach'
* Adding assertion check for pooler_output of CLIP. This change is response to below comment.
https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387
* Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise
https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>