lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-11 14:49:43 +00:00

Author	SHA1	Message	Date
Khalil Meftah	0944b84279	feat(rl): consolidate HIL-SERL checkpoint into HF-style components Make and s, add abstract / for algorithm-owned tensors (critics, target nets, ), and persist them as a sibling component next to . Replace the pickled side-file with an enriched carrying both and , so resume restores actor + critics + target nets + temperature + optimizers + RNG + counters from plain HF-standard files.	2026-05-08 21:24:23 +02:00
Khalil Meftah	b1b2708e2f	refactor(rl): move grpcio guards to runtime entry points	2026-05-08 11:03:00 +02:00
Khalil Meftah	f5a5ca04e2	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor # Conflicts: # src/lerobot/policies/factory.py	2026-05-07 13:52:55 +02:00
Khalil Meftah	0a470b0701	refactor: add require_package calls for grpcio and gym-hil in relevant modules	2026-05-07 13:44:54 +02:00
Pepijn	82dffde7fa	fix(ci): speed up multi-task benchmark evals (parallelize + cap VLABench steps) (#3529 ) * fix(ci): run multi-task benchmark evals 5-at-a-time in parallel The eval script supports running tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus — 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra) are unchanged. * fix(ci): cap VLABench smoke eval at 50 steps per task VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s the smoke eval took ~80 minutes of rollouts on top of the image build. The eval is a pipeline smoke test (running_success_rate stays at 0% on this short rollout anyway), so we don't need full episodes — cap each task at 50 steps to bring total rollout time down ~10x. * fix(ci): run VLABench tasks 5-at-a-time in parallel The eval script already supports running multiple tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10 VLABench tasks finish in ~2 waves instead of running sequentially.	2026-05-07 13:37:16 +02:00
Ville Kuosmanen	eaf0218bc8	feat(policy): use pretrained vision encoder weights by default for diffusion and vqbet (#3202 ) * feat: add pretrained vision encoder weights for diffusion and vqbet * fix test by re-generating artifacts --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-07 12:10:38 +02:00
Khalil Meftah	29fc0c6d28	refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests	2026-05-07 12:09:23 +02:00
Khalil Meftah	f1bdd6744f	refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer	2026-05-07 11:48:41 +02:00
Khalil Meftah	758964984c	refactor: enforce mandatory config_class and name attributes in RLAlgorithm	2026-05-07 11:37:02 +02:00
Khalil Meftah	84f74cf0bf	chore: improve visual stats reshaping logic and update docstring for clarity	2026-05-07 11:14:57 +02:00
Pepijn	a0e52d52fe	fix(ci): bump robotwin benchmark image to CUDA 12.6 (#3525 ) The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3 on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA repo, so the build failed at apt-get install. Bumping both to -12-6 to match the base image.	2026-05-07 11:11:12 +02:00
Khalil Meftah	ac83f4797c	chore: address reviewer comments	2026-05-07 10:43:59 +02:00
Haoming Song	e99c55af4b	feat(policies): add EO-1 model (#3403 ) * feat(policies): add EO-1 model * chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx * chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-06 18:01:16 +02:00
Steven Palma	408e0ca763	fix(robots): openarm features with openarmmini (#3524 )	2026-05-06 17:03:09 +02:00
Maxime Ellerbach	ce24063efd	feat(dagger): adding smooth handover (#3506 ) * feat(dagger): adding smooth handover * update docstring * small phase fix and documenting potential issues * cleaning up	2026-05-05 14:44:32 +02:00
Steven Palma	82934719db	chore(dep): bump transformers to 5.4.0 (#3374 ) * fix(deps): breaking change from transformers 5.4.0 * Update src/lerobot/policies/xvla/modeling_florence2.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/wall_x/qwen_model/qwen2_5_vl_moe.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * removing dataclass * bumping transformers 5.4.0 * weird i can't even pass the test on main * oops, typo * chore(style): fix pre-commit run * chore: update uv.lock * seems like a weird numerical precision issue, lets check in runners * chore: update uv.lock * chore(dependecies): adjust transformers version * chore: update uv.lock --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co> Co-authored-by: raushan <raushan@huggingface.co>	2026-05-05 14:19:09 +02:00
Steven Palma	401a217597	chore(ci): increase time stale (#3507 )	2026-05-04 22:35:16 +02:00
Steven Palma	40094b0464	chore(ci): upgrade docker internal (#3505 )	2026-05-04 21:28:52 +02:00
Khalil Meftah	ebe6ea34df	refactor: clean up import statements	2026-05-04 20:20:05 +02:00
Khalil Meftah	2bc273c53b	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor	2026-05-04 19:58:03 +02:00
Khalil Meftah	0d60a855be	fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset	2026-05-04 19:33:45 +02:00
Jash Shah	fdbfc015a2	fix(peft): fix LoRA resume from Hub (PosixPath + double wrap) (#3485 )	2026-05-04 10:52:37 +02:00
Khalil Meftah	d4a568ee6c	fix(rl): update gripper position key and handle action absence during reset	2026-04-30 14:56:40 +02:00
Haoming Song	d656da8ccc	fix(pi): keep training sampling outside compiled forwards (#3487 ) Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs. This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path. Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-30 13:21:17 +02:00
Khalil Meftah	b5f65e5332	Expose sarm package API and ship reward model card template (#3477 ) * chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in * chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.	2026-04-29 16:17:16 +02:00
Khalil Meftah	cd6b43ea7a	fix(train): migrate legacy RA-BC fields in train config loading (#3480 )	2026-04-29 16:17:00 +02:00
Steven Palma	2236bbe7a3	fix(rollout): propagate policy-specific CLI config paramaters (#3483 ) Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-29 16:13:10 +02:00
Maxime Ellerbach	cb0a944941	refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472 ) * refactor(datasets): replace untyped dict with typed DatasetInfo dataclass Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json. Changes: - Add DatasetInfo dataclass with explicit fields and validation - Implement __post_init__ for shape conversion (list ↔ tuple) - Add dict-style compatibility layer (__getitem__, __setitem__, .get()) - Add from_dict() and to_dict() for JSON serialization - Update io_utils to use load_info/write_info with DatasetInfo - Update dataset utilities and metadata to use attribute access - Remove aggregate.py dict-style field access - Add tests fixture support for DatasetInfo Benefits: - Type safety with IDE auto-completion - Validation at construction time - Explicit schema documentation * fix pre-commit * update docstring inside DatasetInfo.from_dict() * sorts the unknown to have deterministic output Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * refactoring the last few old fieds * fix crop dataset roi type mismatch * use consistantly int for data and video_files_size_in_mb --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: jjolla93 <jjolla93@gmail.com>	2026-04-28 18:40:30 +02:00
Khalil Meftah	8a3d64033f	Reward models refactor (#3142 ) * feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes * refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/ * refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/ * refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py * refactor(rewards): update imports and delete old reward model locations * test(rewards): add reward model tests and update existing test imports * fix(rewards): restore full Classifier and SARM implementations * test(rewards): restore missing CUDA and mixed precision classifier processor tests * refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train * refactor(lerobot_train.py): add missing sampling weight script * linter + missing files * add testing for sampl weighter * revert some useless changes, improve typing * update docs * add automatic detection of the progress path * remove type exp * improve comment * fix: move rabc.py to rewards/sarm/ and update import paths * refactor(imports): update reward model imports to new module structure * refactor(imports): update reward model imports to reflect new module structure * refactor(imports): conditionally import pandas based on availability * feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig * refactor(policies): remove reward model branches from policy factory and __init__ * refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash * feat(train): route reward model training through rewards/factory instead of policies/factory * refactor(train): streamline reward model training logic * fix(rewards): ensure FileNotFoundError is raised for missing config_file * refactor(train): update __get_path_fields__ to include reward_model for config loading * refactor(classifier): remove redundant input normalization in predict_reward method * fix(train): raise ValueError for non-trainable reward models in train function * refactor(pretrained_rm): add model card template * refactor(tests): reward models * refactor(sarm): update reset method and remove unused action prediction methods * refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function * fix(train): raise ValueError for PEFT usage in reward model training * refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2026-04-28 17:56:24 +02:00
Steven Palma	03ee50e08f	chore(ci): bump docs workflows (#3476 )	2026-04-28 15:06:44 +02:00
Khalil Meftah	6ed80f5a59	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor # Conflicts: # src/lerobot/policies/__init__.py # src/lerobot/rl/actor.py	2026-04-28 12:04:13 +02:00
Khalil Meftah	ef6b3b5b0f	refactor: simplify docstrings for clarity and conciseness across multiple files	2026-04-28 11:11:02 +02:00
Steven Palma	ca87ccd941	feat(rollout): decouple policy deployment from data recording with new `lerobot-rollout` CLI (#3413 ) * feat(scripts): lerobot-rollout * fix(rollout) require dataset in dagger + use duration too * fix(docs): dagger num_episodes * test(rollout): fix expectations * fix(rollout): features check * fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config * docs(rollout): edit rename_map instructions * chore(rollout): multiple minor improvements * chore(rollout): address coments + minor improvements * fix(rollout): enable default * fix(tests): default value RTCConfig * fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): prevent relativeactions with sync inference engine Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): rtc reanchor to non normalized state Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): fixing the episode length to use hwc (#3469) also reducing default length to 5 minutes * feat(rollout): go back to initial position is now a config * fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470) * chore(rollout): note about dagger correction stage * chore(docs): update comments and docstring * fix(test): move rtc relative out of rollout module * fix(rollout): address the review comments --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-28 00:57:35 +02:00
Steven Palma	77352c495c	chore(dependencies): update uv.lock (#3437 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-04-27 23:15:46 +02:00
Khalil Meftah	e298474bf3	fix(tests): gate RL tests on the `datasets` extra	2026-04-27 16:53:34 +02:00
Khalil Meftah	577f14337a	refactor(tests): remove grpc import checks from test files for cleaner code	2026-04-27 16:20:13 +02:00
Khalil Meftah	47be90f040	refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility	2026-04-27 15:59:59 +02:00
Khalil Meftah	47dd65347e	refactor(rl): add type property to RLAlgorithmConfig for better clarity	2026-04-27 15:57:24 +02:00
Khalil Meftah	fd5a788120	refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation	2026-04-27 15:55:16 +02:00
Khalil Meftah	9ce9e01469	refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable	2026-04-27 13:39:03 +02:00
Khalil Meftah	21c16a27f0	Revert "perf(observation_processor): add CUDA support for image processing" This reverts commit `38b88c414c`.	2026-04-27 11:52:19 +02:00
Khalil Meftah	b3164543f4	fix(rl): enhance intervention handling in actor and learner (cherry picked from commit `ef8bfffbd7`)	2026-04-27 11:35:21 +02:00
Khalil Meftah	f3993cbbb1	fix(rl): improve action processing for discrete and continuous actions (cherry picked from commit `f887ab3f6a`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	c278cfa026	fix(rl): postprocess action in actor (cherry picked from commit `c2556439e5`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	77d18659b1	fix(rl): mirror gym_manipulator in actor (cherry picked from commit `d2a046dfc5`)	2026-04-27 11:35:19 +02:00
Khalil Meftah	6347edefb1	fix(rl): merge environment and action-processor info in transition processing (cherry picked from commit `30e1886b64`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	eda47eca18	fix(rl): update neutral gripper action (cherry picked from commit `9c9064e5be`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	a64e6f5070	fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100 (cherry picked from commit `494f469a2b`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	3def86c2c3	fix(rl): add time limit processor to environment pipeline (cherry picked from commit `cd105f65cb`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	356a64d8c4	fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline (cherry picked from commit `9c2af818ff`)	2026-04-27 11:35:16 +02:00

1 2 3 4 5 ...

1496 Commits