* chore(video backend): renaming codec into video_backend in get_safe_default_video_backend()
* feat(pyav utils): adding suport for PyAV encoding parameters validation
* feat(VideoEncoderConfig): creating a VideoEncoderConfig to encapsulate encoding parameters
* feat(VideoEncoderConfig): propagating the VideoEncoderConfig in the codebase
* chore(docs): updating the docs
* feat(metadata): adding encoding parameters in dataset metadata
* fix(concatenation compatibility): adding compatibility check when concatenating video files
* feat(VideoEncoderConfig init): making VideoEncoderConfig more robust and adaptable to multiple backends
* feat(pyav checks): making pyav parameters checks more robust
* chore(duplicate): removing duplicate get_codec_options definition
* test(existing): adapting existing tests
* test(new): adding new tests for encoding related features
* chore(format): fixing formatting issues
* chore(PyAV): cleaning up PyAV utils and encoding parameters checks to stick to the minimun required tooling.
* chore(format): formatting code
* chore(doctrings): updating docstrings
* fix(camera_encoder_config): Removing camera_encoder_config from LeRobotDataset, as it's only required in LeRobotDatasetWriter.
* feat(default values): applying a consistent naming convention for default RGB cameras video encoder parameters
* fix(rollout): propagating VideoEncoderConfig to the latest recording modes
* chore(format): formatting code, fixing error messages and variable names
* fix(arguments order): reverting changes in arguments order in StreamingVideoEncoder
* chore(relative imports): switching to relative local imports within lerobot.datasets
* test(artifacts): cleaning up artifacts for the video encoding tests
* chore(docs): updating docs
* chore(fromat): formatting code
* fix(imports): refactoring the file architecture to avoid circular imports. VideoEncoderConfig is now defined in lerobot.configs and lazily imports av at runtime.
* fix(typos): fixing typos and small mistakes
* test(factories): updating factories
* feat(aggregate): updating dataset aggregation procedure. Encoding tuning paramters (crf, g,...) are ignored for validation and changed to None in the aggregated dataset if incompatible.
* docs(typos): fixing typos
* fix(deletion): reverting unwanted deletion
* fix(typos): fixing multiple typos
* feat(codec options): passing codec options to lerobot_edit_dataset episode deletion tool
* typo(typo): typo
* fix(typos): fixing remaining typos
* chore(rename): renaming camera_encoder_config to camera_encoder
* docs(clean): cleaning and formating docs
* docs(dataset): addind details about datasets
* chore(format): formatting code
* docs(warning): adding warning regarding encoding parameters modification
* fix(re-encoding): removing inconsistent re-encoding option in lerobot_edit_dataset
* typos(typos): typos
* chore(format): resolving prettier issues
* fix(h264_nvenc): fixing crf handling for h264_nvenc
* docs(clean): removing too technical parts of the docs
* fix(imports): fixing imports at the __init__ level
* fix(imports): fixing not very pretty imports in video config file
* fix(config): add lora_alpha to PeftConfig
PeftConfig was missing the lora_alpha field, causing the PEFT library
to default to alpha=8 regardless of the LoRA rank, which dampens the
adaptation signal for high-rank adapters (e.g., r=128).
This adds lora_alpha: int | None = None to PeftConfig, allowing users
to specify --peft.lora_alpha <value> on the CLI.
Closes#3551
* fix(docs): add lora_alpha to peft training example + clarify scaling formula
- Add --peft.lora_alpha=64 to docs/source/peft_training.mdx example to
prevent new users from hitting the alpha=8 default dampening bug
- Clarify lora_alpha comment in default.py with scaling = lora_alpha / r
* docs: mention both --peft.r and --peft.lora_alpha in LoRA description
---------
Co-authored-by: Cheng Yin <yin@users.noreply.github.com>
* fix(config): support policy.path in YAML config files
policy.path was only handled via CLI args (filtered from sys.argv before
draccus, then retrieved in validate()). When specified in YAML, draccus
would crash because 'path' is not a valid field on PreTrainedConfig.
Extract path fields from the YAML/JSON config before draccus processes
it, store them in a module-level dict, and fall back to it in
get_path_arg() when the CLI doesn't have the path.
Fixes#2957
* fix(parser): preserve YAML policy overrides when loading from pretrained
When policy.path is set in YAML, validate() was calling from_pretrained
with only CLI overrides, discarding any YAML policy fields (e.g. lr,
batch_size) that draccus had already parsed. Fix by capturing the
remaining YAML fields as CLI-style args in _config_yaml_overrides and
merging them into the overrides passed to from_pretrained in train.py,
eval.py, and lerobot_record.py (CLI args still take precedence).
Also fix the NamedTemporaryFile SIM115 ruff warning and add types-PyYAML
to the mypy pre-commit hook.
* fix(parser): serialize bool/None values correctly in YAML policy overrides
Bool values from YAML configs (e.g. push_to_hub: true) were passed as
Python "True"/"False" strings instead of lowercase "true"/"false" that
draccus expects. Also skip None values to avoid passing "None" strings.
* revert: remove types-PyYAML from .pre-commit-config.yaml
* chore: fix quality check caused by untyped YAML import
Co-authored-by: masato-ka <jp6uzv@gmail.com>
Signed-off-by: Khalil Meftah <khalil.meftah@huggingface.co>
---------
Signed-off-by: Khalil Meftah <khalil.meftah@huggingface.co>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Co-authored-by: masato-ka <jp6uzv@gmail.com>
* feat(episode filtering): adding support for episodes filtering at initialization time in LeRobotDataset
* test(tests): adding tests
* chore(format): formatting code
* feat(performance): improving implementation for better performances on big datasets
* chores(warning): improving warnings and errors for episodes filtering
* test(invalid key): adding test for invalid filtering key
* chore(format): formatting code
* fix(deps): better versioning control for torchcodec
* refactor(video_utils): replace torchvision with pyav
* adding Torchcodec version to lerobot-info
* chore(benchmarks): delete video benchmark
---------
Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co>
* refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
* chore: clarify torch.compile disabled note in SACAlgorithm
* fix(teleop): keyboard EE teleop not registering special keys and losing intervention state
Fixes#2345
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
* fix: remove leftover normalization calls from reward classifier predict_reward
Fixes#2355
* fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()
* refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference
* perf: remove redundant CPU→GPU→CPU transition move in learner
* Fix: add kwargs in reward classifier __init__()
* fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer
* fix: add try/finally to control_loop to ensure image writer cleanup on exit
* fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error
* fix: skip tests that require grpc if not available
* fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests
* fix(tests): skip tests that require grpc if not available
* refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages
* fix(config): update vision encoder model name to lerobot/resnet10
* fix(sac): clarify torch.compile status
* refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity
* refactor(sac): simplify optimizer return structure
* perf(rl): use async iterators in OnlineOfflineMixer.get_iterator
* refactor(sac): decouple algorithm hyperparameters from policy config
* update losses names in tests
* fix docstring
* remove unused type alias
* fix test for flat dict structure
* refactor(policies): rename policies/sac → policies/gaussian_actor
* refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
* perf(observation_processor): add CUDA support for image processing
* fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline
(cherry picked from commit 9c2af818ff)
* fix(rl): add time limit processor to environment pipeline
(cherry picked from commit cd105f65cb)
* fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100
(cherry picked from commit 494f469a2b)
* fix(rl): update neutral gripper action
(cherry picked from commit 9c9064e5be)
* fix(rl): merge environment and action-processor info in transition processing
(cherry picked from commit 30e1886b64)
* fix(rl): mirror gym_manipulator in actor
(cherry picked from commit d2a046dfc5)
* fix(rl): postprocess action in actor
(cherry picked from commit c2556439e5)
* fix(rl): improve action processing for discrete and continuous actions
(cherry picked from commit f887ab3f6a)
* fix(rl): enhance intervention handling in actor and learner
(cherry picked from commit ef8bfffbd7)
* Revert "perf(observation_processor): add CUDA support for image processing"
This reverts commit 38b88c414c.
* refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
* refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation
* refactor(rl): add type property to RLAlgorithmConfig for better clarity
* refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility
* refactor(tests): remove grpc import checks from test files for cleaner code
* fix(tests): gate RL tests on the `datasets` extra
* refactor: simplify docstrings for clarity and conciseness across multiple files
* fix(rl): update gripper position key and handle action absence during reset
* fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset
* refactor: clean up import statements
* chore: address reviewer comments
* chore: improve visual stats reshaping logic and update docstring for clarity
* refactor: enforce mandatory config_class and name attributes in RLAlgorithm
* refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer
* refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
* refactor: add require_package calls for grpcio and gym-hil in relevant modules
* refactor(rl): move grpcio guards to runtime entry points
* feat(rl): consolidate HIL-SERL checkpoint into HF-style components
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
* refactor(rl): move actor weight-sync wire format from policy to algorithm
* refactor(rl): update type hints for learner and actor functions
* refactor(rl): hoist grpcio guard to module top in actor/learner
* chore(rl): manage import pattern in actor (#3564)
* chore(rl): manage import pattern in actor
* chore(rl): optional grpc imports in learner; quote grpc ServicerContext types
---------
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
* update uv.lock
* chore(doc): update doc
---------
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* chore(deps): ceiling + cuda
* ci: bump cuda version docker image
* ci: add cpu wheel to release workflow
* chore(deps): update uv.lock
* docs: update installation with cuda note
* docs(omx): adding some examples and scripts
* cleaning up and reviewing the cli args
* adding __init__.py to example folder, adjusting the examples
* adding reference to pretrained act policy
* moving `.send_action` before `dataset.add_frame` for consistency
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adjusting docstring
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adressing hardcoded dataset fps
* removed init as it worked without
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
If VideoDecoder() raises during initialization, the fsspec file handle
was leaked since it was opened via __enter__() but never closed on the
exception path. Now explicitly closes the handle before re-raising.
* chore(deps): allow torch 2.11/2.12 and fix autocast deprecation
- Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26),
and torchcodec to <0.13 (was <0.11) to allow installs against the latest
stable torch 2.11 and the upcoming 2.12 line.
- Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda")
in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+).
- Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130,
torchcodec 0.11.1, full CUDA 13 stack).
Verified locally with `uv sync --locked` from a clean .venv and the lerobot
test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is
identical to the pre-bump baseline: 18 pre-existing failures
(test_sac_policy*, test_pi0_rtc*, test_pi05_rtc*, test_replay_buffer*),
0 new, 0 fixed.
AI assistance: this change was authored with Claude Code per AI_POLICY.md.
* fix(policies): use device-agnostic autocast dtype lookup
Pass query_states.device.type to torch.get_autocast_dtype() instead of
hardcoding 'cuda', so the cast matches the active autocast context when
running under CPU/MPS/XPU autocast.
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* fix(train): restrict legacy RA-BC migration to JSON checkpoints only
_migrate_legacy_rabc_fields was called for all config files, causing
json.load to raise DecodeError when a YAML/TOML config was passed to
lerobot-train for a new training run. Guard the block with an
.endswith(".json") check so migration only runs when resuming from
a JSON checkpoint.
* fix(ci): run multi-task benchmark evals 5-at-a-time in parallel
The eval script supports running tasks concurrently via a
ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four
multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus
— 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of
running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra)
are unchanged.
* fix(ci): cap VLABench smoke eval at 50 steps per task
VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s
the smoke eval took ~80 minutes of rollouts on top of the image build.
The eval is a pipeline smoke test (running_success_rate stays at 0% on
this short rollout anyway), so we don't need full episodes — cap each
task at 50 steps to bring total rollout time down ~10x.
* fix(ci): run VLABench tasks 5-at-a-time in parallel
The eval script already supports running multiple tasks concurrently via
a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10
VLABench tasks finish in ~2 waves instead of running sequentially.
* feat: add pretrained vision encoder weights for diffusion and vqbet
* fix test by re-generating artifacts
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and
cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3
on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA
repo, so the build failed at apt-get install. Bumping both to -12-6 to
match the base image.
* feat(policies): add EO-1 model
* chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx
* chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs.
This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path.
Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in
* chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass
Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.
Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo
Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation
* fix pre-commit
* update docstring inside DatasetInfo.from_dict()
* sorts the unknown to have deterministic output
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* refactoring the last few old fieds
* fix crop dataset roi type mismatch
* use consistantly int for data and video_files_size_in_mb
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding
In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return BaseModelOutputWithPooling instead of a plain torch.FloatTensor.
Added isinstance check to extract pooler_output when the return value is not
a tensor, maintaining backward compatibility with transformers 4.x.
Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach'
* Adding assertion check for pooler_output of CLIP. This change is response to below comment.
https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387
* Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise
https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for
ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the
`action_is_pad` mask to zero out padded timesteps, then calls `.mean()`
(or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number
of elements including the zeroed padding, the loss is diluted by the
padding fraction.
Fixed by normalizing only over valid (non-padded) scalar entries:
num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1)
loss = losses.sum(...) / num_valid
`clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both
reduction paths are updated. Behavior when `action_is_pad` is missing is
unchanged (`losses.mean()`).
Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2,
30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` —
same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for
ACT. Heavier-padding recipes will see a larger gap.
Refs: #3353 (original report for ACT), #3377 (fix for the other three
policies).