* fix(config): add lora_alpha to PeftConfig
PeftConfig was missing the lora_alpha field, causing the PEFT library
to default to alpha=8 regardless of the LoRA rank, which dampens the
adaptation signal for high-rank adapters (e.g., r=128).
This adds lora_alpha: int | None = None to PeftConfig, allowing users
to specify --peft.lora_alpha <value> on the CLI.
Closes#3551
* fix(docs): add lora_alpha to peft training example + clarify scaling formula
- Add --peft.lora_alpha=64 to docs/source/peft_training.mdx example to
prevent new users from hitting the alpha=8 default dampening bug
- Clarify lora_alpha comment in default.py with scaling = lora_alpha / r
* docs: mention both --peft.r and --peft.lora_alpha in LoRA description
---------
Co-authored-by: Cheng Yin <yin@users.noreply.github.com>
* fix(config): support policy.path in YAML config files
policy.path was only handled via CLI args (filtered from sys.argv before
draccus, then retrieved in validate()). When specified in YAML, draccus
would crash because 'path' is not a valid field on PreTrainedConfig.
Extract path fields from the YAML/JSON config before draccus processes
it, store them in a module-level dict, and fall back to it in
get_path_arg() when the CLI doesn't have the path.
Fixes#2957
* fix(parser): preserve YAML policy overrides when loading from pretrained
When policy.path is set in YAML, validate() was calling from_pretrained
with only CLI overrides, discarding any YAML policy fields (e.g. lr,
batch_size) that draccus had already parsed. Fix by capturing the
remaining YAML fields as CLI-style args in _config_yaml_overrides and
merging them into the overrides passed to from_pretrained in train.py,
eval.py, and lerobot_record.py (CLI args still take precedence).
Also fix the NamedTemporaryFile SIM115 ruff warning and add types-PyYAML
to the mypy pre-commit hook.
* fix(parser): serialize bool/None values correctly in YAML policy overrides
Bool values from YAML configs (e.g. push_to_hub: true) were passed as
Python "True"/"False" strings instead of lowercase "true"/"false" that
draccus expects. Also skip None values to avoid passing "None" strings.
* revert: remove types-PyYAML from .pre-commit-config.yaml
* chore: fix quality check caused by untyped YAML import
Co-authored-by: masato-ka <jp6uzv@gmail.com>
Signed-off-by: Khalil Meftah <khalil.meftah@huggingface.co>
---------
Signed-off-by: Khalil Meftah <khalil.meftah@huggingface.co>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Co-authored-by: masato-ka <jp6uzv@gmail.com>
* feat(episode filtering): adding support for episodes filtering at initialization time in LeRobotDataset
* test(tests): adding tests
* chore(format): formatting code
* feat(performance): improving implementation for better performances on big datasets
* chores(warning): improving warnings and errors for episodes filtering
* test(invalid key): adding test for invalid filtering key
* chore(format): formatting code
* fix(deps): better versioning control for torchcodec
* refactor(video_utils): replace torchvision with pyav
* adding Torchcodec version to lerobot-info
* chore(benchmarks): delete video benchmark
---------
Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co>
* refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
* chore: clarify torch.compile disabled note in SACAlgorithm
* fix(teleop): keyboard EE teleop not registering special keys and losing intervention state
Fixes#2345
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
* fix: remove leftover normalization calls from reward classifier predict_reward
Fixes#2355
* fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()
* refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference
* perf: remove redundant CPU→GPU→CPU transition move in learner
* Fix: add kwargs in reward classifier __init__()
* fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer
* fix: add try/finally to control_loop to ensure image writer cleanup on exit
* fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error
* fix: skip tests that require grpc if not available
* fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests
* fix(tests): skip tests that require grpc if not available
* refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages
* fix(config): update vision encoder model name to lerobot/resnet10
* fix(sac): clarify torch.compile status
* refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity
* refactor(sac): simplify optimizer return structure
* perf(rl): use async iterators in OnlineOfflineMixer.get_iterator
* refactor(sac): decouple algorithm hyperparameters from policy config
* update losses names in tests
* fix docstring
* remove unused type alias
* fix test for flat dict structure
* refactor(policies): rename policies/sac → policies/gaussian_actor
* refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
* perf(observation_processor): add CUDA support for image processing
* fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline
(cherry picked from commit 9c2af818ff)
* fix(rl): add time limit processor to environment pipeline
(cherry picked from commit cd105f65cb)
* fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100
(cherry picked from commit 494f469a2b)
* fix(rl): update neutral gripper action
(cherry picked from commit 9c9064e5be)
* fix(rl): merge environment and action-processor info in transition processing
(cherry picked from commit 30e1886b64)
* fix(rl): mirror gym_manipulator in actor
(cherry picked from commit d2a046dfc5)
* fix(rl): postprocess action in actor
(cherry picked from commit c2556439e5)
* fix(rl): improve action processing for discrete and continuous actions
(cherry picked from commit f887ab3f6a)
* fix(rl): enhance intervention handling in actor and learner
(cherry picked from commit ef8bfffbd7)
* Revert "perf(observation_processor): add CUDA support for image processing"
This reverts commit 38b88c414c.
* refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
* refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation
* refactor(rl): add type property to RLAlgorithmConfig for better clarity
* refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility
* refactor(tests): remove grpc import checks from test files for cleaner code
* fix(tests): gate RL tests on the `datasets` extra
* refactor: simplify docstrings for clarity and conciseness across multiple files
* fix(rl): update gripper position key and handle action absence during reset
* fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset
* refactor: clean up import statements
* chore: address reviewer comments
* chore: improve visual stats reshaping logic and update docstring for clarity
* refactor: enforce mandatory config_class and name attributes in RLAlgorithm
* refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer
* refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
* refactor: add require_package calls for grpcio and gym-hil in relevant modules
* refactor(rl): move grpcio guards to runtime entry points
* feat(rl): consolidate HIL-SERL checkpoint into HF-style components
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
* refactor(rl): move actor weight-sync wire format from policy to algorithm
* refactor(rl): update type hints for learner and actor functions
* refactor(rl): hoist grpcio guard to module top in actor/learner
* chore(rl): manage import pattern in actor (#3564)
* chore(rl): manage import pattern in actor
* chore(rl): optional grpc imports in learner; quote grpc ServicerContext types
---------
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
* update uv.lock
* chore(doc): update doc
---------
Co-authored-by: jpizarrom <jpizarrom@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* chore(deps): ceiling + cuda
* ci: bump cuda version docker image
* ci: add cpu wheel to release workflow
* chore(deps): update uv.lock
* docs: update installation with cuda note
* docs(omx): adding some examples and scripts
* cleaning up and reviewing the cli args
* adding __init__.py to example folder, adjusting the examples
* adding reference to pretrained act policy
* moving `.send_action` before `dataset.add_frame` for consistency
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adjusting docstring
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* adressing hardcoded dataset fps
* removed init as it worked without
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
If VideoDecoder() raises during initialization, the fsspec file handle
was leaked since it was opened via __enter__() but never closed on the
exception path. Now explicitly closes the handle before re-raising.
* chore(deps): allow torch 2.11/2.12 and fix autocast deprecation
- Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26),
and torchcodec to <0.13 (was <0.11) to allow installs against the latest
stable torch 2.11 and the upcoming 2.12 line.
- Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda")
in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+).
- Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130,
torchcodec 0.11.1, full CUDA 13 stack).
Verified locally with `uv sync --locked` from a clean .venv and the lerobot
test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is
identical to the pre-bump baseline: 18 pre-existing failures
(test_sac_policy*, test_pi0_rtc*, test_pi05_rtc*, test_replay_buffer*),
0 new, 0 fixed.
AI assistance: this change was authored with Claude Code per AI_POLICY.md.
* fix(policies): use device-agnostic autocast dtype lookup
Pass query_states.device.type to torch.get_autocast_dtype() instead of
hardcoding 'cuda', so the cast matches the active autocast context when
running under CPU/MPS/XPU autocast.
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* fix(train): restrict legacy RA-BC migration to JSON checkpoints only
_migrate_legacy_rabc_fields was called for all config files, causing
json.load to raise DecodeError when a YAML/TOML config was passed to
lerobot-train for a new training run. Guard the block with an
.endswith(".json") check so migration only runs when resuming from
a JSON checkpoint.
* fix(ci): run multi-task benchmark evals 5-at-a-time in parallel
The eval script supports running tasks concurrently via a
ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four
multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus
— 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of
running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra)
are unchanged.
* fix(ci): cap VLABench smoke eval at 50 steps per task
VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s
the smoke eval took ~80 minutes of rollouts on top of the image build.
The eval is a pipeline smoke test (running_success_rate stays at 0% on
this short rollout anyway), so we don't need full episodes — cap each
task at 50 steps to bring total rollout time down ~10x.
* fix(ci): run VLABench tasks 5-at-a-time in parallel
The eval script already supports running multiple tasks concurrently via
a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10
VLABench tasks finish in ~2 waves instead of running sequentially.
* feat: add pretrained vision encoder weights for diffusion and vqbet
* fix test by re-generating artifacts
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and
cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3
on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA
repo, so the build failed at apt-get install. Bumping both to -12-6 to
match the base image.
* feat(policies): add EO-1 model
* chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx
* chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs.
This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path.
Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in
* chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass
Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.
Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo
Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation
* fix pre-commit
* update docstring inside DatasetInfo.from_dict()
* sorts the unknown to have deterministic output
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* refactoring the last few old fieds
* fix crop dataset roi type mismatch
* use consistantly int for data and video_files_size_in_mb
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding
In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return BaseModelOutputWithPooling instead of a plain torch.FloatTensor.
Added isinstance check to extract pooler_output when the return value is not
a tensor, maintaining backward compatibility with transformers 4.x.
Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach'
* Adding assertion check for pooler_output of CLIP. This change is response to below comment.
https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387
* Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise
https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for
ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the
`action_is_pad` mask to zero out padded timesteps, then calls `.mean()`
(or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number
of elements including the zeroed padding, the loss is diluted by the
padding fraction.
Fixed by normalizing only over valid (non-padded) scalar entries:
num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1)
loss = losses.sum(...) / num_valid
`clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both
reduction paths are updated. Behavior when `action_is_pad` is missing is
unchanged (`losses.mean()`).
Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2,
30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` —
same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for
ACT. Heavier-padding recipes will see a larger gap.
Refs: #3353 (original report for ACT), #3377 (fix for the other three
policies).
feat(sim): add VLABench benchmark integration
Add VLABench as a new simulation benchmark in LeRobot, following the existing LIBERO and MetaWorld patterns.
This PR wires VLABench end-to-end across environment integration, Docker setup, CI smoke evaluation, and documentation. It also fixes a number of upstream packaging and runtime issues required to make VLABench usable and reproducible in CI.
What’s included
Benchmark integration
Add VLABench as a new simulation benchmark.
Expose supported VLABench tasks through the LeRobot env interface.
Follow the established LIBERO / MetaWorld factory patterns.
Preserve lazy async-env metadata so env.unwrapped.metadata["render_fps"] continues to work.
CI smoke evaluation
Add a VLABench smoke-eval job using lerobot/smolvla_vlabench.
Use the correct rename_map for the 3-camera dataset layout.
Expand smoke coverage from 1 to 10 primitive tasks.
Extract task descriptions after eval so metrics artifacts include per-task labels.
Skip Docker Hub login when secrets are unavailable (e.g. fork PRs).
Docker / install fixes
Install VLABench from GitHub rather than PyPI.
Use uv pip, not pip, in the base image.
Fail loudly on install errors instead of masking them.
Clone VLABench into the non-root user’s home directory.
Use shallow editable installs for VLABench and rrt-algorithms to work around missing __init__.py issues.
Pin upstream clones to exact commit SHAs for reproducibility.
Add undeclared runtime dependencies required by VLABench (open3d, colorlog, scikit-learn, openai).
Unpin open3d so Python 3.12 wheels resolve.
Assets
Support downloading VLABench assets from a Hugging Face Hub mirror via VLABENCH_ASSETS_REPO.
Keep Google Drive download support as fallback.
Install huggingface_hub[hf_xet] so Xet-backed assets download correctly.
Validate required mesh/XML asset subtrees at build time.
Patch VLABench constants to tolerate missing asset directories at import time.
Runtime / env correctness
Import VLABench robots and tasks explicitly so decorator-based registry population happens.
Resize and normalize camera observations so they always match the declared (H, W, 3) uint8 observation space.
Reinstall LeRobot editably inside the image so the new env code is actually used.
Coerce agent_pos / ee_state to the expected shape.
Pad actions when needed to match data.ctrl.
Replace zero-padding fallback with proper dm_control IK for 7D end-effector actions.
Refetch dm_control physics on each step instead of caching weakrefs.
Retry unstable resets with reseeding and handle PhysicsError gracefully at step time.
Dataset / policy alignment
Align VLABench observations and actions with Hugging Face dataset conventions used by lerobot/vlabench_unified:
convert EE position between world frame and robot-base frame at the env boundary,
expose / consume Euler XYZ instead of raw quaternion layout,
align gripper semantics with dataset convention (1 = open, 0 = closed).
This fixes policy/env mismatches that previously caused incorrect IK targets and unstable behavior at evaluation time.
Docs
Add a full docs/source/vlabench.mdx page aligned with the standard benchmark template.
Document task selection forms (single task, comma list, suite shortcut).
Document installation, evaluation, training, and result reproduction.
Point examples at lerobot/smolvla_vlabench.
Add a benchmark banner image.
Remove outdated / misleading references to upstream evaluation tracks.
Document manual install flow instead of a broken vlabench extra.
Packaging cleanup
Remove the unresolvable vlabench extra from pyproject.toml.
Remove the no-op VLABench processor step.
Remove the obsolete env unit test that only covered the dropped gripper remap helper.
Apply formatting / logging / style cleanup from review feedback.
Why this is needed
VLABench is not currently consumable as a normal Python dependency and requires several upstream workarounds:
no PyPI release,
missing package declarations,
undeclared runtime deps,
SSH-only submodule references,
asset downloads outside normal package install flow,
registry population that depends on import side effects,
env outputs that do not always match declared observation shapes,
task resets that can diverge under some random layouts.
This PR makes the benchmark usable in LeRobot despite those constraints, and ensures CI runs are reproducible and informative.
If you want a much shorter squash commit message, I’d use this:
feat(sim): integrate VLABench benchmark with CI, Docker, and docs
Add VLABench as a new LeRobot simulation benchmark, following the existing LIBERO / MetaWorld patterns.
This includes:
LeRobot env integration and task exposure,
CI smoke eval with lerobot/smolvla_vlabench,
Docker install and asset-download fixes,
runtime fixes for registry loading, assets, camera obs, action handling, dm_control IK, and PhysicsError recovery,
alignment of obs/action semantics with HF VLABench datasets,
docs and packaging cleanup.
The PR also incorporates review feedback, improves reproducibility by pinning upstream commits, and makes VLABench usable in CI despite upstream packaging and asset-management issues.
* feat(envs): add LIBERO-plus robustness benchmark integration
- LiberoPlusEnv config (subclass of LiberoEnv, same gym interface)
- Docker image installing LIBERO-plus fork via PYTHONPATH
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus
- pyproject.toml: libero_plus extra
* fix(libero): use suite's perturbation-aware init_states loader
LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that
strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`,
`_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init`
file — the on-disk name for a perturbation variant doesn't exist as a file,
only the base does. lerobot's loader was bypassing that logic and trying to
read the suffix-bearing filename directly, which failed for every non-zero
task id and killed the eval before any rollout video could be written.
Delegate to the suite's method when it exists; fall back to the path-based
loader for vanilla LIBERO (which does not provide the method).
Also drop the hf-libero install + init_files copy from the LIBERO-plus
Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and
`init_files/` for all five suites, so the copy was unnecessary and the
`cp -r` into an existing dir produced a confusing nested layout.
* fix(libero): resolve LIBERO-plus perturbation init_states path ourselves
Delegating to `task_suite.get_task_init_states(i)` works for path resolution
but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`,
which fails on PyTorch 2.6+ because the pickled init_states contains numpy
objects not in the default allowlist:
_pickle.UnpicklingError: Weights only load failed.
WeightsUnpickler error: Unsupported global:
GLOBAL numpy.core.multiarray._reconstruct was not an allowed global.
Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`,
`_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass
`weights_only=False` ourselves. Vanilla LIBERO task names don't contain any
of these patterns except for `_table_` when followed by the word `center`
(e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex
requires `_table_\\d+` so semantic uses are preserved.
* fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus
LIBERO-plus's bddl_base_domain.py resolves scene XMLs with
`os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml
has no effect on scene lookup — MuJoCo always opens
`<clone>/libero/libero/assets/scenes/...`. With no such directory present,
every perturbation task fails on:
FileNotFoundError: No such file or directory:
.../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml
These textures, views, and extra objects ship only in the 6.4 GB `assets.zip`
published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says
to download and unzip it into the package dir). Fetch it via `hf_hub_download`,
unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at
the extracted dir so everything stays consistent. The download lives in its
own Docker layer so subsequent rebuilds reuse the cached assets.
Drops the lerobot/libero-assets snapshot_download — that mirror only has
vanilla LIBERO textures and is ignored for scene loading anyway.
* fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip
The 6.4 GB zip ships with every entry prefixed by
`inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...`
(the author's internal filesystem layout, not the layout the LIBERO-plus
README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created
`${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened
`${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml`
and hit the same FileNotFoundError.
Extract to a scratch dir, then `mv` the nested `assets/` subtree to the
expected location. Verified the target file exists in the zip central
directory under that exact prefix.
* refactor(libero): inline init_states resolver behind single regex
Collapse the three-style suffix stripper (split/re.sub/in) into one
compiled regex, drop the (Path, bool) tuple return, and move the
`_add_`/`_level` reshape branch into the caller so each branch loads
its own file and returns directly. Net: -11 lines, one fewer helper.
* refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/robomme pattern: start from the nightly GPU
image (apt deps, python, uv, venv, lerobot[all] already there) and only
layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build
deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the
PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip.
Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init) the nightly already provides.
123 → 73 lines.
Also:
- Add libero_plus to docs/source/_toctree.yml under Benchmarks so
doc-builder's TOC integrity check stops failing.
- Repoint the docs dataset link from pepijn223/libero_plus_lerobot to
the canonical lerobot/libero_plus.
- Revert the stray uv.lock churn (revision/marker diff that crept in
from an unrelated resolve — unrelated to LIBERO-plus).
* fix(libero-plus): stop touching pyproject + uv.lock
The fast-tests job was rejecting the branch because pyproject.toml had a
[libero_plus] extra whose git dep wasn't represented in uv.lock.
The Docker image no longer needs the extra — it clones LIBERO-plus
directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from
pyproject and restore pyproject.toml + uv.lock to exactly what's on
origin/main, so `uv sync --locked --extra test` is a no-op for this PR.
Also repoint the doc/CI/env comments that still mentioned the extra at
the Docker install path.
* fix(libero-plus): strip perturbation metadata from task descriptions
LIBERO-plus builds task.language by space-joining the perturbation-variant
filename, so every non-_language_ variant inherits a trailing blob like
"view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the
dashboard video labels and no longer matches the base instruction stored
in the training dataset.
Strip those tokens in extract_task_descriptions.py with an end-anchored
regex over the {view,initstate,noise,add,tb,table,light,level}(+digits)
vocabulary. The anchor preserves mid-sentence literal uses of those words
(e.g. "from table center and place it on the plate") — only the trailing
metadata chain is removed. _language_ variants carry real BDDL-sourced
text and are left untouched.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: integrate PR #3313 review feedback
- docs: fix paper link to arxiv, add benchmark image, add suite descriptions,
add LIBERO-plus replacement warning, restructure eval section to match
LIBERO doc style, fix policy I/O section, remove false try/except claim
- docker: fix shell grouping for hf-libero uninstall, replace hardcoded
asset path with dynamic find
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO
always takes the simple path
* fix(docs): use correct LIBERO-plus teaser image URL
* ci(libero-plus): drop redundant hf auth login step
The standalone login step ran `hf auth login` in a throwaway
`docker run --rm` container, so no credentials persisted. Auth is
already performed inside the eval step's container. Removing the
redundant step per PR #3313 review feedback.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch. Without these attributes eval crashes
when calling `env.unwrapped.metadata["render_fps"]` with async vector
envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and
caches the metadata alongside obs/action spaces in the LIBERO and
MetaWorld factories.
* ci: gate Docker Hub login on secret availability
Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step before any of
the actual build/eval work could run. Gate the login on the env-var
expansion of the username so the step is skipped (not failed) when
secrets are absent. Mirrors the existing pattern in the VLABench job.
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update scripts/ci/extract_task_descriptions.py
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update docker/Dockerfile.benchmark.libero_plus
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* fix(libero-plus): address review feedback
* ci(libero-plus): fix YAML indentation in upload-artifact steps
The `uses:` key on two upload-artifact steps was at column 0 instead
of nested under the step, causing `pre-commit run check-yaml` to fail
with "expected <block end>, but found '<block mapping start>'".
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
* feat(envs): add RoboMME benchmark integration
- RoboMME env wrapper with image/wrist_image/state observations
- Docker image with Vulkan, SAPIEN, mani-skill deps
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme
- preprocess_observation: handle image/wrist_image/state keys
- pyproject.toml: robomme extra
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu
Mirror the libero/metaworld pattern: start from the nightly GPU image
(which already has apt deps, uv, venv, and lerobot[all] preinstalled)
and only layer on what RoboMME uniquely needs — the Vulkan libs
ManiSkill/SAPIEN requires, plus the robomme extra with the
gymnasium/numpy overrides.
Drops 48 lines of duplicated base setup (CUDA FROM, python install,
user creation, venv init, base apt deps) that the nightly image already
provides. Net: 102 → 54 lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme
- Remove the "Related work" block referencing the prototype branch
feat/robomme-integration; the PR stands on its own.
- Point all dataset references at lerobot/robomme (docs, env module
docstring, RoboMMEEnvConfig docstring) — this is the canonical HF
location once the dataset is mirrored.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(robomme): make docs build + fast tests green
1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's
TOC integrity check stops rejecting the new page.
2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is
unsatisfiable against the project's numpy>=2 base pin, so `uv sync`
couldn't resolve a universal lockfile.
Drop robomme as a pyproject extra entirely — it truly cannot coexist
with the rest of the dep tree. The Dockerfile installs robomme
directly from its git URL via `uv pip install --override`, which was
already the runtime path. pyproject, docs, env docstrings, and the
CI job comment all now point to the docker-only install.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(robomme): realign unit tests with current env API
The tests were written against an earlier env layout and never updated when
the wrapper was refactored, so CI's fast-test job was failing with:
- KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the
lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset
columns and preprocess_observation's built-in fallbacks).
- AssertionError: 'robomme' not in result — create_robomme_envs now
returns {task_name: {task_id: env}}, not {'robomme': {...}}, so
comma-separated task lists work.
- ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was
removed; create_robomme_envs is straightforward synchronous now.
Rewrite the 7 failing cases against the current API, drop the three
LazyVectorEnv tests, and add a multi-task test so the new comma-separated
task parsing is covered. Stub install/teardown is moved into helpers
(`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests
stop repeating six boilerplate lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: integrate PR #3311 review feedback
- envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos
- envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config
- envs: remove special-case obs conversion in utils.py (no longer needed)
- ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0]
- scripts: extract_task_descriptions supports multiple task_ids
- docs: title to # RoboMME, add image, restructure eval section
- tests: update all key assertions to match new obs naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs): use correct RoboMME teaser image URL
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* ci(robomme): smoke-eval 10 tasks instead of 5
Broader coverage on the RoboMME benchmark CI job: bump the smoke eval
from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS.
Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg,
SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock.
Updated the parse_eval_metrics.py `--task` label from the single
`PickXtimes` stub to the full comma list so the metrics artifact
reflects what was actually run. `parse_eval_metrics.py` already reads
`overall` for multi-task runs, so no parser change is needed.
Made-with: Cursor
* fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up
`_convert_obs` was returning flat keys (`pixels/image`,
`pixels/wrist_image`). `preprocess_observation()` in envs/utils.py
keys off the top-level `"pixels"` entry and, not finding it,
silently dropped every image from the batch. The policy then saw
zero image features and raised
ValueError: All image features are missing from the batch.
Match the LIBERO layout: return
`{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}`
and declare the same shape in `observation_space`.
Made-with: Cursor
* fix(robomme): align docs and tests with nested pixels obs layout
Addresses PR #3311 review feedback:
- Docs: correct observation keys to `pixels/image` / `pixels/wrist_image`
(mapped to `observation.images.image` / `observation.images.wrist_image`)
and drop the now-obsolete column-rename snippet.
- Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`,
matching the nested layout required by `preprocess_observation()`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: gate Docker Hub login on secret availability
Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step. Gate the login
on the env-var expansion of the username so the step is skipped (not
failed) when secrets are absent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(robomme): address review feedback
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(ci): add RoboCerebra benchmark eval job
- Docker image with robosuite/libero deps for RoboCerebra eval
- CI workflow: 1-episode eval with pepijn223/smolvla_robocerebra
- Reuses libero env with rename_map + empty_cameras=3
* docs(robocerebra): add benchmark page and toctree entry
Add a dedicated docs page for RoboCerebra that points at the canonical
dataset lerobot/robocerebra_unified and shows how to run eval + fine-tune
against it. Wire it into the Benchmarks section of the toctree so
doc-builder picks it up.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
* fix(robocerebra): drop alias extra + simplify docker image
Two problems rolled up:
1. `uv sync --locked --extra test` was failing because pyproject.toml added
a `robocerebra = ["lerobot[libero]"]` alias extra but uv.lock wasn't
regenerated. Drop the alias — nothing in CI actually needs the extra
name (the Dockerfile just installs what it needs directly), so this
restores pyproject.toml and uv.lock to byte-exact origin/main.
2. Rebase docker/Dockerfile.benchmark.robocerebra on
huggingface/lerobot-gpu:latest (same pattern as libero/metaworld/…).
The nightly image already ships lerobot[all] which includes [libero],
so the RoboCerebra image is essentially identical to the LIBERO one:
fetch libero-assets, write ~/.libero/config.yaml, overlay source.
92 → 43 lines.
Also repoint the CI workflow comment that referenced the removed extra.
* ci: use dedicated lerobot/smolvla_robocerebra checkpoint for smoke eval
Replace the generic pepijn223/smolvla_libero placeholder with the
purpose-trained lerobot/smolvla_robocerebra model in the RoboCerebra
CI smoke test.
* fix(ci): align RoboCerebra eval with training pipeline
Fixes 5 mismatches that caused 0% success rate:
- env.type: robocerebra (unregistered) → libero
- resolution: 360x360 (default) → 256x256 (matches dataset)
- camera_name_mapping: map eye_in_hand → wrist_image (not image2)
- empty_cameras: 3 → 1 (matches training)
- add HF_USER_TOKEN guard on eval step
* fix(ci): set env.fps=20 and explicit obs_type for RoboCerebra eval
Match the dataset's 20 FPS (LiberoEnv defaults to 30) and make
obs_type=pixels_agent_pos explicit for safety against future changes.
* docs(robocerebra): align page with adding_benchmarks template
Rework docs/source/robocerebra.mdx to follow the standard benchmark
doc structure: intro + links + available tasks + installation + eval
+ recommended episodes + policy I/O + training + reproducing results.
- Point everything at lerobot/smolvla_robocerebra (the released
checkpoint), not the personal pepijn223 mirror.
- Add the --env.fps=20 and --env.obs_type=pixels_agent_pos flags
that CI actually uses, so copy-paste eval reproduces CI.
- Split the "Training" block out of the recipe section into its own
section with the feature table.
- Add an explicit "Reproducing published results" section pointing
at the CI smoke eval.
* fix: integrate PR #3314 review feedback
- ci(robocerebra): drop redundant hf auth login step (auth is
already performed inside the eval step's container).
- ci(robocerebra): add Docker Hub login before the image build
to pick up the authenticated rate limit.
- docs(robocerebra): align eval snippet with the CI command
(observation size, camera_name_mapping, use_async_envs, device,
empty_cameras=1).
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch.
* ci: gate Docker Hub login on secret availability
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>