Commit Graph

1452 Commits

Author SHA1 Message Date
CarolinePascal 18ef0c270b fix(typos): fixing typos and small mistakes 2026-05-11 19:18:48 +02:00
CarolinePascal 58cd98c0d3 fix(imports): refactoring the file architecture to avoid circular imports. VideoEncoderConfig is now defined in lerobot.configs and lazily imports av at runtime. 2026-05-11 19:18:05 +02:00
CarolinePascal a3c670b987 chore(fromat): formatting code 2026-05-07 11:24:16 +02:00
CarolinePascal 8cd74ea8b8 chore(docs): updating docs 2026-05-07 11:21:28 +02:00
CarolinePascal be1180b240 test(artifacts): cleaning up artifacts for the video encoding tests 2026-05-05 13:13:35 +02:00
CarolinePascal fff6bc1a93 chore(relative imports): switching to relative local imports within lerobot.datasets 2026-05-05 11:56:27 +02:00
CarolinePascal 141304ac78 fix(arguments order): reverting changes in arguments order in StreamingVideoEncoder 2026-05-05 11:54:03 +02:00
CarolinePascal 9b3c752b64 chore(format): formatting code, fixing error messages and variable names 2026-05-05 11:31:23 +02:00
CarolinePascal 3dc73551dd fix(rollout): propagating VideoEncoderConfig to the latest recording modes 2026-05-05 11:06:45 +02:00
CarolinePascal 237bae51e8 feat(default values): applying a consistent naming convention for default RGB cameras video encoder parameters 2026-05-04 18:05:23 +02:00
CarolinePascal df8b33fc68 fix(camera_encoder_config): Removing camera_encoder_config from LeRobotDataset, as it's only required in LeRobotDatasetWriter. 2026-05-04 18:00:14 +02:00
CarolinePascal 50e2d7b5f4 chore(doctrings): updating docstrings 2026-05-04 17:01:11 +02:00
CarolinePascal 016799dfa1 chore(format): formatting code 2026-04-30 14:42:37 +02:00
CarolinePascal 51b9038458 chore(PyAV): cleaning up PyAV utils and encoding parameters checks to stick to the minimun required tooling. 2026-04-30 14:31:08 +02:00
CarolinePascal cc9a2e5c99 chore(format): fixing formatting issues 2026-04-29 16:48:57 +02:00
CarolinePascal a2376389f9 test(new): adding new tests for encoding related features 2026-04-29 16:48:56 +02:00
CarolinePascal 57a619ab02 test(existing): adapting existing tests 2026-04-29 16:48:56 +02:00
CarolinePascal 7f624adcc5 chore(duplicate): removing duplicate get_codec_options definition 2026-04-29 16:48:56 +02:00
CarolinePascal 375cf1fdf3 feat(pyav checks): making pyav parameters checks more robust 2026-04-29 16:48:56 +02:00
CarolinePascal b2c2bb7641 feat(VideoEncoderConfig init): making VideoEncoderConfig more robust and adaptable to multiple backends 2026-04-29 16:48:56 +02:00
CarolinePascal 4a87ee1537 fix(concatenation compatibility): adding compatibility check when concatenating video files 2026-04-29 16:48:56 +02:00
CarolinePascal e44f86e516 feat(metadata): adding encoding parameters in dataset metadata 2026-04-29 16:48:56 +02:00
CarolinePascal a0e3acdb67 chore(docs): updating the docs 2026-04-29 16:46:16 +02:00
CarolinePascal 38ff579bcc feat(VideoEncoderConfig): propagating the VideoEncoderConfig in the codebase 2026-04-29 16:44:47 +02:00
CarolinePascal 479e444517 feat(VideoEncoderConfig): creating a VideoEncoderConfig to encapsulate encoding parameters 2026-04-29 16:42:14 +02:00
CarolinePascal 9787b8fa26 feat(pyav utils): adding suport for PyAV encoding parameters validation 2026-04-29 16:42:14 +02:00
CarolinePascal 71f39f6912 chore(video backend): renaming codec into video_backend in get_safe_default_video_backend() 2026-04-29 16:42:14 +02:00
Khalil Meftah b5f65e5332 Expose sarm package API and ship reward model card template (#3477)
* chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in

* chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.
2026-04-29 16:17:16 +02:00
Khalil Meftah cd6b43ea7a fix(train): migrate legacy RA-BC fields in train config loading (#3480) 2026-04-29 16:17:00 +02:00
Steven Palma 2236bbe7a3 fix(rollout): propagate policy-specific CLI config paramaters (#3483)
Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>
2026-04-29 16:13:10 +02:00
Maxime Ellerbach cb0a944941 refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472)
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass

Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.

Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo

Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation

* fix pre-commit

* update docstring inside DatasetInfo.from_dict()

* sorts the unknown to have deterministic output

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>

* refactoring the last few old fieds


* fix crop dataset roi type mismatch


* use consistantly int for data and video_files_size_in_mb

---------

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
2026-04-28 18:40:30 +02:00
Khalil Meftah 8a3d64033f Reward models refactor (#3142)
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes

* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/

* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/

* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py

* refactor(rewards): update imports and delete old reward model locations

* test(rewards): add reward model tests and update existing test imports

* fix(rewards): restore full Classifier and SARM implementations

* test(rewards): restore missing CUDA and mixed precision classifier processor tests

* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train

* refactor(lerobot_train.py): add missing sampling weight script

* linter + missing files

* add testing for sampl weighter

* revert some useless changes, improve typing

* update docs

* add automatic detection of the progress path

* remove type exp

* improve comment

* fix: move rabc.py to rewards/sarm/ and update import paths

* refactor(imports): update reward model imports to new module structure

* refactor(imports): update reward model imports to reflect new module structure

* refactor(imports): conditionally import pandas based on availability

* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig

* refactor(policies): remove reward model branches from policy factory and __init__

* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash

* feat(train): route reward model training through rewards/factory instead of policies/factory

* refactor(train): streamline reward model training logic

* fix(rewards): ensure FileNotFoundError is raised for missing config_file

* refactor(train): update __get_path_fields__ to include reward_model for config loading

* refactor(classifier): remove redundant input normalization in predict_reward method

* fix(train): raise ValueError for non-trainable reward models in train function

* refactor(pretrained_rm): add model card template

* refactor(tests): reward models

* refactor(sarm): update reset method and remove unused action prediction methods

* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function

* fix(train): raise ValueError for PEFT usage in reward model training

* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties

---------

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2026-04-28 17:56:24 +02:00
Steven Palma 03ee50e08f chore(ci): bump docs workflows (#3476) 2026-04-28 15:06:44 +02:00
Steven Palma ca87ccd941 feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI (#3413)
* feat(scripts): lerobot-rollout

* fix(rollout) require dataset in dagger + use duration too

* fix(docs): dagger num_episodes

* test(rollout): fix expectations

* fix(rollout): features check

* fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config

* docs(rollout): edit rename_map instructions

* chore(rollout): multiple minor improvements

* chore(rollout): address coments + minor improvements

* fix(rollout): enable default

* fix(tests): default value RTCConfig

* fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): prevent relativeactions with sync inference engine

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): rtc reanchor to non normalized state

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(rollout): fixing the episode length to use hwc (#3469)

also reducing default length to 5 minutes

* feat(rollout): go back to initial position is now a config

* fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470)

* chore(rollout): note about dagger correction stage

* chore(docs): update comments and docstring

* fix(test): move rtc relative out of rollout module

* fix(rollout): address the review comments

---------

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>
2026-04-28 00:57:35 +02:00
Steven Palma 77352c495c chore(dependencies): update uv.lock (#3437)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-04-27 23:15:46 +02:00
Steven Palma 05a5223885 fix(pi): avoid peak RAM in PiGemma construction by freeing replaced submodules (#3454)
Co-Authored-By: Daiki Kamata <daiki.kamata@access-company.com>
Co-Authored-By: Jack Vial <jackvial@users.noreply.github.com>
Co-Authored-By: Ajay Anubolu <AjAnubolu@users.noreply.github.com>
Co-Authored-By: Finn F. <F-Fer@users.noreply.github.com>
2026-04-24 17:50:12 +02:00
Steven Palma 580d818aa9 fix(dataset): no default overwrite in lerobot tool recompute stats (#3452) 2026-04-24 15:07:19 +02:00
Steven Palma 587aa82021 fix(imports): realsense import name is platform dependent (#3451) 2026-04-24 12:55:38 +02:00
Chuyao Shen 12b88fce02 not use dataclass (#3414)
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-04-24 11:26:59 +02:00
masato-ka fc6c94c82a fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in… (#3419)
* fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding

In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return BaseModelOutputWithPooling instead of a plain torch.FloatTensor.
Added isinstance check to extract pooler_output when the return value is not
a tensor, maintaining backward compatibility with transformers 4.x.

Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach'

* Adding assertion check for pooler_output of CLIP. This change is response to below comment.
https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387

* Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise
https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776

---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-04-23 16:26:58 +02:00
Steven Palma 1add460678 fix(policy): loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT (#3442)
* Fix loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT

When action_is_pad masks out padded timesteps, the subsequent .mean()
still divides by the total element count (including zeroed-out padding),
underestimating the loss. With 60-70% padding this can cut the effective
gradient signal by 2-3x.

Replace mask-then-mean with mask-then-sum / valid-count for all three
affected policies. TDMPC is not affected because it sums over time
before averaging over batch.

Fixes #3353

* linting

Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>

* Update src/lerobot/policies/diffusion/modeling_diffusion.py

Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py

Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py

Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* apply ACT loss normalization suggestion from review

Divide by num_valid (timesteps * action_dim) instead of just timesteps,
matching the diffusion/multi_task_dit fix. Addresses review from
@whats2000 (https://github.com/huggingface/lerobot/pull/3377#discussion_r3106845791).

* fix(test): update safetensor act

---------

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>
2026-04-23 15:23:54 +02:00
Qi Jia 4587c2b648 fix xvla docs (#3291)
Co-authored-by: Qi Jia <kaufou@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-04-23 14:50:32 +02:00
whats2000 2236cdb302 fix(smolvla): correct loss normalization for padded actions (#3434)
Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for
ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the
`action_is_pad` mask to zero out padded timesteps, then calls `.mean()`
(or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number
of elements including the zeroed padding, the loss is diluted by the
padding fraction.

Fixed by normalizing only over valid (non-padded) scalar entries:

    num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1)
    loss = losses.sum(...) / num_valid

`clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both
reduction paths are updated. Behavior when `action_is_pad` is missing is
unchanged (`losses.mean()`).

Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2,
30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` —
same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for
ACT. Heavier-padding recipes will see a larger gap.

Refs: #3353 (original report for ACT), #3377 (fix for the other three
policies).
2026-04-23 10:34:11 +02:00
Steven Palma 7c2466979e chore(dependencies): update uv.lock (#3408)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-04-22 16:38:51 +02:00
Pepijn 39b966e20a docs(agents): add AGENT_GUIDE.md for user facing agent (#3430)
* docs(agents): add AGENT_GUIDE.md with SO-101, data, policy, training, eval guidance

Adds an agent-facing companion to AGENTS.md that helps AI agents (Cursor,
Claude, ChatGPT, etc.) guide end-users through LeRobot without needing to
re-read every doc:

- Mandatory "ask the user first" block (goal, hardware, GPU, skill level)
- SO-101 end-to-end cheat-sheet: install -> calibrate -> record -> train -> eval
- Data-collection tips distilled from the folding project (practice before
  you record, quality > speed, start constrained then add diversity)
- Policy decision table with indicative profiling numbers (update ms, peak
  GPU mem) and AdamW-vs-SGD caveats
- Training duration guidance: 5-10 epoch rule, epoch<->step conversion,
  scheduler/checkpoint scaling with --steps, SmolVLA unfreeze tip
- Real-robot eval via lerobot-record --policy.path and sim eval via
  lerobot-eval, including the pre-baked docker/Dockerfile.benchmark.* images

AGENTS.md gets a short pointer to AGENT_GUIDE.md at the top.
CLAUDE.md (symlink to AGENTS.md) inherits the pointer automatically.

Made-with: Cursor

* docs(agents): recommend 2 cameras (front + wrist) as default

Made-with: Cursor

* docs(agents): add Feetech wiring check and broaden visualizer note

Made-with: Cursor

* docs(agents): clarify Feetech LED behavior (steady-on, not flash)

Made-with: Cursor

* docs(agents): expand Feetech troubleshooting (blinking LED, 5V vs 12V variants)

Made-with: Cursor

* docs(agents): tighten Feetech LED wording

Made-with: Cursor
2026-04-22 11:54:19 +02:00
Pepijn ba27aab79c fix(robotwin): pin compatible curobo in benchmark image (#3427)
* fix(robotwin): pin compatible curobo in benchmark image

* fix(robotwin): make curobo smoke check gpu-free
2026-04-21 19:51:44 +02:00
Pepijn 5adad11128 feat(sim): VLABench benchmark integration (#3396)
feat(sim): add VLABench benchmark integration
Add VLABench as a new simulation benchmark in LeRobot, following the existing LIBERO and MetaWorld patterns.
This PR wires VLABench end-to-end across environment integration, Docker setup, CI smoke evaluation, and documentation. It also fixes a number of upstream packaging and runtime issues required to make VLABench usable and reproducible in CI.
What’s included
Benchmark integration
Add VLABench as a new simulation benchmark.
Expose supported VLABench tasks through the LeRobot env interface.
Follow the established LIBERO / MetaWorld factory patterns.
Preserve lazy async-env metadata so env.unwrapped.metadata["render_fps"] continues to work.
CI smoke evaluation
Add a VLABench smoke-eval job using lerobot/smolvla_vlabench.
Use the correct rename_map for the 3-camera dataset layout.
Expand smoke coverage from 1 to 10 primitive tasks.
Extract task descriptions after eval so metrics artifacts include per-task labels.
Skip Docker Hub login when secrets are unavailable (e.g. fork PRs).
Docker / install fixes
Install VLABench from GitHub rather than PyPI.
Use uv pip, not pip, in the base image.
Fail loudly on install errors instead of masking them.
Clone VLABench into the non-root user’s home directory.
Use shallow editable installs for VLABench and rrt-algorithms to work around missing __init__.py issues.
Pin upstream clones to exact commit SHAs for reproducibility.
Add undeclared runtime dependencies required by VLABench (open3d, colorlog, scikit-learn, openai).
Unpin open3d so Python 3.12 wheels resolve.
Assets
Support downloading VLABench assets from a Hugging Face Hub mirror via VLABENCH_ASSETS_REPO.
Keep Google Drive download support as fallback.
Install huggingface_hub[hf_xet] so Xet-backed assets download correctly.
Validate required mesh/XML asset subtrees at build time.
Patch VLABench constants to tolerate missing asset directories at import time.
Runtime / env correctness
Import VLABench robots and tasks explicitly so decorator-based registry population happens.
Resize and normalize camera observations so they always match the declared (H, W, 3) uint8 observation space.
Reinstall LeRobot editably inside the image so the new env code is actually used.
Coerce agent_pos / ee_state to the expected shape.
Pad actions when needed to match data.ctrl.
Replace zero-padding fallback with proper dm_control IK for 7D end-effector actions.
Refetch dm_control physics on each step instead of caching weakrefs.
Retry unstable resets with reseeding and handle PhysicsError gracefully at step time.
Dataset / policy alignment
Align VLABench observations and actions with Hugging Face dataset conventions used by lerobot/vlabench_unified:
convert EE position between world frame and robot-base frame at the env boundary,
expose / consume Euler XYZ instead of raw quaternion layout,
align gripper semantics with dataset convention (1 = open, 0 = closed).
This fixes policy/env mismatches that previously caused incorrect IK targets and unstable behavior at evaluation time.
Docs
Add a full docs/source/vlabench.mdx page aligned with the standard benchmark template.
Document task selection forms (single task, comma list, suite shortcut).
Document installation, evaluation, training, and result reproduction.
Point examples at lerobot/smolvla_vlabench.
Add a benchmark banner image.
Remove outdated / misleading references to upstream evaluation tracks.
Document manual install flow instead of a broken vlabench extra.
Packaging cleanup
Remove the unresolvable vlabench extra from pyproject.toml.
Remove the no-op VLABench processor step.
Remove the obsolete env unit test that only covered the dropped gripper remap helper.
Apply formatting / logging / style cleanup from review feedback.
Why this is needed
VLABench is not currently consumable as a normal Python dependency and requires several upstream workarounds:
no PyPI release,
missing package declarations,
undeclared runtime deps,
SSH-only submodule references,
asset downloads outside normal package install flow,
registry population that depends on import side effects,
env outputs that do not always match declared observation shapes,
task resets that can diverge under some random layouts.
This PR makes the benchmark usable in LeRobot despite those constraints, and ensures CI runs are reproducible and informative.
If you want a much shorter squash commit message, I’d use this:
feat(sim): integrate VLABench benchmark with CI, Docker, and docs
Add VLABench as a new LeRobot simulation benchmark, following the existing LIBERO / MetaWorld patterns.
This includes:
LeRobot env integration and task exposure,
CI smoke eval with lerobot/smolvla_vlabench,
Docker install and asset-download fixes,
runtime fixes for registry loading, assets, camera obs, action handling, dm_control IK, and PhysicsError recovery,
alignment of obs/action semantics with HF VLABench datasets,
docs and packaging cleanup.
The PR also incorporates review feedback, improves reproducibility by pinning upstream commits, and makes VLABench usable in CI despite upstream packaging and asset-management issues.
2026-04-21 17:54:11 +02:00
Pepijn a07f22e22c feat(envs): add LIBERO-plus robustness benchmark (#3313)
* feat(envs): add LIBERO-plus robustness benchmark integration

- LiberoPlusEnv config (subclass of LiberoEnv, same gym interface)
- Docker image installing LIBERO-plus fork via PYTHONPATH
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus
- pyproject.toml: libero_plus extra

* fix(libero): use suite's perturbation-aware init_states loader

LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that
strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`,
`_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init`
file — the on-disk name for a perturbation variant doesn't exist as a file,
only the base does. lerobot's loader was bypassing that logic and trying to
read the suffix-bearing filename directly, which failed for every non-zero
task id and killed the eval before any rollout video could be written.

Delegate to the suite's method when it exists; fall back to the path-based
loader for vanilla LIBERO (which does not provide the method).

Also drop the hf-libero install + init_files copy from the LIBERO-plus
Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and
`init_files/` for all five suites, so the copy was unnecessary and the
`cp -r` into an existing dir produced a confusing nested layout.

* fix(libero): resolve LIBERO-plus perturbation init_states path ourselves

Delegating to `task_suite.get_task_init_states(i)` works for path resolution
but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`,
which fails on PyTorch 2.6+ because the pickled init_states contains numpy
objects not in the default allowlist:

    _pickle.UnpicklingError: Weights only load failed.
    WeightsUnpickler error: Unsupported global:
      GLOBAL numpy.core.multiarray._reconstruct was not an allowed global.

Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`,
`_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass
`weights_only=False` ourselves. Vanilla LIBERO task names don't contain any
of these patterns except for `_table_` when followed by the word `center`
(e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex
requires `_table_\\d+` so semantic uses are preserved.

* fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus

LIBERO-plus's bddl_base_domain.py resolves scene XMLs with
`os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml
has no effect on scene lookup — MuJoCo always opens
`<clone>/libero/libero/assets/scenes/...`. With no such directory present,
every perturbation task fails on:

    FileNotFoundError: No such file or directory:
      .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml

These textures, views, and extra objects ship only in the 6.4 GB `assets.zip`
published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says
to download and unzip it into the package dir). Fetch it via `hf_hub_download`,
unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at
the extracted dir so everything stays consistent. The download lives in its
own Docker layer so subsequent rebuilds reuse the cached assets.

Drops the lerobot/libero-assets snapshot_download — that mirror only has
vanilla LIBERO textures and is ignored for scene loading anyway.

* fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip

The 6.4 GB zip ships with every entry prefixed by
`inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...`
(the author's internal filesystem layout, not the layout the LIBERO-plus
README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created
`${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened
`${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml`
and hit the same FileNotFoundError.

Extract to a scratch dir, then `mv` the nested `assets/` subtree to the
expected location. Verified the target file exists in the zip central
directory under that exact prefix.

* refactor(libero): inline init_states resolver behind single regex

Collapse the three-style suffix stripper (split/re.sub/in) into one
compiled regex, drop the (Path, bool) tuple return, and move the
`_add_`/`_level` reshape branch into the caller so each branch loads
its own file and returns directly. Net: -11 lines, one fewer helper.

* refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu

Mirror the libero/metaworld/robomme pattern: start from the nightly GPU
image (apt deps, python, uv, venv, lerobot[all] already there) and only
layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build
deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the
PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip.

Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init) the nightly already provides.
123 → 73 lines.

Also:
- Add libero_plus to docs/source/_toctree.yml under Benchmarks so
  doc-builder's TOC integrity check stops failing.
- Repoint the docs dataset link from pepijn223/libero_plus_lerobot to
  the canonical lerobot/libero_plus.
- Revert the stray uv.lock churn (revision/marker diff that crept in
  from an unrelated resolve — unrelated to LIBERO-plus).

* fix(libero-plus): stop touching pyproject + uv.lock

The fast-tests job was rejecting the branch because pyproject.toml had a
[libero_plus] extra whose git dep wasn't represented in uv.lock.

The Docker image no longer needs the extra — it clones LIBERO-plus
directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from
pyproject and restore pyproject.toml + uv.lock to exactly what's on
origin/main, so `uv sync --locked --extra test` is a no-op for this PR.

Also repoint the doc/CI/env comments that still mentioned the extra at
the Docker install path.

* fix(libero-plus): strip perturbation metadata from task descriptions

LIBERO-plus builds task.language by space-joining the perturbation-variant
filename, so every non-_language_ variant inherits a trailing blob like
"view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the
dashboard video labels and no longer matches the base instruction stored
in the training dataset.

Strip those tokens in extract_task_descriptions.py with an end-anchored
regex over the {view,initstate,noise,add,tb,table,light,level}(+digits)
vocabulary. The anchor preserves mid-sentence literal uses of those words
(e.g. "from table center and place it on the plate") — only the trailing
metadata chain is removed. _language_ variants carry real BDDL-sourced
text and are left untouched.

* ci: point benchmark eval checkpoints at the lerobot/ org mirrors

pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: integrate PR #3313 review feedback

- docs: fix paper link to arxiv, add benchmark image, add suite descriptions,
  add LIBERO-plus replacement warning, restructure eval section to match
  LIBERO doc style, fix policy I/O section, remove false try/except claim
- docker: fix shell grouping for hf-libero uninstall, replace hardcoded
  asset path with dynamic find
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO
  always takes the simple path

* fix(docs): use correct LIBERO-plus teaser image URL

* ci(libero-plus): drop redundant hf auth login step

The standalone login step ran `hf auth login` in a throwaway
`docker run --rm` container, so no credentials persisted. Auth is
already performed inside the eval step's container. Removing the
redundant step per PR #3313 review feedback.

* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs

Port of #3416 onto this branch. Without these attributes eval crashes
when calling `env.unwrapped.metadata["render_fps"]` with async vector
envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and
caches the metadata alongside obs/action spaces in the LIBERO and
MetaWorld factories.

* ci: gate Docker Hub login on secret availability

Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step before any of
the actual build/eval work could run. Gate the login on the env-var
expansion of the username so the step is skipped (not failed) when
secrets are absent. Mirrors the existing pattern in the VLABench job.

* Update .github/workflows/benchmark_tests.yml

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update scripts/ci/extract_task_descriptions.py

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update .github/workflows/benchmark_tests.yml

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update docker/Dockerfile.benchmark.libero_plus

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update .github/workflows/benchmark_tests.yml

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(libero-plus): address review feedback

* ci(libero-plus): fix YAML indentation in upload-artifact steps

The `uses:` key on two upload-artifact steps was at column 0 instead
of nested under the step, causing `pre-commit run check-yaml` to fail
with "expected <block end>, but found '<block mapping start>'".


Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
2026-04-20 21:07:21 +02:00
Pepijn 282c31cfef feat(envs): add RoboMME benchmark (#3311)
* feat(envs): add RoboMME benchmark integration

- RoboMME env wrapper with image/wrist_image/state observations
- Docker image with Vulkan, SAPIEN, mani-skill deps
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme
- preprocess_observation: handle image/wrist_image/state keys
- pyproject.toml: robomme extra

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu

Mirror the libero/metaworld pattern: start from the nightly GPU image
(which already has apt deps, uv, venv, and lerobot[all] preinstalled)
and only layer on what RoboMME uniquely needs — the Vulkan libs
ManiSkill/SAPIEN requires, plus the robomme extra with the
gymnasium/numpy overrides.

Drops 48 lines of duplicated base setup (CUDA FROM, python install,
user creation, venv init, base apt deps) that the nightly image already
provides. Net: 102 → 54 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme

- Remove the "Related work" block referencing the prototype branch
  feat/robomme-integration; the PR stands on its own.
- Point all dataset references at lerobot/robomme (docs, env module
  docstring, RoboMMEEnvConfig docstring) — this is the canonical HF
  location once the dataset is mirrored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(robomme): make docs build + fast tests green

1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's
   TOC integrity check stops rejecting the new page.

2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is
   unsatisfiable against the project's numpy>=2 base pin, so `uv sync`
   couldn't resolve a universal lockfile.

   Drop robomme as a pyproject extra entirely — it truly cannot coexist
   with the rest of the dep tree. The Dockerfile installs robomme
   directly from its git URL via `uv pip install --override`, which was
   already the runtime path. pyproject, docs, env docstrings, and the
   CI job comment all now point to the docker-only install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(robomme): realign unit tests with current env API

The tests were written against an earlier env layout and never updated when
the wrapper was refactored, so CI's fast-test job was failing with:

- KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the
  lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset
  columns and preprocess_observation's built-in fallbacks).
- AssertionError: 'robomme' not in result — create_robomme_envs now
  returns {task_name: {task_id: env}}, not {'robomme': {...}}, so
  comma-separated task lists work.
- ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was
  removed; create_robomme_envs is straightforward synchronous now.

Rewrite the 7 failing cases against the current API, drop the three
LazyVectorEnv tests, and add a multi-task test so the new comma-separated
task parsing is covered. Stub install/teardown is moved into helpers
(`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests
stop repeating six boilerplate lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: point benchmark eval checkpoints at the lerobot/ org mirrors

pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: integrate PR #3311 review feedback

- envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos
- envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config
- envs: remove special-case obs conversion in utils.py (no longer needed)
- ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0]
- scripts: extract_task_descriptions supports multiple task_ids
- docs: title to # RoboMME, add image, restructure eval section
- tests: update all key assertions to match new obs naming

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): use correct RoboMME teaser image URL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(robomme): smoke-eval 10 tasks instead of 5

Broader coverage on the RoboMME benchmark CI job: bump the smoke eval
from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS.

Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg,
SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock.

Updated the parse_eval_metrics.py `--task` label from the single
`PickXtimes` stub to the full comma list so the metrics artifact
reflects what was actually run. `parse_eval_metrics.py` already reads
`overall` for multi-task runs, so no parser change is needed.

Made-with: Cursor

* fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up

`_convert_obs` was returning flat keys (`pixels/image`,
`pixels/wrist_image`). `preprocess_observation()` in envs/utils.py
keys off the top-level `"pixels"` entry and, not finding it,
silently dropped every image from the batch. The policy then saw
zero image features and raised

    ValueError: All image features are missing from the batch.

Match the LIBERO layout: return
`{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}`
and declare the same shape in `observation_space`.

Made-with: Cursor

* fix(robomme): align docs and tests with nested pixels obs layout

Addresses PR #3311 review feedback:
- Docs: correct observation keys to `pixels/image` / `pixels/wrist_image`
  (mapped to `observation.images.image` / `observation.images.wrist_image`)
  and drop the now-obsolete column-rename snippet.
- Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`,
  matching the nested layout required by `preprocess_observation()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs

Port of #3416 onto this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: gate Docker Hub login on secret availability

Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step. Gate the login
on the env-var expansion of the username so the step is skipped (not
failed) when secrets are absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(robomme): address review feedback

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 20:21:27 +02:00
Pepijn a147fa4439 feat(envs): add RoboCerebra long-horizon manipulation benchmark (#3314)
* feat(ci): add RoboCerebra benchmark eval job

- Docker image with robosuite/libero deps for RoboCerebra eval
- CI workflow: 1-episode eval with pepijn223/smolvla_robocerebra
- Reuses libero env with rename_map + empty_cameras=3

* docs(robocerebra): add benchmark page and toctree entry

Add a dedicated docs page for RoboCerebra that points at the canonical
dataset lerobot/robocerebra_unified and shows how to run eval + fine-tune
against it. Wire it into the Benchmarks section of the toctree so
doc-builder picks it up.

* ci: point benchmark eval checkpoints at the lerobot/ org mirrors

pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.

* fix(robocerebra): drop alias extra + simplify docker image

Two problems rolled up:

1. `uv sync --locked --extra test` was failing because pyproject.toml added
   a `robocerebra = ["lerobot[libero]"]` alias extra but uv.lock wasn't
   regenerated. Drop the alias — nothing in CI actually needs the extra
   name (the Dockerfile just installs what it needs directly), so this
   restores pyproject.toml and uv.lock to byte-exact origin/main.

2. Rebase docker/Dockerfile.benchmark.robocerebra on
   huggingface/lerobot-gpu:latest (same pattern as libero/metaworld/…).
   The nightly image already ships lerobot[all] which includes [libero],
   so the RoboCerebra image is essentially identical to the LIBERO one:
   fetch libero-assets, write ~/.libero/config.yaml, overlay source.
   92 → 43 lines.

Also repoint the CI workflow comment that referenced the removed extra.

* ci: use dedicated lerobot/smolvla_robocerebra checkpoint for smoke eval

Replace the generic pepijn223/smolvla_libero placeholder with the
purpose-trained lerobot/smolvla_robocerebra model in the RoboCerebra
CI smoke test.

* fix(ci): align RoboCerebra eval with training pipeline

Fixes 5 mismatches that caused 0% success rate:
- env.type: robocerebra (unregistered) → libero
- resolution: 360x360 (default) → 256x256 (matches dataset)
- camera_name_mapping: map eye_in_hand → wrist_image (not image2)
- empty_cameras: 3 → 1 (matches training)
- add HF_USER_TOKEN guard on eval step

* fix(ci): set env.fps=20 and explicit obs_type for RoboCerebra eval

Match the dataset's 20 FPS (LiberoEnv defaults to 30) and make
obs_type=pixels_agent_pos explicit for safety against future changes.

* docs(robocerebra): align page with adding_benchmarks template

Rework docs/source/robocerebra.mdx to follow the standard benchmark
doc structure: intro + links + available tasks + installation + eval
+ recommended episodes + policy I/O + training + reproducing results.

- Point everything at lerobot/smolvla_robocerebra (the released
  checkpoint), not the personal pepijn223 mirror.
- Add the --env.fps=20 and --env.obs_type=pixels_agent_pos flags
  that CI actually uses, so copy-paste eval reproduces CI.
- Split the "Training" block out of the recipe section into its own
  section with the feature table.
- Add an explicit "Reproducing published results" section pointing
  at the CI smoke eval.

* fix: integrate PR #3314 review feedback

- ci(robocerebra): drop redundant hf auth login step (auth is
  already performed inside the eval step's container).
- ci(robocerebra): add Docker Hub login before the image build
  to pick up the authenticated rate limit.
- docs(robocerebra): align eval snippet with the CI command
  (observation size, camera_name_mapping, use_async_envs, device,
  empty_cameras=1).

* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs

Port of #3416 onto this branch.

* ci: gate Docker Hub login on secret availability

* Update .github/workflows/benchmark_tests.yml

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update .github/workflows/benchmark_tests.yml

Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
2026-04-20 19:12:15 +02:00