Files
lerobot/tests
Pepijn 282c31cfef feat(envs): add RoboMME benchmark (#3311)
* feat(envs): add RoboMME benchmark integration

- RoboMME env wrapper with image/wrist_image/state observations
- Docker image with Vulkan, SAPIEN, mani-skill deps
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme
- preprocess_observation: handle image/wrist_image/state keys
- pyproject.toml: robomme extra

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu

Mirror the libero/metaworld pattern: start from the nightly GPU image
(which already has apt deps, uv, venv, and lerobot[all] preinstalled)
and only layer on what RoboMME uniquely needs — the Vulkan libs
ManiSkill/SAPIEN requires, plus the robomme extra with the
gymnasium/numpy overrides.

Drops 48 lines of duplicated base setup (CUDA FROM, python install,
user creation, venv init, base apt deps) that the nightly image already
provides. Net: 102 → 54 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme

- Remove the "Related work" block referencing the prototype branch
  feat/robomme-integration; the PR stands on its own.
- Point all dataset references at lerobot/robomme (docs, env module
  docstring, RoboMMEEnvConfig docstring) — this is the canonical HF
  location once the dataset is mirrored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(robomme): make docs build + fast tests green

1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's
   TOC integrity check stops rejecting the new page.

2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is
   unsatisfiable against the project's numpy>=2 base pin, so `uv sync`
   couldn't resolve a universal lockfile.

   Drop robomme as a pyproject extra entirely — it truly cannot coexist
   with the rest of the dep tree. The Dockerfile installs robomme
   directly from its git URL via `uv pip install --override`, which was
   already the runtime path. pyproject, docs, env docstrings, and the
   CI job comment all now point to the docker-only install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(robomme): realign unit tests with current env API

The tests were written against an earlier env layout and never updated when
the wrapper was refactored, so CI's fast-test job was failing with:

- KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the
  lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset
  columns and preprocess_observation's built-in fallbacks).
- AssertionError: 'robomme' not in result — create_robomme_envs now
  returns {task_name: {task_id: env}}, not {'robomme': {...}}, so
  comma-separated task lists work.
- ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was
  removed; create_robomme_envs is straightforward synchronous now.

Rewrite the 7 failing cases against the current API, drop the three
LazyVectorEnv tests, and add a multi-task test so the new comma-separated
task parsing is covered. Stub install/teardown is moved into helpers
(`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests
stop repeating six boilerplate lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: point benchmark eval checkpoints at the lerobot/ org mirrors

pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: integrate PR #3311 review feedback

- envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos
- envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config
- envs: remove special-case obs conversion in utils.py (no longer needed)
- ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0]
- scripts: extract_task_descriptions supports multiple task_ids
- docs: title to # RoboMME, add image, restructure eval section
- tests: update all key assertions to match new obs naming

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): use correct RoboMME teaser image URL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(robomme): smoke-eval 10 tasks instead of 5

Broader coverage on the RoboMME benchmark CI job: bump the smoke eval
from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS.

Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg,
SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock.

Updated the parse_eval_metrics.py `--task` label from the single
`PickXtimes` stub to the full comma list so the metrics artifact
reflects what was actually run. `parse_eval_metrics.py` already reads
`overall` for multi-task runs, so no parser change is needed.

Made-with: Cursor

* fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up

`_convert_obs` was returning flat keys (`pixels/image`,
`pixels/wrist_image`). `preprocess_observation()` in envs/utils.py
keys off the top-level `"pixels"` entry and, not finding it,
silently dropped every image from the batch. The policy then saw
zero image features and raised

    ValueError: All image features are missing from the batch.

Match the LIBERO layout: return
`{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}`
and declare the same shape in `observation_space`.

Made-with: Cursor

* fix(robomme): align docs and tests with nested pixels obs layout

Addresses PR #3311 review feedback:
- Docs: correct observation keys to `pixels/image` / `pixels/wrist_image`
  (mapped to `observation.images.image` / `observation.images.wrist_image`)
  and drop the now-obsolete column-rename snippet.
- Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`,
  matching the nested layout required by `preprocess_observation()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs

Port of #3416 onto this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: gate Docker Hub login on secret availability

Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step. Gate the login
on the env-var expansion of the username so the step is skipped (not
failed) when secrets are absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(robomme): address review feedback

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 20:21:27 +02:00
..