From 282c31cfef4ebe80dfd773008aaa07891a0b55d2 Mon Sep 17 00:00:00 2001 From: Pepijn <138571049+pkooij@users.noreply.github.com> Date: Mon, 20 Apr 2026 20:21:27 +0200 Subject: [PATCH] feat(envs): add RoboMME benchmark (#3311) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(envs): add RoboMME benchmark integration - RoboMME env wrapper with image/wrist_image/state observations - Docker image with Vulkan, SAPIEN, mani-skill deps - CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme - preprocess_observation: handle image/wrist_image/state keys - pyproject.toml: robomme extra Co-Authored-By: Claude Opus 4.6 (1M context) * refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu Mirror the libero/metaworld pattern: start from the nightly GPU image (which already has apt deps, uv, venv, and lerobot[all] preinstalled) and only layer on what RoboMME uniquely needs — the Vulkan libs ManiSkill/SAPIEN requires, plus the robomme extra with the gymnasium/numpy overrides. Drops 48 lines of duplicated base setup (CUDA FROM, python install, user creation, venv init, base apt deps) that the nightly image already provides. Net: 102 → 54 lines. Co-Authored-By: Claude Opus 4.6 (1M context) * docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme - Remove the "Related work" block referencing the prototype branch feat/robomme-integration; the PR stands on its own. - Point all dataset references at lerobot/robomme (docs, env module docstring, RoboMMEEnvConfig docstring) — this is the canonical HF location once the dataset is mirrored. Co-Authored-By: Claude Opus 4.6 (1M context) * fix(robomme): make docs build + fast tests green 1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's TOC integrity check stops rejecting the new page. 2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is unsatisfiable against the project's numpy>=2 base pin, so `uv sync` couldn't resolve a universal lockfile. Drop robomme as a pyproject extra entirely — it truly cannot coexist with the rest of the dep tree. The Dockerfile installs robomme directly from its git URL via `uv pip install --override`, which was already the runtime path. pyproject, docs, env docstrings, and the CI job comment all now point to the docker-only install. Co-Authored-By: Claude Opus 4.6 (1M context) * test(robomme): realign unit tests with current env API The tests were written against an earlier env layout and never updated when the wrapper was refactored, so CI's fast-test job was failing with: - KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset columns and preprocess_observation's built-in fallbacks). - AssertionError: 'robomme' not in result — create_robomme_envs now returns {task_name: {task_id: env}}, not {'robomme': {...}}, so comma-separated task lists work. - ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was removed; create_robomme_envs is straightforward synchronous now. Rewrite the 7 failing cases against the current API, drop the three LazyVectorEnv tests, and add a multi-task test so the new comma-separated task parsing is covered. Stub install/teardown is moved into helpers (`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests stop repeating six boilerplate lines. Co-Authored-By: Claude Opus 4.6 (1M context) * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) * fix: integrate PR #3311 review feedback - envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos - envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config - envs: remove special-case obs conversion in utils.py (no longer needed) - ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0] - scripts: extract_task_descriptions supports multiple task_ids - docs: title to # RoboMME, add image, restructure eval section - tests: update all key assertions to match new obs naming Co-Authored-By: Claude Opus 4.6 (1M context) * fix(docs): use correct RoboMME teaser image URL Co-Authored-By: Claude Opus 4.6 (1M context) * ci(robomme): smoke-eval 10 tasks instead of 5 Broader coverage on the RoboMME benchmark CI job: bump the smoke eval from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS. Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg, SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock. Updated the parse_eval_metrics.py `--task` label from the single `PickXtimes` stub to the full comma list so the metrics artifact reflects what was actually run. `parse_eval_metrics.py` already reads `overall` for multi-task runs, so no parser change is needed. Made-with: Cursor * fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up `_convert_obs` was returning flat keys (`pixels/image`, `pixels/wrist_image`). `preprocess_observation()` in envs/utils.py keys off the top-level `"pixels"` entry and, not finding it, silently dropped every image from the batch. The policy then saw zero image features and raised ValueError: All image features are missing from the batch. Match the LIBERO layout: return `{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}` and declare the same shape in `observation_space`. Made-with: Cursor * fix(robomme): align docs and tests with nested pixels obs layout Addresses PR #3311 review feedback: - Docs: correct observation keys to `pixels/image` / `pixels/wrist_image` (mapped to `observation.images.image` / `observation.images.wrist_image`) and drop the now-obsolete column-rename snippet. - Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`, matching the nested layout required by `preprocess_observation()`. Co-Authored-By: Claude Opus 4.7 (1M context) * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Co-Authored-By: Claude Opus 4.7 (1M context) * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Co-Authored-By: Claude Opus 4.7 (1M context) * fix(robomme): address review feedback --------- Co-authored-by: Claude Opus 4.6 (1M context) --- .github/workflows/benchmark_tests.yml | 104 ++++++++++ docker/Dockerfile.benchmark.robomme | 56 ++++++ docs/source/_toctree.yml | 2 + docs/source/robomme.mdx | 130 +++++++++++++ pyproject.toml | 4 + scripts/ci/extract_task_descriptions.py | 45 +++++ src/lerobot/envs/configs.py | 57 ++++++ src/lerobot/envs/robomme.py | 245 ++++++++++++++++++++++++ tests/test_robomme_env.py | 232 ++++++++++++++++++++++ 9 files changed, 875 insertions(+) create mode 100644 docker/Dockerfile.benchmark.robomme create mode 100644 docs/source/robomme.mdx create mode 100644 src/lerobot/envs/robomme.py create mode 100644 tests/test_robomme_env.py diff --git a/.github/workflows/benchmark_tests.yml b/.github/workflows/benchmark_tests.yml index b65883f1a..d055365bd 100644 --- a/.github/workflows/benchmark_tests.yml +++ b/.github/workflows/benchmark_tests.yml @@ -632,3 +632,107 @@ jobs: name: robocerebra-metrics path: /tmp/robocerebra-artifacts/metrics.json if-no-files-found: warn + + # ── ROBOMME ─────────────────────────────────────────────────────────────── + # Isolated image: mani-skill/SAPIEN/Vulkan chain with gymnasium and numpy + # overrides (robomme can't be a pyproject extra due to numpy<2 pin). + robomme-integration-test: + name: RoboMME — build image + 1-episode eval + runs-on: + group: aws-g6-4xlarge-plus + env: + HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }} + ROBOMME_POLICY: lerobot/smolvla_robomme + ROBOMME_TASKS: PickXtimes,BinFill,StopCube,MoveCube,InsertPeg,SwingXtimes,VideoUnmask,ButtonUnmask,PickHighlight,PatternLock + + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + with: + persist-credentials: false + lfs: true + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses] + with: + cache-binary: false + + - name: Login to Docker Hub + if: ${{ env.DOCKERHUB_USERNAME != '' }} + uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses] + with: + username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }} + password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }} + + - name: Build RoboMME benchmark image + uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses] + with: + context: . + file: docker/Dockerfile.benchmark.robomme + push: false + load: true + tags: lerobot-benchmark-robomme:ci + + - name: Run RoboMME smoke eval (10 tasks, 1 episode each) + if: env.HF_USER_TOKEN != '' + run: | + docker run --name robomme-eval --gpus all \ + --shm-size=4g \ + -e HF_HOME=/tmp/hf \ + -e HF_USER_TOKEN="${HF_USER_TOKEN}" \ + -e HF_HUB_DOWNLOAD_TIMEOUT=300 \ + -e ROBOMME_POLICY="${ROBOMME_POLICY}" \ + -e ROBOMME_TASKS="${ROBOMME_TASKS}" \ + lerobot-benchmark-robomme:ci \ + bash -c " + hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true + lerobot-eval \ + --policy.path=\"\$ROBOMME_POLICY\" \ + --env.type=robomme \ + --env.task=\"\$ROBOMME_TASKS\" \ + --env.dataset_split=test \ + --env.task_ids=[0] \ + --eval.batch_size=1 \ + --eval.n_episodes=1 \ + --eval.use_async_envs=false \ + --policy.device=cuda \ + '--rename_map={\"observation.images.image\": \"observation.images.camera1\", \"observation.images.wrist_image\": \"observation.images.camera2\"}' \ + --policy.empty_cameras=3 \ + --output_dir=/tmp/eval-artifacts + python scripts/ci/extract_task_descriptions.py \ + --env robomme --task \"\$ROBOMME_TASKS\" \ + --output /tmp/eval-artifacts/task_descriptions.json + " + + - name: Copy RoboMME artifacts from container + if: always() + run: | + mkdir -p /tmp/robomme-artifacts + docker cp robomme-eval:/tmp/eval-artifacts/. /tmp/robomme-artifacts/ 2>/dev/null || true + docker rm -f robomme-eval || true + + - name: Parse RoboMME eval metrics + if: always() + run: | + python3 scripts/ci/parse_eval_metrics.py \ + --artifacts-dir /tmp/robomme-artifacts \ + --env robomme \ + --task "${ROBOMME_TASKS}" \ + --policy "${ROBOMME_POLICY}" + + - name: Upload RoboMME rollout video + if: always() + uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses] + with: + name: robomme-rollout-video + path: /tmp/robomme-artifacts/videos/ + if-no-files-found: warn + + - name: Upload RoboMME eval metrics + if: always() + uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses] + with: + name: robomme-metrics + path: /tmp/robomme-artifacts/metrics.json + if-no-files-found: warn diff --git a/docker/Dockerfile.benchmark.robomme b/docker/Dockerfile.benchmark.robomme new file mode 100644 index 000000000..2bfc83b4f --- /dev/null +++ b/docker/Dockerfile.benchmark.robomme @@ -0,0 +1,56 @@ +# Copyright 2026 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Benchmark image for RoboMME integration tests. +# Extends the nightly GPU image (which has lerobot[all]) with Vulkan system +# libs for ManiSkill/SAPIEN and the robomme extra. robomme isn't in [all] +# because mani-skill hard-pins gymnasium==0.29.1 and numpy<2.0.0 which +# conflict with lerobot's defaults; both are safe at runtime: +# - gymnasium 0.29.x has the same 5-tuple step() API as 1.x (since 0.26) +# - numpy 1.26.4 is API-compatible with lerobot's actual usage. +# +# Build: docker build -f docker/Dockerfile.benchmark.robomme -t lerobot-benchmark-robomme . +# Run: docker run --gpus all --rm lerobot-benchmark-robomme lerobot-eval ... + +FROM huggingface/lerobot-gpu:latest + +# NVIDIA Container Toolkit: expose Vulkan driver capability for headless rendering. +ENV NVIDIA_DRIVER_CAPABILITIES=all \ + VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json + +# ManiSkill/SAPIEN's renderer needs Vulkan, which isn't in the base image. +USER root +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + libvulkan1 libvulkan-dev mesa-vulkan-drivers \ + && mkdir -p /usr/share/vulkan/icd.d \ + && echo '{"file_format_version":"1.0.0","ICD":{"library_path":"libGLX_nvidia.so.0","api_version":"1.3.0"}}' \ + > /usr/share/vulkan/icd.d/nvidia_icd.json \ + && apt-get clean && rm -rf /var/lib/apt/lists/* +USER user_lerobot + +# Install smolvla + av-dep via the PR's pyproject, then layer robomme on top +# with gymnasium/numpy overrides. robomme isn't a pyproject extra because its +# mani-skill pin conflicts with lerobot's base numpy>=2 (see pyproject.toml). +COPY --chown=user_lerobot:user_lerobot setup.py pyproject.toml uv.lock README.md MANIFEST.in ./ +RUN printf 'gymnasium==0.29.1\nnumpy==1.26.4\n' > /tmp/robomme_override.txt \ + && uv pip install --no-cache --override /tmp/robomme_override.txt \ + -e ".[smolvla,av-dep]" \ + "robomme @ git+https://github.com/RoboMME/robomme_benchmark.git@main" \ + && python -c "import robomme; print('robomme import OK')" + +# Overlay the PR's source code on top of the nightly image. +COPY --chown=user_lerobot:user_lerobot . . + +CMD ["/bin/bash"] diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index ff3c08e96..9fab7cb37 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -85,6 +85,8 @@ title: RoboCasa365 - local: robocerebra title: RoboCerebra + - local: robomme + title: RoboMME - local: envhub_isaaclab_arena title: NVIDIA IsaacLab Arena Environments title: "Benchmarks" diff --git a/docs/source/robomme.mdx b/docs/source/robomme.mdx new file mode 100644 index 000000000..6613a3923 --- /dev/null +++ b/docs/source/robomme.mdx @@ -0,0 +1,130 @@ +# RoboMME + +[RoboMME](https://robomme.github.io) is a memory-augmented manipulation benchmark built on ManiSkill (SAPIEN). It evaluates a robot's ability to retain and use information across an episode — counting, object permanence, reference, and imitation. + +- **16 tasks** across 4 memory-skill suites +- **1,600 training demos** (100 per task, 50 val, 50 test) +- **Dataset**: [`lerobot/robomme`](https://huggingface.co/datasets/lerobot/robomme) — LeRobot v3.0, 768K frames at 10 fps +- **Simulator**: ManiSkill / SAPIEN, Panda arm, Linux only + +![RoboMME benchmark tasks overview](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2603.04639/gradient.png) + +## Tasks + +| Suite | Tasks | +| --------------------------------- | ------------------------------------------------------------- | +| **Counting** (temporal memory) | BinFill, PickXtimes, SwingXtimes, StopCube | +| **Permanence** (spatial memory) | VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap | +| **Reference** (object memory) | PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder | +| **Imitation** (procedural memory) | MoveCube, InsertPeg, PatternLock, RouteStick | + +## Installation + +> RoboMME requires **Linux** (ManiSkill/SAPIEN uses Vulkan rendering). Docker is recommended to isolate dependency conflicts. + +### Native (Linux) + +```bash +pip install --override <(printf 'gymnasium==0.29.1\nnumpy==1.26.4\n') \ + -e '.[smolvla,av-dep]' \ + 'robomme @ git+https://github.com/RoboMME/robomme_benchmark.git@main' +``` + +> **Dependency note**: `mani-skill` (pulled by `robomme`) pins `gymnasium==0.29.1` and `numpy<2.0.0`, which conflict with lerobot's base `numpy>=2.0.0`. That's why `robomme` is not a pyproject extra — use the override install above, or the Docker approach below to avoid conflicts entirely. + +### Docker (recommended) + +```bash +# Build base image first (from repo root) +docker build -f docker/Dockerfile.eval-base -t lerobot-eval-base . + +# Build RoboMME eval image (applies gymnasium + numpy pin overrides) +docker build -f docker/Dockerfile.benchmark.robomme -t lerobot-robomme . +``` + +The `docker/Dockerfile.benchmark.robomme` image overrides `gymnasium==0.29.1` and `numpy==1.26.4` after lerobot's install. Both versions are runtime-safe for lerobot's actual API usage. + +## Running Evaluation + +### Default (single task, single episode) + +```bash +lerobot-eval \ + --policy.path= \ + --env.type=robomme \ + --env.task=PickXtimes \ + --env.dataset_split=test \ + --env.task_ids=[0] \ + --eval.batch_size=1 \ + --eval.n_episodes=1 +``` + +### Multi-task evaluation + +Evaluate multiple tasks in one run by comma-separating task names. Use `task_ids` to control which episodes are evaluated per task. Recommended: 50 episodes per task for the test split. + +```bash +lerobot-eval \ + --policy.path= \ + --env.type=robomme \ + --env.task=PickXtimes,BinFill,StopCube,MoveCube,InsertPeg \ + --env.dataset_split=test \ + --env.task_ids=[0,1,2,3,4,5,6,7,8,9] \ + --eval.batch_size=1 \ + --eval.n_episodes=50 +``` + +### Key CLI options for `env.type=robomme` + +| Option | Default | Description | +| -------------------- | ------------- | -------------------------------------------------- | +| `env.task` | `PickXtimes` | Any of the 16 task names above (comma-separated) | +| `env.dataset_split` | `test` | `train`, `val`, or `test` | +| `env.action_space` | `joint_angle` | `joint_angle` (8-D) or `ee_pose` (7-D) | +| `env.episode_length` | `300` | Max steps per episode | +| `env.task_ids` | `null` | List of episode indices to evaluate (null = `[0]`) | + +## Dataset + +The dataset [`lerobot/robomme`](https://huggingface.co/datasets/lerobot/robomme) is in **LeRobot v3.0 format** and can be loaded directly: + +```python +from lerobot.datasets.lerobot_dataset import LeRobotDataset + +dataset = LeRobotDataset("lerobot/robomme") +``` + +### Dataset features + +| Feature | Shape | Description | +| ------------------ | ------------- | ------------------------------- | +| `image` | (256, 256, 3) | Front camera RGB | +| `wrist_image` | (256, 256, 3) | Wrist camera RGB | +| `actions` | (8,) | Joint angles + gripper | +| `state` | (8,) | Joint positions + gripper state | +| `simple_subgoal` | str | High-level language annotation | +| `grounded_subgoal` | str | Grounded language annotation | +| `episode_index` | int | Episode ID | +| `frame_index` | int | Frame within episode | + +### Feature key alignment (training) + +The env wrapper exposes `pixels/image` and `pixels/wrist_image` as observation keys. The `features_map` in `RoboMMEEnv` maps these to `observation.images.image` and `observation.images.wrist_image` for the policy. State is exposed as `agent_pos` and maps to `observation.state`. + +The dataset's `image` and `wrist_image` columns already align with the policy input keys, so no renaming is needed when fine-tuning. + +## Action Spaces + +| Type | Dim | Description | +| ------------- | --- | --------------------------------------------------------- | +| `joint_angle` | 8 | 7 joint angles + 1 gripper (−1 closed, +1 open, absolute) | +| `ee_pose` | 7 | xyz + roll/pitch/yaw + gripper | + +Set via `--env.action_space=joint_angle` (default) or `--env.action_space=ee_pose`. + +## Platform Notes + +- **Linux only**: ManiSkill requires SAPIEN/Vulkan. macOS and Windows are not supported. +- **GPU recommended**: Rendering is CPU-capable but slow; CUDA + Vulkan gives full speed. +- **gymnasium / numpy conflict**: See installation note above. Docker image handles this automatically. +- **ManiSkill fork**: `robomme` depends on a specific ManiSkill fork (`YinpeiDai/ManiSkill`), pulled in automatically via the `robomme` package. diff --git a/pyproject.toml b/pyproject.toml index 10789b0f2..dbc866a49 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -212,6 +212,10 @@ aloha = ["lerobot[dataset]", "gym-aloha>=0.1.2,<0.2.0", "lerobot[scipy-dep]"] pusht = ["lerobot[dataset]", "gym-pusht>=0.1.5,<0.2.0", "pymunk>=6.6.0,<7.0.0"] # TODO: Fix pymunk version in gym-pusht instead libero = ["lerobot[dataset]", "lerobot[transformers-dep]", "hf-libero>=0.1.3,<0.2.0; sys_platform == 'linux'", "lerobot[scipy-dep]"] metaworld = ["lerobot[dataset]", "metaworld==3.0.0", "lerobot[scipy-dep]"] +# NOTE: robomme is NOT a pyproject extra — mani-skill hard-pins numpy<2 +# which conflicts with lerobot's numpy>=2 base pin, so the two trees can't +# resolve into a single env. Install it only in the RoboMME Docker image +# via `uv pip install --override` (see docker/Dockerfile.benchmark.robomme). # NOTE: robocasa is NOT exposed as a `lerobot` extra. Its setup.py pins # `lerobot==0.3.3` in install_requires, which cyclically shadows our own # workspace `lerobot` and makes the graph unsolvable under any resolver diff --git a/scripts/ci/extract_task_descriptions.py b/scripts/ci/extract_task_descriptions.py index c9216e02d..33e3868d4 100644 --- a/scripts/ci/extract_task_descriptions.py +++ b/scripts/ci/extract_task_descriptions.py @@ -92,13 +92,56 @@ def _robocasa_descriptions(task_spec: str) -> dict[str, str]: return out +_ROBOMME_DESCRIPTIONS = { + "BinFill": "Fill the target bin with the correct number of cubes", + "PickXtimes": "Pick the indicated cube the specified number of times", + "SwingXtimes": "Swing the object the specified number of times", + "StopCube": "Grasp and stop the moving cube", + "VideoUnmask": "Pick the cube shown in the reference video", + "VideoUnmaskSwap": "Pick the cube matching the reference video after a swap", + "ButtonUnmask": "Press the button indicated by the reference", + "ButtonUnmaskSwap": "Press the correct button after objects are swapped", + "PickHighlight": "Pick the highlighted cube", + "VideoRepick": "Repick the cube shown in the reference video", + "VideoPlaceButton": "Place the cube on the button shown in the video", + "VideoPlaceOrder": "Place cubes in the order shown in the video", + "MoveCube": "Move the cube to the target location", + "InsertPeg": "Insert the peg into the target hole", + "PatternLock": "Unlock the pattern by pressing buttons in sequence", + "RouteStick": "Route the stick through the required waypoints", +} + + +def _robomme_descriptions(task_names: str, task_ids: list[int] | None = None) -> dict[str, str]: + """Return descriptions for each requested RoboMME task. Keys match the + video filename pattern `_` used by the eval script.""" + if task_ids is None: + task_ids = [0] + out: dict[str, str] = {} + for name in (t.strip() for t in task_names.split(",") if t.strip()): + desc = _ROBOMME_DESCRIPTIONS.get(name, name) + for tid in task_ids: + out[f"{name}_{tid}"] = desc + return out + + def main() -> int: parser = argparse.ArgumentParser(description=__doc__) parser.add_argument("--env", required=True, help="Environment family (libero, metaworld, ...)") parser.add_argument("--task", required=True, help="Task/suite name (e.g. libero_spatial)") + parser.add_argument( + "--task-ids", + type=str, + default=None, + help="Comma-separated task IDs (e.g. '0,1,2'). Default: [0]", + ) parser.add_argument("--output", required=True, help="Path to write task_descriptions.json") args = parser.parse_args() + task_ids: list[int] | None = None + if args.task_ids: + task_ids = [int(x.strip()) for x in args.task_ids.split(",")] + descriptions: dict[str, str] = {} try: if args.env == "libero": @@ -109,6 +152,8 @@ def main() -> int: descriptions = _robotwin_descriptions(args.task) elif args.env == "robocasa": descriptions = _robocasa_descriptions(args.task) + elif args.env == "robomme": + descriptions = _robomme_descriptions(args.task, task_ids=task_ids) else: print( f"[extract_task_descriptions] No description extractor for env '{args.env}'.", diff --git a/src/lerobot/envs/configs.py b/src/lerobot/envs/configs.py index 4f93a26a7..50abe93b0 100644 --- a/src/lerobot/envs/configs.py +++ b/src/lerobot/envs/configs.py @@ -736,3 +736,60 @@ class RoboTwinEnvConfig(EnvConfig): observation_width=self.observation_width, episode_length=self.episode_length, ) + + +@EnvConfig.register_subclass("robomme") +@dataclass +class RoboMMEEnv(EnvConfig): + """RoboMME memory-augmented manipulation benchmark (ManiSkill/SAPIEN). + + 16 tasks across 4 suites: Counting, Permanence, Reference, Imitation. + Dataset: lerobot/robomme (LeRobot v3.0, 1,600 episodes). + Benchmark: https://github.com/RoboMME/robomme_benchmark + + Requires the `robomme` git package installed separately (Linux only); + see docker/Dockerfile.benchmark.robomme for the canonical install. + """ + + task: str = "PickXtimes" + fps: int = 10 + episode_length: int = 300 + action_space: str = "joint_angle" # or "ee_pose" (7-D) + dataset_split: str = "test" # "train" | "val" | "test" + task_ids: list[int] | None = None + features: dict[str, PolicyFeature] = field(default_factory=dict) + features_map: dict[str, str] = field( + default_factory=lambda: { + ACTION: ACTION, + "pixels/image": f"{OBS_IMAGES}.image", + "pixels/wrist_image": f"{OBS_IMAGES}.wrist_image", + "agent_pos": OBS_STATE, + } + ) + + def __post_init__(self): + action_dim = 8 if self.action_space == "joint_angle" else 7 + self.features = { + ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(action_dim,)), + "pixels/image": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)), + "pixels/wrist_image": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)), + "agent_pos": PolicyFeature(type=FeatureType.STATE, shape=(8,)), + } + + @property + def gym_kwargs(self) -> dict: + return {} + + def create_envs(self, n_envs: int, use_async_envs: bool = True): + from lerobot.envs.robomme import create_robomme_envs + + env_cls = _make_vec_env_cls(use_async_envs, n_envs) + return create_robomme_envs( + task=self.task, + n_envs=n_envs, + action_space_type=self.action_space, + dataset=self.dataset_split, + episode_length=self.episode_length, + task_ids=self.task_ids, + env_cls=env_cls, + ) diff --git a/src/lerobot/envs/robomme.py b/src/lerobot/envs/robomme.py new file mode 100644 index 000000000..69d665bd4 --- /dev/null +++ b/src/lerobot/envs/robomme.py @@ -0,0 +1,245 @@ +"""RoboMME environment wrapper for LeRobot evaluation. + +Wraps the RoboMME ``BenchmarkEnvBuilder`` into a Gymnasium-compatible +``VectorEnv`` suitable for ``lerobot_eval``. + +RoboMME tasks: + Counting: BinFill, PickXtimes, SwingXtimes, StopCube + Permanence: VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap + Reference: PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder + Imitation: MoveCube, InsertPeg, PatternLock, RouteStick + +Dataset: lerobot/robomme (LeRobot v3.0, 1,600 episodes) +Install: see docker/Dockerfile.benchmark.robomme (Linux only — mani-skill vs numpy pin conflict) +Benchmark: https://github.com/RoboMME/robomme_benchmark +""" + +from __future__ import annotations + +from collections.abc import Callable, Sequence +from functools import partial +from typing import Any + +import gymnasium as gym +import numpy as np +from gymnasium import spaces + +from .utils import _LazyAsyncVectorEnv + +ROBOMME_TASKS = [ + "BinFill", + "PickXtimes", + "SwingXtimes", + "StopCube", + "VideoUnmask", + "VideoUnmaskSwap", + "ButtonUnmask", + "ButtonUnmaskSwap", + "PickHighlight", + "VideoRepick", + "VideoPlaceButton", + "VideoPlaceOrder", + "MoveCube", + "InsertPeg", + "PatternLock", + "RouteStick", +] + + +class RoboMMEGymEnv(gym.Env): + """Thin Gymnasium wrapper around a single RoboMME episode env.""" + + metadata = {"render_modes": ["rgb_array"], "render_fps": 10} + + def __init__( + self, + task: str = "PickXtimes", + action_space_type: str = "joint_angle", + dataset: str = "test", + episode_idx: int = 0, + max_steps: int = 300, + ): + super().__init__() + from robomme.env_record_wrapper import BenchmarkEnvBuilder + + self._task = task + self._action_space_type = action_space_type + self._dataset = dataset + self._episode_idx = episode_idx + self._max_steps = max_steps + self._max_episode_steps = max_steps + + self._builder = BenchmarkEnvBuilder( + env_id=task, + dataset=dataset, + action_space=action_space_type, + gui_render=False, + max_steps=max_steps, + ) + self._env = None + self._last_raw_obs: dict | None = None + + action_dim = 8 if action_space_type == "joint_angle" else 7 + self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(action_dim,), dtype=np.float32) + # `pixels` must be a nested Dict so `preprocess_observation()` in + # envs/utils.py picks it up and maps each camera to + # `observation.images.`. A flat layout (`pixels/image`, + # `pixels/wrist_image`) silently drops every image from the batch. + self.observation_space = spaces.Dict( + { + "pixels": spaces.Dict( + { + "image": spaces.Box(0, 255, shape=(256, 256, 3), dtype=np.uint8), + "wrist_image": spaces.Box(0, 255, shape=(256, 256, 3), dtype=np.uint8), + } + ), + "agent_pos": spaces.Box(-np.inf, np.inf, shape=(8,), dtype=np.float32), + } + ) + + def reset(self, *, seed=None, options=None): + super().reset(seed=seed) + self._env = self._builder.make_env_for_episode( + episode_idx=self._episode_idx, + max_steps=self._max_steps, + ) + obs, info = self._env.reset() + self._last_raw_obs = obs + return self._convert_obs(obs), self._convert_info(info) + + def step(self, action): + obs, reward, terminated, truncated, info = self._env.step(action) + self._last_raw_obs = obs + + terminated_bool = bool(terminated.item()) if hasattr(terminated, "item") else bool(terminated) + truncated_bool = bool(truncated.item()) if hasattr(truncated, "item") else bool(truncated) + + status = info.get("status", "ongoing") + is_success = status == "success" + conv_info = self._convert_info(info) + conv_info["is_success"] = is_success + + return self._convert_obs(obs), float(reward), terminated_bool, truncated_bool, conv_info + + def render(self) -> np.ndarray | None: + """Return the front camera image from the last observation for video recording.""" + if self._last_raw_obs is None: + return np.zeros((256, 256, 3), dtype=np.uint8) + front = self._last_raw_obs.get("front_rgb_list") + if front is None: + return np.zeros((256, 256, 3), dtype=np.uint8) + frame = front[-1] if isinstance(front, list) else front + return np.asarray(frame, dtype=np.uint8) + + def _convert_obs(self, obs: dict) -> dict: + front_rgb = ( + obs["front_rgb_list"][-1] if isinstance(obs["front_rgb_list"], list) else obs["front_rgb_list"] + ) + wrist_rgb = ( + obs["wrist_rgb_list"][-1] if isinstance(obs["wrist_rgb_list"], list) else obs["wrist_rgb_list"] + ) + joint_state = ( + obs["joint_state_list"][-1] + if isinstance(obs["joint_state_list"], list) + else obs["joint_state_list"] + ) + gripper_state = ( + obs["gripper_state_list"][-1] + if isinstance(obs["gripper_state_list"], list) + else obs["gripper_state_list"] + ) + + front_rgb = np.asarray(front_rgb, dtype=np.uint8) + wrist_rgb = np.asarray(wrist_rgb, dtype=np.uint8) + joint = np.asarray(joint_state, dtype=np.float32).flatten()[:7] + gripper = np.asarray(gripper_state, dtype=np.float32).flatten()[:1] + state = np.concatenate([joint, gripper]) + + return { + "pixels": {"image": front_rgb, "wrist_image": wrist_rgb}, + "agent_pos": state, + } + + def _convert_info(self, info: dict) -> dict: + return { + "status": info.get("status", "ongoing"), + "task_goal": info.get("task_goal", ""), + } + + +def _make_env_fns( + *, + task: str, + n_envs: int, + action_space_type: str, + dataset: str, + episode_length: int, + task_id: int, +) -> list[Callable[[], RoboMMEGymEnv]]: + """Build n_envs factory callables for one RoboMME task id.""" + + def _make_one(episode_index: int) -> RoboMMEGymEnv: + return RoboMMEGymEnv( + task=task, + action_space_type=action_space_type, + dataset=dataset, + episode_idx=episode_index, + max_steps=episode_length, + ) + + return [partial(_make_one, task_id + i) for i in range(n_envs)] + + +def create_robomme_envs( + task: str, + n_envs: int = 1, + action_space_type: str = "joint_angle", + dataset: str = "test", + episode_length: int = 300, + task_ids: list[int] | None = None, + env_cls: Callable[[Sequence[Callable[[], Any]]], Any] | None = None, +) -> dict[str, dict[int, gym.vector.VectorEnv]]: + """Create vectorized RoboMME environments for evaluation. + + `task` may be a single RoboMME task name (e.g. "PickXtimes") or a + comma-separated list (e.g. "PickXtimes,BinFill,StopCube"). Each task + becomes its own suite in the returned mapping. + + Returns {suite_name: {task_id: VectorEnv}} matching lerobot's expected format. + """ + if env_cls is None or not callable(env_cls): + raise ValueError("env_cls must be a callable that wraps a list of env factory callables.") + if not isinstance(n_envs, int) or n_envs <= 0: + raise ValueError(f"n_envs must be a positive int; got {n_envs}.") + + if task_ids is None: + task_ids = [0] + + task_names = [t.strip() for t in task.split(",") if t.strip()] + is_async = env_cls is gym.vector.AsyncVectorEnv + cached_obs_space: spaces.Space | None = None + cached_act_space: spaces.Space | None = None + cached_metadata: dict[str, Any] | None = None + out: dict[str, dict[int, gym.vector.VectorEnv]] = {} + for task_name in task_names: + envs_by_task: dict[int, gym.vector.VectorEnv] = {} + for task_id in task_ids: + fns = _make_env_fns( + task=task_name, + n_envs=n_envs, + action_space_type=action_space_type, + dataset=dataset, + episode_length=episode_length, + task_id=task_id, + ) + if is_async: + lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata) + if cached_obs_space is None: + cached_obs_space = lazy.observation_space + cached_act_space = lazy.action_space + cached_metadata = lazy.metadata + envs_by_task[task_id] = lazy + else: + envs_by_task[task_id] = env_cls(fns) + out[task_name] = envs_by_task + return out diff --git a/tests/test_robomme_env.py b/tests/test_robomme_env.py new file mode 100644 index 000000000..20646430a --- /dev/null +++ b/tests/test_robomme_env.py @@ -0,0 +1,232 @@ +# Copyright 2026 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Unit tests for the RoboMME env wrapper and config. + +RoboMME requires Linux + ManiSkill (Vulkan/SAPIEN), so tests that touch the +env wrapper mock the ``robomme`` package. Tests that only exercise the +dataclass config run without any mocking. +""" + +from __future__ import annotations + +import sys +from types import ModuleType +from unittest.mock import MagicMock + +import numpy as np + + +def _install_robomme_stub(): + """Register a minimal stub for the ``robomme`` package on sys.modules.""" + stub = ModuleType("robomme") + wrapper_stub = ModuleType("robomme.env_record_wrapper") + + class FakeBuilder: + def __init__(self, **kwargs): + pass + + def make_env_for_episode(self, episode_idx: int, max_steps: int): + env = MagicMock() + obs = { + "front_rgb_list": [np.zeros((256, 256, 3), dtype=np.uint8)], + "wrist_rgb_list": [np.zeros((256, 256, 3), dtype=np.uint8)], + "joint_state_list": [np.zeros(7, dtype=np.float32)], + "gripper_state_list": [np.zeros(2, dtype=np.float32)], + } + env.reset.return_value = (obs, {"status": "ongoing", "task_goal": "pick the cube"}) + env.step.return_value = (obs, 0.0, False, False, {"status": "ongoing", "task_goal": ""}) + return env + + wrapper_stub.BenchmarkEnvBuilder = FakeBuilder + stub.env_record_wrapper = wrapper_stub + sys.modules["robomme"] = stub + sys.modules["robomme.env_record_wrapper"] = wrapper_stub + + +def _uninstall_robomme_stub(): + sys.modules.pop("robomme", None) + sys.modules.pop("robomme.env_record_wrapper", None) + + +# --------------------------------------------------------------------------- +# Config tests (no sim required) +# --------------------------------------------------------------------------- + + +def test_robomme_env_config_defaults(): + from lerobot.envs.configs import RoboMMEEnv + + cfg = RoboMMEEnv() + assert cfg.task == "PickXtimes" + assert cfg.fps == 10 + assert cfg.episode_length == 300 + assert cfg.action_space == "joint_angle" + assert cfg.dataset_split == "test" + assert cfg.task_ids is None + + +def test_robomme_env_config_type(): + from lerobot.envs.configs import RoboMMEEnv + + cfg = RoboMMEEnv() + assert cfg.type == "robomme" + + +def test_robomme_features_map(): + from lerobot.envs.configs import RoboMMEEnv + from lerobot.utils.constants import ACTION, OBS_IMAGES, OBS_STATE + + cfg = RoboMMEEnv() + assert cfg.features_map[ACTION] == ACTION + assert cfg.features_map["pixels/image"] == f"{OBS_IMAGES}.image" + assert cfg.features_map["pixels/wrist_image"] == f"{OBS_IMAGES}.wrist_image" + assert cfg.features_map["agent_pos"] == OBS_STATE + + +def test_robomme_features_action_dim_joint_angle(): + from lerobot.envs.configs import RoboMMEEnv + from lerobot.utils.constants import ACTION + + cfg = RoboMMEEnv(action_space="joint_angle") + assert cfg.features[ACTION].shape == (8,) + + +def test_robomme_features_action_dim_ee_pose(): + """`ee_pose` uses a 7-D action; __post_init__ sets the correct shape.""" + from lerobot.envs.configs import RoboMMEEnv + from lerobot.utils.constants import ACTION + + cfg = RoboMMEEnv(action_space="ee_pose") + assert cfg.features[ACTION].shape == (7,) + + +# --------------------------------------------------------------------------- +# Obs conversion (pure Python, no sim) +# --------------------------------------------------------------------------- + + +def test_convert_obs_list_format(): + """_convert_obs takes the last element from list-format obs fields and + emits a nested ``pixels`` dict (image, wrist_image) plus ``agent_pos``. + + The nested layout is required so ``preprocess_observation()`` in + ``envs/utils.py`` maps each camera to ``observation.images.``. + """ + _install_robomme_stub() + try: + from lerobot.envs.robomme import RoboMMEGymEnv + + env = RoboMMEGymEnv.__new__(RoboMMEGymEnv) + + front = np.full((256, 256, 3), 42, dtype=np.uint8) + wrist = np.full((256, 256, 3), 7, dtype=np.uint8) + joints = np.arange(7, dtype=np.float32) + gripper = np.array([0.5, 0.5], dtype=np.float32) + + obs_raw = { + "front_rgb_list": [np.zeros_like(front), front], + "wrist_rgb_list": [np.zeros_like(wrist), wrist], + "joint_state_list": [np.zeros(7, dtype=np.float32), joints], + "gripper_state_list": [np.zeros(2, dtype=np.float32), gripper], + } + + result = env._convert_obs(obs_raw) + np.testing.assert_array_equal(result["pixels"]["image"], front) + np.testing.assert_array_equal(result["pixels"]["wrist_image"], wrist) + assert result["agent_pos"].shape == (8,) + np.testing.assert_array_almost_equal(result["agent_pos"][:7], joints) + assert result["agent_pos"][7] == gripper[0] + finally: + _uninstall_robomme_stub() + + +def test_convert_obs_array_format(): + """_convert_obs also handles non-list (direct array) obs.""" + _install_robomme_stub() + try: + from lerobot.envs.robomme import RoboMMEGymEnv + + env = RoboMMEGymEnv.__new__(RoboMMEGymEnv) + + front = np.zeros((256, 256, 3), dtype=np.uint8) + obs_raw = { + "front_rgb_list": front, + "wrist_rgb_list": front, + "joint_state_list": np.zeros(7, dtype=np.float32), + "gripper_state_list": np.zeros(2, dtype=np.float32), + } + result = env._convert_obs(obs_raw) + assert result["pixels"]["image"].shape == (256, 256, 3) + assert result["pixels"]["wrist_image"].shape == (256, 256, 3) + assert result["agent_pos"].shape == (8,) + finally: + _uninstall_robomme_stub() + + +# --------------------------------------------------------------------------- +# create_robomme_envs (mocked sim) +# --------------------------------------------------------------------------- + + +def test_create_robomme_envs_returns_correct_structure(): + """Single task -> {task_name: {task_id: VectorEnv}} with one entry per task_id.""" + _install_robomme_stub() + try: + from lerobot.envs.robomme import create_robomme_envs + + env_cls = MagicMock(return_value=MagicMock()) + result = create_robomme_envs( + task="PickXtimes", + n_envs=1, + task_ids=[0, 1], + env_cls=env_cls, + ) + + assert "PickXtimes" in result + assert 0 in result["PickXtimes"] + assert 1 in result["PickXtimes"] + assert env_cls.call_count == 2 + finally: + _uninstall_robomme_stub() + + +def test_create_robomme_envs_multi_task(): + """Comma-separated task list produces one suite per task.""" + _install_robomme_stub() + try: + from lerobot.envs.robomme import create_robomme_envs + + env_cls = MagicMock(return_value=MagicMock()) + result = create_robomme_envs( + task="PickXtimes,BinFill,StopCube", + n_envs=1, + env_cls=env_cls, + ) + + assert set(result.keys()) == {"PickXtimes", "BinFill", "StopCube"} + finally: + _uninstall_robomme_stub() + + +def test_create_robomme_envs_raises_on_invalid_env_cls(): + _install_robomme_stub() + try: + import pytest + + from lerobot.envs.robomme import create_robomme_envs + + with pytest.raises(ValueError, match="env_cls must be a callable"): + create_robomme_envs(task="PickXtimes", n_envs=1, env_cls=None) + finally: + _uninstall_robomme_stub()