feat(envs): add LIBERO-plus robustness benchmark (#3313)

* feat(envs): add LIBERO-plus robustness benchmark integration - LiberoPlusEnv config (subclass of LiberoEnv, same gym interface) - Docker image installing LIBERO-plus fork via PYTHONPATH - CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus - pyproject.toml: libero_plus extra * fix(libero): use suite's perturbation-aware init_states loader LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init` file — the on-disk name for a perturbation variant doesn't exist as a file, only the base does. lerobot's loader was bypassing that logic and trying to read the suffix-bearing filename directly, which failed for every non-zero task id and killed the eval before any rollout video could be written. Delegate to the suite's method when it exists; fall back to the path-based loader for vanilla LIBERO (which does not provide the method). Also drop the hf-libero install + init_files copy from the LIBERO-plus Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and `init_files/` for all five suites, so the copy was unnecessary and the `cp -r` into an existing dir produced a confusing nested layout. * fix(libero): resolve LIBERO-plus perturbation init_states path ourselves Delegating to `task_suite.get_task_init_states(i)` works for path resolution but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`, which fails on PyTorch 2.6+ because the pickled init_states contains numpy objects not in the default allowlist: _pickle.UnpicklingError: Weights only load failed. WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global. Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass `weights_only=False` ourselves. Vanilla LIBERO task names don't contain any of these patterns except for `_table_` when followed by the word `center` (e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex requires `_table_\\d+` so semantic uses are preserved. * fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus LIBERO-plus's bddl_base_domain.py resolves scene XMLs with `os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml has no effect on scene lookup — MuJoCo always opens `<clone>/libero/libero/assets/scenes/...`. With no such directory present, every perturbation task fails on: FileNotFoundError: No such file or directory: .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml These textures, views, and extra objects ship only in the 6.4 GB `assets.zip` published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says to download and unzip it into the package dir). Fetch it via `hf_hub_download`, unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at the extracted dir so everything stays consistent. The download lives in its own Docker layer so subsequent rebuilds reuse the cached assets. Drops the lerobot/libero-assets snapshot_download — that mirror only has vanilla LIBERO textures and is ignored for scene loading anyway. * fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip The 6.4 GB zip ships with every entry prefixed by `inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...` (the author's internal filesystem layout, not the layout the LIBERO-plus README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created `${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened `${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml` and hit the same FileNotFoundError. Extract to a scratch dir, then `mv` the nested `assets/` subtree to the expected location. Verified the target file exists in the zip central directory under that exact prefix. * refactor(libero): inline init_states resolver behind single regex Collapse the three-style suffix stripper (split/re.sub/in) into one compiled regex, drop the (Path, bool) tuple return, and move the `_add_`/`_level` reshape branch into the caller so each branch loads its own file and returns directly. Net: -11 lines, one fewer helper. * refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and only layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip. Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init) the nightly already provides. 123 → 73 lines. Also: - Add libero_plus to docs/source/_toctree.yml under Benchmarks so doc-builder's TOC integrity check stops failing. - Repoint the docs dataset link from pepijn223/libero_plus_lerobot to the canonical lerobot/libero_plus. - Revert the stray uv.lock churn (revision/marker diff that crept in from an unrelated resolve — unrelated to LIBERO-plus). * fix(libero-plus): stop touching pyproject + uv.lock The fast-tests job was rejecting the branch because pyproject.toml had a [libero_plus] extra whose git dep wasn't represented in uv.lock. The Docker image no longer needs the extra — it clones LIBERO-plus directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from pyproject and restore pyproject.toml + uv.lock to exactly what's on origin/main, so `uv sync --locked --extra test` is a no-op for this PR. Also repoint the doc/CI/env comments that still mentioned the extra at the Docker install path. * fix(libero-plus): strip perturbation metadata from task descriptions LIBERO-plus builds task.language by space-joining the perturbation-variant filename, so every non-_language_ variant inherits a trailing blob like "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the dashboard video labels and no longer matches the base instruction stored in the training dataset. Strip those tokens in extract_task_descriptions.py with an end-anchored regex over the {view,initstate,noise,add,tb,table,light,level}(+digits) vocabulary. The anchor preserves mid-sentence literal uses of those words (e.g. "from table center and place it on the plate") — only the trailing metadata chain is removed. _language_ variants carry real BDDL-sourced text and are left untouched. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3313 review feedback - docs: fix paper link to arxiv, add benchmark image, add suite descriptions, add LIBERO-plus replacement warning, restructure eval section to match LIBERO doc style, fix policy I/O section, remove false try/except claim - docker: fix shell grouping for hf-libero uninstall, replace hardcoded asset path with dynamic find - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO always takes the simple path * fix(docs): use correct LIBERO-plus teaser image URL * ci(libero-plus): drop redundant hf auth login step The standalone login step ran `hf auth login` in a throwaway `docker run --rm` container, so no credentials persisted. Auth is already performed inside the eval step's container. Removing the redundant step per PR #3313 review feedback. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Without these attributes eval crashes when calling `env.unwrapped.metadata["render_fps"]` with async vector envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and caches the metadata alongside obs/action spaces in the LIBERO and MetaWorld factories. * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step before any of the actual build/eval work could run. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Mirrors the existing pattern in the VLABench job. * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update scripts/ci/extract_task_descriptions.py Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docker/Dockerfile.benchmark.libero_plus Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(libero-plus): address review feedback * ci(libero-plus): fix YAML indentation in upload-artifact steps The `uses:` key on two upload-artifact steps was at column 0 instead of nested under the step, causing `pre-commit run check-yaml` to fail with "expected <block end>, but found '<block mapping start>'". Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
2026-05-11 14:49:43 +00:00 · 2026-04-20 21:07:21 +02:00
parent 282c31cfef
commit a07f22e22c
7 changed files with 466 additions and 11 deletions
@@ -736,3 +736,110 @@ jobs:
          name: robomme-metrics
          path: /tmp/robomme-artifacts/metrics.json
          if-no-files-found: warn
+
+  # ── LIBERO-plus ───────────────────────────────────────────────────────────
+  # Isolated image: LIBERO-plus fork cloned into /home/user_lerobot on top of
+  # huggingface/lerobot-gpu (see docker/Dockerfile.benchmark.libero_plus).
+  libero-plus-integration-test:
+    name: LIBERO-plus — build image + 1-episode eval
+    runs-on:
+      group: aws-g6-4xlarge-plus
+    env:
+      HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
+      LIBERO_PLUS_SUITE: libero_spatial
+      LIBERO_PLUS_POLICY: lerobot/smolvla_libero_plus
+      LIBERO_PLUS_TASK_IDS: "[0,100,260,500,1000,1500,2000,2400]"
+
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+          lfs: true
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
+        with:
+          cache-binary: false
+
+      - name: Login to Docker Hub
+        if: ${{ env.DOCKERHUB_USERNAME != '' }}
+        uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
+        with:
+          username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
+        env:
+          DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
+
+      - name: Build LIBERO-plus benchmark image
+        uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
+        with:
+          context: .
+          file: docker/Dockerfile.benchmark.libero_plus
+          push: false
+          load: true
+          tags: lerobot-benchmark-libero-plus:ci
+          cache-from: type=local,src=/tmp/.buildx-cache-libero-plus
+          cache-to: type=local,dest=/tmp/.buildx-cache-libero-plus,mode=max
+
+      - name: Run LIBERO-plus smoke eval (1 episode)
+        if: env.HF_USER_TOKEN != ''
+        run: |
+          docker run --name libero-plus-eval --gpus all \
+            --shm-size=4g \
+            -e HF_HOME=/tmp/hf \
+            -e HF_USER_TOKEN="${HF_USER_TOKEN}" \
+            -e HF_HUB_DOWNLOAD_TIMEOUT=300 \
+            -e LIBERO_PLUS_SUITE="${LIBERO_PLUS_SUITE}" \
+            -e LIBERO_PLUS_POLICY="${LIBERO_PLUS_POLICY}" \
+            -e LIBERO_PLUS_TASK_IDS="${LIBERO_PLUS_TASK_IDS}" \
+            lerobot-benchmark-libero-plus:ci \
+            bash -c "
+              hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
+              lerobot-eval \
+                --policy.path=\"\$LIBERO_PLUS_POLICY\" \
+                --env.type=libero_plus \
+                --env.task=\"\$LIBERO_PLUS_SUITE\" \
+                --env.task_ids=\"\$LIBERO_PLUS_TASK_IDS\" \
+                --eval.batch_size=1 \
+                --eval.n_episodes=1 \
+                --eval.use_async_envs=false \
+                --policy.device=cuda \
+                '--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
+                --policy.empty_cameras=1 \
+                --output_dir=/tmp/eval-artifacts
+              python scripts/ci/extract_task_descriptions.py \
+                --env libero_plus --task \"\$LIBERO_PLUS_SUITE\" \
+                --output /tmp/eval-artifacts/task_descriptions.json
+            "
+
+      - name: Copy LIBERO-plus artifacts from container
+        if: always()
+        run: |
+          mkdir -p /tmp/libero-plus-artifacts
+          docker cp libero-plus-eval:/tmp/eval-artifacts/. /tmp/libero-plus-artifacts/ 2>/dev/null || true
+          docker rm -f libero-plus-eval || true
+
+      - name: Parse LIBERO-plus eval metrics
+        if: always()
+        run: |
+          python3 scripts/ci/parse_eval_metrics.py \
+            --artifacts-dir /tmp/libero-plus-artifacts \
+            --env libero_plus \
+            --task "${LIBERO_PLUS_SUITE}" \
+            --policy "${LIBERO_PLUS_POLICY}"
+
+      - name: Upload LIBERO-plus rollout video
+        if: always()
+        uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
+        with:
+          name: libero-plus-rollout-video
+          path: /tmp/libero-plus-artifacts/videos/
+          if-no-files-found: warn
+
+      - name: Upload LIBERO-plus eval metrics
+        if: always()
+        uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
+        with:
+          name: libero-plus-metrics
+          path: /tmp/libero-plus-artifacts/metrics.json
+          if-no-files-found: warn
@@ -0,0 +1,84 @@
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Benchmark image for LIBERO-plus integration tests.
+# Extends the nightly GPU image (which has lerobot[all]) with the LIBERO-plus
+# fork source + its 6.4 GB perturbation assets.
+#
+# Build:  docker build -f docker/Dockerfile.benchmark.libero_plus -t lerobot-benchmark-libero-plus .
+# Run:    docker run --gpus all --rm lerobot-benchmark-libero-plus lerobot-eval ...
+
+FROM huggingface/lerobot-gpu:latest
+ENV MUJOCO_GL=egl
+
+# unzip for the 6.4 GB assets.zip; the rest are LIBERO-plus build-time extras
+# (wand / ImageMagick / fontconfig) not in the nightly base.
+USER root
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+         unzip libexpat1 libfontconfig1-dev libmagickwand-dev \
+    && apt-get clean && rm -rf /var/lib/apt/lists/*
+USER user_lerobot
+
+# robosuite==1.4.1 is mandatory (the fork uses `single_arm_env` removed in
+# v1.5+). The rest are LIBERO-plus runtime deps pulled from its setup.py.
+# We install these explicitly instead of via the [libero_plus] extra because
+# the extra's `libero @ git+...` dep installs as a namespace package and then
+# clone and PYTHONPATH-override it below.
+RUN uv pip install --no-cache \
+        "robosuite==1.4.1" \
+        "bddl==1.0.1" \
+        "easydict==1.13" \
+        "mujoco==3.7.0" \
+        "matplotlib==3.10.8" \
+        "Wand==0.6.13" \
+        "scikit-image==0.25.2" \
+        "gym==0.26.2"
+
+# Clone LIBERO-plus and make it importable as `libero`. The nightly base has
+# hf-libero (10 tasks) preinstalled via lerobot[libero]; uninstall it so
+# Python resolves `import libero` to the 2402-task LIBERO-plus module instead.
+# Pinned to the current upstream main SHA so benchmark builds stay reproducible.
+ARG LIBERO_PLUS_SHA=4976dc3
+ENV LIBERO_PLUS_ROOT=/home/user_lerobot/libero-plus/libero/libero
+RUN git clone https://github.com/sylvestf/LIBERO-plus.git /home/user_lerobot/libero-plus \
+    && git -C /home/user_lerobot/libero-plus checkout ${LIBERO_PLUS_SHA} \
+    && cd /home/user_lerobot/libero-plus && uv pip install --no-cache --no-deps -e "." \
+    && (uv pip uninstall hf-libero 2>/dev/null || true)
+ENV PYTHONPATH="/home/user_lerobot/libero-plus:${PYTHONPATH}"
+
+# Perturbation textures/scenes: bddl_base_domain.py resolves XMLs via
+# DIR_PATH/../assets (package-relative, ignoring ~/.libero/config.yaml). All
+# 2402 tasks reference files that ship only in Sylvest/LIBERO-plus's
+# assets.zip (6.4 GB) under a deep author-internal prefix — extract and
+# flatten it under ${LIBERO_PLUS_ROOT}/assets.
+RUN python -c "\
+from huggingface_hub import hf_hub_download; \
+hf_hub_download(repo_id='Sylvest/LIBERO-plus', repo_type='dataset', \
+                filename='assets.zip', local_dir='/tmp/libero-plus-dl')" \
+    && unzip -q /tmp/libero-plus-dl/assets.zip -d /tmp/libero-plus-dl/extract \
+    && ASSETS_DIR=$(find /tmp/libero-plus-dl/extract -type d -name assets | head -1) \
+    && mv "${ASSETS_DIR}" ${LIBERO_PLUS_ROOT}/assets \
+    && rm -rf /tmp/libero-plus-dl
+
+# Point ~/.libero/config.yaml at the clone so LIBERO-plus's imports are
+# non-interactive (it calls input() when the config is missing).
+RUN mkdir -p /home/user_lerobot/.libero \
+    && printf "assets: ${LIBERO_PLUS_ROOT}/assets\nbddl_files: ${LIBERO_PLUS_ROOT}/bddl_files\ndatasets: ${LIBERO_PLUS_ROOT}/../datasets\ninit_states: ${LIBERO_PLUS_ROOT}/init_files\n" \
+       > /home/user_lerobot/.libero/config.yaml
+
+# Overlay the PR's source code on top of the nightly image.
+COPY --chown=user_lerobot:user_lerobot . .
+
+CMD ["/bin/bash"]
@@ -77,6 +77,8 @@
    title: Adding a New Benchmark
  - local: libero
    title: LIBERO
+  - local: libero_plus
+    title: LIBERO-plus
  - local: metaworld
    title: Meta-World
  - local: robotwin
@@ -0,0 +1,188 @@
+# LIBERO-plus
+
+LIBERO-plus is a **robustness benchmark** for Vision-Language-Action (VLA) models built on top of [LIBERO](./libero). It systematically stress-tests policies by applying **seven independent perturbation dimensions** to the original LIBERO task set, exposing failure modes that standard benchmarks miss.
+
+- Paper: [In-depth Robustness Analysis of Vision-Language-Action Models](https://arxiv.org/abs/2510.13626)
+- GitHub: [sylvestf/LIBERO-plus](https://github.com/sylvestf/LIBERO-plus)
+- Dataset: [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
+
+![An overview of the LIBERO-plus benchmark perturbation dimensions](https://github.com/sylvestf/LIBERO-plus/raw/main/static/images/libero-plus.jpg)
+
+## Perturbation dimensions
+
+LIBERO-plus creates ~10 000 task variants by perturbing each original LIBERO task along these axes:
+
+| Dimension             | What changes                                          |
+| --------------------- | ----------------------------------------------------- |
+| Objects layout        | Target position, presence of confounding objects      |
+| Camera viewpoints     | Camera position, orientation, field-of-view           |
+| Robot initial states  | Manipulator start pose                                |
+| Language instructions | LLM-rewritten task description (paraphrase / synonym) |
+| Light conditions      | Intensity, direction, color, shadow                   |
+| Background textures   | Scene surface and object appearance                   |
+| Sensor noise          | Photometric distortions and image degradation         |
+
+## Available task suites
+
+LIBERO-plus covers the same five suites as LIBERO:
+
+| Suite          | CLI name         | Tasks | Max steps | Description                                        |
+| -------------- | ---------------- | ----- | --------- | -------------------------------------------------- |
+| LIBERO-Spatial | `libero_spatial` | 10    | 280       | Tasks requiring reasoning about spatial relations  |
+| LIBERO-Object  | `libero_object`  | 10    | 280       | Tasks centered on manipulating different objects   |
+| LIBERO-Goal    | `libero_goal`    | 10    | 300       | Goal-conditioned tasks with changing targets       |
+| LIBERO-90      | `libero_90`      | 90    | 400       | Short-horizon tasks from the LIBERO-100 collection |
+| LIBERO-Long    | `libero_10`      | 10    | 520       | Long-horizon tasks from the LIBERO-100 collection  |
+
+<Tip warning={true}>
+  Installing LIBERO-plus **replaces** vanilla LIBERO — it uninstalls `hf-libero`
+  so that `import libero` resolves to the LIBERO-plus fork. You cannot have both
+  installed at the same time. To switch back to vanilla LIBERO, uninstall the
+  fork and reinstall with `pip install -e ".[libero]"`.
+</Tip>
+
+## Installation
+
+### System dependencies (Linux only)
+
+```bash
+sudo apt install libexpat1 libfontconfig1-dev libmagickwand-dev
+```
+
+### Python package
+
+```bash
+pip install -e ".[libero]" "robosuite==1.4.1" bddl easydict mujoco wand scikit-image gym
+git clone https://github.com/sylvestf/LIBERO-plus.git
+cd LIBERO-plus && pip install --no-deps -e .
+pip uninstall -y hf-libero  # so `import libero` resolves to the fork
+```
+
+LIBERO-plus is installed from its GitHub fork rather than a pyproject extra — the fork ships as a namespace package that pip can't handle, so it must be cloned and added to `PYTHONPATH`. See `docker/Dockerfile.benchmark.libero_plus` for the canonical install. MuJoCo is required, so only Linux is supported.
+
+<Tip>
+Set the MuJoCo rendering backend before running evaluation:
+
+```bash
+export MUJOCO_GL=egl   # headless / HPC / cloud
+```
+
+</Tip>
+
+### Download LIBERO-plus assets
+
+LIBERO-plus ships its extended asset pack separately. Download `assets.zip` from the [Hugging Face dataset](https://huggingface.co/datasets/Sylvest/LIBERO-plus/tree/main) and extract it into the LIBERO-plus package directory:
+
+```bash
+# After installing the package, find where it was installed:
+python -c "import libero; print(libero.__file__)"
+# Then extract assets.zip into <package_root>/libero/assets/
+```
+
+## Evaluation
+
+### Default evaluation (recommended)
+
+Evaluate across the four standard suites (10 episodes per task):
+
+```bash
+lerobot-eval \
+  --policy.path="your-policy-id" \
+  --env.type=libero_plus \
+  --env.task=libero_spatial,libero_object,libero_goal,libero_10 \
+  --eval.batch_size=1 \
+  --eval.n_episodes=10 \
+  --env.max_parallel_tasks=1
+```
+
+### Single-suite evaluation
+
+Evaluate on one LIBERO-plus suite:
+
+```bash
+lerobot-eval \
+  --policy.path="your-policy-id" \
+  --env.type=libero_plus \
+  --env.task=libero_spatial \
+  --eval.batch_size=1 \
+  --eval.n_episodes=10
+```
+
+- `--env.task` picks the suite (`libero_spatial`, `libero_object`, etc.).
+- `--env.task_ids` restricts to specific task indices (`[0]`, `[1,2,3]`, etc.). Omit to run all tasks in the suite.
+- `--eval.batch_size` controls how many environments run in parallel.
+- `--eval.n_episodes` sets how many episodes to run per task.
+
+### Multi-suite evaluation
+
+Benchmark a policy across multiple suites at once by passing a comma-separated list:
+
+```bash
+lerobot-eval \
+  --policy.path="your-policy-id" \
+  --env.type=libero_plus \
+  --env.task=libero_spatial,libero_object \
+  --eval.batch_size=1 \
+  --eval.n_episodes=10
+```
+
+### Control mode
+
+LIBERO-plus supports two control modes — `relative` (default) and `absolute`. Different VLA checkpoints are trained with different action parameterizations, so make sure the mode matches your policy:
+
+```bash
+--env.control_mode=relative   # or "absolute"
+```
+
+### Policy inputs and outputs
+
+**Observations:**
+
+- `observation.state` — 8-dim proprioceptive features (eef position, axis-angle orientation, gripper qpos)
+- `observation.images.image` — main camera view (`agentview_image`), HWC uint8
+- `observation.images.image2` — wrist camera view (`robot0_eye_in_hand_image`), HWC uint8
+
+**Actions:**
+
+- Continuous control in `Box(-1, 1, shape=(7,))` — 6D end-effector delta + 1D gripper
+
+### Recommended evaluation episodes
+
+For reproducible benchmarking, use **10 episodes per task** across all four standard suites (Spatial, Object, Goal, Long). This gives 400 total episodes and matches the protocol used for published results.
+
+## Training
+
+### Dataset
+
+A LeRobot-format training dataset for LIBERO-plus is available at:
+
+- [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
+
+### Example training command
+
+```bash
+lerobot-train \
+    --policy.type=smolvla \
+    --policy.repo_id=${HF_USER}/smolvla_libero_plus \
+    --policy.load_vlm_weights=true \
+    --dataset.repo_id=lerobot/libero_plus \
+    --env.type=libero_plus \
+    --env.task=libero_spatial \
+    --output_dir=./outputs/ \
+    --steps=100000 \
+    --batch_size=4 \
+    --eval.batch_size=1 \
+    --eval.n_episodes=1 \
+    --eval_freq=1000
+```
+
+## Relationship to LIBERO
+
+LIBERO-plus is a drop-in extension of LIBERO:
+
+- Same Python gym interface (`LiberoEnv`, `LiberoProcessorStep`)
+- Same camera names and observation/action format
+- Same task suite names
+- Installs under the same `libero` Python package name (different GitHub repo)
+
+To use the original LIBERO benchmark, see [LIBERO](./libero) and use `--env.type=libero`.
@@ -31,9 +31,23 @@ from __future__ import annotations

 import argparse
 import json
+import re
 import sys
 from pathlib import Path

+# LIBERO-plus derives task.language by space-joining the perturbation-variant
+# filename (grab_language_from_filename in libero/libero/benchmark/__init__.py),
+# so non-_language_ variants inherit a trailing metadata blob like
+# "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". Strip those tokens so
+# the description matches the base instruction used in the training dataset.
+_LIBERO_PERTURBATION_TAIL_RE = re.compile(
+    r"(?:\s(?:view|initstate|noise|add|tb|table|light|level)(?:\s\d+)+)+$"
+)
+
+
+def _strip_libero_perturbation_tail(instruction: str) -> str:
+    return _LIBERO_PERTURBATION_TAIL_RE.sub("", instruction).strip()
+

 def _libero_descriptions(task_suite: str) -> dict[str, str]:
    from libero.libero import benchmark  # type: ignore[import-untyped]
@@ -47,7 +61,10 @@ def _libero_descriptions(task_suite: str) -> dict[str, str]:
        )
        return {}
    suite = suite_dict[task_suite]()
-    return {f"{task_suite}_{i}": suite.get_task(i).language for i in range(suite.n_tasks)}
+    return {
+        f"{task_suite}_{i}": _strip_libero_perturbation_tail(suite.get_task(i).language)
+        for i in range(suite.n_tasks)
+    }


 def _metaworld_descriptions(task_name: str) -> dict[str, str]:
@@ -144,7 +161,7 @@ def main() -> int:

    descriptions: dict[str, str] = {}
    try:
-        if args.env == "libero":
+        if args.env == ("libero", "libero_plus"):
            descriptions = _libero_descriptions(args.task)
        elif args.env == "metaworld":
            descriptions = _metaworld_descriptions(args.task)
@@ -331,6 +331,7 @@ class LiberoEnv(EnvConfig):
    camera_name_mapping: dict[str, str] | None = None
    observation_height: int = 360
    observation_width: int = 360
+    is_libero_plus: bool = False
    features: dict[str, PolicyFeature] = field(
        default_factory=lambda: {
            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
@@ -432,6 +433,7 @@ class LiberoEnv(EnvConfig):
            control_mode=self.control_mode,
            episode_length=self.episode_length,
            camera_name_mapping=self.camera_name_mapping,
+            is_libero_plus=self.is_libero_plus,
        )

    def get_env_processors(self):
@@ -651,6 +653,30 @@ class IsaaclabArenaEnv(HubEnvConfig):
        )


+@EnvConfig.register_subclass("libero_plus")
+@dataclass
+class LiberoPlusEnv(LiberoEnv):
+    """Config for LIBERO-plus robustness benchmark evaluation.
+
+    LIBERO-plus extends LIBERO with 7 perturbation dimensions (camera viewpoints,
+    object layouts, robot initial states, language instructions, lighting, background
+    textures, sensor noise) producing ~10k task variants.
+
+    The gym interface is identical to LIBERO so this class reuses ``LiberoEnv``
+    entirely — only the registered name and default task suite differ.
+
+    Install: see docker/Dockerfile.benchmark.libero_plus — LIBERO-plus ships
+    as a namespace package from a git fork and must be cloned + PYTHONPATH'd
+    rather than installed as a pyproject extra.
+
+    See Also:
+        https://github.com/sylvestf/LIBERO-plus
+    """
+
+    task: str = "libero_spatial"
+    is_libero_plus: bool = True
+
+
@EnvConfig.register_subclass("robotwin")
@dataclass
 class RoboTwinEnvConfig(EnvConfig):
@@ -16,6 +16,7 @@
 from __future__ import annotations

 import os
+import re
 from collections import defaultdict
 from collections.abc import Callable, Iterable, Mapping, Sequence
 from functools import partial
@@ -56,14 +57,34 @@ def _select_task_ids(total_tasks: int, task_ids: Iterable[int] | None) -> list[i
    return ids


-def get_task_init_states(task_suite: Any, i: int) -> np.ndarray:
-    init_states_path = (
-        Path(get_libero_path("init_states"))
-        / task_suite.tasks[i].problem_folder
-        / task_suite.tasks[i].init_states_file
-    )
-    init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
-    return init_states
+# LIBERO-plus perturbation variants encode the perturbation in the filename
+# but on disk only the base `.pruned_init` exists — strip the suffix to match
+# LIBERO-plus's own suite.get_task_init_states() (we reimplement it here so we
+# can pass weights_only=False for PyTorch 2.6+ numpy pickles).
+_LIBERO_PERTURBATION_SUFFIX_RE = re.compile(r"_(?:language|view|light)_[^.]*|_(?:table|tb)_\d+")
+
+
+def get_task_init_states(task_suite: Any, i: int, is_libero_plus: bool = False) -> np.ndarray:
+    task = task_suite.tasks[i]
+    filename = Path(task.init_states_file)
+    root = Path(get_libero_path("init_states"))
+
+    if not is_libero_plus:
+        init_states_path = root / task.problem_folder / filename.name
+        return torch.load(init_states_path, weights_only=False)  # nosec B614
+
+    # LIBERO-plus: `_add_` / `_level` variants store extra-object layouts under
+    # libero_newobj/ as a flat array that must be reshaped to (1, -1).
+    if "_add_" in filename.name or "_level" in filename.name:
+        init_states_path = root / "libero_newobj" / task.problem_folder / filename.name
+        init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
+        return init_states.reshape(1, -1)
+
+    # LIBERO-plus perturbation variants encode the perturbation in the filename
+    # but on disk only the base `.pruned_init` exists — strip the suffix to match.
+    stripped = _LIBERO_PERTURBATION_SUFFIX_RE.sub("", filename.stem) + filename.suffix
+    init_states_path = root / task.problem_folder / stripped
+    return torch.load(init_states_path, weights_only=False)  # nosec B614


 def get_libero_dummy_action():
@@ -105,9 +126,11 @@ class LiberoEnv(gym.Env):
        camera_name_mapping: dict[str, str] | None = None,
        num_steps_wait: int = 10,
        control_mode: str = "relative",
+        is_libero_plus: bool = False,
    ):
        super().__init__()
        self.task_id = task_id
+        self.is_libero_plus = is_libero_plus
        self.obs_type = obs_type
        self.render_mode = render_mode
        self.observation_width = observation_width
@@ -134,7 +157,11 @@ class LiberoEnv(gym.Env):
        self.episode_index = episode_index
        self.episode_length = episode_length
        # Load once and keep
-        self._init_states = get_task_init_states(task_suite, self.task_id) if self.init_states else None
+        self._init_states = (
+            get_task_init_states(task_suite, self.task_id, is_libero_plus=self.is_libero_plus)
+            if self.init_states
+            else None
+        )
        self._reset_stride = n_envs  # when performing a reset, append `_reset_stride` to `init_state_id`.

        self.init_state_id = self.episode_index  # tie each sub-env to a fixed init state
@@ -367,6 +394,7 @@ def _make_env_fns(
    gym_kwargs: Mapping[str, Any],
    control_mode: str,
    camera_name_mapping: dict[str, str] | None = None,
+    is_libero_plus: bool = False,
 ) -> list[Callable[[], LiberoEnv]]:
    """Build n_envs factory callables for a single (suite, task_id)."""

@@ -383,6 +411,7 @@ def _make_env_fns(
            n_envs=n_envs,
            control_mode=control_mode,
            camera_name_mapping=camera_name_mapping,
+            is_libero_plus=is_libero_plus,
            **local_kwargs,
        )

@@ -405,6 +434,7 @@ def create_libero_envs(
    control_mode: str = "relative",
    episode_length: int | None = None,
    camera_name_mapping: dict[str, str] | None = None,
+    is_libero_plus: bool = False,
 ) -> dict[str, dict[int, Any]]:
    """
    Create vectorized LIBERO environments with a consistent return shape.
@@ -463,6 +493,7 @@ def create_libero_envs(
                gym_kwargs=gym_kwargs,
                control_mode=control_mode,
                camera_name_mapping=camera_name_mapping,
+                is_libero_plus=is_libero_plus,
            )
            if is_async:
                lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)