feat(envs): add LIBERO-plus robustness benchmark (#3313)

* feat(envs): add LIBERO-plus robustness benchmark integration - LiberoPlusEnv config (subclass of LiberoEnv, same gym interface) - Docker image installing LIBERO-plus fork via PYTHONPATH - CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus - pyproject.toml: libero_plus extra * fix(libero): use suite's perturbation-aware init_states loader LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init` file — the on-disk name for a perturbation variant doesn't exist as a file, only the base does. lerobot's loader was bypassing that logic and trying to read the suffix-bearing filename directly, which failed for every non-zero task id and killed the eval before any rollout video could be written. Delegate to the suite's method when it exists; fall back to the path-based loader for vanilla LIBERO (which does not provide the method). Also drop the hf-libero install + init_files copy from the LIBERO-plus Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and `init_files/` for all five suites, so the copy was unnecessary and the `cp -r` into an existing dir produced a confusing nested layout. * fix(libero): resolve LIBERO-plus perturbation init_states path ourselves Delegating to `task_suite.get_task_init_states(i)` works for path resolution but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`, which fails on PyTorch 2.6+ because the pickled init_states contains numpy objects not in the default allowlist: _pickle.UnpicklingError: Weights only load failed. WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global. Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass `weights_only=False` ourselves. Vanilla LIBERO task names don't contain any of these patterns except for `_table_` when followed by the word `center` (e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex requires `_table_\\d+` so semantic uses are preserved. * fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus LIBERO-plus's bddl_base_domain.py resolves scene XMLs with `os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml has no effect on scene lookup — MuJoCo always opens `<clone>/libero/libero/assets/scenes/...`. With no such directory present, every perturbation task fails on: FileNotFoundError: No such file or directory: .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml These textures, views, and extra objects ship only in the 6.4 GB `assets.zip` published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says to download and unzip it into the package dir). Fetch it via `hf_hub_download`, unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at the extracted dir so everything stays consistent. The download lives in its own Docker layer so subsequent rebuilds reuse the cached assets. Drops the lerobot/libero-assets snapshot_download — that mirror only has vanilla LIBERO textures and is ignored for scene loading anyway. * fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip The 6.4 GB zip ships with every entry prefixed by `inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...` (the author's internal filesystem layout, not the layout the LIBERO-plus README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created `${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened `${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml` and hit the same FileNotFoundError. Extract to a scratch dir, then `mv` the nested `assets/` subtree to the expected location. Verified the target file exists in the zip central directory under that exact prefix. * refactor(libero): inline init_states resolver behind single regex Collapse the three-style suffix stripper (split/re.sub/in) into one compiled regex, drop the (Path, bool) tuple return, and move the `_add_`/`_level` reshape branch into the caller so each branch loads its own file and returns directly. Net: -11 lines, one fewer helper. * refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and only layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip. Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init) the nightly already provides. 123 → 73 lines. Also: - Add libero_plus to docs/source/_toctree.yml under Benchmarks so doc-builder's TOC integrity check stops failing. - Repoint the docs dataset link from pepijn223/libero_plus_lerobot to the canonical lerobot/libero_plus. - Revert the stray uv.lock churn (revision/marker diff that crept in from an unrelated resolve — unrelated to LIBERO-plus). * fix(libero-plus): stop touching pyproject + uv.lock The fast-tests job was rejecting the branch because pyproject.toml had a [libero_plus] extra whose git dep wasn't represented in uv.lock. The Docker image no longer needs the extra — it clones LIBERO-plus directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from pyproject and restore pyproject.toml + uv.lock to exactly what's on origin/main, so `uv sync --locked --extra test` is a no-op for this PR. Also repoint the doc/CI/env comments that still mentioned the extra at the Docker install path. * fix(libero-plus): strip perturbation metadata from task descriptions LIBERO-plus builds task.language by space-joining the perturbation-variant filename, so every non-_language_ variant inherits a trailing blob like "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the dashboard video labels and no longer matches the base instruction stored in the training dataset. Strip those tokens in extract_task_descriptions.py with an end-anchored regex over the {view,initstate,noise,add,tb,table,light,level}(+digits) vocabulary. The anchor preserves mid-sentence literal uses of those words (e.g. "from table center and place it on the plate") — only the trailing metadata chain is removed. _language_ variants carry real BDDL-sourced text and are left untouched. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3313 review feedback - docs: fix paper link to arxiv, add benchmark image, add suite descriptions, add LIBERO-plus replacement warning, restructure eval section to match LIBERO doc style, fix policy I/O section, remove false try/except claim - docker: fix shell grouping for hf-libero uninstall, replace hardcoded asset path with dynamic find - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO always takes the simple path * fix(docs): use correct LIBERO-plus teaser image URL * ci(libero-plus): drop redundant hf auth login step The standalone login step ran `hf auth login` in a throwaway `docker run --rm` container, so no credentials persisted. Auth is already performed inside the eval step's container. Removing the redundant step per PR #3313 review feedback. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Without these attributes eval crashes when calling `env.unwrapped.metadata["render_fps"]` with async vector envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and caches the metadata alongside obs/action spaces in the LIBERO and MetaWorld factories. * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step before any of the actual build/eval work could run. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Mirrors the existing pattern in the VLABench job. * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update scripts/ci/extract_task_descriptions.py Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docker/Dockerfile.benchmark.libero_plus Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(libero-plus): address review feedback * ci(libero-plus): fix YAML indentation in upload-artifact steps The `uses:` key on two upload-artifact steps was at column 0 instead of nested under the step, causing `pre-commit run check-yaml` to fail with "expected <block end>, but found '<block mapping start>'". Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
2026-07-17 06:51:48 +00:00 · 2026-04-20 21:07:21 +02:00
parent 282c31cfef
commit a07f22e22c
7 changed files with 466 additions and 11 deletions
@@ -736,3 +736,110 @@ jobs:
          name: robomme-metrics
          path: /tmp/robomme-artifacts/metrics.json
          if-no-files-found: warn
  # ── LIBERO-plus ───────────────────────────────────────────────────────────
  # Isolated image: LIBERO-plus fork cloned into /home/user_lerobot on top of
  # huggingface/lerobot-gpu (see docker/Dockerfile.benchmark.libero_plus).
  libero-plus-integration-test:
    name: LIBERO-plus — build image + 1-episode eval
    runs-on:
      group: aws-g6-4xlarge-plus
    env:
      HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
      LIBERO_PLUS_SUITE: libero_spatial
      LIBERO_PLUS_POLICY: lerobot/smolvla_libero_plus
      LIBERO_PLUS_TASK_IDS: "[0,100,260,500,1000,1500,2000,2400]"
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
          lfs: true
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
        with:
          cache-binary: false
      - name: Login to Docker Hub
        if: ${{ env.DOCKERHUB_USERNAME != '' }}
        uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
        with:
          username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
          password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
        env:
          DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
      - name: Build LIBERO-plus benchmark image
        uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
        with:
          context: .
          file: docker/Dockerfile.benchmark.libero_plus
          push: false
          load: true
          tags: lerobot-benchmark-libero-plus:ci
          cache-from: type=local,src=/tmp/.buildx-cache-libero-plus
          cache-to: type=local,dest=/tmp/.buildx-cache-libero-plus,mode=max
      - name: Run LIBERO-plus smoke eval (1 episode)
        if: env.HF_USER_TOKEN != ''
        run: |
          docker run --name libero-plus-eval --gpus all \
            --shm-size=4g \
            -e HF_HOME=/tmp/hf \
            -e HF_USER_TOKEN="${HF_USER_TOKEN}" \
            -e HF_HUB_DOWNLOAD_TIMEOUT=300 \
            -e LIBERO_PLUS_SUITE="${LIBERO_PLUS_SUITE}" \
            -e LIBERO_PLUS_POLICY="${LIBERO_PLUS_POLICY}" \
            -e LIBERO_PLUS_TASK_IDS="${LIBERO_PLUS_TASK_IDS}" \
            lerobot-benchmark-libero-plus:ci \
            bash -c "
              hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
              lerobot-eval \
                --policy.path=\"\$LIBERO_PLUS_POLICY\" \
                --env.type=libero_plus \
                --env.task=\"\$LIBERO_PLUS_SUITE\" \
                --env.task_ids=\"\$LIBERO_PLUS_TASK_IDS\" \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
                --policy.device=cuda \
                '--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
                --policy.empty_cameras=1 \
                --output_dir=/tmp/eval-artifacts
              python scripts/ci/extract_task_descriptions.py \
                --env libero_plus --task \"\$LIBERO_PLUS_SUITE\" \
                --output /tmp/eval-artifacts/task_descriptions.json
            "
      - name: Copy LIBERO-plus artifacts from container
        if: always()
        run: |
          mkdir -p /tmp/libero-plus-artifacts
          docker cp libero-plus-eval:/tmp/eval-artifacts/. /tmp/libero-plus-artifacts/ 2>/dev/null || true
          docker rm -f libero-plus-eval || true
      - name: Parse LIBERO-plus eval metrics
        if: always()
        run: |
          python3 scripts/ci/parse_eval_metrics.py \
            --artifacts-dir /tmp/libero-plus-artifacts \
            --env libero_plus \
            --task "${LIBERO_PLUS_SUITE}" \
            --policy "${LIBERO_PLUS_POLICY}"
      - name: Upload LIBERO-plus rollout video
        if: always()
        uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
        with:
          name: libero-plus-rollout-video
          path: /tmp/libero-plus-artifacts/videos/
          if-no-files-found: warn
      - name: Upload LIBERO-plus eval metrics
        if: always()
        uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
        with:
          name: libero-plus-metrics
          path: /tmp/libero-plus-artifacts/metrics.json
          if-no-files-found: warn
@@ -0,0 +1,84 @@
 # Copyright 2026 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # Benchmark image for LIBERO-plus integration tests.
 # Extends the nightly GPU image (which has lerobot[all]) with the LIBERO-plus
 # fork source + its 6.4 GB perturbation assets.
 #
 # Build:  docker build -f docker/Dockerfile.benchmark.libero_plus -t lerobot-benchmark-libero-plus .
 # Run:    docker run --gpus all --rm lerobot-benchmark-libero-plus lerobot-eval ...
 FROM huggingface/lerobot-gpu:latest
 ENV MUJOCO_GL=egl
 # unzip for the 6.4 GB assets.zip; the rest are LIBERO-plus build-time extras
 # (wand / ImageMagick / fontconfig) not in the nightly base.
 USER root
 RUN apt-get update \
    && apt-get install -y --no-install-recommends \
         unzip libexpat1 libfontconfig1-dev libmagickwand-dev \
    && apt-get clean && rm -rf /var/lib/apt/lists/*
 USER user_lerobot
 # robosuite==1.4.1 is mandatory (the fork uses `single_arm_env` removed in
 # v1.5+). The rest are LIBERO-plus runtime deps pulled from its setup.py.
 # We install these explicitly instead of via the [libero_plus] extra because
 # the extra's `libero @ git+...` dep installs as a namespace package and then
 # clone and PYTHONPATH-override it below.
 RUN uv pip install --no-cache \
        "robosuite==1.4.1" \
        "bddl==1.0.1" \
        "easydict==1.13" \
        "mujoco==3.7.0" \
        "matplotlib==3.10.8" \
        "Wand==0.6.13" \
        "scikit-image==0.25.2" \
        "gym==0.26.2"
 # Clone LIBERO-plus and make it importable as `libero`. The nightly base has
 # hf-libero (10 tasks) preinstalled via lerobot[libero]; uninstall it so
 # Python resolves `import libero` to the 2402-task LIBERO-plus module instead.
 # Pinned to the current upstream main SHA so benchmark builds stay reproducible.
 ARG LIBERO_PLUS_SHA=4976dc3
 ENV LIBERO_PLUS_ROOT=/home/user_lerobot/libero-plus/libero/libero
 RUN git clone https://github.com/sylvestf/LIBERO-plus.git /home/user_lerobot/libero-plus \
    && git -C /home/user_lerobot/libero-plus checkout ${LIBERO_PLUS_SHA} \
    && cd /home/user_lerobot/libero-plus && uv pip install --no-cache --no-deps -e "." \
    && (uv pip uninstall hf-libero 2>/dev/null || true)
 ENV PYTHONPATH="/home/user_lerobot/libero-plus:${PYTHONPATH}"
 # Perturbation textures/scenes: bddl_base_domain.py resolves XMLs via
 # DIR_PATH/../assets (package-relative, ignoring ~/.libero/config.yaml). All
 # 2402 tasks reference files that ship only in Sylvest/LIBERO-plus's
 # assets.zip (6.4 GB) under a deep author-internal prefix — extract and
 # flatten it under ${LIBERO_PLUS_ROOT}/assets.
 RUN python -c "\
 from huggingface_hub import hf_hub_download; \
 hf_hub_download(repo_id='Sylvest/LIBERO-plus', repo_type='dataset', \
                filename='assets.zip', local_dir='/tmp/libero-plus-dl')" \
    && unzip -q /tmp/libero-plus-dl/assets.zip -d /tmp/libero-plus-dl/extract \
    && ASSETS_DIR=$(find /tmp/libero-plus-dl/extract -type d -name assets | head -1) \
    && mv "${ASSETS_DIR}" ${LIBERO_PLUS_ROOT}/assets \
    && rm -rf /tmp/libero-plus-dl
 # Point ~/.libero/config.yaml at the clone so LIBERO-plus's imports are
 # non-interactive (it calls input() when the config is missing).
 RUN mkdir -p /home/user_lerobot/.libero \
    && printf "assets: ${LIBERO_PLUS_ROOT}/assets\nbddl_files: ${LIBERO_PLUS_ROOT}/bddl_files\ndatasets: ${LIBERO_PLUS_ROOT}/../datasets\ninit_states: ${LIBERO_PLUS_ROOT}/init_files\n" \
       > /home/user_lerobot/.libero/config.yaml
 # Overlay the PR's source code on top of the nightly image.
 COPY --chown=user_lerobot:user_lerobot . .
 CMD ["/bin/bash"]
@@ -77,6 +77,8 @@
    title: Adding a New Benchmark
  - local: libero
    title: LIBERO
  - local: libero_plus
    title: LIBERO-plus
  - local: metaworld
    title: Meta-World
  - local: robotwin
@@ -0,0 +1,188 @@
 # LIBERO-plus
 LIBERO-plus is a **robustness benchmark** for Vision-Language-Action (VLA) models built on top of [LIBERO](./libero). It systematically stress-tests policies by applying **seven independent perturbation dimensions** to the original LIBERO task set, exposing failure modes that standard benchmarks miss.
 - Paper: [In-depth Robustness Analysis of Vision-Language-Action Models](https://arxiv.org/abs/2510.13626)
 - GitHub: [sylvestf/LIBERO-plus](https://github.com/sylvestf/LIBERO-plus)
 - Dataset: [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
 ![An overview of the LIBERO-plus benchmark perturbation dimensions](https://github.com/sylvestf/LIBERO-plus/raw/main/static/images/libero-plus.jpg)
 ## Perturbation dimensions
 LIBERO-plus creates ~10 000 task variants by perturbing each original LIBERO task along these axes:
 | Dimension             | What changes                                          |
 | --------------------- | ----------------------------------------------------- |
 | Objects layout        | Target position, presence of confounding objects      |
 | Camera viewpoints     | Camera position, orientation, field-of-view           |
 | Robot initial states  | Manipulator start pose                                |
 | Language instructions | LLM-rewritten task description (paraphrase / synonym) |
 | Light conditions      | Intensity, direction, color, shadow                   |
 | Background textures   | Scene surface and object appearance                   |
 | Sensor noise          | Photometric distortions and image degradation         |
 ## Available task suites
 LIBERO-plus covers the same five suites as LIBERO:
 | Suite          | CLI name         | Tasks | Max steps | Description                                        |
 | -------------- | ---------------- | ----- | --------- | -------------------------------------------------- |
 | LIBERO-Spatial | `libero_spatial` | 10    | 280       | Tasks requiring reasoning about spatial relations  |
 | LIBERO-Object  | `libero_object`  | 10    | 280       | Tasks centered on manipulating different objects   |
 | LIBERO-Goal    | `libero_goal`    | 10    | 300       | Goal-conditioned tasks with changing targets       |
 | LIBERO-90      | `libero_90`      | 90    | 400       | Short-horizon tasks from the LIBERO-100 collection |
 | LIBERO-Long    | `libero_10`      | 10    | 520       | Long-horizon tasks from the LIBERO-100 collection  |
 <Tip warning={true}>
  Installing LIBERO-plus **replaces** vanilla LIBERO — it uninstalls `hf-libero`
  so that `import libero` resolves to the LIBERO-plus fork. You cannot have both
  installed at the same time. To switch back to vanilla LIBERO, uninstall the
  fork and reinstall with `pip install -e ".[libero]"`.
 </Tip>
 ## Installation
 ### System dependencies (Linux only)
 ```bash
 sudo apt install libexpat1 libfontconfig1-dev libmagickwand-dev
 ```
 ### Python package
 ```bash
 pip install -e ".[libero]" "robosuite==1.4.1" bddl easydict mujoco wand scikit-image gym
 git clone https://github.com/sylvestf/LIBERO-plus.git
 cd LIBERO-plus && pip install --no-deps -e .
 pip uninstall -y hf-libero  # so `import libero` resolves to the fork
 ```
 LIBERO-plus is installed from its GitHub fork rather than a pyproject extra — the fork ships as a namespace package that pip can't handle, so it must be cloned and added to `PYTHONPATH`. See `docker/Dockerfile.benchmark.libero_plus` for the canonical install. MuJoCo is required, so only Linux is supported.
 <Tip>
 Set the MuJoCo rendering backend before running evaluation:
 ```bash
 export MUJOCO_GL=egl   # headless / HPC / cloud
 ```
 </Tip>
 ### Download LIBERO-plus assets
 LIBERO-plus ships its extended asset pack separately. Download `assets.zip` from the [Hugging Face dataset](https://huggingface.co/datasets/Sylvest/LIBERO-plus/tree/main) and extract it into the LIBERO-plus package directory:
 ```bash
 # After installing the package, find where it was installed:
 python -c "import libero; print(libero.__file__)"
 # Then extract assets.zip into <package_root>/libero/assets/
 ```
 ## Evaluation
 ### Default evaluation (recommended)
 Evaluate across the four standard suites (10 episodes per task):
 ```bash
 lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=libero_plus \
  --env.task=libero_spatial,libero_object,libero_goal,libero_10 \
  --eval.batch_size=1 \
  --eval.n_episodes=10 \
  --env.max_parallel_tasks=1
 ```
 ### Single-suite evaluation
 Evaluate on one LIBERO-plus suite:
 ```bash
 lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=libero_plus \
  --env.task=libero_spatial \
  --eval.batch_size=1 \
  --eval.n_episodes=10
 ```
 - `--env.task` picks the suite (`libero_spatial`, `libero_object`, etc.).
 - `--env.task_ids` restricts to specific task indices (`[0]`, `[1,2,3]`, etc.). Omit to run all tasks in the suite.
 - `--eval.batch_size` controls how many environments run in parallel.
 - `--eval.n_episodes` sets how many episodes to run per task.
 ### Multi-suite evaluation
 Benchmark a policy across multiple suites at once by passing a comma-separated list:
 ```bash
 lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=libero_plus \
  --env.task=libero_spatial,libero_object \
  --eval.batch_size=1 \
  --eval.n_episodes=10
 ```
 ### Control mode
 LIBERO-plus supports two control modes — `relative` (default) and `absolute`. Different VLA checkpoints are trained with different action parameterizations, so make sure the mode matches your policy:
 ```bash
 --env.control_mode=relative   # or "absolute"
 ```
 ### Policy inputs and outputs
 **Observations:**
 - `observation.state` — 8-dim proprioceptive features (eef position, axis-angle orientation, gripper qpos)
 - `observation.images.image` — main camera view (`agentview_image`), HWC uint8
 - `observation.images.image2` — wrist camera view (`robot0_eye_in_hand_image`), HWC uint8
 **Actions:**
 - Continuous control in `Box(-1, 1, shape=(7,))` — 6D end-effector delta + 1D gripper
 ### Recommended evaluation episodes
 For reproducible benchmarking, use **10 episodes per task** across all four standard suites (Spatial, Object, Goal, Long). This gives 400 total episodes and matches the protocol used for published results.
 ## Training
 ### Dataset
 A LeRobot-format training dataset for LIBERO-plus is available at:
 - [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
 ### Example training command
 ```bash
 lerobot-train \
    --policy.type=smolvla \
    --policy.repo_id=${HF_USER}/smolvla_libero_plus \
    --policy.load_vlm_weights=true \
    --dataset.repo_id=lerobot/libero_plus \
    --env.type=libero_plus \
    --env.task=libero_spatial \
    --output_dir=./outputs/ \
    --steps=100000 \
    --batch_size=4 \
    --eval.batch_size=1 \
    --eval.n_episodes=1 \
    --eval_freq=1000
 ```
 ## Relationship to LIBERO
 LIBERO-plus is a drop-in extension of LIBERO:
 - Same Python gym interface (`LiberoEnv`, `LiberoProcessorStep`)
 - Same camera names and observation/action format
 - Same task suite names
 - Installs under the same `libero` Python package name (different GitHub repo)
 To use the original LIBERO benchmark, see [LIBERO](./libero) and use `--env.type=libero`.
@@ -31,9 +31,23 @@ from __future__ import annotations
 import argparse
 import json
 import re
 import sys
 from pathlib import Path
 # LIBERO-plus derives task.language by space-joining the perturbation-variant
 # filename (grab_language_from_filename in libero/libero/benchmark/__init__.py),
 # so non-_language_ variants inherit a trailing metadata blob like
 # "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". Strip those tokens so
 # the description matches the base instruction used in the training dataset.
 _LIBERO_PERTURBATION_TAIL_RE = re.compile(
    r"(?:\s(?:view|initstate|noise|add|tb|table|light|level)(?:\s\d+)+)+$"
 )
 def _strip_libero_perturbation_tail(instruction: str) -> str:
    return _LIBERO_PERTURBATION_TAIL_RE.sub("", instruction).strip()
 def _libero_descriptions(task_suite: str) -> dict[str, str]:
    from libero.libero import benchmark  # type: ignore[import-untyped]
@@ -47,7 +61,10 @@ def _libero_descriptions(task_suite: str) -> dict[str, str]:
        )
        return {}
    suite = suite_dict[task_suite]()
-    return {f"{task_suite}_{i}": suite.get_task(i).language for i in range(suite.n_tasks)}
+    return {
        f"{task_suite}_{i}": _strip_libero_perturbation_tail(suite.get_task(i).language)
        for i in range(suite.n_tasks)
    }
 def _metaworld_descriptions(task_name: str) -> dict[str, str]:
@@ -144,7 +161,7 @@ def main() -> int:
    descriptions: dict[str, str] = {}
    try:
-        if args.env == "libero":
+        if args.env == ("libero", "libero_plus"):
            descriptions = _libero_descriptions(args.task)
        elif args.env == "metaworld":
            descriptions = _metaworld_descriptions(args.task)
@@ -331,6 +331,7 @@ class LiberoEnv(EnvConfig):
    camera_name_mapping: dict[str, str] | None = None
    observation_height: int = 360
    observation_width: int = 360
    is_libero_plus: bool = False
    features: dict[str, PolicyFeature] = field(
        default_factory=lambda: {
            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
@@ -432,6 +433,7 @@ class LiberoEnv(EnvConfig):
            control_mode=self.control_mode,
            episode_length=self.episode_length,
            camera_name_mapping=self.camera_name_mapping,
            is_libero_plus=self.is_libero_plus,
        )
    def get_env_processors(self):
@@ -651,6 +653,30 @@ class IsaaclabArenaEnv(HubEnvConfig):
        )
@EnvConfig.register_subclass("libero_plus")
@dataclass
 class LiberoPlusEnv(LiberoEnv):
    """Config for LIBERO-plus robustness benchmark evaluation.
    LIBERO-plus extends LIBERO with 7 perturbation dimensions (camera viewpoints,
    object layouts, robot initial states, language instructions, lighting, background
    textures, sensor noise) producing ~10k task variants.
    The gym interface is identical to LIBERO so this class reuses ``LiberoEnv``
    entirely — only the registered name and default task suite differ.
    Install: see docker/Dockerfile.benchmark.libero_plus — LIBERO-plus ships
    as a namespace package from a git fork and must be cloned + PYTHONPATH'd
    rather than installed as a pyproject extra.
    See Also:
        https://github.com/sylvestf/LIBERO-plus
    """
    task: str = "libero_spatial"
    is_libero_plus: bool = True
@EnvConfig.register_subclass("robotwin")
@dataclass
 class RoboTwinEnvConfig(EnvConfig):
@@ -16,6 +16,7 @@
 from __future__ import annotations
 import os
 import re
 from collections import defaultdict
 from collections.abc import Callable, Iterable, Mapping, Sequence
 from functools import partial
@@ -56,14 +57,34 @@ def _select_task_ids(total_tasks: int, task_ids: Iterable[int] | None) -> list[i
    return ids
-def get_task_init_states(task_suite: Any, i: int) -> np.ndarray:
+# LIBERO-plus perturbation variants encode the perturbation in the filename
-    init_states_path = (
+# but on disk only the base `.pruned_init` exists — strip the suffix to match
-        Path(get_libero_path("init_states"))
+# LIBERO-plus's own suite.get_task_init_states() (we reimplement it here so we
-        / task_suite.tasks[i].problem_folder
+# can pass weights_only=False for PyTorch 2.6+ numpy pickles).
-        / task_suite.tasks[i].init_states_file
+_LIBERO_PERTURBATION_SUFFIX_RE = re.compile(r"_(?:language|view|light)_[^.]*|_(?:table|tb)_\d+")
-    )
+
-    init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
+
-    return init_states
+def get_task_init_states(task_suite: Any, i: int, is_libero_plus: bool = False) -> np.ndarray:
    task = task_suite.tasks[i]
    filename = Path(task.init_states_file)
    root = Path(get_libero_path("init_states"))
    if not is_libero_plus:
        init_states_path = root / task.problem_folder / filename.name
        return torch.load(init_states_path, weights_only=False)  # nosec B614
    # LIBERO-plus: `_add_` / `_level` variants store extra-object layouts under
    # libero_newobj/ as a flat array that must be reshaped to (1, -1).
    if "_add_" in filename.name or "_level" in filename.name:
        init_states_path = root / "libero_newobj" / task.problem_folder / filename.name
        init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
        return init_states.reshape(1, -1)
    # LIBERO-plus perturbation variants encode the perturbation in the filename
    # but on disk only the base `.pruned_init` exists — strip the suffix to match.
    stripped = _LIBERO_PERTURBATION_SUFFIX_RE.sub("", filename.stem) + filename.suffix
    init_states_path = root / task.problem_folder / stripped
    return torch.load(init_states_path, weights_only=False)  # nosec B614
 def get_libero_dummy_action():
@@ -105,9 +126,11 @@ class LiberoEnv(gym.Env):
        camera_name_mapping: dict[str, str] | None = None,
        num_steps_wait: int = 10,
        control_mode: str = "relative",
        is_libero_plus: bool = False,
    ):
        super().__init__()
        self.task_id = task_id
        self.is_libero_plus = is_libero_plus
        self.obs_type = obs_type
        self.render_mode = render_mode
        self.observation_width = observation_width
@@ -134,7 +157,11 @@ class LiberoEnv(gym.Env):
        self.episode_index = episode_index
        self.episode_length = episode_length
        # Load once and keep
-        self._init_states = get_task_init_states(task_suite, self.task_id) if self.init_states else None
+        self._init_states = (
            get_task_init_states(task_suite, self.task_id, is_libero_plus=self.is_libero_plus)
            if self.init_states
            else None
        )
        self._reset_stride = n_envs  # when performing a reset, append `_reset_stride` to `init_state_id`.
        self.init_state_id = self.episode_index  # tie each sub-env to a fixed init state
@@ -367,6 +394,7 @@ def _make_env_fns(
    gym_kwargs: Mapping[str, Any],
    control_mode: str,
    camera_name_mapping: dict[str, str] | None = None,
    is_libero_plus: bool = False,
 ) -> list[Callable[[], LiberoEnv]]:
    """Build n_envs factory callables for a single (suite, task_id)."""
@@ -383,6 +411,7 @@ def _make_env_fns(
            n_envs=n_envs,
            control_mode=control_mode,
            camera_name_mapping=camera_name_mapping,
            is_libero_plus=is_libero_plus,
            **local_kwargs,
        )
@@ -405,6 +434,7 @@ def create_libero_envs(
    control_mode: str = "relative",
    episode_length: int | None = None,
    camera_name_mapping: dict[str, str] | None = None,
    is_libero_plus: bool = False,
 ) -> dict[str, dict[int, Any]]:
    """
    Create vectorized LIBERO environments with a consistent return shape.
@@ -463,6 +493,7 @@ def create_libero_envs(
                gym_kwargs=gym_kwargs,
                control_mode=control_mode,
                camera_name_mapping=camera_name_mapping,
                is_libero_plus=is_libero_plus,
            )
            if is_async:
                lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)