feat(envs): add LIBERO-plus robustness benchmark (#3313)

* feat(envs): add LIBERO-plus robustness benchmark integration - LiberoPlusEnv config (subclass of LiberoEnv, same gym interface) - Docker image installing LIBERO-plus fork via PYTHONPATH - CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus - pyproject.toml: libero_plus extra * fix(libero): use suite's perturbation-aware init_states loader LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init` file — the on-disk name for a perturbation variant doesn't exist as a file, only the base does. lerobot's loader was bypassing that logic and trying to read the suffix-bearing filename directly, which failed for every non-zero task id and killed the eval before any rollout video could be written. Delegate to the suite's method when it exists; fall back to the path-based loader for vanilla LIBERO (which does not provide the method). Also drop the hf-libero install + init_files copy from the LIBERO-plus Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and `init_files/` for all five suites, so the copy was unnecessary and the `cp -r` into an existing dir produced a confusing nested layout. * fix(libero): resolve LIBERO-plus perturbation init_states path ourselves Delegating to `task_suite.get_task_init_states(i)` works for path resolution but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`, which fails on PyTorch 2.6+ because the pickled init_states contains numpy objects not in the default allowlist: _pickle.UnpicklingError: Weights only load failed. WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global. Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass `weights_only=False` ourselves. Vanilla LIBERO task names don't contain any of these patterns except for `_table_` when followed by the word `center` (e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex requires `_table_\\d+` so semantic uses are preserved. * fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus LIBERO-plus's bddl_base_domain.py resolves scene XMLs with `os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml has no effect on scene lookup — MuJoCo always opens `<clone>/libero/libero/assets/scenes/...`. With no such directory present, every perturbation task fails on: FileNotFoundError: No such file or directory: .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml These textures, views, and extra objects ship only in the 6.4 GB `assets.zip` published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says to download and unzip it into the package dir). Fetch it via `hf_hub_download`, unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at the extracted dir so everything stays consistent. The download lives in its own Docker layer so subsequent rebuilds reuse the cached assets. Drops the lerobot/libero-assets snapshot_download — that mirror only has vanilla LIBERO textures and is ignored for scene loading anyway. * fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip The 6.4 GB zip ships with every entry prefixed by `inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...` (the author's internal filesystem layout, not the layout the LIBERO-plus README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created `${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened `${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml` and hit the same FileNotFoundError. Extract to a scratch dir, then `mv` the nested `assets/` subtree to the expected location. Verified the target file exists in the zip central directory under that exact prefix. * refactor(libero): inline init_states resolver behind single regex Collapse the three-style suffix stripper (split/re.sub/in) into one compiled regex, drop the (Path, bool) tuple return, and move the `_add_`/`_level` reshape branch into the caller so each branch loads its own file and returns directly. Net: -11 lines, one fewer helper. * refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and only layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip. Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init) the nightly already provides. 123 → 73 lines. Also: - Add libero_plus to docs/source/_toctree.yml under Benchmarks so doc-builder's TOC integrity check stops failing. - Repoint the docs dataset link from pepijn223/libero_plus_lerobot to the canonical lerobot/libero_plus. - Revert the stray uv.lock churn (revision/marker diff that crept in from an unrelated resolve — unrelated to LIBERO-plus). * fix(libero-plus): stop touching pyproject + uv.lock The fast-tests job was rejecting the branch because pyproject.toml had a [libero_plus] extra whose git dep wasn't represented in uv.lock. The Docker image no longer needs the extra — it clones LIBERO-plus directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from pyproject and restore pyproject.toml + uv.lock to exactly what's on origin/main, so `uv sync --locked --extra test` is a no-op for this PR. Also repoint the doc/CI/env comments that still mentioned the extra at the Docker install path. * fix(libero-plus): strip perturbation metadata from task descriptions LIBERO-plus builds task.language by space-joining the perturbation-variant filename, so every non-_language_ variant inherits a trailing blob like "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the dashboard video labels and no longer matches the base instruction stored in the training dataset. Strip those tokens in extract_task_descriptions.py with an end-anchored regex over the {view,initstate,noise,add,tb,table,light,level}(+digits) vocabulary. The anchor preserves mid-sentence literal uses of those words (e.g. "from table center and place it on the plate") — only the trailing metadata chain is removed. _language_ variants carry real BDDL-sourced text and are left untouched. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3313 review feedback - docs: fix paper link to arxiv, add benchmark image, add suite descriptions, add LIBERO-plus replacement warning, restructure eval section to match LIBERO doc style, fix policy I/O section, remove false try/except claim - docker: fix shell grouping for hf-libero uninstall, replace hardcoded asset path with dynamic find - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO always takes the simple path * fix(docs): use correct LIBERO-plus teaser image URL * ci(libero-plus): drop redundant hf auth login step The standalone login step ran `hf auth login` in a throwaway `docker run --rm` container, so no credentials persisted. Auth is already performed inside the eval step's container. Removing the redundant step per PR #3313 review feedback. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Without these attributes eval crashes when calling `env.unwrapped.metadata["render_fps"]` with async vector envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and caches the metadata alongside obs/action spaces in the LIBERO and MetaWorld factories. * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step before any of the actual build/eval work could run. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Mirrors the existing pattern in the VLABench job. * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update scripts/ci/extract_task_descriptions.py Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docker/Dockerfile.benchmark.libero_plus Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(libero-plus): address review feedback * ci(libero-plus): fix YAML indentation in upload-artifact steps The `uses:` key on two upload-artifact steps was at column 0 instead of nested under the step, causing `pre-commit run check-yaml` to fail with "expected <block end>, but found '<block mapping start>'". Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
2026-05-23 04:30:10 +00:00 · 2026-04-20 21:07:21 +02:00
parent 282c31cfef
commit a07f22e22c
7 changed files with 466 additions and 11 deletions
@@ -331,6 +331,7 @@ class LiberoEnv(EnvConfig):
    camera_name_mapping: dict[str, str] | None = None
    observation_height: int = 360
    observation_width: int = 360
+    is_libero_plus: bool = False
    features: dict[str, PolicyFeature] = field(
        default_factory=lambda: {
            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
@@ -432,6 +433,7 @@ class LiberoEnv(EnvConfig):
            control_mode=self.control_mode,
            episode_length=self.episode_length,
            camera_name_mapping=self.camera_name_mapping,
+            is_libero_plus=self.is_libero_plus,
        )

    def get_env_processors(self):
@@ -651,6 +653,30 @@ class IsaaclabArenaEnv(HubEnvConfig):
        )


+@EnvConfig.register_subclass("libero_plus")
+@dataclass
+class LiberoPlusEnv(LiberoEnv):
+    """Config for LIBERO-plus robustness benchmark evaluation.
+
+    LIBERO-plus extends LIBERO with 7 perturbation dimensions (camera viewpoints,
+    object layouts, robot initial states, language instructions, lighting, background
+    textures, sensor noise) producing ~10k task variants.
+
+    The gym interface is identical to LIBERO so this class reuses ``LiberoEnv``
+    entirely — only the registered name and default task suite differ.
+
+    Install: see docker/Dockerfile.benchmark.libero_plus — LIBERO-plus ships
+    as a namespace package from a git fork and must be cloned + PYTHONPATH'd
+    rather than installed as a pyproject extra.
+
+    See Also:
+        https://github.com/sylvestf/LIBERO-plus
+    """
+
+    task: str = "libero_spatial"
+    is_libero_plus: bool = True
+
+
@EnvConfig.register_subclass("robotwin")
@dataclass
 class RoboTwinEnvConfig(EnvConfig):
@@ -16,6 +16,7 @@
 from __future__ import annotations

 import os
+import re
 from collections import defaultdict
 from collections.abc import Callable, Iterable, Mapping, Sequence
 from functools import partial
@@ -56,14 +57,34 @@ def _select_task_ids(total_tasks: int, task_ids: Iterable[int] | None) -> list[i
    return ids


-def get_task_init_states(task_suite: Any, i: int) -> np.ndarray:
-    init_states_path = (
-        Path(get_libero_path("init_states"))
-        / task_suite.tasks[i].problem_folder
-        / task_suite.tasks[i].init_states_file
-    )
-    init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
-    return init_states
+# LIBERO-plus perturbation variants encode the perturbation in the filename
+# but on disk only the base `.pruned_init` exists — strip the suffix to match
+# LIBERO-plus's own suite.get_task_init_states() (we reimplement it here so we
+# can pass weights_only=False for PyTorch 2.6+ numpy pickles).
+_LIBERO_PERTURBATION_SUFFIX_RE = re.compile(r"_(?:language|view|light)_[^.]*|_(?:table|tb)_\d+")
+
+
+def get_task_init_states(task_suite: Any, i: int, is_libero_plus: bool = False) -> np.ndarray:
+    task = task_suite.tasks[i]
+    filename = Path(task.init_states_file)
+    root = Path(get_libero_path("init_states"))
+
+    if not is_libero_plus:
+        init_states_path = root / task.problem_folder / filename.name
+        return torch.load(init_states_path, weights_only=False)  # nosec B614
+
+    # LIBERO-plus: `_add_` / `_level` variants store extra-object layouts under
+    # libero_newobj/ as a flat array that must be reshaped to (1, -1).
+    if "_add_" in filename.name or "_level" in filename.name:
+        init_states_path = root / "libero_newobj" / task.problem_folder / filename.name
+        init_states = torch.load(init_states_path, weights_only=False)  # nosec B614
+        return init_states.reshape(1, -1)
+
+    # LIBERO-plus perturbation variants encode the perturbation in the filename
+    # but on disk only the base `.pruned_init` exists — strip the suffix to match.
+    stripped = _LIBERO_PERTURBATION_SUFFIX_RE.sub("", filename.stem) + filename.suffix
+    init_states_path = root / task.problem_folder / stripped
+    return torch.load(init_states_path, weights_only=False)  # nosec B614


 def get_libero_dummy_action():
@@ -105,9 +126,11 @@ class LiberoEnv(gym.Env):
        camera_name_mapping: dict[str, str] | None = None,
        num_steps_wait: int = 10,
        control_mode: str = "relative",
+        is_libero_plus: bool = False,
    ):
        super().__init__()
        self.task_id = task_id
+        self.is_libero_plus = is_libero_plus
        self.obs_type = obs_type
        self.render_mode = render_mode
        self.observation_width = observation_width
@@ -134,7 +157,11 @@ class LiberoEnv(gym.Env):
        self.episode_index = episode_index
        self.episode_length = episode_length
        # Load once and keep
-        self._init_states = get_task_init_states(task_suite, self.task_id) if self.init_states else None
+        self._init_states = (
+            get_task_init_states(task_suite, self.task_id, is_libero_plus=self.is_libero_plus)
+            if self.init_states
+            else None
+        )
        self._reset_stride = n_envs  # when performing a reset, append `_reset_stride` to `init_state_id`.

        self.init_state_id = self.episode_index  # tie each sub-env to a fixed init state
@@ -367,6 +394,7 @@ def _make_env_fns(
    gym_kwargs: Mapping[str, Any],
    control_mode: str,
    camera_name_mapping: dict[str, str] | None = None,
+    is_libero_plus: bool = False,
 ) -> list[Callable[[], LiberoEnv]]:
    """Build n_envs factory callables for a single (suite, task_id)."""

@@ -383,6 +411,7 @@ def _make_env_fns(
            n_envs=n_envs,
            control_mode=control_mode,
            camera_name_mapping=camera_name_mapping,
+            is_libero_plus=is_libero_plus,
            **local_kwargs,
        )

@@ -405,6 +434,7 @@ def create_libero_envs(
    control_mode: str = "relative",
    episode_length: int | None = None,
    camera_name_mapping: dict[str, str] | None = None,
+    is_libero_plus: bool = False,
 ) -> dict[str, dict[int, Any]]:
    """
    Create vectorized LIBERO environments with a consistent return shape.
@@ -463,6 +493,7 @@ def create_libero_envs(
                gym_kwargs=gym_kwargs,
                control_mode=control_mode,
                camera_name_mapping=camera_name_mapping,
+                is_libero_plus=is_libero_plus,
            )
            if is_async:
                lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)