mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-23 04:30:10 +00:00
feat(envs): add LIBERO-plus robustness benchmark (#3313)
* feat(envs): add LIBERO-plus robustness benchmark integration
- LiberoPlusEnv config (subclass of LiberoEnv, same gym interface)
- Docker image installing LIBERO-plus fork via PYTHONPATH
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus
- pyproject.toml: libero_plus extra
* fix(libero): use suite's perturbation-aware init_states loader
LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that
strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`,
`_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init`
file — the on-disk name for a perturbation variant doesn't exist as a file,
only the base does. lerobot's loader was bypassing that logic and trying to
read the suffix-bearing filename directly, which failed for every non-zero
task id and killed the eval before any rollout video could be written.
Delegate to the suite's method when it exists; fall back to the path-based
loader for vanilla LIBERO (which does not provide the method).
Also drop the hf-libero install + init_files copy from the LIBERO-plus
Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and
`init_files/` for all five suites, so the copy was unnecessary and the
`cp -r` into an existing dir produced a confusing nested layout.
* fix(libero): resolve LIBERO-plus perturbation init_states path ourselves
Delegating to `task_suite.get_task_init_states(i)` works for path resolution
but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`,
which fails on PyTorch 2.6+ because the pickled init_states contains numpy
objects not in the default allowlist:
_pickle.UnpicklingError: Weights only load failed.
WeightsUnpickler error: Unsupported global:
GLOBAL numpy.core.multiarray._reconstruct was not an allowed global.
Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`,
`_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass
`weights_only=False` ourselves. Vanilla LIBERO task names don't contain any
of these patterns except for `_table_` when followed by the word `center`
(e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex
requires `_table_\\d+` so semantic uses are preserved.
* fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus
LIBERO-plus's bddl_base_domain.py resolves scene XMLs with
`os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml
has no effect on scene lookup — MuJoCo always opens
`<clone>/libero/libero/assets/scenes/...`. With no such directory present,
every perturbation task fails on:
FileNotFoundError: No such file or directory:
.../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml
These textures, views, and extra objects ship only in the 6.4 GB `assets.zip`
published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says
to download and unzip it into the package dir). Fetch it via `hf_hub_download`,
unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at
the extracted dir so everything stays consistent. The download lives in its
own Docker layer so subsequent rebuilds reuse the cached assets.
Drops the lerobot/libero-assets snapshot_download — that mirror only has
vanilla LIBERO textures and is ignored for scene loading anyway.
* fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip
The 6.4 GB zip ships with every entry prefixed by
`inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...`
(the author's internal filesystem layout, not the layout the LIBERO-plus
README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created
`${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened
`${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml`
and hit the same FileNotFoundError.
Extract to a scratch dir, then `mv` the nested `assets/` subtree to the
expected location. Verified the target file exists in the zip central
directory under that exact prefix.
* refactor(libero): inline init_states resolver behind single regex
Collapse the three-style suffix stripper (split/re.sub/in) into one
compiled regex, drop the (Path, bool) tuple return, and move the
`_add_`/`_level` reshape branch into the caller so each branch loads
its own file and returns directly. Net: -11 lines, one fewer helper.
* refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/robomme pattern: start from the nightly GPU
image (apt deps, python, uv, venv, lerobot[all] already there) and only
layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build
deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the
PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip.
Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init) the nightly already provides.
123 → 73 lines.
Also:
- Add libero_plus to docs/source/_toctree.yml under Benchmarks so
doc-builder's TOC integrity check stops failing.
- Repoint the docs dataset link from pepijn223/libero_plus_lerobot to
the canonical lerobot/libero_plus.
- Revert the stray uv.lock churn (revision/marker diff that crept in
from an unrelated resolve — unrelated to LIBERO-plus).
* fix(libero-plus): stop touching pyproject + uv.lock
The fast-tests job was rejecting the branch because pyproject.toml had a
[libero_plus] extra whose git dep wasn't represented in uv.lock.
The Docker image no longer needs the extra — it clones LIBERO-plus
directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from
pyproject and restore pyproject.toml + uv.lock to exactly what's on
origin/main, so `uv sync --locked --extra test` is a no-op for this PR.
Also repoint the doc/CI/env comments that still mentioned the extra at
the Docker install path.
* fix(libero-plus): strip perturbation metadata from task descriptions
LIBERO-plus builds task.language by space-joining the perturbation-variant
filename, so every non-_language_ variant inherits a trailing blob like
"view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the
dashboard video labels and no longer matches the base instruction stored
in the training dataset.
Strip those tokens in extract_task_descriptions.py with an end-anchored
regex over the {view,initstate,noise,add,tb,table,light,level}(+digits)
vocabulary. The anchor preserves mid-sentence literal uses of those words
(e.g. "from table center and place it on the plate") — only the trailing
metadata chain is removed. _language_ variants carry real BDDL-sourced
text and are left untouched.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: integrate PR #3313 review feedback
- docs: fix paper link to arxiv, add benchmark image, add suite descriptions,
add LIBERO-plus replacement warning, restructure eval section to match
LIBERO doc style, fix policy I/O section, remove false try/except claim
- docker: fix shell grouping for hf-libero uninstall, replace hardcoded
asset path with dynamic find
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO
always takes the simple path
* fix(docs): use correct LIBERO-plus teaser image URL
* ci(libero-plus): drop redundant hf auth login step
The standalone login step ran `hf auth login` in a throwaway
`docker run --rm` container, so no credentials persisted. Auth is
already performed inside the eval step's container. Removing the
redundant step per PR #3313 review feedback.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch. Without these attributes eval crashes
when calling `env.unwrapped.metadata["render_fps"]` with async vector
envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and
caches the metadata alongside obs/action spaces in the LIBERO and
MetaWorld factories.
* ci: gate Docker Hub login on secret availability
Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step before any of
the actual build/eval work could run. Gate the login on the env-var
expansion of the username so the step is skipped (not failed) when
secrets are absent. Mirrors the existing pattern in the VLABench job.
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update scripts/ci/extract_task_descriptions.py
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update docker/Dockerfile.benchmark.libero_plus
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* fix(libero-plus): address review feedback
* ci(libero-plus): fix YAML indentation in upload-artifact steps
The `uses:` key on two upload-artifact steps was at column 0 instead
of nested under the step, causing `pre-commit run check-yaml` to fail
with "expected <block end>, but found '<block mapping start>'".
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
This commit is contained in:
@@ -331,6 +331,7 @@ class LiberoEnv(EnvConfig):
|
||||
camera_name_mapping: dict[str, str] | None = None
|
||||
observation_height: int = 360
|
||||
observation_width: int = 360
|
||||
is_libero_plus: bool = False
|
||||
features: dict[str, PolicyFeature] = field(
|
||||
default_factory=lambda: {
|
||||
ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
|
||||
@@ -432,6 +433,7 @@ class LiberoEnv(EnvConfig):
|
||||
control_mode=self.control_mode,
|
||||
episode_length=self.episode_length,
|
||||
camera_name_mapping=self.camera_name_mapping,
|
||||
is_libero_plus=self.is_libero_plus,
|
||||
)
|
||||
|
||||
def get_env_processors(self):
|
||||
@@ -651,6 +653,30 @@ class IsaaclabArenaEnv(HubEnvConfig):
|
||||
)
|
||||
|
||||
|
||||
@EnvConfig.register_subclass("libero_plus")
|
||||
@dataclass
|
||||
class LiberoPlusEnv(LiberoEnv):
|
||||
"""Config for LIBERO-plus robustness benchmark evaluation.
|
||||
|
||||
LIBERO-plus extends LIBERO with 7 perturbation dimensions (camera viewpoints,
|
||||
object layouts, robot initial states, language instructions, lighting, background
|
||||
textures, sensor noise) producing ~10k task variants.
|
||||
|
||||
The gym interface is identical to LIBERO so this class reuses ``LiberoEnv``
|
||||
entirely — only the registered name and default task suite differ.
|
||||
|
||||
Install: see docker/Dockerfile.benchmark.libero_plus — LIBERO-plus ships
|
||||
as a namespace package from a git fork and must be cloned + PYTHONPATH'd
|
||||
rather than installed as a pyproject extra.
|
||||
|
||||
See Also:
|
||||
https://github.com/sylvestf/LIBERO-plus
|
||||
"""
|
||||
|
||||
task: str = "libero_spatial"
|
||||
is_libero_plus: bool = True
|
||||
|
||||
|
||||
@EnvConfig.register_subclass("robotwin")
|
||||
@dataclass
|
||||
class RoboTwinEnvConfig(EnvConfig):
|
||||
|
||||
@@ -16,6 +16,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import re
|
||||
from collections import defaultdict
|
||||
from collections.abc import Callable, Iterable, Mapping, Sequence
|
||||
from functools import partial
|
||||
@@ -56,14 +57,34 @@ def _select_task_ids(total_tasks: int, task_ids: Iterable[int] | None) -> list[i
|
||||
return ids
|
||||
|
||||
|
||||
def get_task_init_states(task_suite: Any, i: int) -> np.ndarray:
|
||||
init_states_path = (
|
||||
Path(get_libero_path("init_states"))
|
||||
/ task_suite.tasks[i].problem_folder
|
||||
/ task_suite.tasks[i].init_states_file
|
||||
)
|
||||
init_states = torch.load(init_states_path, weights_only=False) # nosec B614
|
||||
return init_states
|
||||
# LIBERO-plus perturbation variants encode the perturbation in the filename
|
||||
# but on disk only the base `.pruned_init` exists — strip the suffix to match
|
||||
# LIBERO-plus's own suite.get_task_init_states() (we reimplement it here so we
|
||||
# can pass weights_only=False for PyTorch 2.6+ numpy pickles).
|
||||
_LIBERO_PERTURBATION_SUFFIX_RE = re.compile(r"_(?:language|view|light)_[^.]*|_(?:table|tb)_\d+")
|
||||
|
||||
|
||||
def get_task_init_states(task_suite: Any, i: int, is_libero_plus: bool = False) -> np.ndarray:
|
||||
task = task_suite.tasks[i]
|
||||
filename = Path(task.init_states_file)
|
||||
root = Path(get_libero_path("init_states"))
|
||||
|
||||
if not is_libero_plus:
|
||||
init_states_path = root / task.problem_folder / filename.name
|
||||
return torch.load(init_states_path, weights_only=False) # nosec B614
|
||||
|
||||
# LIBERO-plus: `_add_` / `_level` variants store extra-object layouts under
|
||||
# libero_newobj/ as a flat array that must be reshaped to (1, -1).
|
||||
if "_add_" in filename.name or "_level" in filename.name:
|
||||
init_states_path = root / "libero_newobj" / task.problem_folder / filename.name
|
||||
init_states = torch.load(init_states_path, weights_only=False) # nosec B614
|
||||
return init_states.reshape(1, -1)
|
||||
|
||||
# LIBERO-plus perturbation variants encode the perturbation in the filename
|
||||
# but on disk only the base `.pruned_init` exists — strip the suffix to match.
|
||||
stripped = _LIBERO_PERTURBATION_SUFFIX_RE.sub("", filename.stem) + filename.suffix
|
||||
init_states_path = root / task.problem_folder / stripped
|
||||
return torch.load(init_states_path, weights_only=False) # nosec B614
|
||||
|
||||
|
||||
def get_libero_dummy_action():
|
||||
@@ -105,9 +126,11 @@ class LiberoEnv(gym.Env):
|
||||
camera_name_mapping: dict[str, str] | None = None,
|
||||
num_steps_wait: int = 10,
|
||||
control_mode: str = "relative",
|
||||
is_libero_plus: bool = False,
|
||||
):
|
||||
super().__init__()
|
||||
self.task_id = task_id
|
||||
self.is_libero_plus = is_libero_plus
|
||||
self.obs_type = obs_type
|
||||
self.render_mode = render_mode
|
||||
self.observation_width = observation_width
|
||||
@@ -134,7 +157,11 @@ class LiberoEnv(gym.Env):
|
||||
self.episode_index = episode_index
|
||||
self.episode_length = episode_length
|
||||
# Load once and keep
|
||||
self._init_states = get_task_init_states(task_suite, self.task_id) if self.init_states else None
|
||||
self._init_states = (
|
||||
get_task_init_states(task_suite, self.task_id, is_libero_plus=self.is_libero_plus)
|
||||
if self.init_states
|
||||
else None
|
||||
)
|
||||
self._reset_stride = n_envs # when performing a reset, append `_reset_stride` to `init_state_id`.
|
||||
|
||||
self.init_state_id = self.episode_index # tie each sub-env to a fixed init state
|
||||
@@ -367,6 +394,7 @@ def _make_env_fns(
|
||||
gym_kwargs: Mapping[str, Any],
|
||||
control_mode: str,
|
||||
camera_name_mapping: dict[str, str] | None = None,
|
||||
is_libero_plus: bool = False,
|
||||
) -> list[Callable[[], LiberoEnv]]:
|
||||
"""Build n_envs factory callables for a single (suite, task_id)."""
|
||||
|
||||
@@ -383,6 +411,7 @@ def _make_env_fns(
|
||||
n_envs=n_envs,
|
||||
control_mode=control_mode,
|
||||
camera_name_mapping=camera_name_mapping,
|
||||
is_libero_plus=is_libero_plus,
|
||||
**local_kwargs,
|
||||
)
|
||||
|
||||
@@ -405,6 +434,7 @@ def create_libero_envs(
|
||||
control_mode: str = "relative",
|
||||
episode_length: int | None = None,
|
||||
camera_name_mapping: dict[str, str] | None = None,
|
||||
is_libero_plus: bool = False,
|
||||
) -> dict[str, dict[int, Any]]:
|
||||
"""
|
||||
Create vectorized LIBERO environments with a consistent return shape.
|
||||
@@ -463,6 +493,7 @@ def create_libero_envs(
|
||||
gym_kwargs=gym_kwargs,
|
||||
control_mode=control_mode,
|
||||
camera_name_mapping=camera_name_mapping,
|
||||
is_libero_plus=is_libero_plus,
|
||||
)
|
||||
if is_async:
|
||||
lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)
|
||||
|
||||
Reference in New Issue
Block a user