mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
a07f22e22c
* feat(envs): add LIBERO-plus robustness benchmark integration
- LiberoPlusEnv config (subclass of LiberoEnv, same gym interface)
- Docker image installing LIBERO-plus fork via PYTHONPATH
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus
- pyproject.toml: libero_plus extra
* fix(libero): use suite's perturbation-aware init_states loader
LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that
strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`,
`_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init`
file — the on-disk name for a perturbation variant doesn't exist as a file,
only the base does. lerobot's loader was bypassing that logic and trying to
read the suffix-bearing filename directly, which failed for every non-zero
task id and killed the eval before any rollout video could be written.
Delegate to the suite's method when it exists; fall back to the path-based
loader for vanilla LIBERO (which does not provide the method).
Also drop the hf-libero install + init_files copy from the LIBERO-plus
Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and
`init_files/` for all five suites, so the copy was unnecessary and the
`cp -r` into an existing dir produced a confusing nested layout.
* fix(libero): resolve LIBERO-plus perturbation init_states path ourselves
Delegating to `task_suite.get_task_init_states(i)` works for path resolution
but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`,
which fails on PyTorch 2.6+ because the pickled init_states contains numpy
objects not in the default allowlist:
_pickle.UnpicklingError: Weights only load failed.
WeightsUnpickler error: Unsupported global:
GLOBAL numpy.core.multiarray._reconstruct was not an allowed global.
Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`,
`_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass
`weights_only=False` ourselves. Vanilla LIBERO task names don't contain any
of these patterns except for `_table_` when followed by the word `center`
(e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex
requires `_table_\\d+` so semantic uses are preserved.
* fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus
LIBERO-plus's bddl_base_domain.py resolves scene XMLs with
`os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml
has no effect on scene lookup — MuJoCo always opens
`<clone>/libero/libero/assets/scenes/...`. With no such directory present,
every perturbation task fails on:
FileNotFoundError: No such file or directory:
.../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml
These textures, views, and extra objects ship only in the 6.4 GB `assets.zip`
published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says
to download and unzip it into the package dir). Fetch it via `hf_hub_download`,
unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at
the extracted dir so everything stays consistent. The download lives in its
own Docker layer so subsequent rebuilds reuse the cached assets.
Drops the lerobot/libero-assets snapshot_download — that mirror only has
vanilla LIBERO textures and is ignored for scene loading anyway.
* fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip
The 6.4 GB zip ships with every entry prefixed by
`inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...`
(the author's internal filesystem layout, not the layout the LIBERO-plus
README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created
`${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened
`${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml`
and hit the same FileNotFoundError.
Extract to a scratch dir, then `mv` the nested `assets/` subtree to the
expected location. Verified the target file exists in the zip central
directory under that exact prefix.
* refactor(libero): inline init_states resolver behind single regex
Collapse the three-style suffix stripper (split/re.sub/in) into one
compiled regex, drop the (Path, bool) tuple return, and move the
`_add_`/`_level` reshape branch into the caller so each branch loads
its own file and returns directly. Net: -11 lines, one fewer helper.
* refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/robomme pattern: start from the nightly GPU
image (apt deps, python, uv, venv, lerobot[all] already there) and only
layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build
deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the
PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip.
Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init) the nightly already provides.
123 → 73 lines.
Also:
- Add libero_plus to docs/source/_toctree.yml under Benchmarks so
doc-builder's TOC integrity check stops failing.
- Repoint the docs dataset link from pepijn223/libero_plus_lerobot to
the canonical lerobot/libero_plus.
- Revert the stray uv.lock churn (revision/marker diff that crept in
from an unrelated resolve — unrelated to LIBERO-plus).
* fix(libero-plus): stop touching pyproject + uv.lock
The fast-tests job was rejecting the branch because pyproject.toml had a
[libero_plus] extra whose git dep wasn't represented in uv.lock.
The Docker image no longer needs the extra — it clones LIBERO-plus
directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from
pyproject and restore pyproject.toml + uv.lock to exactly what's on
origin/main, so `uv sync --locked --extra test` is a no-op for this PR.
Also repoint the doc/CI/env comments that still mentioned the extra at
the Docker install path.
* fix(libero-plus): strip perturbation metadata from task descriptions
LIBERO-plus builds task.language by space-joining the perturbation-variant
filename, so every non-_language_ variant inherits a trailing blob like
"view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the
dashboard video labels and no longer matches the base instruction stored
in the training dataset.
Strip those tokens in extract_task_descriptions.py with an end-anchored
regex over the {view,initstate,noise,add,tb,table,light,level}(+digits)
vocabulary. The anchor preserves mid-sentence literal uses of those words
(e.g. "from table center and place it on the plate") — only the trailing
metadata chain is removed. _language_ variants carry real BDDL-sourced
text and are left untouched.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: integrate PR #3313 review feedback
- docs: fix paper link to arxiv, add benchmark image, add suite descriptions,
add LIBERO-plus replacement warning, restructure eval section to match
LIBERO doc style, fix policy I/O section, remove false try/except claim
- docker: fix shell grouping for hf-libero uninstall, replace hardcoded
asset path with dynamic find
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO
always takes the simple path
* fix(docs): use correct LIBERO-plus teaser image URL
* ci(libero-plus): drop redundant hf auth login step
The standalone login step ran `hf auth login` in a throwaway
`docker run --rm` container, so no credentials persisted. Auth is
already performed inside the eval step's container. Removing the
redundant step per PR #3313 review feedback.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch. Without these attributes eval crashes
when calling `env.unwrapped.metadata["render_fps"]` with async vector
envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and
caches the metadata alongside obs/action spaces in the LIBERO and
MetaWorld factories.
* ci: gate Docker Hub login on secret availability
Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step before any of
the actual build/eval work could run. Gate the login on the env-var
expansion of the username so the step is skipped (not failed) when
secrets are absent. Mirrors the existing pattern in the VLABench job.
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update scripts/ci/extract_task_descriptions.py
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update docker/Dockerfile.benchmark.libero_plus
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* fix(libero-plus): address review feedback
* ci(libero-plus): fix YAML indentation in upload-artifact steps
The `uses:` key on two upload-artifact steps was at column 0 instead
of nested under the step, causing `pre-commit run check-yaml` to fail
with "expected <block end>, but found '<block mapping start>'".
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
846 lines
34 KiB
YAML
846 lines
34 KiB
YAML
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
# Integration tests: build an isolated Docker image per benchmark and run a
|
|
# 1-episode smoke eval. Each benchmark gets its own image so incompatible
|
|
# dependency trees (e.g. hf-libero vs metaworld==3.0.0) can never collide.
|
|
#
|
|
# To add a new benchmark:
|
|
# 1. Add docker/Dockerfile.benchmark.<name> (install only lerobot[<name>])
|
|
# 2. Copy one of the jobs below and adjust the image name and eval command.
|
|
name: Benchmark Integration Tests
|
|
|
|
on:
|
|
# Run manually from the Actions tab
|
|
workflow_dispatch:
|
|
|
|
# Run every Monday at 02:00 UTC.
|
|
schedule:
|
|
- cron: "0 2 * * 1"
|
|
|
|
push:
|
|
branches:
|
|
- main
|
|
paths:
|
|
- "src/lerobot/envs/**"
|
|
- "src/lerobot/scripts/lerobot_eval.py"
|
|
- "docker/Dockerfile.benchmark.*"
|
|
- ".github/workflows/benchmark_tests.yml"
|
|
- "pyproject.toml"
|
|
|
|
pull_request:
|
|
branches:
|
|
- main
|
|
paths:
|
|
- "src/lerobot/envs/**"
|
|
- "src/lerobot/scripts/lerobot_eval.py"
|
|
- "docker/Dockerfile.benchmark.*"
|
|
- ".github/workflows/benchmark_tests.yml"
|
|
- "pyproject.toml"
|
|
|
|
permissions:
|
|
contents: read
|
|
|
|
env:
|
|
UV_VERSION: "0.8.0"
|
|
PYTHON_VERSION: "3.12"
|
|
|
|
# Cancel in-flight runs for the same branch/PR.
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
|
cancel-in-progress: true
|
|
|
|
jobs:
|
|
# ── LIBERO ────────────────────────────────────────────────────────────────
|
|
# Isolated image: lerobot[libero] only (hf-libero, dm-control, mujoco chain)
|
|
libero-integration-test:
|
|
name: Libero — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
# Build the benchmark-specific image. The Dockerfile separates dep-install
|
|
# from source-copy, so code-only changes skip the slow uv-sync layer
|
|
# when the runner has a warm Docker daemon cache.
|
|
- name: Build Libero benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.libero
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-libero:ci
|
|
|
|
- name: Run Libero smoke eval (1 episode)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
# Named container (no --rm) so we can docker cp artifacts out.
|
|
# Output to /tmp inside the container — /artifacts doesn't exist
|
|
# and user_lerobot cannot create root-level dirs.
|
|
docker run --name libero-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
lerobot-benchmark-libero:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=lerobot/smolvla_libero \
|
|
--env.type=libero \
|
|
--env.task=libero_spatial \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
|
|
--policy.empty_cameras=1 \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env libero --task libero_spatial \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy Libero artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/libero-artifacts
|
|
docker cp libero-eval:/tmp/eval-artifacts/. /tmp/libero-artifacts/ 2>/dev/null || true
|
|
docker rm -f libero-eval || true
|
|
|
|
- name: Parse Libero eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/libero-artifacts \
|
|
--env libero \
|
|
--task libero_spatial \
|
|
--policy lerobot/smolvla_libero
|
|
|
|
- name: Upload Libero rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: libero-rollout-video
|
|
path: /tmp/libero-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload Libero eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: libero-metrics
|
|
path: /tmp/libero-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── LIBERO TRAIN+EVAL SMOKE ──────────────────────────────────────────────
|
|
# Train SmolVLA for 1 step (batch_size=1, dataset episode 0 only) then
|
|
# immediately runs eval inside the training loop (eval_freq=1, 1 episode).
|
|
# Tests the full train→eval-within-training pipeline end-to-end.
|
|
- name: Run Libero train+eval smoke (1 step, eval_freq=1)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name libero-train-smoke --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
lerobot-benchmark-libero:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
accelerate launch --num_processes=1 \$(which lerobot-train) \
|
|
--policy.path=lerobot/smolvla_base \
|
|
--policy.load_vlm_weights=true \
|
|
--policy.scheduler_decay_steps=25000 \
|
|
--policy.freeze_vision_encoder=false \
|
|
--policy.train_expert_only=false \
|
|
--dataset.repo_id=lerobot/libero \
|
|
--dataset.episodes=[0] \
|
|
--dataset.use_imagenet_stats=false \
|
|
--env.type=libero \
|
|
--env.task=libero_spatial \
|
|
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
|
|
--policy.empty_cameras=1 \
|
|
--output_dir=/tmp/train-smoke \
|
|
--steps=1 \
|
|
--batch_size=1 \
|
|
--eval_freq=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.batch_size=1 \
|
|
--eval.use_async_envs=false \
|
|
--save_freq=1 \
|
|
--policy.push_to_hub=false \
|
|
'--rename_map={\"observation.images.image\": \"observation.images.camera1\", \"observation.images.image2\": \"observation.images.camera2\"}'
|
|
"
|
|
|
|
- name: Copy Libero train-smoke artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/libero-train-smoke-artifacts
|
|
docker cp libero-train-smoke:/tmp/train-smoke/. /tmp/libero-train-smoke-artifacts/ 2>/dev/null || true
|
|
docker rm -f libero-train-smoke || true
|
|
|
|
- name: Upload Libero train-smoke eval video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: libero-train-smoke-video
|
|
path: /tmp/libero-train-smoke-artifacts/eval/
|
|
if-no-files-found: warn
|
|
|
|
# ── METAWORLD ─────────────────────────────────────────────────────────────
|
|
# Isolated image: lerobot[metaworld] only (metaworld==3.0.0, mujoco>=3 chain)
|
|
metaworld-integration-test:
|
|
name: MetaWorld — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
- name: Build MetaWorld benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.metaworld
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-metaworld:ci
|
|
|
|
- name: Run MetaWorld smoke eval (1 episode)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name metaworld-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
lerobot-benchmark-metaworld:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=lerobot/smolvla_metaworld \
|
|
--env.type=metaworld \
|
|
--env.task=metaworld-push-v3 \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--rename_map={\"observation.image\": \"observation.images.camera1\"}' \
|
|
--policy.empty_cameras=2 \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env metaworld --task metaworld-push-v3 \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy MetaWorld artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/metaworld-artifacts
|
|
docker cp metaworld-eval:/tmp/eval-artifacts/. /tmp/metaworld-artifacts/ 2>/dev/null || true
|
|
docker rm -f metaworld-eval || true
|
|
|
|
- name: Parse MetaWorld eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/metaworld-artifacts \
|
|
--env metaworld \
|
|
--task metaworld-push-v3 \
|
|
--policy lerobot/smolvla_metaworld
|
|
|
|
- name: Upload MetaWorld rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: metaworld-rollout-video
|
|
path: /tmp/metaworld-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload MetaWorld eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: metaworld-metrics
|
|
path: /tmp/metaworld-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── ROBOTWIN 2.0 ──────────────────────────────────────────────────────────
|
|
# Isolated image: full RoboTwin 2.0 stack — SAPIEN, mplib, CuRobo,
|
|
# pytorch3d, + simulation assets (~4 GB).
|
|
# Build takes ~20 min on first run; subsequent runs hit the layer cache.
|
|
# Requires an NVIDIA GPU runner with CUDA 12.1 drivers.
|
|
robotwin-integration-test:
|
|
name: RoboTwin 2.0 — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
ROBOTWIN_POLICY: lerobot/smolvla_robotwin
|
|
ROBOTWIN_TASKS: beat_block_hammer,click_bell,handover_block,stack_blocks_two,click_alarmclock,open_microwave,adjust_bottle,lift_pot,stamp_seal,turn_switch
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
# Build the full-install image: SAPIEN, mplib, CuRobo, pytorch3d +
|
|
# simulation assets (~4 GB). Layer cache lives in the runner's local
|
|
# Docker daemon — reused across re-runs on the same machine.
|
|
- name: Build RoboTwin 2.0 benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.robotwin
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-robotwin:ci
|
|
cache-from: type=local,src=/tmp/.buildx-cache-robotwin
|
|
cache-to: type=local,dest=/tmp/.buildx-cache-robotwin,mode=max
|
|
|
|
- name: Run RoboTwin 2.0 smoke eval (10 tasks, 1 episode each)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
# Named container (no --rm) so we can docker cp artifacts out.
|
|
docker run --name robotwin-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e ROBOTWIN_POLICY="${ROBOTWIN_POLICY}" \
|
|
-e ROBOTWIN_TASKS="${ROBOTWIN_TASKS}" \
|
|
lerobot-benchmark-robotwin:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
cd /opt/robotwin && lerobot-eval \
|
|
--policy.path=\"\$ROBOTWIN_POLICY\" \
|
|
--env.type=robotwin \
|
|
--env.task=\"\$ROBOTWIN_TASKS\" \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--rename_map={\"observation.images.head_camera\": \"observation.images.camera1\", \"observation.images.left_camera\": \"observation.images.camera2\", \"observation.images.right_camera\": \"observation.images.camera3\"}' \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python /lerobot/scripts/ci/extract_task_descriptions.py \
|
|
--env robotwin \
|
|
--task \"\$ROBOTWIN_TASKS\" \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy RoboTwin artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/robotwin-artifacts
|
|
docker cp robotwin-eval:/tmp/eval-artifacts/. /tmp/robotwin-artifacts/ 2>/dev/null || true
|
|
docker rm -f robotwin-eval || true
|
|
|
|
- name: Parse RoboTwin eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/robotwin-artifacts \
|
|
--env robotwin \
|
|
--task "${ROBOTWIN_TASKS}" \
|
|
--policy "${ROBOTWIN_POLICY}"
|
|
|
|
- name: Upload RoboTwin rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: robotwin-rollout-video
|
|
path: /tmp/robotwin-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload RoboTwin eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: robotwin-metrics
|
|
path: /tmp/robotwin-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── ROBOCASA365 ──────────────────────────────────────────────────────────
|
|
# Isolated image: robocasa + robosuite installed manually as editable
|
|
# clones (no `lerobot[robocasa]` extra — robocasa's setup.py pins
|
|
# `lerobot==0.3.3`, which would shadow this repo's lerobot).
|
|
robocasa-integration-test:
|
|
name: RoboCasa365 — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
- name: Build RoboCasa365 benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.robocasa
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-robocasa:ci
|
|
|
|
- name: Run RoboCasa365 smoke eval (10 atomic tasks, 1 episode each)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name robocasa-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
-e MUJOCO_GL=egl \
|
|
lerobot-benchmark-robocasa:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=lerobot/smolvla_robocasa \
|
|
--env.type=robocasa \
|
|
--env.task=CloseFridge,OpenCabinet,OpenDrawer,TurnOnMicrowave,TurnOffStove,CloseToasterOvenDoor,SlideDishwasherRack,TurnOnSinkFaucet,NavigateKitchen,TurnOnElectricKettle \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--rename_map={\"observation.images.robot0_agentview_left\": \"observation.images.camera1\", \"observation.images.robot0_eye_in_hand\": \"observation.images.camera2\", \"observation.images.robot0_agentview_right\": \"observation.images.camera3\"}' \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env robocasa \
|
|
--task CloseFridge,OpenCabinet,OpenDrawer,TurnOnMicrowave,TurnOffStove,CloseToasterOvenDoor,SlideDishwasherRack,TurnOnSinkFaucet,NavigateKitchen,TurnOnElectricKettle \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy RoboCasa365 artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/robocasa-artifacts
|
|
docker cp robocasa-eval:/tmp/eval-artifacts/. /tmp/robocasa-artifacts/ 2>/dev/null || true
|
|
docker rm -f robocasa-eval || true
|
|
|
|
- name: Parse RoboCasa365 eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/robocasa-artifacts \
|
|
--env robocasa \
|
|
--task atomic_smoke_10 \
|
|
--policy lerobot/smolvla_robocasa
|
|
|
|
- name: Upload RoboCasa365 rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robocasa-rollout-video
|
|
path: /tmp/robocasa-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload RoboCasa365 eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robocasa-metrics
|
|
path: /tmp/robocasa-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── ROBOCEREBRA ───────────────────────────────────────────────────────────
|
|
# Reuses the LIBERO simulator (libero_10 suite) with RoboCerebra camera
|
|
# defaults (image/wrist_image). The image is layered on
|
|
# huggingface/lerobot-gpu, which already ships [libero] as part of [all].
|
|
robocerebra-integration-test:
|
|
name: RoboCerebra — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
- name: Build RoboCerebra benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.robocerebra
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-robocerebra:ci
|
|
cache-from: type=local,src=/tmp/.buildx-cache-robocerebra
|
|
cache-to: type=local,dest=/tmp/.buildx-cache-robocerebra,mode=max
|
|
|
|
- name: Run RoboCerebra smoke eval (1 episode)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name robocerebra-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
-e LIBERO_DATA_FOLDER=/tmp/libero_data \
|
|
lerobot-benchmark-robocerebra:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=lerobot/smolvla_robocerebra \
|
|
--env.type=libero \
|
|
--env.task=libero_10 \
|
|
--env.fps=20 \
|
|
--env.obs_type=pixels_agent_pos \
|
|
--env.observation_height=256 \
|
|
--env.observation_width=256 \
|
|
'--env.camera_name_mapping={\"agentview_image\": \"image\", \"robot0_eye_in_hand_image\": \"wrist_image\"}' \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--rename_map={\"observation.images.image\": \"observation.images.camera1\", \"observation.images.wrist_image\": \"observation.images.camera2\"}' \
|
|
--policy.empty_cameras=1 \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env libero --task libero_10 \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy RoboCerebra artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/robocerebra-artifacts
|
|
docker cp robocerebra-eval:/tmp/eval-artifacts/. /tmp/robocerebra-artifacts/ 2>/dev/null || true
|
|
docker rm -f robocerebra-eval || true
|
|
|
|
- name: Parse RoboCerebra eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/robocerebra-artifacts \
|
|
--env robocerebra \
|
|
--task libero_10 \
|
|
--policy lerobot/smolvla_robocerebra
|
|
|
|
- name: Upload RoboCerebra rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robocerebra-rollout-video
|
|
path: /tmp/robocerebra-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload RoboCerebra eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robocerebra-metrics
|
|
path: /tmp/robocerebra-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── ROBOMME ───────────────────────────────────────────────────────────────
|
|
# Isolated image: mani-skill/SAPIEN/Vulkan chain with gymnasium and numpy
|
|
# overrides (robomme can't be a pyproject extra due to numpy<2 pin).
|
|
robomme-integration-test:
|
|
name: RoboMME — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
ROBOMME_POLICY: lerobot/smolvla_robomme
|
|
ROBOMME_TASKS: PickXtimes,BinFill,StopCube,MoveCube,InsertPeg,SwingXtimes,VideoUnmask,ButtonUnmask,PickHighlight,PatternLock
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
- name: Build RoboMME benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.robomme
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-robomme:ci
|
|
|
|
- name: Run RoboMME smoke eval (10 tasks, 1 episode each)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name robomme-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
-e ROBOMME_POLICY="${ROBOMME_POLICY}" \
|
|
-e ROBOMME_TASKS="${ROBOMME_TASKS}" \
|
|
lerobot-benchmark-robomme:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=\"\$ROBOMME_POLICY\" \
|
|
--env.type=robomme \
|
|
--env.task=\"\$ROBOMME_TASKS\" \
|
|
--env.dataset_split=test \
|
|
--env.task_ids=[0] \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--rename_map={\"observation.images.image\": \"observation.images.camera1\", \"observation.images.wrist_image\": \"observation.images.camera2\"}' \
|
|
--policy.empty_cameras=3 \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env robomme --task \"\$ROBOMME_TASKS\" \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy RoboMME artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/robomme-artifacts
|
|
docker cp robomme-eval:/tmp/eval-artifacts/. /tmp/robomme-artifacts/ 2>/dev/null || true
|
|
docker rm -f robomme-eval || true
|
|
|
|
- name: Parse RoboMME eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/robomme-artifacts \
|
|
--env robomme \
|
|
--task "${ROBOMME_TASKS}" \
|
|
--policy "${ROBOMME_POLICY}"
|
|
|
|
- name: Upload RoboMME rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robomme-rollout-video
|
|
path: /tmp/robomme-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload RoboMME eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: robomme-metrics
|
|
path: /tmp/robomme-artifacts/metrics.json
|
|
if-no-files-found: warn
|
|
|
|
# ── LIBERO-plus ───────────────────────────────────────────────────────────
|
|
# Isolated image: LIBERO-plus fork cloned into /home/user_lerobot on top of
|
|
# huggingface/lerobot-gpu (see docker/Dockerfile.benchmark.libero_plus).
|
|
libero-plus-integration-test:
|
|
name: LIBERO-plus — build image + 1-episode eval
|
|
runs-on:
|
|
group: aws-g6-4xlarge-plus
|
|
env:
|
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
|
LIBERO_PLUS_SUITE: libero_spatial
|
|
LIBERO_PLUS_POLICY: lerobot/smolvla_libero_plus
|
|
LIBERO_PLUS_TASK_IDS: "[0,100,260,500,1000,1500,2000,2400]"
|
|
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
persist-credentials: false
|
|
lfs: true
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
cache-binary: false
|
|
|
|
- name: Login to Docker Hub
|
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
|
env:
|
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
|
|
|
- name: Build LIBERO-plus benchmark image
|
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
context: .
|
|
file: docker/Dockerfile.benchmark.libero_plus
|
|
push: false
|
|
load: true
|
|
tags: lerobot-benchmark-libero-plus:ci
|
|
cache-from: type=local,src=/tmp/.buildx-cache-libero-plus
|
|
cache-to: type=local,dest=/tmp/.buildx-cache-libero-plus,mode=max
|
|
|
|
- name: Run LIBERO-plus smoke eval (1 episode)
|
|
if: env.HF_USER_TOKEN != ''
|
|
run: |
|
|
docker run --name libero-plus-eval --gpus all \
|
|
--shm-size=4g \
|
|
-e HF_HOME=/tmp/hf \
|
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
|
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
|
-e LIBERO_PLUS_SUITE="${LIBERO_PLUS_SUITE}" \
|
|
-e LIBERO_PLUS_POLICY="${LIBERO_PLUS_POLICY}" \
|
|
-e LIBERO_PLUS_TASK_IDS="${LIBERO_PLUS_TASK_IDS}" \
|
|
lerobot-benchmark-libero-plus:ci \
|
|
bash -c "
|
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
|
lerobot-eval \
|
|
--policy.path=\"\$LIBERO_PLUS_POLICY\" \
|
|
--env.type=libero_plus \
|
|
--env.task=\"\$LIBERO_PLUS_SUITE\" \
|
|
--env.task_ids=\"\$LIBERO_PLUS_TASK_IDS\" \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval.use_async_envs=false \
|
|
--policy.device=cuda \
|
|
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
|
|
--policy.empty_cameras=1 \
|
|
--output_dir=/tmp/eval-artifacts
|
|
python scripts/ci/extract_task_descriptions.py \
|
|
--env libero_plus --task \"\$LIBERO_PLUS_SUITE\" \
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"
|
|
|
|
- name: Copy LIBERO-plus artifacts from container
|
|
if: always()
|
|
run: |
|
|
mkdir -p /tmp/libero-plus-artifacts
|
|
docker cp libero-plus-eval:/tmp/eval-artifacts/. /tmp/libero-plus-artifacts/ 2>/dev/null || true
|
|
docker rm -f libero-plus-eval || true
|
|
|
|
- name: Parse LIBERO-plus eval metrics
|
|
if: always()
|
|
run: |
|
|
python3 scripts/ci/parse_eval_metrics.py \
|
|
--artifacts-dir /tmp/libero-plus-artifacts \
|
|
--env libero_plus \
|
|
--task "${LIBERO_PLUS_SUITE}" \
|
|
--policy "${LIBERO_PLUS_POLICY}"
|
|
|
|
- name: Upload LIBERO-plus rollout video
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: libero-plus-rollout-video
|
|
path: /tmp/libero-plus-artifacts/videos/
|
|
if-no-files-found: warn
|
|
|
|
- name: Upload LIBERO-plus eval metrics
|
|
if: always()
|
|
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
|
|
with:
|
|
name: libero-plus-metrics
|
|
path: /tmp/libero-plus-artifacts/metrics.json
|
|
if-no-files-found: warn
|