Files created by user_lerobot inside the eval container inherit a
restrictive umask, making them unreadable by the runner after the
container exits. Add a post-eval 'docker run --user root' chmod step
so upload-artifact can find the video files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace host-side chmod (unreliable across Docker UID boundary) with a
dedicated 'docker run --user root' step that chmods from inside the
container before the eval run mounts the path.
- Use python3 instead of python (CI runners only have python3).
- Add if: always() to parse/upload steps so metrics are captured even on
eval failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- spaces/health-dashboard/app.py: Gradio Space that queries the GitHub
Actions API directly (no extra datastore). Shows benchmark status
badges, success-rate and duration trend charts, and embeds the latest
rollout video per benchmark. Results cached 5 min in-memory; video
files cached on disk by artifact ID so downloads only happen once.
- spaces/health-dashboard/requirements.txt + README.md: Space card with
setup instructions for the GITHUB_RO_TOKEN secret (actions:read,
metadata:read only).
- scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval,
reads eval_info.json written by lerobot-eval, extracts pc_success and
n_episodes, and writes metrics.json to the artifacts dir.
- .github/workflows/benchmark_tests.yml: add "Parse … metrics" and
"Upload … metrics" steps (if: always()) after each eval so the
dashboard has data even when the eval fails.
The Space should be deployed as a private Space under the huggingface
org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Container runs as user_lerobot (non-root); host-mounted /artifacts volume
was owned by root, causing PermissionError on first video write.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mount a host volume into the container so lerobot-eval writes videos to
/artifacts, then upload artifacts/videos/ via actions/upload-artifact.
`if: always()` ensures the video is uploaded even when the eval fails,
which helps debug rollout issues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add HF_HUB_DOWNLOAD_TIMEOUT=300 to both jobs — SmolVLM2 processor
download was timing out on CI runners with the default timeout
- MetaWorld: add --rename_map to map observation.image → camera1 and
--policy.empty_cameras=2 to pad the 2 missing cameras the policy
expects (trained with 3 cameras, env provides 1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All MetaWorld task names in metaworld_config.json use the v3 suffix.
push-v2 caused a KeyError on TASK_DESCRIPTIONS lookup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each benchmark gets its own image (lerobot[<benchmark>,smolvla]) so
incompatible dep trees can never collide. A 1-episode smoke eval runs
per benchmark on GPU runners.
- Libero: pepijn223/smolvla_libero, libero_spatial, camera_name_mapping
- MetaWorld: pepijn223/smolvla_metaworld, metaworld-push-v2
- LIBERO config pre-created at build time to bypass interactive stdin prompt
- Triggers on envs/**, lerobot_eval.py, Dockerfiles, pyproject.toml changes
- Adds docs/source/evaluation.mdx and restores step 7 in adding_benchmarks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test
doc, and dispatch tests belong in a separate PR. Scope this PR to the
async env init changes only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
libero/__init__.py calls input() when ~/.libero/config.yaml is missing.
We write the config at image build time (without importing libero) so
the prompt never fires at runtime. Also trigger CI on pyproject.toml changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
libero/__init__.py calls input() to ask about a custom dataset path,
which raises EOFError when stdin is closed inside Docker. Setting
LIBERO_DATA_FOLDER skips the prompt entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld]
only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs
per benchmark on GPU runners.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(ci): add uv.lock
* feat(ci): use uv.lock in CI PR testing
* chore(ci): rename nightly to docker publish and test
* feat(ci): automated update of uv.lock + remove unbound check + docker images now use uv.lock
* fix(ci): add --force-with-lease + set -e for silent erros
* fix(ci): skip HF log in (and tests) in forks and community PRs
* chore(test): remove comment about test meant to be only run locally
* fix(tests): no hf log in decorator for xvla
* fix(test): no decorator in yield
* fix(ci): prevent runner group error on fork pushes
Add repository check to unbound_deps_tests workflow to ensure
aws-general-8-plus runner group is only used on main repository,
preventing 'Required runner group not found' errors on forks.
* fix(ci): use gating job to prevent runner allocation on forks
The previous approach failed because GitHub evaluates runs-on before if conditions.
Now using a check-repo job that runs on ubuntu-latest first, and all jobs with
special runners depend on it and check its output before being scheduled.
* fix(ci): add gating job to full_tests to prevent runner allocation on forks
Apply the same gating pattern used in unbound_deps_tests to full_tests.yml
to prevent GitHub from trying to allocate custom runners when workflows
run on forks. The check-repo job runs first on ubuntu-latest and all jobs
with custom runners depend on it and check its output.
* fix(ci): add repository check to unbound_deps_tests workflow
Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker job to prevent runner group access errors on forks, matching the pattern used in nightly.yml
* fix(ci): add repository check to full_tests workflow
Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker and gpu-tests jobs to prevent runner group access errors on forks
* refactor(ci): remove redundant check from gpu-tests job
gpu-tests depends on build-and-push-docker via needs, so it will automatically skip when the parent job is skipped
* refactor(ci): remove unnecessary fork check from full-tests job
full-tests runs on ubuntu-latest which is available to all forks, no need to restrict it
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* pi fixes for dependencies
* add walls sarm conflict
* also add conflicts for pi
* fix(ci): use --extra all instead of --all-extras + --no-extra
---------
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
* support wallx
* fix bugs in flow
* incorporate wallx model into lerobot
* update the policy methods
* reduce to least config and params & pass lerobot basic test
* fixed dtype bugs
* add wallx dependencies
* update
* remove flash-attn requirement && fix bug in inference and fast mode
* fix bug for inference
* add some small modifications
* fix pre-commit errors
* remove lerobot[wallx]
* fix ci
* fix precommit issues
* fix: exclude wallx extra properly in CI workflows
* fix: add uv conflicts for wallx transformers version
* fix: peft test import
* pre-commit
* only export WallXConfig from wall_x package to avoid peft import in CI
* remove torch dep
* precommit
* add import
---------
Co-authored-by: vincentchen <chenlufang@x2robot.com>
Co-authored-by: Geoffrey19 <sympathischmann35@gmail.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>