Commit Graph

170 Commits

Author SHA1 Message Date
Pepijn 2665d4a5ac fix: address PR review feedback — broken link, NaN guard, zizmor tags, fork skip
- Remove broken Triton issue link from Dockerfile.benchmark.libero
- Add module-level _safe_int helper to guard n_episodes against NaN
- Move _safe_float to module level alongside _safe_int
- Add # zizmor: ignore[unpinned-uses] to all upload-artifact@v4 steps
- Add if: env.HF_USER_TOKEN != '' to Libero smoke eval for fork PRs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 13:38:28 +02:00
Pepijn 183fdb7f78 ci(benchmarks): trigger on PRs targeting feat/benchmark-ci
Benchmark PRs (robomme, libero-plus, robocerebra, robotwin) target
feat/benchmark-ci, not main. Without this, the workflow never runs
on those PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:29:00 +02:00
Pepijn c505a71f78 fix(ci): address PR review feedback for benchmark smoke tests
Security:
- Remove "Login to Hugging Face" step — it was a no-op (ephemeral
  --rm container) that exposed the HF token via CLI argument in
  docker inspect / /proc/*/cmdline. The eval step already
  re-authenticates via env var.

Functional:
- Remove feat/benchmark-ci from push trigger branches (won't exist
  post-merge).

Dockerfiles:
- Pin uv to 0.8.0 (was unpinned, fetching whatever latest ships).
- Add comment explaining the chmod +x ptxas workaround (Triton
  packaging bug — ships ptxas without execute bit).

Scripts:
- parse_eval_metrics.py: add note that it runs on bare host and must
  stay stdlib-only.
- parse_eval_metrics.py: add NaN guard for avg_sum_reward and eval_s
  (was only guarding pc_success).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:48:07 +02:00
Pepijn 58d4ecd304 Merge branch 'main' into feat/benchmark-ci 2026-04-10 12:42:46 +02:00
Pepijn 86c51a5663 fix(ci): remove GHCR cache (org blocks GITHUB_TOKEN package writes)
The huggingface org restricts GHCR package creation via GITHUB_TOKEN,
causing 403 on cache export. Remove all registry caching and GHCR
login. The Dockerfile layer split (deps vs source) still helps when
the runner has a warm Docker daemon.

Also fix the metaworld job which had a stale conditional Docker Hub
login and was missing the GHCR login entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:01:12 +02:00
Pepijn c3429aa9df fix(ci): use GHCR for Docker layer cache (Docker Hub push denied)
Docker Hub CI token can't push to new repos. GHCR works out of the
box — GITHUB_TOKEN has automatic packages:write for the repo owner.

- Add GHCR login step (github.actor + GITHUB_TOKEN)
- Switch cache refs to ghcr.io/huggingface/lerobot/cache-benchmark
- Add packages:write at job level (not workflow, per zizmor)
- Keep Docker Hub login for pulling nvidia/cuda base image

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:57:18 +02:00
Steven Palma 6799da35eb chore(ci): proper claude args workflow (#3338) 2026-04-09 16:20:01 +02:00
Pepijn a8b6ecda0d fix(ci): switch Docker cache from type=gha to type=registry
GHA cache is capped at 10GB per repo — a single CUDA + PyTorch +
benchmark image is ~8GB so the cache evicts before it's reused.

Switch to type=registry which pushes cache layers to Docker Hub
(huggingface/lerobot-benchmark-cache:{libero,metaworld}). No size
limit, layers persist until explicitly deleted, and shared across
all runners and branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:31:20 +02:00
Pepijn 0490e97c96 fix(ci): simplify Docker Hub login to match existing workflows
Drop the conditional guard — other workflows (docker_publish,
full_tests) call docker/login-action unconditionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:27:26 +02:00
Pepijn e72b168f28 fix(ci): use env context for secrets check in step if-condition
Step-level 'if' cannot reference 'secrets' directly. Expose the
secret via an env var and check that instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:27:02 +02:00
Pepijn 14f1e09f22 fix(ci): use existing DOCKERHUB_LEROBOT_USERNAME/PASSWORD secrets
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:18:56 +02:00
Pepijn c713c7f58c fix(ci): add Docker Hub login to avoid pull rate limits
Anonymous pulls from Docker Hub are rate-limited to 100/6h, which
fails when multiple benchmark jobs pull nvidia/cuda in parallel.
Add docker/login-action step (conditional on DOCKERHUB_USERNAME var)
to authenticate and get 200 pulls/6h.

Setup: add DOCKERHUB_USERNAME as a repository variable and
DOCKERHUB_TOKEN as a repository secret in GitHub Settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:17:39 +02:00
Pepijn 9a84ae7b61 perf(ci): split Dockerfile dep-install from source-copy for faster rebuilds
The dep-install layer (uv sync) now only depends on pyproject.toml,
uv.lock, and a minimal package stub — not the full src/ tree. Source
code changes only rebuild the final COPY layer (seconds, not minutes).

Also switch from type=local cache (lost on ephemeral runners) to
type=gha (persisted in GitHub Actions cache, shared across all runs).

Before: every src/ change → full uv sync rebuild (~8-10 min)
After:  src/-only change → cached dep layer, ~30s source copy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:15:43 +02:00
Pepijn c454d2913f Merge branch 'main' into feat/benchmark-ci 2026-04-09 14:58:15 +02:00
Pepijn 9a9bc3b42c fix(ci): call extract_task_descriptions.py after eval in benchmark jobs
The task descriptions were never populated in metrics.json because
extract_task_descriptions.py was never invoked. The script exists and
parse_eval_metrics.py already looks for its output — the call was
simply missing from the workflow.

Appends the extraction step to the existing bash -c block (runs inside
the container where libero/metaworld is installed) so task_descriptions.json
is written to the eval-artifacts dir before docker cp copies it out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:53:51 +02:00
Steven Palma 3e34d550c8 fix(ci): pin claude-code-action to v1.0.88 (#3336) 2026-04-09 14:16:54 +02:00
hf-security-analysis[bot] 800449aa53 chore(security): update claude.yml (#3333)
* fix(security): remediate workflow vulnerability in .github/workflows/claude.yml

* fix(security): right AUTHOR_ASSOCIATION fetching

---------

Co-authored-by: hf-security-analysis[bot] <265538906+hf-security-analysis[bot]@users.noreply.github.com>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2026-04-09 13:02:05 +02:00
Pepijn d39a6211b7 chore: merge main into feat/benchmark-ci-clean
Resolves conflict in lerobot_eval.py by taking explicit
(AttributeError, NotImplementedError) catches from main (#3274).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:46:55 +02:00
Steven Palma 8645d71e56 feat(ci): add agent assitance workflow (#3332)
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-04-09 12:06:25 +02:00
Pepijn 17a5431ca3 feat(ci): add Libero train+eval smoke test (1 step, eval_freq=1)
Runs accelerate launch --num_processes=1 lerobot-train with:
- steps=1, batch_size=1, dataset.episodes=[0] (episode 0 only)
- eval_freq=1 so the training loop triggers eval after step 1
- eval.n_episodes=1, eval.use_async_envs=false

Tests the full train→eval-within-training pipeline in the existing
libero-benchmark-libero:ci image (no extra Docker build cost).
Uploads eval video from /tmp/train-smoke/eval/ as libero-train-smoke-video.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 10:25:12 +02:00
Pepijn 3534331fcc feat(ci): add parse_eval_metrics step to benchmark workflow
Adds scripts/ci/parse_eval_metrics.py and wires it into both Libero and
MetaWorld jobs so the dashboard can read pc_success, avg_sum_reward and
eval_s from the metrics artifact instead of relying on GitHub step timing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 10:04:30 +02:00
Pepijn 0dd0a8f11a fix(ci): write eval output to /tmp inside container
user_lerobot cannot create /artifacts at the container root.
Use /tmp/eval-artifacts (always writable) then docker cp it out.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:56:10 +02:00
Pepijn 936b42e6a2 fix(ci): use docker cp instead of bind mounts for artifacts
Bind mounts on these runners don't surface container-written files on
the host path (likely DinD/socket-mount setup). Switch to named
containers + docker cp, which copies directly through the daemon and
lands files in the runner's accessible filesystem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:32:10 +02:00
Pepijn e8d029eaf2 fix(ci): change benchmark schedule from monthly to weekly (every Monday)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:06:57 +02:00
Pepijn d8305abb3e feat(ci): add monthly schedule trigger for benchmark tests
Runs on the 1st of every month at 02:00 UTC in addition to the
existing push/PR and manual dispatch triggers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:06:02 +02:00
Pepijn a16f00ca66 fix(ci): re-chmod artifacts after eval to fix unreadable files
Files created by user_lerobot inside the eval container inherit a
restrictive umask, making them unreadable by the runner after the
container exits. Add a post-eval 'docker run --user root' chmod step
so upload-artifact can find the video files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 19:59:28 +02:00
Pepijn 927118e0ee fix(ci): use root container chmod to fix PermissionError on artifact dirs
Running chmod on the host doesn't propagate into Docker due to UID/SELinux
mismatch. Instead, spin up the image as root to mkdir+chmod from inside
the container before the eval run mounts the same path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 19:22:10 +02:00
Pepijn c8c2e88e24 chore: remove out-of-scope benchmark/CI/docs files from PR
Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test
doc, and dispatch tests belong in a separate PR. Scope this PR to the
async env init changes only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:13 +02:00
Pepijn 841cbb0835 fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt
libero/__init__.py calls input() when ~/.libero/config.yaml is missing.
We write the config at image build time (without importing libero) so
the prompt never fires at runtime. Also trigger CI on pyproject.toml changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:10 +02:00
Pepijn dfd09c054d fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt
libero/__init__.py calls input() to ask about a custom dataset path,
which raises EOFError when stdin is closed inside Docker. Setting
LIBERO_DATA_FOLDER skips the prompt entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:10 +02:00
Pepijn 07350f95a9 ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:09 +02:00
Pepijn 61e2be8c9e ci(benchmarks): pin action hashes and use uv sync --locked
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:09 +02:00
Pepijn 6e6f76d47f ci(benchmarks): add isolated integration tests for libero and metaworld
Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld]
only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs
per benchmark on GPU runners.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:29:09 +02:00
Pauline Bailly-Masson 1396b9fab7 🔒 Pin GitHub Actions to commit SHAs (#3265)
* 🔒 pin quality.yml actions to commit SHAs

* 🔒 pin fast_tests.yml actions to commit SHAs

* 🔒 pin full_tests.yml actions to commit SHAs

* 🔒 pin documentation.yml actions to commit SHAs

* 🔒 pin documentation-upload-pr.yml actions to commit SHAs

* 🔒 pin release.yml actions to commit SHAs

* 🔒 pin security.yml actions to commit SHAs

---------

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-04-07 16:11:14 +02:00
Steven Palma 913041e753 fix(ci): latest deps tests permissions (#3296)
* fix(ci): latest deps tests permissions

* fix(ci): force push dep update branch

* fix(ci): change secret for permissions & Ci trigger
2026-04-06 14:56:05 +02:00
Steven Palma 50a1e67e94 feat(ci): add uv.lock (#3292)
* feat(ci): add uv.lock

* feat(ci): use uv.lock in CI PR testing

* chore(ci): rename nightly to docker publish and test

* feat(ci): automated update of uv.lock + remove unbound check + docker images now use uv.lock

* fix(ci): add --force-with-lease + set -e for silent erros
2026-04-06 12:23:37 +02:00
Steven Palma 85de893fa7 fix(ci): skip HF log in (and tests) in forks and community PRs (#3097)
* fix(ci): skip HF log in (and tests) in forks and community PRs

* chore(test): remove comment about test meant to be only run locally

* fix(tests): no hf log in decorator for xvla

* fix(test): no decorator in yield
2026-03-06 16:33:43 +01:00
Steven Palma a4c66e530b chore(docs): remove pi installation note (#3095) 2026-03-06 15:52:54 +01:00
Steven Palma e489ba24fc feat(dependencies): require Python 3.12+ as minimum version (#3023)
* feat(dependecies): upgrade to python3.12

* fix(test): processor regex message

* fix(test): processor regex message

* fix(dependecies): resolve all tags in python 3.12

* fix(dependecies): add more hints to faster resolve

* chore(dependecies): remove cli tag huggingface-hub dep

* refactor(policy): update eagle for python3.12

* chore(docs): update policy creation for python 3.12

* chore(test): skip failing tests in macos
2026-03-06 10:15:13 +01:00
Steven Palma d324ffe810 fix(ci): test only multi-gpu tests in multi-gpu runner (#3092) 2026-03-05 19:53:40 +01:00
Steven Palma 3e45120272 fix(ci): log in HF for gated repo in nightly workflows (#3089)
* fix(ci): log in HF for gated repo in nightly workflows

* fix(ci): add env var

* fix(ci): remove 10 min limit for multi-gpu nightly
2026-03-05 13:22:37 +01:00
Steven Palma f0d2b37beb chore(dependencies): bump transformers v5 (#2964)
* chore(dependencies): upgrade transformers + hggingface-hub + peft + scipy

* chore(dependencies): bump pi0 family to transformers v5

* chore(dependencies): bump wall x to transformers v5

* chore(dependencies): bump gr00t to transformers v5

* chore(style): fix pre-commit

* fix(policy): xvla forced_bos_token missing

* test(rl): skip ci tests for resnet10

* Fix: full pi models support for transformer v5 (#2967)

* fix(pi): remove loss truncation

* fix(pi): remove state padding before tokenization

* fix(pi): fix image padding value

* fix from_pretrain

* add transformer v5 changes

* remove reference

* more fixes

* make it work

* add support for rest of pi family

* add pifast work

* more changes

* more changes

* more cleanup

* fix torch params

* dtype fix

* torch compile

* embed mismatch fix

* revert groot

* more nit fixes

* remove unused classes

* more fixes

* revert

* nit

* torch dtype warning fix

* but back dynamic renaming

* add tie embedding

---------

Co-authored-by: Yufei Sun <skieyfly@gmail.com>

* chore: fix XVLA in transformers v5 (#3006)

* test(policies): enable wall x CI testing

* style(test): pre-commit check

* style(test): pre-commit

* fix wall x for transformer v5 (#3008)

* tv5 fix

* various wall x fixes

* Delete tests/policies/pi0_pi05/print_pi05_output_logits.py

Signed-off-by: Jade Choghari <chogharijade@gmail.com>

* sync modeling_florence2.py with chore/bump_transformers_v5

* more

* more fixes

* more

* remove comment

* more

---------

Signed-off-by: Jade Choghari <chogharijade@gmail.com>

* chore(dependencies): adjust dependencies versioning after transformers v5 (#3034)

* chore(dependecies): adjust dependecies versioning after transformers v5

* fix(policies): remove deprecated input_embeds

* fix(policies): dict _tied_weights_keys

* chore(depedencies): common qwen-vl-utils

* chore(dependencies): bump transformers to 5.2

* Fix policy testing for tv5 (#3032)

* fix ci logger

* other fix

* fix mypy

* change logits to torch2.10

* skip wallx|

* remove logging

---------

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>

* feat(ci): log into HF to unblock some CI tests (#3007)

* feat(ci): log into HF to unblock some CI tests

* chore(ci): change hf call + secret name

* fix(ci): temp fix for pi0 rtc test

* test(policies): require_cuda for unblocked tests

* test(policies): require_cuda wall_x

* fic(tests): require_cuda outter most for pi0

* fix(test): return instead of yield

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* style(test): fix pre-commit

* chore(deps): upgrade transformers (#3050)

* chore(test): use lerobot model

* fix(policies): change default action tokenizer for wall x

* sample on cpu

* Revert "Merge branch 'chore/bump_transformers_v5' of https://github.com/huggingface/lerobot into chore/bump_transformers_v5"

This reverts commit d9b76755f7, reversing
changes made to 89359cb0b6.

* Reapply "Merge branch 'chore/bump_transformers_v5' of https://github.com/huggingface/lerobot into chore/bump_transformers_v5"

This reverts commit c9914db78b.

---------

Signed-off-by: Jade Choghari <chogharijade@gmail.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: Yufei Sun <skieyfly@gmail.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
2026-03-05 09:25:26 +01:00
Steven Palma 5095ab0845 fix(ci): permissions triton (#3011) 2026-02-24 19:09:34 +01:00
whats2000 778db19a17 [Bug Fix] fix(ci): prevent runner group error on fork pushes (#2911)
* fix(ci): prevent runner group error on fork pushes

Add repository check to unbound_deps_tests workflow to ensure
aws-general-8-plus runner group is only used on main repository,
preventing 'Required runner group not found' errors on forks.

* fix(ci): use gating job to prevent runner allocation on forks

The previous approach failed because GitHub evaluates runs-on before if conditions.
Now using a check-repo job that runs on ubuntu-latest first, and all jobs with
special runners depend on it and check its output before being scheduled.

* fix(ci): add gating job to full_tests to prevent runner allocation on forks

Apply the same gating pattern used in unbound_deps_tests to full_tests.yml
to prevent GitHub from trying to allocate custom runners when workflows
run on forks. The check-repo job runs first on ubuntu-latest and all jobs
with custom runners depend on it and check its output.

* fix(ci): add repository check to unbound_deps_tests workflow

Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker job to prevent runner group access errors on forks, matching the pattern used in nightly.yml

* fix(ci): add repository check to full_tests workflow

Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker and gpu-tests jobs to prevent runner group access errors on forks

* refactor(ci): remove redundant check from gpu-tests job

gpu-tests depends on build-and-push-docker via needs, so it will automatically skip when the parent job is skipped

* refactor(ci): remove unnecessary fork check from full-tests job

full-tests runs on ubuntu-latest which is available to all forks, no need to restrict it

---------

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-02-10 15:21:40 +01:00
Steven Palma 6d34a986de feat(ci): trigger manually documentation release version (#2841) 2026-01-22 12:26:17 +01:00
Steven Palma 112b2d173a chore(ci): deactivates cron job on unbound dep tests (#2810) 2026-01-16 14:39:00 +01:00
Pauline Bailly-Masson a9d81e7f67 refactor(ci): Docker Hub image env (#2755)
* Refactor Docker Hub image env

Updated environment variable usage for Docker Hub credentials and corrected image tag extraction.

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>

* same

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(ci): remove duplicated IMAGE_FULL variable definition

---------

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-07 00:21:03 +01:00
Salman Chishti a06f4b9140 Upgrade GitHub Actions for Node 24 compatibility (#2691) 2025-12-24 10:42:29 +01:00
Steven Palma 20c22a2799 chore(ci): make keyword matching more conservative (#2711) 2025-12-24 02:03:12 +01:00
Steven Palma 2f238fce15 feat(ci): adds release versioning to docs (#2709)
* feat(ci): adds release versioning to docs

* chore(ci): remove TODO
2025-12-24 00:40:56 +01:00