Drop the conditional guard — other workflows (docker_publish,
full_tests) call docker/login-action unconditionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step-level 'if' cannot reference 'secrets' directly. Expose the
secret via an env var and check that instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anonymous pulls from Docker Hub are rate-limited to 100/6h, which
fails when multiple benchmark jobs pull nvidia/cuda in parallel.
Add docker/login-action step (conditional on DOCKERHUB_USERNAME var)
to authenticate and get 200 pulls/6h.
Setup: add DOCKERHUB_USERNAME as a repository variable and
DOCKERHUB_TOKEN as a repository secret in GitHub Settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dep-install layer (uv sync) now only depends on pyproject.toml,
uv.lock, and a minimal package stub — not the full src/ tree. Source
code changes only rebuild the final COPY layer (seconds, not minutes).
Also switch from type=local cache (lost on ephemeral runners) to
type=gha (persisted in GitHub Actions cache, shared across all runs).
Before: every src/ change → full uv sync rebuild (~8-10 min)
After: src/-only change → cached dep layer, ~30s source copy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The task descriptions were never populated in metrics.json because
extract_task_descriptions.py was never invoked. The script exists and
parse_eval_metrics.py already looks for its output — the call was
simply missing from the workflow.
Appends the extraction step to the existing bash -c block (runs inside
the container where libero/metaworld is installed) so task_descriptions.json
is written to the eval-artifacts dir before docker cp copies it out.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs accelerate launch --num_processes=1 lerobot-train with:
- steps=1, batch_size=1, dataset.episodes=[0] (episode 0 only)
- eval_freq=1 so the training loop triggers eval after step 1
- eval.n_episodes=1, eval.use_async_envs=false
Tests the full train→eval-within-training pipeline in the existing
libero-benchmark-libero:ci image (no extra Docker build cost).
Uploads eval video from /tmp/train-smoke/eval/ as libero-train-smoke-video.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds scripts/ci/parse_eval_metrics.py and wires it into both Libero and
MetaWorld jobs so the dashboard can read pc_success, avg_sum_reward and
eval_s from the metrics artifact instead of relying on GitHub step timing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
user_lerobot cannot create /artifacts at the container root.
Use /tmp/eval-artifacts (always writable) then docker cp it out.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bind mounts on these runners don't surface container-written files on
the host path (likely DinD/socket-mount setup). Switch to named
containers + docker cp, which copies directly through the daemon and
lands files in the runner's accessible filesystem.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs on the 1st of every month at 02:00 UTC in addition to the
existing push/PR and manual dispatch triggers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Files created by user_lerobot inside the eval container inherit a
restrictive umask, making them unreadable by the runner after the
container exits. Add a post-eval 'docker run --user root' chmod step
so upload-artifact can find the video files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Running chmod on the host doesn't propagate into Docker due to UID/SELinux
mismatch. Instead, spin up the image as root to mkdir+chmod from inside
the container before the eval run mounts the same path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test
doc, and dispatch tests belong in a separate PR. Scope this PR to the
async env init changes only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
libero/__init__.py calls input() when ~/.libero/config.yaml is missing.
We write the config at image build time (without importing libero) so
the prompt never fires at runtime. Also trigger CI on pyproject.toml changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
libero/__init__.py calls input() to ask about a custom dataset path,
which raises EOFError when stdin is closed inside Docker. Setting
LIBERO_DATA_FOLDER skips the prompt entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld]
only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs
per benchmark on GPU runners.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>