- Remove broken Triton issue link from Dockerfile.benchmark.libero
- Add module-level _safe_int helper to guard n_episodes against NaN
- Move _safe_float to module level alongside _safe_int
- Add # zizmor: ignore[unpinned-uses] to all upload-artifact@v4 steps
- Add if: env.HF_USER_TOKEN != '' to Libero smoke eval for fork PRs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
uv sync --locked validates the entire lockfile across all extras.
Since robomme depends on mani-skill which pins numpy<2.0, and the
base project requires numpy>=2.0, the full lockfile is unsatisfiable.
Switch to uv pip install -e ".[libero,smolvla]" which only resolves
the requested extras for the current Python version and platform,
avoiding the cross-extra numpy conflict entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Remove "Login to Hugging Face" step — it was a no-op (ephemeral
--rm container) that exposed the HF token via CLI argument in
docker inspect / /proc/*/cmdline. The eval step already
re-authenticates via env var.
Functional:
- Remove feat/benchmark-ci from push trigger branches (won't exist
post-merge).
Dockerfiles:
- Pin uv to 0.8.0 (was unpinned, fetching whatever latest ships).
- Add comment explaining the chmod +x ptxas workaround (Triton
packaging bug — ships ptxas without execute bit).
Scripts:
- parse_eval_metrics.py: add note that it runs on bare host and must
stay stdlib-only.
- parse_eval_metrics.py: add NaN guard for avg_sum_reward and eval_s
(was only guarding pc_success).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dep-install layer (uv sync) now only depends on pyproject.toml,
uv.lock, and a minimal package stub — not the full src/ tree. Source
code changes only rebuild the final COPY layer (seconds, not minutes).
Also switch from type=local cache (lost on ephemeral runners) to
type=gha (persisted in GitHub Actions cache, shared across all runs).
Before: every src/ change → full uv sync rebuild (~8-10 min)
After: src/-only change → cached dep layer, ~30s source copy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running chmod on the host doesn't propagate into Docker due to UID/SELinux
mismatch. Instead, spin up the image as root to mkdir+chmod from inside
the container before the eval run mounts the same path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test
doc, and dispatch tests belong in a separate PR. Scope this PR to the
async env init changes only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
num2words (required by SmolVLM processor) is declared in lerobot[smolvla],
not lerobot[libero/metaworld]. Install both extras together.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The config was pointing to /tmp/libero_init which doesn't exist.
Use importlib.util.find_spec to locate the hf-libero package directory
and write paths to the actual bundled bddl_files/init_files/assets.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The multiline RUN python -c "..." was being parsed as Dockerfile
instructions. Use printf to write ~/.libero/config.yaml directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
libero/__init__.py calls input() when ~/.libero/config.yaml is missing.
We write the config at image build time (without importing libero) so
the prompt never fires at runtime. Also trigger CI on pyproject.toml changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld]
only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs
per benchmark on GPU runners.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(ci): add uv.lock
* feat(ci): use uv.lock in CI PR testing
* chore(ci): rename nightly to docker publish and test
* feat(ci): automated update of uv.lock + remove unbound check + docker images now use uv.lock
* fix(ci): add --force-with-lease + set -e for silent erros