Commit Graph

7 Commits

Author SHA1 Message Date
Pepijn 437014926f feat(ci): add benchmark smoke tests with isolated Docker images
Each benchmark gets its own image (lerobot[<benchmark>,smolvla]) so
incompatible dep trees can never collide. A 1-episode smoke eval runs
per benchmark on GPU runners.

- Libero: pepijn223/smolvla_libero, libero_spatial, camera_name_mapping
- MetaWorld: pepijn223/smolvla_metaworld, metaworld-push-v2
- LIBERO config pre-created at build time to bypass interactive stdin prompt
- Triggers on envs/**, lerobot_eval.py, Dockerfiles, pyproject.toml changes
- Adds docs/source/evaluation.mdx and restores step 7 in adding_benchmarks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 14:44:59 +02:00
Pepijn c4d7e7468b chore: remove out-of-scope benchmark/CI/docs files from PR
Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test
doc, and dispatch tests belong in a separate PR. Scope this PR to the
async env init changes only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 14:33:48 +02:00
Pepijn 28c5fd0421 fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt
libero/__init__.py calls input() when ~/.libero/config.yaml is missing.
We write the config at image build time (without importing libero) so
the prompt never fires at runtime. Also trigger CI on pyproject.toml changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:34:36 +02:00
Pepijn 1bb62aa0c5 fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt
libero/__init__.py calls input() to ask about a custom dataset path,
which raises EOFError when stdin is closed inside Docker. Setting
LIBERO_DATA_FOLDER skips the prompt entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 09:34:08 +02:00
Pepijn 834532f1dc ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 22:29:42 +02:00
Pepijn 40757b3481 ci(benchmarks): pin action hashes and use uv sync --locked
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 21:56:47 +02:00
Pepijn 0bc68740f4 ci(benchmarks): add isolated integration tests for libero and metaworld
Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld]
only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs
per benchmark on GPU runners.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 21:55:59 +02:00