feat(ci): add benchmark smoke tests with isolated Docker images

Each benchmark gets its own image (lerobot[<benchmark>,smolvla]) so incompatible dep trees can never collide. A 1-episode smoke eval runs per benchmark on GPU runners. - Libero: pepijn223/smolvla_libero, libero_spatial, camera_name_mapping - MetaWorld: pepijn223/smolvla_metaworld, metaworld-push-v2 - LIBERO config pre-created at build time to bypass interactive stdin prompt - Triggers on envs/**, lerobot_eval.py, Dockerfiles, pyproject.toml changes - Adds docs/source/evaluation.mdx and restores step 7 in adding_benchmarks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-07-23 01:41:54 +00:00 · 2026-04-08 14:44:59 +02:00
parent f4ad290067
commit 437014926f
6 changed files with 581 additions and 10 deletions
@@ -73,6 +73,8 @@
    title: Control & Train Robots in Sim (LeIsaac)
  title: "Simulation"
 - sections:
+  - local: evaluation
+    title: Evaluation (lerobot-eval)
  - local: adding_benchmarks
    title: Adding a New Benchmark
  - local: libero