feat(ci): add health dashboard Space + benchmark metrics artifacts

- spaces/health-dashboard/app.py: Gradio Space that queries the GitHub Actions API directly (no extra datastore). Shows benchmark status badges, success-rate and duration trend charts, and embeds the latest rollout video per benchmark. Results cached 5 min in-memory; video files cached on disk by artifact ID so downloads only happen once. - spaces/health-dashboard/requirements.txt + README.md: Space card with setup instructions for the GITHUB_RO_TOKEN secret (actions:read, metadata:read only). - scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval, reads eval_info.json written by lerobot-eval, extracts pc_success and n_episodes, and writes metrics.json to the artifacts dir. - .github/workflows/benchmark_tests.yml: add "Parse … metrics" and "Upload … metrics" steps (if: always()) after each eval so the dashboard has data even when the eval fails. The Space should be deployed as a private Space under the huggingface org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-07-07 18:11:50 +00:00 · 2026-04-08 17:46:44 +02:00
parent 13ee7009fe
commit 452d9abaa4
5 changed files with 666 additions and 0 deletions
@@ -125,6 +125,15 @@ jobs:
                --output_dir=/artifacts
            "

+      - name: Parse Libero eval metrics
+        if: always()
+        run: |
+          python scripts/ci/parse_eval_metrics.py \
+            --artifacts-dir /tmp/libero-artifacts \
+            --env libero \
+            --task libero_spatial \
+            --policy pepijn223/smolvla_libero
+
      - name: Upload Libero rollout video
        if: always()
        uses: actions/upload-artifact@v4
@@ -133,6 +142,14 @@ jobs:
          path: /tmp/libero-artifacts/videos/
          if-no-files-found: warn

+      - name: Upload Libero eval metrics
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: libero-metrics
+          path: /tmp/libero-artifacts/metrics.json
+          if-no-files-found: warn
+
  # ── METAWORLD ─────────────────────────────────────────────────────────────
  # Isolated image: lerobot[metaworld] only (metaworld==3.0.0, mujoco>=3 chain)
  metaworld-integration-test:
@@ -189,6 +206,15 @@ jobs:
                --output_dir=/artifacts
            "

+      - name: Parse MetaWorld eval metrics
+        if: always()
+        run: |
+          python scripts/ci/parse_eval_metrics.py \
+            --artifacts-dir /tmp/metaworld-artifacts \
+            --env metaworld \
+            --task metaworld-push-v3 \
+            --policy pepijn223/smolvla_metaworld
+
      - name: Upload MetaWorld rollout video
        if: always()
        uses: actions/upload-artifact@v4
@@ -196,3 +222,11 @@ jobs:
          name: metaworld-rollout-video
          path: /tmp/metaworld-artifacts/videos/
          if-no-files-found: warn
+
+      - name: Upload MetaWorld eval metrics
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: metaworld-metrics
+          path: /tmp/metaworld-artifacts/metrics.json
+          if-no-files-found: warn