- spaces/health-dashboard/app.py: Gradio Space that queries the GitHub
Actions API directly (no extra datastore). Shows benchmark status
badges, success-rate and duration trend charts, and embeds the latest
rollout video per benchmark. Results cached 5 min in-memory; video
files cached on disk by artifact ID so downloads only happen once.
- spaces/health-dashboard/requirements.txt + README.md: Space card with
setup instructions for the GITHUB_RO_TOKEN secret (actions:read,
metadata:read only).
- scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval,
reads eval_info.json written by lerobot-eval, extracts pc_success and
n_episodes, and writes metrics.json to the artifacts dir.
- .github/workflows/benchmark_tests.yml: add "Parse … metrics" and
"Upload … metrics" steps (if: always()) after each eval so the
dashboard has data even when the eval fails.
The Space should be deployed as a private Space under the huggingface
org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>