fix(ci): address PR review feedback for benchmark smoke tests

Security: - Remove "Login to Hugging Face" step — it was a no-op (ephemeral --rm container) that exposed the HF token via CLI argument in docker inspect / /proc/*/cmdline. The eval step already re-authenticates via env var. Functional: - Remove feat/benchmark-ci from push trigger branches (won't exist post-merge). Dockerfiles: - Pin uv to 0.8.0 (was unpinned, fetching whatever latest ships). - Add comment explaining the chmod +x ptxas workaround (Triton packaging bug — ships ptxas without execute bit). Scripts: - parse_eval_metrics.py: add note that it runs on bare host and must stay stdlib-only. - parse_eval_metrics.py: add NaN guard for avg_sum_reward and eval_s (was only guarding pc_success). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-16 00:59:46 +00:00 · 2026-04-10 12:47:58 +02:00
parent 58d4ecd304
commit c505a71f78
4 changed files with 20 additions and 13 deletions
@@ -19,6 +19,9 @@ Reads eval_info.json written by lerobot-eval --output_dir and extracts the
 key metrics needed by the health dashboard. Handles both single-task and
 multi-task eval output formats.

+NOTE: This script runs on the bare CI runner (not inside Docker), so it
+must use only Python stdlib modules. Do not add third-party imports.
+
 Usage:
    python scripts/ci/parse_eval_metrics.py \\
        --artifacts-dir /tmp/libero-artifacts \\
@@ -54,12 +57,19 @@ def _extract_metrics(info: dict) -> tuple[float | None, int | None, float | None
        n = agg.get("n_episodes")
        reward = agg.get("avg_sum_reward")
        eval_s = agg.get("eval_s")
+
+        def _safe_float(v: float | int | None) -> float | None:
+            if v is None:
+                return None
+            f = float(v)
+            return None if math.isnan(f) else f
+
        if pc is not None and not math.isnan(pc):
            return (
                float(pc),
                int(n) if n is not None else None,
-                float(reward) if reward is not None else None,
-                float(eval_s) if eval_s is not None else None,
+                _safe_float(reward),
+                _safe_float(eval_s),
            )

    return None, None, None, None