feat(ci): extract task descriptions and embed in metrics artifact

- Add scripts/ci/extract_task_descriptions.py: runs inside the benchmark Docker container (LIBERO/MetaWorld installed) after lerobot-eval and writes task_descriptions.json mapping task keys to NL instructions. LIBERO: uses libero.libero.benchmark to get suite.get_task(i).language. MetaWorld: formats task name as human-readable label. - Call extraction at the end of each eval bash-c (|| true so never fatal). - parse_eval_metrics.py reads task_descriptions.json and includes it in metrics.json so the health dashboard Space can label videos by task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-07-13 21:11:59 +00:00 · 2026-04-09 12:50:04 +02:00
parent d39a6211b7
commit 192a53d41e
2 changed files with 101 additions and 0 deletions
@@ -95,6 +95,17 @@ def main() -> int:
            file=sys.stderr,
        )

+    task_descriptions: dict[str, str] = {}
+    task_desc_path = artifacts_dir / "task_descriptions.json"
+    if task_desc_path.exists():
+        try:
+            task_descriptions = json.loads(task_desc_path.read_text())
+        except json.JSONDecodeError as exc:
+            print(
+                f"[parse_eval_metrics] Warning: could not parse task_descriptions.json: {exc}",
+                file=sys.stderr,
+            )
+
    metrics = {
        "env": args.env,
        "task": args.task,
@@ -103,6 +114,7 @@ def main() -> int:
        "n_episodes": n_episodes,
        "avg_sum_reward": avg_sum_reward,
        "eval_s": eval_s,
+        "task_descriptions": task_descriptions,
    }

    out_path = artifacts_dir / "metrics.json"