feat(robotwin): eval 5 diverse tasks per CI run with NL descriptions

Widen the smoke eval from a single task (beat_block_hammer) to five:
click_bell, handover_block, open_laptop, stack_blocks_two on top of the
original. Each gets its own rollout video in videos/<task>_0/ so the
dashboard can surface visually distinct behaviours.

extract_task_descriptions.py now has a RoboTwin branch that reads
`description/task_instruction/<task>.json` (already shipped in the clone
at /opt/robotwin) and pulls the `full_description` field. CI cds into
the clone before invoking the script so the relative path resolves.

parse_eval_metrics.py is invoked with the same 5-task list so the
metrics.json embeds one entry per task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-04-14 21:03:15 +02:00
parent 793f52e360
commit e67ceb213d
2 changed files with 26 additions and 2 deletions
+6 -2
View File
@@ -361,13 +361,17 @@ jobs:
cd /opt/robotwin && lerobot-eval \
--policy.path=pepijn223/smolvla_robotwin \
--env.type=robotwin \
--env.task=beat_block_hammer \
--env.task=beat_block_hammer,click_bell,handover_block,open_laptop,stack_blocks_two \
--eval.batch_size=1 \
--eval.n_episodes=1 \
--eval.use_async_envs=false \
--policy.device=cuda \
'--rename_map={\"observation.images.head_camera\": \"observation.images.camera1\", \"observation.images.left_camera\": \"observation.images.camera2\", \"observation.images.right_camera\": \"observation.images.camera3\"}' \
--output_dir=/tmp/eval-artifacts
python /lerobot/scripts/ci/extract_task_descriptions.py \
--env robotwin \
--task beat_block_hammer,click_bell,handover_block,open_laptop,stack_blocks_two \
--output /tmp/eval-artifacts/task_descriptions.json
"
- name: Copy RoboTwin artifacts from container
@@ -383,7 +387,7 @@ jobs:
python3 scripts/ci/parse_eval_metrics.py \
--artifacts-dir /tmp/robotwin-artifacts \
--env robotwin \
--task beat_block_hammer \
--task beat_block_hammer,click_bell,handover_block,open_laptop,stack_blocks_two \
--policy pepijn223/smolvla_robotwin
- name: Upload RoboTwin rollout video