feat(annotations): enforce imperative verb-first subtask phrasing

Rewrite module_1_subtasks prompt to produce short imperative commands
("pick up the orange") instead of third-person narration ("the robot
arm moves to the orange"). Drops the verbose "how, not what" rule and
adds a good/bad few-shot table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-19 13:52:54 +02:00
parent f72b28738a
commit 26013da699
@@ -6,15 +6,18 @@ You are shown the entire demonstration as a single video. Watch the
whole clip, then segment it into a list of consecutive atomic subtasks whole clip, then segment it into a list of consecutive atomic subtasks
the robot performs. the robot performs.
Authoring rules — based on Hi Robot (Shi 2025) atom granularity and Authoring rules — based on Hi Robot (Shi 2025) atom granularity:
Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
- Each subtask is one atomic skill the low-level policy can execute, - Each subtask is one atomic skill the low-level policy can execute,
e.g. "pick up one piece of lettuce", "place the bowl into the box", e.g. "pick up the orange", "place the bowl into the box".
"move the right arm to the left". - Write each subtask as an IMPERATIVE COMMAND to the robot, starting
- Capture HOW the subtask is performed, not only WHAT — e.g. prefer with a verb: move, reach, pick up, grasp, place, put, push, pull,
"grasp the handle of the sponge with the left hand" to "pick up the open, close, turn, press, lift, insert, pour...
sponge". - NEVER use third person. Never write "the robot", "the arm", "the
gripper moves", "it picks up". Command the robot, do not describe it.
- Keep it SHORT — 3 to 8 words. Add a "how" detail (which hand, which
grasp point) ONLY when it is needed to disambiguate.
- Lower-case, no trailing period.
- Subtasks are non-overlapping and cover the full episode in order. - Subtasks are non-overlapping and cover the full episode in order.
Choose the cut points yourself based on what you see in the video Choose the cut points yourself based on what you see in the video
(gripper open/close events, contact, regrasps, transitions). (gripper open/close events, contact, regrasps, transitions).
@@ -23,11 +26,22 @@ Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
- Every subtask's [start_time, end_time] must lie within - Every subtask's [start_time, end_time] must lie within
[0.0, {episode_duration}] seconds. [0.0, {episode_duration}] seconds.
Style examples:
Good Bad (do NOT produce these)
"pick up the orange" "the robot arm moves to the orange"
"move to the yellow block" "the gripper approaches the block"
"close gripper to grasp "close the gripper to grasp the
the yellow cube" yellow cube so it can lift it"
"open the toaster oven" "it opens the toaster oven door"
"put the bagel on the "the white plate now has the bagel
white plate" placed on it by the arm"
Output strictly valid JSON of shape: Output strictly valid JSON of shape:
{{ {{
"subtasks": [ "subtasks": [
{{"text": "<how-not-what>", "start": <float>, "end": <float>}}, {{"text": "<short imperative command>", "start": <float>, "end": <float>}},
... ...
] ]
}} }}