feat(annotations): enforce imperative verb-first subtask phrasing

Rewrite module_1_subtasks prompt to produce short imperative commands
("pick up the orange") instead of third-person narration ("the robot
arm moves to the orange"). Drops the verbose "how, not what" rule and
adds a good/bad few-shot table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-19 13:52:54 +02:00
parent f72b28738a
commit 26013da699
@@ -6,15 +6,18 @@ You are shown the entire demonstration as a single video. Watch the
whole clip, then segment it into a list of consecutive atomic subtasks
the robot performs.
Authoring rules — based on Hi Robot (Shi 2025) atom granularity and
Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
Authoring rules — based on Hi Robot (Shi 2025) atom granularity:
- Each subtask is one atomic skill the low-level policy can execute,
e.g. "pick up one piece of lettuce", "place the bowl into the box",
"move the right arm to the left".
- Capture HOW the subtask is performed, not only WHAT — e.g. prefer
"grasp the handle of the sponge with the left hand" to "pick up the
sponge".
e.g. "pick up the orange", "place the bowl into the box".
- Write each subtask as an IMPERATIVE COMMAND to the robot, starting
with a verb: move, reach, pick up, grasp, place, put, push, pull,
open, close, turn, press, lift, insert, pour...
- NEVER use third person. Never write "the robot", "the arm", "the
gripper moves", "it picks up". Command the robot, do not describe it.
- Keep it SHORT — 3 to 8 words. Add a "how" detail (which hand, which
grasp point) ONLY when it is needed to disambiguate.
- Lower-case, no trailing period.
- Subtasks are non-overlapping and cover the full episode in order.
Choose the cut points yourself based on what you see in the video
(gripper open/close events, contact, regrasps, transitions).
@@ -23,11 +26,22 @@ Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
- Every subtask's [start_time, end_time] must lie within
[0.0, {episode_duration}] seconds.
Style examples:
Good Bad (do NOT produce these)
"pick up the orange" "the robot arm moves to the orange"
"move to the yellow block" "the gripper approaches the block"
"close gripper to grasp "close the gripper to grasp the
the yellow cube" yellow cube so it can lift it"
"open the toaster oven" "it opens the toaster oven door"
"put the bagel on the "the white plate now has the bagel
white plate" placed on it by the arm"
Output strictly valid JSON of shape:
{{
"subtasks": [
{{"text": "<how-not-what>", "start": <float>, "end": <float>}},
{{"text": "<short imperative command>", "start": <float>, "end": <float>}},
...
]
}}