diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt index 5d7c9cc8d..cd1303cbe 100644 --- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt +++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt @@ -6,15 +6,18 @@ You are shown the entire demonstration as a single video. Watch the whole clip, then segment it into a list of consecutive atomic subtasks the robot performs. -Authoring rules — based on Hi Robot (Shi 2025) atom granularity and -Pi0.7 (Physical Intelligence 2025) "how, not what" detail: +Authoring rules — based on Hi Robot (Shi 2025) atom granularity: - Each subtask is one atomic skill the low-level policy can execute, - e.g. "pick up one piece of lettuce", "place the bowl into the box", - "move the right arm to the left". -- Capture HOW the subtask is performed, not only WHAT — e.g. prefer - "grasp the handle of the sponge with the left hand" to "pick up the - sponge". + e.g. "pick up the orange", "place the bowl into the box". +- Write each subtask as an IMPERATIVE COMMAND to the robot, starting + with a verb: move, reach, pick up, grasp, place, put, push, pull, + open, close, turn, press, lift, insert, pour... +- NEVER use third person. Never write "the robot", "the arm", "the + gripper moves", "it picks up". Command the robot, do not describe it. +- Keep it SHORT — 3 to 8 words. Add a "how" detail (which hand, which + grasp point) ONLY when it is needed to disambiguate. +- Lower-case, no trailing period. - Subtasks are non-overlapping and cover the full episode in order. Choose the cut points yourself based on what you see in the video (gripper open/close events, contact, regrasps, transitions). @@ -23,11 +26,22 @@ Pi0.7 (Physical Intelligence 2025) "how, not what" detail: - Every subtask's [start_time, end_time] must lie within [0.0, {episode_duration}] seconds. +Style examples: + + Good Bad (do NOT produce these) + "pick up the orange" "the robot arm moves to the orange" + "move to the yellow block" "the gripper approaches the block" + "close gripper to grasp "close the gripper to grasp the + the yellow cube" yellow cube so it can lift it" + "open the toaster oven" "it opens the toaster oven door" + "put the bagel on the "the white plate now has the bagel + white plate" placed on it by the arm" + Output strictly valid JSON of shape: {{ "subtasks": [ - {{"text": "", "start": , "end": }}, + {{"text": "", "start": , "end": }}, ... ] }}