mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-21 19:49:49 +00:00
feat(annotations): enforce imperative verb-first subtask phrasing
Rewrite module_1_subtasks prompt to produce short imperative commands
("pick up the orange") instead of third-person narration ("the robot
arm moves to the orange"). Drops the verbose "how, not what" rule and
adds a good/bad few-shot table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -6,15 +6,18 @@ You are shown the entire demonstration as a single video. Watch the
|
||||
whole clip, then segment it into a list of consecutive atomic subtasks
|
||||
the robot performs.
|
||||
|
||||
Authoring rules — based on Hi Robot (Shi 2025) atom granularity and
|
||||
Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
|
||||
Authoring rules — based on Hi Robot (Shi 2025) atom granularity:
|
||||
|
||||
- Each subtask is one atomic skill the low-level policy can execute,
|
||||
e.g. "pick up one piece of lettuce", "place the bowl into the box",
|
||||
"move the right arm to the left".
|
||||
- Capture HOW the subtask is performed, not only WHAT — e.g. prefer
|
||||
"grasp the handle of the sponge with the left hand" to "pick up the
|
||||
sponge".
|
||||
e.g. "pick up the orange", "place the bowl into the box".
|
||||
- Write each subtask as an IMPERATIVE COMMAND to the robot, starting
|
||||
with a verb: move, reach, pick up, grasp, place, put, push, pull,
|
||||
open, close, turn, press, lift, insert, pour...
|
||||
- NEVER use third person. Never write "the robot", "the arm", "the
|
||||
gripper moves", "it picks up". Command the robot, do not describe it.
|
||||
- Keep it SHORT — 3 to 8 words. Add a "how" detail (which hand, which
|
||||
grasp point) ONLY when it is needed to disambiguate.
|
||||
- Lower-case, no trailing period.
|
||||
- Subtasks are non-overlapping and cover the full episode in order.
|
||||
Choose the cut points yourself based on what you see in the video
|
||||
(gripper open/close events, contact, regrasps, transitions).
|
||||
@@ -23,11 +26,22 @@ Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
|
||||
- Every subtask's [start_time, end_time] must lie within
|
||||
[0.0, {episode_duration}] seconds.
|
||||
|
||||
Style examples:
|
||||
|
||||
Good Bad (do NOT produce these)
|
||||
"pick up the orange" "the robot arm moves to the orange"
|
||||
"move to the yellow block" "the gripper approaches the block"
|
||||
"close gripper to grasp "close the gripper to grasp the
|
||||
the yellow cube" yellow cube so it can lift it"
|
||||
"open the toaster oven" "it opens the toaster oven door"
|
||||
"put the bagel on the "the white plate now has the bagel
|
||||
white plate" placed on it by the arm"
|
||||
|
||||
Output strictly valid JSON of shape:
|
||||
|
||||
{{
|
||||
"subtasks": [
|
||||
{{"text": "<how-not-what>", "start": <float>, "end": <float>}},
|
||||
{{"text": "<short imperative command>", "start": <float>, "end": <float>}},
|
||||
...
|
||||
]
|
||||
}}
|
||||
|
||||
Reference in New Issue
Block a user