mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-23 04:30:10 +00:00
feat(annotate): compact steerable annotation prompts
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -21,9 +21,7 @@ from huggingface_hub import get_token, run_job
|
|||||||
|
|
||||||
token = os.environ.get("HF_TOKEN") or get_token()
|
token = os.environ.get("HF_TOKEN") or get_token()
|
||||||
if not token:
|
if not token:
|
||||||
raise RuntimeError(
|
raise RuntimeError("No HF token. Run `huggingface-cli login` or `export HF_TOKEN=hf_...`")
|
||||||
"No HF token. Run `huggingface-cli login` or `export HF_TOKEN=hf_...`"
|
|
||||||
)
|
|
||||||
|
|
||||||
CMD = (
|
CMD = (
|
||||||
"apt-get update -qq && apt-get install -y -qq git ffmpeg && "
|
"apt-get update -qq && apt-get install -y -qq git ffmpeg && "
|
||||||
@@ -46,13 +44,16 @@ CMD = (
|
|||||||
"--vlm.client_concurrency=256 "
|
"--vlm.client_concurrency=256 "
|
||||||
"--vlm.max_new_tokens=512 "
|
"--vlm.max_new_tokens=512 "
|
||||||
"--executor.episode_parallelism=32 "
|
"--executor.episode_parallelism=32 "
|
||||||
"--vlm.chat_template_kwargs='{enable_thinking: false}' "
|
"--vlm.chat_template_kwargs='{\"enable_thinking\": false}' "
|
||||||
"--vlm.camera_key=observation.images.wrist "
|
"--vlm.camera_key=observation.images.wrist "
|
||||||
"--module_1.frames_per_second=1.0 "
|
"--module_1.frames_per_second=1.0 "
|
||||||
"--module_1.use_video_url=true "
|
"--module_1.use_video_url=true "
|
||||||
"--module_1.use_video_url_fps=1.0 "
|
"--module_1.use_video_url_fps=1.0 "
|
||||||
"--module_3.K=1 --module_3.vqa_emission_hz=0.2 "
|
"--module_1.derive_task_from_video=always "
|
||||||
"--push_to_hub=pepijn223/super_poulain_qwen36moe-3"
|
"--module_1.n_task_rephrasings=10 "
|
||||||
|
"--module_3.K=1 "
|
||||||
|
"--module_3.vqa_emission_hz=1.0 "
|
||||||
|
"--push_to_hub=pepijn223/super_poulain_full_tool2"
|
||||||
)
|
)
|
||||||
|
|
||||||
job = run_job(
|
job = run_job(
|
||||||
|
|||||||
@@ -18,8 +18,14 @@ Previous memory: {prior_memory}
|
|||||||
Just-completed subtask: "{completed_subtask}"
|
Just-completed subtask: "{completed_subtask}"
|
||||||
Remaining subtasks (for relevance judgement only): {remaining_subtasks}
|
Remaining subtasks (for relevance judgement only): {remaining_subtasks}
|
||||||
|
|
||||||
Update the memory. Drop irrelevant detail. Compress completed steps.
|
Update the memory as a compact state note.
|
||||||
Keep WHAT happened, drop HOW. Shorter is better.
|
|
||||||
|
Rules:
|
||||||
|
- Keep only facts needed later.
|
||||||
|
- Keep WHAT changed; drop HOW it was done.
|
||||||
|
- Use fragments when clear.
|
||||||
|
- Prefer: "bowl in box; lid still open"
|
||||||
|
- Avoid: "The robot placed the bowl into the box and the lid remains open."
|
||||||
|
|
||||||
Output strictly valid JSON:
|
Output strictly valid JSON:
|
||||||
{{ "memory": "<one or two short sentences>" }}
|
{{ "memory": "<brief state note>" }}
|
||||||
|
|||||||
@@ -1,18 +1,18 @@
|
|||||||
You are the high-level planner for a robot demonstrating: "{episode_task}".
|
You are the high-level planner for a robot demonstrating: "{episode_task}".
|
||||||
|
|
||||||
Given the subtask decomposition below, write a concise hierarchical PLAN
|
Given the subtask decomposition below, write a compact hierarchical PLAN.
|
||||||
the robot should follow. Format the plan as a numbered list, one line per
|
Use short imperative fragments, like pi0.7 context prompts.
|
||||||
high-level step. The plan describes the full task; subtasks are the atomic
|
|
||||||
skills used to execute it.
|
|
||||||
|
|
||||||
Subtasks for context:
|
Subtasks for context:
|
||||||
{subtasks_text}
|
{subtasks_text}
|
||||||
|
|
||||||
Authoring rules:
|
Authoring rules:
|
||||||
- 3 to {plan_max_steps} steps.
|
- 3 to {plan_max_steps} steps.
|
||||||
- Each step describes one logical chunk of the task, not one motion.
|
- Each step is one logical chunk, not one motion.
|
||||||
- Steps must be in execution order.
|
- Steps must be in execution order.
|
||||||
- Plain prose, no JSON, no markdown headers.
|
- Brief commands, not full sentences.
|
||||||
|
- Prefer: "open air fryer"; avoid: "The robot should open the air fryer."
|
||||||
|
- Plain text, no markdown headers.
|
||||||
|
|
||||||
Output strictly valid JSON:
|
Output strictly valid JSON:
|
||||||
{{ "plan": "1. ...\n2. ...\n3. ..." }}
|
{{ "plan": "1. ...\n2. ...\n3. ..." }}
|
||||||
|
|||||||
@@ -4,17 +4,18 @@ The user originally asked: "{episode_task}"
|
|||||||
|
|
||||||
You are shown the entire demonstration as a single video. Watch the
|
You are shown the entire demonstration as a single video. Watch the
|
||||||
whole clip, then segment it into a list of consecutive atomic subtasks
|
whole clip, then segment it into a list of consecutive atomic subtasks
|
||||||
the robot performs.
|
the robot performs. Write compact action labels, not prose.
|
||||||
|
|
||||||
Authoring rules — based on Hi Robot (Shi 2025) atom granularity and
|
Authoring rules — based on Hi Robot (Shi 2025) atom granularity and
|
||||||
Pi0.7 (Physical Intelligence 2025) "how, not what" detail:
|
pi0.7 (Physical Intelligence 2025) compact context prompts:
|
||||||
|
|
||||||
- Each subtask is one atomic skill the low-level policy can execute,
|
- Each subtask is one atomic skill the low-level policy can execute,
|
||||||
e.g. "pick up one piece of lettuce", "place the bowl into the box",
|
e.g. "pick up one piece of lettuce", "place the bowl into the box",
|
||||||
"move the right arm to the left".
|
"move the right arm to the left".
|
||||||
- Capture HOW the subtask is performed, not only WHAT — e.g. prefer
|
- Capture HOW when useful, but keep it brief — e.g. prefer
|
||||||
"grasp the handle of the sponge with the left hand" to "pick up the
|
"grasp the handle of the sponge with the left hand" to "pick up the
|
||||||
sponge".
|
sponge".
|
||||||
|
- Use verb phrases, not full sentences.
|
||||||
- Subtasks are non-overlapping and cover the full episode in order.
|
- Subtasks are non-overlapping and cover the full episode in order.
|
||||||
Choose the cut points yourself based on what you see in the video
|
Choose the cut points yourself based on what you see in the video
|
||||||
(gripper open/close events, contact, regrasps, transitions).
|
(gripper open/close events, contact, regrasps, transitions).
|
||||||
|
|||||||
@@ -9,7 +9,7 @@ Original task:
|
|||||||
Generate exactly {n} alternative phrasings of the same task. Vary:
|
Generate exactly {n} alternative phrasings of the same task. Vary:
|
||||||
|
|
||||||
- formality (casual / polite / curt)
|
- formality (casual / polite / curt)
|
||||||
- verbosity (short imperative vs longer polite request)
|
- verbosity (mostly short imperative; occasional polite request)
|
||||||
- word choice (synonyms, different verbs)
|
- word choice (synonyms, different verbs)
|
||||||
- sentence structure (imperative / question / suggestion)
|
- sentence structure (imperative / question / suggestion)
|
||||||
|
|
||||||
@@ -17,7 +17,7 @@ Hard rules:
|
|||||||
- Each phrasing MUST preserve the exact meaning of the original task.
|
- Each phrasing MUST preserve the exact meaning of the original task.
|
||||||
Do not change which object is involved, the destination, or the
|
Do not change which object is involved, the destination, or the
|
||||||
action. Do not add extra steps. Do not invent new objects.
|
action. Do not add extra steps. Do not invent new objects.
|
||||||
- Each phrasing must be a single short sentence, plain prose, no
|
- Each phrasing must be a short phrase or sentence, plain prose, no
|
||||||
markdown, no quotes, no list numbers.
|
markdown, no quotes, no list numbers.
|
||||||
- Phrasings must be distinct — no near-duplicates.
|
- Phrasings must be distinct — no near-duplicates.
|
||||||
- Output exactly {n} entries.
|
- Output exactly {n} entries.
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
The user just asked the robot: "{episode_task}".
|
The user just asked the robot: "{episode_task}".
|
||||||
|
|
||||||
Generate a short verbal acknowledgement the robot would speak back before
|
Generate a short verbal acknowledgement the robot would speak back before
|
||||||
beginning the task. Style: confident, friendly, single short sentence.
|
beginning the task. Style: compact, confident, friendly.
|
||||||
|
|
||||||
Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.",
|
Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.",
|
||||||
"OK, starting with the sponge.", "Got it.".
|
"OK, starting with the sponge.", "Got it.".
|
||||||
|
|
||||||
|
Prefer very short replies: "Got it.", "On it.", "OK."
|
||||||
|
|
||||||
Output strictly valid JSON:
|
Output strictly valid JSON:
|
||||||
{{ "text": "<the spoken acknowledgement>" }}
|
{{ "text": "<the spoken acknowledgement>" }}
|
||||||
|
|||||||
@@ -14,12 +14,10 @@ subtask boundary in the demonstration:
|
|||||||
- Subtask the robot is about to start: "{next_subtask}"
|
- Subtask the robot is about to start: "{next_subtask}"
|
||||||
- Time into episode: {timestamp:.2f}s
|
- Time into episode: {timestamp:.2f}s
|
||||||
|
|
||||||
Write ONE interjection the user would naturally say at this moment to
|
Write ONE compact interjection the user would naturally say at this
|
||||||
prompt / confirm / encourage the robot to do "{next_subtask}". Phrase it
|
moment to prompt / confirm / encourage the robot to do "{next_subtask}".
|
||||||
like a real human mid-task remark — conversational, varied, sometimes
|
Keep it like a mid-task coaching cue, not a full instruction paragraph.
|
||||||
just a nudge, sometimes a clarification, sometimes a small constraint
|
Also write the robot's compact verbal acknowledgement.
|
||||||
that the upcoming motion happens to satisfy. Plus the robot's verbal
|
|
||||||
acknowledgement.
|
|
||||||
|
|
||||||
Hard rules:
|
Hard rules:
|
||||||
|
|
||||||
@@ -29,7 +27,9 @@ Hard rules:
|
|||||||
instead", DO NOT — those would contradict the demonstration.
|
instead", DO NOT — those would contradict the demonstration.
|
||||||
- The interjection must reference an object, location, or action that
|
- The interjection must reference an object, location, or action that
|
||||||
is plausible given the visible scene and the next subtask text.
|
is plausible given the visible scene and the next subtask text.
|
||||||
- One sentence each. Conversational, not robotic.
|
- One short phrase or sentence each. Conversational, not robotic.
|
||||||
|
- Prefer direct cues: "{next_subtask}, please."; "Now {next_subtask}."
|
||||||
|
- Keep robot speech very short: "OK.", "On it.", "Doing that."
|
||||||
|
|
||||||
Style examples (vary the phrasing — don't reuse these verbatim):
|
Style examples (vary the phrasing — don't reuse these verbatim):
|
||||||
- "Now go ahead and {next_subtask}."
|
- "Now go ahead and {next_subtask}."
|
||||||
@@ -41,6 +41,6 @@ Style examples (vary the phrasing — don't reuse these verbatim):
|
|||||||
|
|
||||||
Output strictly valid JSON:
|
Output strictly valid JSON:
|
||||||
{{
|
{{
|
||||||
"interjection": "<single sentence the user says, asking for the next subtask>",
|
"interjection": "<short cue from the user, asking for the next subtask>",
|
||||||
"speech": "<single sentence the robot speaks back, confirming and starting>"
|
"speech": "<short robot acknowledgement>"
|
||||||
}}
|
}}
|
||||||
|
|||||||
Reference in New Issue
Block a user