annotate: telegraphic subtasks — ≤4 words, verb+object, consistent nouns

Tighten the subtask prompt further per real-data feedback. The old
≤5-word cap still produced things like "release the yellow block
into the green bin" (8 words, articles, destination, and "block"
where the task said "cube").

New rules:
* Hard cap ≤ 4 words, ideally 2-3. Form: VERB + (color) + OBJECT.
* No articles, no destinations, no adverbs, no "robot/arm/gripper".
* Must reuse the exact object nouns from the task — no block/cube,
  bin/box/container drift across the episode.
* Concrete good/bad examples anchored on the cube task.

Shorter, templated, consistent targets are far more robust for the
autoregressive LM head — fewer tokens to drift on, fewer dominant
n-grams to repetition-collapse into.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-15 14:14:42 +02:00
parent f1a0a663cc
commit e727688052
@@ -4,17 +4,24 @@ The user originally asked: "{episode_task}"
You are shown the entire demonstration as a single video. Watch the
whole clip, then segment it into a list of consecutive atomic subtasks
the robot performs. Write **ultra-compact** action labels.
the robot performs. Write **telegraphic** action labels.
Authoring rules — Hi Robot atom granularity, pi0.7-style short prompts:
- Each subtask = one atomic skill the low-level policy can execute.
- **Hard length cap: ≤ 5 words per subtask.** Drop articles, modifiers,
adverbs. Verb + object (+ optional short qualifier) only.
- Prefer: "pick lettuce", "place bowl in box", "open drawer",
"grasp sponge left hand".
- Avoid: "pick up one piece of lettuce", "carefully place the bowl",
"the robot moves its arm to the left".
- **Hard length cap: ≤ 4 words.** Ideally 2-3. Form: VERB + (color) +
OBJECT. No articles ("the", "a"), no destinations, no adverbs, no
"robot"/"arm"/"gripper" — those are implied.
- **Use the exact object nouns from the task above.** If the task says
"cube", every subtask says "cube" — never switch to "block". If it
says "box", never switch to "bin"/"container". Consistent vocabulary
across the whole episode.
- Good: "move to blue cube", "grasp blue cube", "lift blue cube",
"place blue cube", "open drawer", "release yellow cube".
- Bad: "release the yellow block into the green bin" (articles,
destination, "block" instead of "cube"), "the robot arm moves
towards the blue cube" ("the robot arm", too long), "carefully
pick up the cube" (adverb, article).
- Subtasks are non-overlapping and cover the full episode in order.
Choose the cut points yourself based on what you see in the video
(gripper open/close events, contact, regrasps, transitions).
@@ -27,7 +34,7 @@ Output strictly valid JSON of shape:
{{
"subtasks": [
{{"text": "<≤5-word verb phrase>", "start": <float>, "end": <float>}},
{{"text": "<≤4-word verb phrase>", "start": <float>, "end": <float>}},
...
]
}}