From e7276880524d46c9b6e915774c521112fcdfe847 Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Fri, 15 May 2026 14:14:42 +0200
Subject: [PATCH] =?UTF-8?q?annotate:=20telegraphic=20subtasks=20=E2=80=94?=
 =?UTF-8?q?=20=E2=89=A44=20words,=20verb+object,=20consistent=20nouns?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Tighten the subtask prompt further per real-data feedback. The old
≤5-word cap still produced things like "release the yellow block
into the green bin" (8 words, articles, destination, and "block"
where the task said "cube").

New rules:
* Hard cap ≤ 4 words, ideally 2-3. Form: VERB + (color) + OBJECT.
* No articles, no destinations, no adverbs, no "robot/arm/gripper".
* Must reuse the exact object nouns from the task — no block/cube,
  bin/box/container drift across the episode.
* Concrete good/bad examples anchored on the cube task.

Shorter, templated, consistent targets are far more robust for the
autoregressive LM head — fewer tokens to drift on, fewer dominant
n-grams to repetition-collapse into.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../prompts/module_1_subtasks.txt             | 23 ++++++++++++-------
 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt
index 10044fc0b..56d14d42f 100644
--- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt
+++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_subtasks.txt
@@ -4,17 +4,24 @@ The user originally asked: "{episode_task}"
 
 You are shown the entire demonstration as a single video. Watch the
 whole clip, then segment it into a list of consecutive atomic subtasks
-the robot performs. Write **ultra-compact** action labels.
+the robot performs. Write **telegraphic** action labels.
 
 Authoring rules — Hi Robot atom granularity, pi0.7-style short prompts:
 
 - Each subtask = one atomic skill the low-level policy can execute.
-- **Hard length cap: ≤ 5 words per subtask.** Drop articles, modifiers,
-  adverbs. Verb + object (+ optional short qualifier) only.
-- Prefer: "pick lettuce", "place bowl in box", "open drawer",
-  "grasp sponge left hand".
-- Avoid: "pick up one piece of lettuce", "carefully place the bowl",
-  "the robot moves its arm to the left".
+- **Hard length cap: ≤ 4 words.** Ideally 2-3. Form: VERB + (color) +
+  OBJECT. No articles ("the", "a"), no destinations, no adverbs, no
+  "robot"/"arm"/"gripper" — those are implied.
+- **Use the exact object nouns from the task above.** If the task says
+  "cube", every subtask says "cube" — never switch to "block". If it
+  says "box", never switch to "bin"/"container". Consistent vocabulary
+  across the whole episode.
+- Good: "move to blue cube", "grasp blue cube", "lift blue cube",
+  "place blue cube", "open drawer", "release yellow cube".
+- Bad: "release the yellow block into the green bin" (articles,
+  destination, "block" instead of "cube"), "the robot arm moves
+  towards the blue cube" ("the robot arm", too long), "carefully
+  pick up the cube" (adverb, article).
 - Subtasks are non-overlapping and cover the full episode in order.
   Choose the cut points yourself based on what you see in the video
   (gripper open/close events, contact, regrasps, transitions).
@@ -27,7 +34,7 @@ Output strictly valid JSON of shape:
 
   {{
     "subtasks": [
-      {{"text": "<≤5-word verb phrase>", "start": <float>, "end": <float>}},
+      {{"text": "<≤4-word verb phrase>", "start": <float>, "end": <float>}},
       ...
     ]
   }}