From 7b64e5498d7b404af1e1275da672f0aaa6b7809b Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Tue, 19 May 2026 14:17:52 +0200
Subject: [PATCH] revert(annotate): move memory + speech prompts to base PR
 (#3471)

The first-person memory narrative, task-rephrasing and initial-speech
prompt tweaks belong in the annotation pipeline itself. Applied to
feat/language-annotation-pipeline (#3471); reverting them here to the
merge-base so they drop out of this PR's diff. general_vqa.py keeps its
docstring fix since it references a recipe this PR introduces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../prompts/module_1_memory.txt               | 28 ++++++-------------
 .../prompts/module_1_task_rephrasings.txt     |  4 +--
 .../prompts/module_2_initial_speech.txt       |  4 +--
 3 files changed, 12 insertions(+), 24 deletions(-)
diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt
index f62fa3582..6a89ecefa 100644
--- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt
+++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt
@@ -1,35 +1,25 @@
 You are updating the robot's compressed semantic memory at the boundary of
 a completed subtask.
 
-Reference (MEM, Torne 2026):
+Reference (verbatim from MEM, Torne 2026):
 "Remove or compress information in the language memory whenever
 appropriate. Keep ONLY the minimal set of relevant information for future
 task execution. Specific object attributes (colors, precise quantities of
 each item) get discarded when their details won't affect subsequent
 actions. Functional outcomes (where items went, how many) are preserved."
 
+Concrete example from MEM:
+  Before: "I put a light green bowl, a dark blue bowl and a bright yellow
+           bowl into the top right cabinet"
+  After:  "I placed three bowls in the top right cabinet"
+
 Episode task: "{episode_task}"
 Previous memory: {prior_memory}
 Just-completed subtask: "{completed_subtask}"
 Remaining subtasks (for relevance judgement only): {remaining_subtasks}
 
-Write the **shortest possible** state note that future subtasks could
-need. Telegraphic style.
-
-**Hard caps**
-- ≤ 10 words total.
-- No articles. No verbs in past tense ("placed", "moved"). Use
-  comma-separated noun→location fragments.
-- Drop colors/sizes/counts unless a later subtask depends on them.
-- If nothing material changed for downstream subtasks, emit "" (empty
-  string).
-
-Examples
-- Good: "bowl in box, lid open"
-- Good: "3 bowls in cabinet"
-- Good: "cup on tray, drawer closed"
-- Bad:  "The bowl is now in the box and the lid is still open."
-- Bad:  "I placed the green bowl carefully into the cardboard box."
+Update the memory. Drop irrelevant detail. Compress completed steps.
+Keep WHAT happened, drop HOW. Shorter is better.
 
 Output strictly valid JSON:
-  {{ "memory": "<≤10-word telegraphic state, or empty>" }}
+  {{ "memory": "<one or two short sentences>" }}
diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt
index 602892bd3..d03a6bf8b 100644
--- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt
+++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt
@@ -9,7 +9,7 @@ Original task:
 Generate exactly {n} alternative phrasings of the same task. Vary:
 
 - formality (casual / polite / curt)
-- verbosity (mostly short imperative; occasional polite request)
+- verbosity (short imperative vs longer polite request)
 - word choice (synonyms, different verbs)
 - sentence structure (imperative / question / suggestion)
 
@@ -17,7 +17,7 @@ Hard rules:
 - Each phrasing MUST preserve the exact meaning of the original task.
   Do not change which object is involved, the destination, or the
   action. Do not add extra steps. Do not invent new objects.
-- Each phrasing must be a short phrase or sentence, plain prose, no
+- Each phrasing must be a single short sentence, plain prose, no
   markdown, no quotes, no list numbers.
 - Phrasings must be distinct — no near-duplicates.
 - Output exactly {n} entries.
diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt
index 625ce920c..6058b1f5c 100644
--- a/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt
+++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt
@@ -1,12 +1,10 @@
 The user just asked the robot: "{episode_task}".
 
 Generate a short verbal acknowledgement the robot would speak back before
-beginning the task. Style: compact, confident, friendly.
+beginning the task. Style: confident, friendly, single short sentence.
 
 Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.",
 "OK, starting with the sponge.", "Got it.".
 
-Prefer very short replies: "Got it.", "On it.", "OK."
-
 Output strictly valid JSON:
   {{ "text": "<the spoken acknowledgement>" }}