From 7b64e5498d7b404af1e1275da672f0aaa6b7809b Mon Sep 17 00:00:00 2001 From: Pepijn Date: Tue, 19 May 2026 14:17:52 +0200 Subject: [PATCH] revert(annotate): move memory + speech prompts to base PR (#3471) The first-person memory narrative, task-rephrasing and initial-speech prompt tweaks belong in the annotation pipeline itself. Applied to feat/language-annotation-pipeline (#3471); reverting them here to the merge-base so they drop out of this PR's diff. general_vqa.py keeps its docstring fix since it references a recipe this PR introduces. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../prompts/module_1_memory.txt | 28 ++++++------------- .../prompts/module_1_task_rephrasings.txt | 4 +-- .../prompts/module_2_initial_speech.txt | 4 +-- 3 files changed, 12 insertions(+), 24 deletions(-) diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt index f62fa3582..6a89ecefa 100644 --- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt +++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_memory.txt @@ -1,35 +1,25 @@ You are updating the robot's compressed semantic memory at the boundary of a completed subtask. -Reference (MEM, Torne 2026): +Reference (verbatim from MEM, Torne 2026): "Remove or compress information in the language memory whenever appropriate. Keep ONLY the minimal set of relevant information for future task execution. Specific object attributes (colors, precise quantities of each item) get discarded when their details won't affect subsequent actions. Functional outcomes (where items went, how many) are preserved." +Concrete example from MEM: + Before: "I put a light green bowl, a dark blue bowl and a bright yellow + bowl into the top right cabinet" + After: "I placed three bowls in the top right cabinet" + Episode task: "{episode_task}" Previous memory: {prior_memory} Just-completed subtask: "{completed_subtask}" Remaining subtasks (for relevance judgement only): {remaining_subtasks} -Write the **shortest possible** state note that future subtasks could -need. Telegraphic style. - -**Hard caps** -- ≤ 10 words total. -- No articles. No verbs in past tense ("placed", "moved"). Use - comma-separated noun→location fragments. -- Drop colors/sizes/counts unless a later subtask depends on them. -- If nothing material changed for downstream subtasks, emit "" (empty - string). - -Examples -- Good: "bowl in box, lid open" -- Good: "3 bowls in cabinet" -- Good: "cup on tray, drawer closed" -- Bad: "The bowl is now in the box and the lid is still open." -- Bad: "I placed the green bowl carefully into the cardboard box." +Update the memory. Drop irrelevant detail. Compress completed steps. +Keep WHAT happened, drop HOW. Shorter is better. Output strictly valid JSON: - {{ "memory": "<≤10-word telegraphic state, or empty>" }} + {{ "memory": "" }} diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt index 602892bd3..d03a6bf8b 100644 --- a/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt +++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_rephrasings.txt @@ -9,7 +9,7 @@ Original task: Generate exactly {n} alternative phrasings of the same task. Vary: - formality (casual / polite / curt) -- verbosity (mostly short imperative; occasional polite request) +- verbosity (short imperative vs longer polite request) - word choice (synonyms, different verbs) - sentence structure (imperative / question / suggestion) @@ -17,7 +17,7 @@ Hard rules: - Each phrasing MUST preserve the exact meaning of the original task. Do not change which object is involved, the destination, or the action. Do not add extra steps. Do not invent new objects. -- Each phrasing must be a short phrase or sentence, plain prose, no +- Each phrasing must be a single short sentence, plain prose, no markdown, no quotes, no list numbers. - Phrasings must be distinct — no near-duplicates. - Output exactly {n} entries. diff --git a/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt b/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt index 625ce920c..6058b1f5c 100644 --- a/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt +++ b/src/lerobot/annotations/steerable_pipeline/prompts/module_2_initial_speech.txt @@ -1,12 +1,10 @@ The user just asked the robot: "{episode_task}". Generate a short verbal acknowledgement the robot would speak back before -beginning the task. Style: compact, confident, friendly. +beginning the task. Style: confident, friendly, single short sentence. Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.", "OK, starting with the sponge.", "Got it.". -Prefer very short replies: "Got it.", "On it.", "OK." - Output strictly valid JSON: {{ "text": "" }}