revert(annotate): move memory + speech prompts to base PR (#3471)

The first-person memory narrative, task-rephrasing and initial-speech prompt tweaks belong in the annotation pipeline itself. Applied to feat/language-annotation-pipeline (#3471); reverting them here to the merge-base so they drop out of this PR's diff. general_vqa.py keeps its docstring fix since it references a recipe this PR introduces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-06 09:37:06 +00:00 · 2026-05-19 14:17:52 +02:00
parent 182f10184f
commit 7b64e5498d
3 changed files with 12 additions and 24 deletions
@@ -1,35 +1,25 @@
 You are updating the robot's compressed semantic memory at the boundary of
 a completed subtask.

-Reference (MEM, Torne 2026):
+Reference (verbatim from MEM, Torne 2026):
 "Remove or compress information in the language memory whenever
 appropriate. Keep ONLY the minimal set of relevant information for future
 task execution. Specific object attributes (colors, precise quantities of
 each item) get discarded when their details won't affect subsequent
 actions. Functional outcomes (where items went, how many) are preserved."

+Concrete example from MEM:
+  Before: "I put a light green bowl, a dark blue bowl and a bright yellow
+           bowl into the top right cabinet"
+  After:  "I placed three bowls in the top right cabinet"
+
 Episode task: "{episode_task}"
 Previous memory: {prior_memory}
 Just-completed subtask: "{completed_subtask}"
 Remaining subtasks (for relevance judgement only): {remaining_subtasks}

-Write the **shortest possible** state note that future subtasks could
-need. Telegraphic style.
-
-**Hard caps**
- ≤ 10 words total.
- No articles. No verbs in past tense ("placed", "moved"). Use
-  comma-separated noun→location fragments.
- Drop colors/sizes/counts unless a later subtask depends on them.
- If nothing material changed for downstream subtasks, emit "" (empty
-  string).
-
-Examples
- Good: "bowl in box, lid open"
- Good: "3 bowls in cabinet"
- Good: "cup on tray, drawer closed"
- Bad:  "The bowl is now in the box and the lid is still open."
- Bad:  "I placed the green bowl carefully into the cardboard box."
+Update the memory. Drop irrelevant detail. Compress completed steps.
+Keep WHAT happened, drop HOW. Shorter is better.

 Output strictly valid JSON:
-  {{ "memory": "<≤10-word telegraphic state, or empty>" }}
+  {{ "memory": "<one or two short sentences>" }}
@@ -9,7 +9,7 @@ Original task:
 Generate exactly {n} alternative phrasings of the same task. Vary:

 - formality (casual / polite / curt)
- verbosity (mostly short imperative; occasional polite request)
+- verbosity (short imperative vs longer polite request)
 - word choice (synonyms, different verbs)
 - sentence structure (imperative / question / suggestion)

@@ -17,7 +17,7 @@ Hard rules:
 - Each phrasing MUST preserve the exact meaning of the original task.
  Do not change which object is involved, the destination, or the
  action. Do not add extra steps. Do not invent new objects.
- Each phrasing must be a short phrase or sentence, plain prose, no
+- Each phrasing must be a single short sentence, plain prose, no
  markdown, no quotes, no list numbers.
 - Phrasings must be distinct — no near-duplicates.
 - Output exactly {n} entries.
@@ -1,12 +1,10 @@
 The user just asked the robot: "{episode_task}".

 Generate a short verbal acknowledgement the robot would speak back before
-beginning the task. Style: compact, confident, friendly.
+beginning the task. Style: confident, friendly, single short sentence.

 Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.",
 "OK, starting with the sponge.", "Got it.".

-Prefer very short replies: "Got it.", "On it.", "OK."
-
 Output strictly valid JSON:
  {{ "text": "<the spoken acknowledgement>" }}