revert(annotate): move memory + speech prompts to base PR (#3471)

The first-person memory narrative, task-rephrasing and initial-speech
prompt tweaks belong in the annotation pipeline itself. Applied to
feat/language-annotation-pipeline (#3471); reverting them here to the
merge-base so they drop out of this PR's diff. general_vqa.py keeps its
docstring fix since it references a recipe this PR introduces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-19 14:17:52 +02:00
parent 182f10184f
commit 7b64e5498d
3 changed files with 12 additions and 24 deletions
@@ -1,35 +1,25 @@
You are updating the robot's compressed semantic memory at the boundary of You are updating the robot's compressed semantic memory at the boundary of
a completed subtask. a completed subtask.
Reference (MEM, Torne 2026): Reference (verbatim from MEM, Torne 2026):
"Remove or compress information in the language memory whenever "Remove or compress information in the language memory whenever
appropriate. Keep ONLY the minimal set of relevant information for future appropriate. Keep ONLY the minimal set of relevant information for future
task execution. Specific object attributes (colors, precise quantities of task execution. Specific object attributes (colors, precise quantities of
each item) get discarded when their details won't affect subsequent each item) get discarded when their details won't affect subsequent
actions. Functional outcomes (where items went, how many) are preserved." actions. Functional outcomes (where items went, how many) are preserved."
Concrete example from MEM:
Before: "I put a light green bowl, a dark blue bowl and a bright yellow
bowl into the top right cabinet"
After: "I placed three bowls in the top right cabinet"
Episode task: "{episode_task}" Episode task: "{episode_task}"
Previous memory: {prior_memory} Previous memory: {prior_memory}
Just-completed subtask: "{completed_subtask}" Just-completed subtask: "{completed_subtask}"
Remaining subtasks (for relevance judgement only): {remaining_subtasks} Remaining subtasks (for relevance judgement only): {remaining_subtasks}
Write the **shortest possible** state note that future subtasks could Update the memory. Drop irrelevant detail. Compress completed steps.
need. Telegraphic style. Keep WHAT happened, drop HOW. Shorter is better.
**Hard caps**
- ≤ 10 words total.
- No articles. No verbs in past tense ("placed", "moved"). Use
comma-separated noun→location fragments.
- Drop colors/sizes/counts unless a later subtask depends on them.
- If nothing material changed for downstream subtasks, emit "" (empty
string).
Examples
- Good: "bowl in box, lid open"
- Good: "3 bowls in cabinet"
- Good: "cup on tray, drawer closed"
- Bad: "The bowl is now in the box and the lid is still open."
- Bad: "I placed the green bowl carefully into the cardboard box."
Output strictly valid JSON: Output strictly valid JSON:
{{ "memory": "<≤10-word telegraphic state, or empty>" }} {{ "memory": "<one or two short sentences>" }}
@@ -9,7 +9,7 @@ Original task:
Generate exactly {n} alternative phrasings of the same task. Vary: Generate exactly {n} alternative phrasings of the same task. Vary:
- formality (casual / polite / curt) - formality (casual / polite / curt)
- verbosity (mostly short imperative; occasional polite request) - verbosity (short imperative vs longer polite request)
- word choice (synonyms, different verbs) - word choice (synonyms, different verbs)
- sentence structure (imperative / question / suggestion) - sentence structure (imperative / question / suggestion)
@@ -17,7 +17,7 @@ Hard rules:
- Each phrasing MUST preserve the exact meaning of the original task. - Each phrasing MUST preserve the exact meaning of the original task.
Do not change which object is involved, the destination, or the Do not change which object is involved, the destination, or the
action. Do not add extra steps. Do not invent new objects. action. Do not add extra steps. Do not invent new objects.
- Each phrasing must be a short phrase or sentence, plain prose, no - Each phrasing must be a single short sentence, plain prose, no
markdown, no quotes, no list numbers. markdown, no quotes, no list numbers.
- Phrasings must be distinct — no near-duplicates. - Phrasings must be distinct — no near-duplicates.
- Output exactly {n} entries. - Output exactly {n} entries.
@@ -1,12 +1,10 @@
The user just asked the robot: "{episode_task}". The user just asked the robot: "{episode_task}".
Generate a short verbal acknowledgement the robot would speak back before Generate a short verbal acknowledgement the robot would speak back before
beginning the task. Style: compact, confident, friendly. beginning the task. Style: confident, friendly, single short sentence.
Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.", Examples (Hi Robot, Shi 2025): "Sure, I won't put cheese on it.",
"OK, starting with the sponge.", "Got it.". "OK, starting with the sponge.", "Got it.".
Prefer very short replies: "Got it.", "On it.", "OK."
Output strictly valid JSON: Output strictly valid JSON:
{{ "text": "<the spoken acknowledgement>" }} {{ "text": "<the spoken acknowledgement>" }}