feat(annotate): let the VLM decide vocabulary size

Hardcoding ``n_subtask_target=10`` and ``n_memory_target=6`` baked task complexity into the config — a simple pick-and-place needs ~6, a multi-step recipe needs ~20. The VLM already sees the clips, so let it pick the count itself from what's recurring across episodes. Drop both knobs from ``VocabularyConfig`` and the ``module_0_vocabulary`` prompt template. The prompt now says "decide the count yourself based on what you see — the smallest set that still covers every recurring phase" and adds an "each label must recur across the demos" rule so the VLM filters out one-off motions. Update the launcher script + docs to remove the old knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
2026-07-11 12:01:52 +00:00 · 2026-05-22 11:46:31 +00:00
parent 369ab17110
commit 54221ceea2
5 changed files with 31 additions and 27 deletions
@@ -55,12 +55,11 @@ CMD = (
    "--vlm.chat_template_kwargs='{\"enable_thinking\": false}' "
    "--vlm.camera_key=observation.images.wrist "
    # Phase 0 — canonical vocabulary discovery from the first N sample
-    # episodes. The resulting meta/canonical_vocabulary.json constrains
-    # every subtask + memory string to a small repeatable target
-    # distribution; tune the counts for your task complexity.
+    # episodes. The VLM picks the right number of subtask + memory
+    # entries itself from what it sees; the resulting
+    # meta/canonical_vocabulary.json constrains every subtask + memory
+    # string to a small repeatable target distribution.
    "--vocabulary.sample_episodes=3 "
-    "--vocabulary.n_subtask_target=10 "
-    "--vocabulary.n_memory_target=6 "
    # Phase 1 — plan module (subtasks + plan + memory + task_aug).
    "--plan.frames_per_second=1.0 "
    "--plan.use_video_url=true "