mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-23 20:50:02 +00:00
feat(annotate): let the VLM decide vocabulary size
Hardcoding ``n_subtask_target=10`` and ``n_memory_target=6`` baked task complexity into the config — a simple pick-and-place needs ~6, a multi-step recipe needs ~20. The VLM already sees the clips, so let it pick the count itself from what's recurring across episodes. Drop both knobs from ``VocabularyConfig`` and the ``module_0_vocabulary`` prompt template. The prompt now says "decide the count yourself based on what you see — the smallest set that still covers every recurring phase" and adds an "each label must recur across the demos" rule so the VLM filters out one-off motions. Update the launcher script + docs to remove the old knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -55,12 +55,11 @@ CMD = (
|
||||
"--vlm.chat_template_kwargs='{\"enable_thinking\": false}' "
|
||||
"--vlm.camera_key=observation.images.wrist "
|
||||
# Phase 0 — canonical vocabulary discovery from the first N sample
|
||||
# episodes. The resulting meta/canonical_vocabulary.json constrains
|
||||
# every subtask + memory string to a small repeatable target
|
||||
# distribution; tune the counts for your task complexity.
|
||||
# episodes. The VLM picks the right number of subtask + memory
|
||||
# entries itself from what it sees; the resulting
|
||||
# meta/canonical_vocabulary.json constrains every subtask + memory
|
||||
# string to a small repeatable target distribution.
|
||||
"--vocabulary.sample_episodes=3 "
|
||||
"--vocabulary.n_subtask_target=10 "
|
||||
"--vocabulary.n_memory_target=6 "
|
||||
# Phase 1 — plan module (subtasks + plan + memory + task_aug).
|
||||
"--plan.frames_per_second=1.0 "
|
||||
"--plan.use_video_url=true "
|
||||
|
||||
Reference in New Issue
Block a user