examples(annotate): switch run_hf_job to Qwen3.6-27B (dense VLM)

Swap the annotation VLM from Qwen3.6-35B-A3B (sparse MoE, ~3B active) to Qwen3.6-27B (dense, 27B all-active). Per Scale's dense-captioning study, model capacity is the #1 lever and the dominant failure is visual grounding — both helped by ~9x more active params. Qwen3.6-27B is a vision-language model (vision encoder, image + video), same family so the chat template / video handling / enable_thinking=false flag are unchanged, and at 27B dense it still fits one H200 per server, so the two-parallel-server layout (TP=1, one per GPU) is preserved — no throughput-layout change, just a much stronger model. Kept: parallel_servers=2, num_gpus=2, max-model-len 32768 (the 32-frame embedded budget is ~10k tokens, well under), gpu-mem 0.8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-08-02 14:39:57 +00:00 · 2026-06-02 16:16:26 +02:00
parent cd128cbbd5
commit 3236c6ee4a
1 changed files with 4 additions and 4 deletions
@@ -1,10 +1,10 @@
 #!/usr/bin/env python
-"""Launch ``lerobot-annotate`` on a Hugging Face job (vllm + Qwen3.6 MoE).
+"""Launch ``lerobot-annotate`` on a Hugging Face job (vllm + Qwen3.6-27B VLM).

 Spawns one ``h200x2`` job that:

  1. installs this branch of ``lerobot`` plus the annotation extras,
-  2. boots two vllm servers (one per GPU) with Qwen3.6-35B-A3B-FP8,
+  2. boots two vllm servers (one per GPU) with Qwen3.6-27B (dense VLM),
  3. runs the plan / interjections / vqa modules across the dataset
     in free-form mode (each episode generates its own subtasks +
     memory),
@@ -40,10 +40,10 @@ CMD = (
    "--dest_repo_id=pepijn223/robocasa_smoke_2atomic_v3_ann "
    "--push_to_hub=true "
    "--vlm.backend=openai "
-    "--vlm.model_id=Qwen/Qwen3.6-35B-A3B-FP8 "
+    "--vlm.model_id=Qwen/Qwen3.6-27B "
    "--vlm.parallel_servers=2 "
    "--vlm.num_gpus=2 "
-    '--vlm.serve_command="vllm serve Qwen/Qwen3.6-35B-A3B-FP8 '
+    '--vlm.serve_command="vllm serve Qwen/Qwen3.6-27B '
    "--tensor-parallel-size 1 --max-model-len 32768 "
    '--gpu-memory-utilization 0.8 --uvicorn-log-level warning --port {port}" '
    "--vlm.serve_ready_timeout_s=1800 "