examples(annotate): switch run_hf_job to Qwen3.6-27B (dense VLM)

Swap the annotation VLM from Qwen3.6-35B-A3B (sparse MoE, ~3B active)
to Qwen3.6-27B (dense, 27B all-active). Per Scale's dense-captioning
study, model capacity is the #1 lever and the dominant failure is
visual grounding — both helped by ~9x more active params. Qwen3.6-27B
is a vision-language model (vision encoder, image + video), same family
so the chat template / video handling / enable_thinking=false flag are
unchanged, and at 27B dense it still fits one H200 per server, so the
two-parallel-server layout (TP=1, one per GPU) is preserved — no
throughput-layout change, just a much stronger model.

Kept: parallel_servers=2, num_gpus=2, max-model-len 32768 (the 32-frame
embedded budget is ~10k tokens, well under), gpu-mem 0.8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-06-02 16:16:26 +02:00
parent cd128cbbd5
commit 3236c6ee4a
+4 -4
View File
@@ -1,10 +1,10 @@
#!/usr/bin/env python
"""Launch ``lerobot-annotate`` on a Hugging Face job (vllm + Qwen3.6 MoE).
"""Launch ``lerobot-annotate`` on a Hugging Face job (vllm + Qwen3.6-27B VLM).
Spawns one ``h200x2`` job that:
1. installs this branch of ``lerobot`` plus the annotation extras,
2. boots two vllm servers (one per GPU) with Qwen3.6-35B-A3B-FP8,
2. boots two vllm servers (one per GPU) with Qwen3.6-27B (dense VLM),
3. runs the plan / interjections / vqa modules across the dataset
in free-form mode (each episode generates its own subtasks +
memory),
@@ -40,10 +40,10 @@ CMD = (
"--dest_repo_id=pepijn223/robocasa_smoke_2atomic_v3_ann "
"--push_to_hub=true "
"--vlm.backend=openai "
"--vlm.model_id=Qwen/Qwen3.6-35B-A3B-FP8 "
"--vlm.model_id=Qwen/Qwen3.6-27B "
"--vlm.parallel_servers=2 "
"--vlm.num_gpus=2 "
'--vlm.serve_command="vllm serve Qwen/Qwen3.6-35B-A3B-FP8 '
'--vlm.serve_command="vllm serve Qwen/Qwen3.6-27B '
"--tensor-parallel-size 1 --max-model-len 32768 "
'--gpu-memory-utilization 0.8 --uvicorn-log-level warning --port {port}" '
"--vlm.serve_ready_timeout_s=1800 "