mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 09:09:48 +00:00
aa749d4947
Last bump combined ``module_3.K=3`` with ``vqa_emission_hz=2.0`` and
``executor.episode_parallelism=32``. With 2 cameras per dataset that
produced ~12× the original VQA call volume, all submitted concurrently.
Module 3 latency went from ~30s/phase to ~490s per episode, vLLM's
KV cache pegged at 94% with 800+ in-flight requests, and the
multimodal cache corrupted with ``AssertionError: Expected a cached
item for mm_hash='...'`` (a known vLLM bug under image-heavy
concurrency). Module 1 and 2 ran fine; Module 3 was the bottleneck.
Pull back the multipliers to land in a sustainable spot:
* module_3.K: 3 (kept) — three diverse questions per emission,
where the diversity actually helps the LM head.
* module_3.vqa_emission_hz: 2.0 → 1.0 — back to the original
emission rate. Net VQA volume is now ~3× original (K alone) on
a single camera, ~6× across both cameras — manageable.
* module_2.max_interjections_per_episode: 9 → 6 — still 2× the
default, fewer than the prior 3× to keep total request volume
in check.
* vlm.client_concurrency: 256 → 128 — gives vLLM headroom on the
multimodal request path so the mm_cache doesn't desync.
* executor.episode_parallelism: 32 → 16 — half the episodes
in flight at once, so peak vLLM load is ~half.
n_task_rephrasings stays at 30 (text-only, doesn't load the image
path) and vlm.temperature stays at 0.7. The diversity gains are
preserved; only the throughput knobs come down.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
82 lines
3.2 KiB
Python
82 lines
3.2 KiB
Python
#!/usr/bin/env python
|
|
"""Launch ``lerobot-annotate`` on a Hugging Face job (vllm + Qwen3.6 MoE).
|
|
|
|
Spawns one ``h200x2`` job that:
|
|
|
|
1. installs this branch of ``lerobot`` plus the annotation extras,
|
|
2. boots two vllm servers (one per GPU) with Qwen3.6-35B-A3B-FP8,
|
|
3. runs Module 1/2/3 across the dataset (per-camera VQA via PR 3471),
|
|
4. uploads the annotated dataset to ``--push_to_hub``.
|
|
|
|
Usage:
|
|
|
|
HF_TOKEN=hf_... uv run python examples/annotation/run_hf_job.py
|
|
|
|
Adjust ``CMD`` below to point at your own dataset / target hub repo.
|
|
"""
|
|
|
|
import os
|
|
|
|
from huggingface_hub import get_token, run_job
|
|
|
|
token = os.environ.get("HF_TOKEN") or get_token()
|
|
if not token:
|
|
raise RuntimeError("No HF token. Run `huggingface-cli login` or `export HF_TOKEN=hf_...`")
|
|
|
|
# --- Diversity knobs (Pi0.7-style prompt expansion) -----------------------
|
|
# Bumped roughly 3x across the board to fight memorization on small datasets.
|
|
# A single dataset trained for many epochs with deterministic atom wording
|
|
# converges to perfect recall on training prompts but produces JSON-token
|
|
# garbage at inference for any wording that drifts slightly. More atom
|
|
# variants per episode + higher sampling temperature widens the training
|
|
# distribution so the model has to actually use its language head, not
|
|
# just memorize.
|
|
#
|
|
# Pushes to a *new* hub repo (``_tool3``) so the previous annotation pass
|
|
# (``_tool2``) stays intact — re-train from scratch on the new dataset and
|
|
# compare loss-curve shapes to verify the diversity bump is doing something.
|
|
CMD = (
|
|
"apt-get update -qq && apt-get install -y -qq git ffmpeg && "
|
|
"pip install --no-deps "
|
|
"'lerobot @ git+https://github.com/huggingface/lerobot.git@feat/language-annotation-pipeline' && "
|
|
"pip install --upgrade-strategy only-if-needed "
|
|
"datasets pyarrow av jsonlines draccus gymnasium torchcodec mergedeep pyyaml-include toml typing-inspect && "
|
|
"export VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0 && "
|
|
"export VLLM_VIDEO_BACKEND=pyav && "
|
|
"lerobot-annotate "
|
|
"--repo_id=imstevenpmwork/super_poulain_draft "
|
|
"--vlm.backend=openai "
|
|
"--vlm.model_id=Qwen/Qwen3.6-35B-A3B-FP8 "
|
|
"--vlm.parallel_servers=2 "
|
|
"--vlm.num_gpus=2 "
|
|
'--vlm.serve_command="vllm serve Qwen/Qwen3.6-35B-A3B-FP8 '
|
|
"--tensor-parallel-size 1 --max-model-len 32768 "
|
|
'--gpu-memory-utilization 0.8 --uvicorn-log-level warning --port {port}" '
|
|
"--vlm.serve_ready_timeout_s=1800 "
|
|
"--vlm.client_concurrency=128 "
|
|
"--vlm.max_new_tokens=512 "
|
|
"--vlm.temperature=0.7 "
|
|
"--executor.episode_parallelism=16 "
|
|
"--vlm.chat_template_kwargs='{\"enable_thinking\": false}' "
|
|
"--vlm.camera_key=observation.images.wrist "
|
|
"--module_1.frames_per_second=1.0 "
|
|
"--module_1.use_video_url=true "
|
|
"--module_1.use_video_url_fps=1.0 "
|
|
"--module_1.derive_task_from_video=always "
|
|
"--module_1.n_task_rephrasings=30 "
|
|
"--module_2.max_interjections_per_episode=6 "
|
|
"--module_3.K=3 "
|
|
"--module_3.vqa_emission_hz=1.0 "
|
|
"--push_to_hub=pepijn223/super_poulain_full_tool3"
|
|
)
|
|
|
|
job = run_job(
|
|
image="vllm/vllm-openai:latest",
|
|
command=["bash", "-c", CMD],
|
|
flavor="h200x2",
|
|
secrets={"HF_TOKEN": token},
|
|
timeout="2h",
|
|
)
|
|
print(f"Job URL: {job.url}")
|
|
print(f"Job ID: {job.id}")
|