chore(annotate): throttle Module 3 + executor parallelism to fix vLLM stall

Last bump combined ``module_3.K=3`` with ``vqa_emission_hz=2.0`` and ``executor.episode_parallelism=32``. With 2 cameras per dataset that produced ~12× the original VQA call volume, all submitted concurrently. Module 3 latency went from ~30s/phase to ~490s per episode, vLLM's KV cache pegged at 94% with 800+ in-flight requests, and the multimodal cache corrupted with ``AssertionError: Expected a cached item for mm_hash='...'`` (a known vLLM bug under image-heavy concurrency). Module 1 and 2 ran fine; Module 3 was the bottleneck. Pull back the multipliers to land in a sustainable spot: * module_3.K: 3 (kept) — three diverse questions per emission, where the diversity actually helps the LM head. * module_3.vqa_emission_hz: 2.0 → 1.0 — back to the original emission rate. Net VQA volume is now ~3× original (K alone) on a single camera, ~6× across both cameras — manageable. * module_2.max_interjections_per_episode: 9 → 6 — still 2× the default, fewer than the prior 3× to keep total request volume in check. * vlm.client_concurrency: 256 → 128 — gives vLLM headroom on the multimodal request path so the mm_cache doesn't desync. * executor.episode_parallelism: 32 → 16 — half the episodes in flight at once, so peak vLLM load is ~half. n_task_rephrasings stays at 30 (text-only, doesn't load the image path) and vlm.temperature stays at 0.7. The diversity gains are preserved; only the throughput knobs come down. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-08 18:41:54 +00:00 · 2026-05-05 15:07:18 +02:00
parent 1394a6ab5d
commit aa749d4947
1 changed files with 4 additions and 4 deletions
@@ -53,10 +53,10 @@ CMD = (
    "--tensor-parallel-size 1 --max-model-len 32768 "
    '--gpu-memory-utilization 0.8 --uvicorn-log-level warning --port {port}" '
    "--vlm.serve_ready_timeout_s=1800 "
-    "--vlm.client_concurrency=256 "
+    "--vlm.client_concurrency=128 "
    "--vlm.max_new_tokens=512 "
    "--vlm.temperature=0.7 "
-    "--executor.episode_parallelism=32 "
+    "--executor.episode_parallelism=16 "
    "--vlm.chat_template_kwargs='{\"enable_thinking\": false}' "
    "--vlm.camera_key=observation.images.wrist "
    "--module_1.frames_per_second=1.0 "
@@ -64,9 +64,9 @@ CMD = (
    "--module_1.use_video_url_fps=1.0 "
    "--module_1.derive_task_from_video=always "
    "--module_1.n_task_rephrasings=30 "
-    "--module_2.max_interjections_per_episode=9 "
+    "--module_2.max_interjections_per_episode=6 "
    "--module_3.K=3 "
-    "--module_3.vqa_emission_hz=2.0 "
+    "--module_3.vqa_emission_hz=1.0 "
    "--push_to_hub=pepijn223/super_poulain_full_tool3"
 )