fix(smolvla2): align flow_loss_weight default with Pi 0.5 paper's α=10

Pi 0.5 paper §IV.D Eq. (1) sets the loss balance to α=10 between text CE and flow MSE: actions are the primary output and the flow head should dominate the gradient signal. SmolVLA2 was defaulting both weights to 1.0, which inverts that — text CE (~0.5-2.0 nats) ends up larger than flow MSE (~0.1-1.0), so the action expert gets less gradient than the LM head despite being the primary task. Match the paper's split: text_loss_weight=1.0, flow_loss_weight=10.0. Same as ``pi052`` (the new full reproduction policy). Also pin the values explicitly in the SLURM launcher so the choice is visible and overridable per-run rather than buried in the config default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-12 04:21:45 +00:00 · 2026-05-13 11:02:17 +02:00
parent ef5879a02a
commit 12cce8f2cc
2 changed files with 15 additions and 2 deletions
@@ -63,6 +63,8 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
    --policy.compile_model=false \
    --policy.device=cuda \
    --policy.tokenizer_max_length=512 \
+    --policy.text_loss_weight=1.0 \
+    --policy.flow_loss_weight=10.0 \
    --steps="$STEPS" \
    --policy.scheduler_decay_steps="$STEPS" \
    --batch_size="$BATCH_SIZE" \