From 1d24301b67446d61dbc8790b96803ace37ef0b9d Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Wed, 13 May 2026 10:46:19 +0200
Subject: [PATCH] chore(training): STEPS=15000 default + dropout walked back to
 0.30/0.30/0.20
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After _tool-good (2000 steps, 0.50/0.50/0.20 dropout) the LM head's
distribution at position 0 shifted from EOS to subtask-vocabulary
tokens but emitted bag-of-words ("cube arm and") rather than well-
formed sentences. That's the expected mid-fine-tuning phase: token-
level supervision has landed, sequence-level grammar hasn't.

Two changes for the next retrain:

  * STEPS=15000 (from 2000) — chat-pretrained backbones need O(10k+)
    steps to walk their pretraining priors down far enough to commit
    to the fine-tuned distribution structurally, not just at the
    token level. _tool-g2's bag-of-words output proves the model is
    on the right path; it just needs more gradient signal.

  * plan/memory dropout 0.50 -> 0.30 — 0.50 was probably too
    aggressive for a small dataset. Half the training samples had
    crucial context missing, which slows down learning the full
    conditional structure. 0.30 still regularises against prompt
    leakage but lets the model learn proper grammar first; the
    higher dropout can be revisited once the head is solid.

Subtask dropout stays at 0.20 since subtask isn't in the high-level
prompt anyway (recipe fix removed the "Current subtask:" message).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 examples/training/smolvla2_hirobot.slurm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/examples/training/smolvla2_hirobot.slurm b/examples/training/smolvla2_hirobot.slurm
index ee5060005..c1f950e8b 100644
--- a/examples/training/smolvla2_hirobot.slurm
+++ b/examples/training/smolvla2_hirobot.slurm
@@ -39,7 +39,7 @@ POLICY_REPO_ID="${POLICY_REPO_ID:-pepijn223/smolvla2_hirobot_super_poulain_tool6
 JOB_NAME="${JOB_NAME:-smolvla2-hirobot-super-poulain-tool6}"
 NUM_PROCESSES="${NUM_PROCESSES:-8}"
 BATCH_SIZE="${BATCH_SIZE:-32}"
-STEPS="${STEPS:-2000}"
+STEPS="${STEPS:-15000}"
 RUN_ID="${SLURM_JOB_ID:-$(date +%Y%m%d_%H%M%S)}"
 OUTPUT_DIR="${OUTPUT_DIR:-/fsx/pepijn/outputs/train/smolvla2_hirobot_super_poulain_tool3_${STEPS}_${RUN_ID}}"
 
@@ -48,7 +48,7 @@ echo "  GPUs:         $NUM_PROCESSES"
 echo "  batch:        $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))"
 echo "  steps:        $STEPS"
 echo "  output:       $OUTPUT_DIR"
-echo "  augmentation: image_transforms ON, prompt dropout {plan:0.50 memory:0.50 subtask:0.20}"
+echo "  augmentation: image_transforms ON, prompt dropout {plan:0.30 memory:0.30 subtask:0.20}"
 
 accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
     -m lerobot.scripts.lerobot_train \
@@ -75,6 +75,6 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
     --dataset.image_transforms.enable=true \
     --dataset.image_transforms.max_num_transforms=3 \
     --dataset.image_transforms.random_order=true \
-    --policy.plan_dropout_prob=0.50 \
-    --policy.memory_dropout_prob=0.50 \
+    --policy.plan_dropout_prob=0.30 \
+    --policy.memory_dropout_prob=0.30 \
     --policy.subtask_dropout_prob=0.20