From 1d24301b67446d61dbc8790b96803ace37ef0b9d Mon Sep 17 00:00:00 2001 From: Pepijn Date: Wed, 13 May 2026 10:46:19 +0200 Subject: [PATCH] chore(training): STEPS=15000 default + dropout walked back to 0.30/0.30/0.20 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After _tool-good (2000 steps, 0.50/0.50/0.20 dropout) the LM head's distribution at position 0 shifted from EOS to subtask-vocabulary tokens but emitted bag-of-words ("cube arm and") rather than well- formed sentences. That's the expected mid-fine-tuning phase: token- level supervision has landed, sequence-level grammar hasn't. Two changes for the next retrain: * STEPS=15000 (from 2000) — chat-pretrained backbones need O(10k+) steps to walk their pretraining priors down far enough to commit to the fine-tuned distribution structurally, not just at the token level. _tool-g2's bag-of-words output proves the model is on the right path; it just needs more gradient signal. * plan/memory dropout 0.50 -> 0.30 — 0.50 was probably too aggressive for a small dataset. Half the training samples had crucial context missing, which slows down learning the full conditional structure. 0.30 still regularises against prompt leakage but lets the model learn proper grammar first; the higher dropout can be revisited once the head is solid. Subtask dropout stays at 0.20 since subtask isn't in the high-level prompt anyway (recipe fix removed the "Current subtask:" message). Co-Authored-By: Claude Opus 4.7 (1M context) --- examples/training/smolvla2_hirobot.slurm | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/examples/training/smolvla2_hirobot.slurm b/examples/training/smolvla2_hirobot.slurm index ee5060005..c1f950e8b 100644 --- a/examples/training/smolvla2_hirobot.slurm +++ b/examples/training/smolvla2_hirobot.slurm @@ -39,7 +39,7 @@ POLICY_REPO_ID="${POLICY_REPO_ID:-pepijn223/smolvla2_hirobot_super_poulain_tool6 JOB_NAME="${JOB_NAME:-smolvla2-hirobot-super-poulain-tool6}" NUM_PROCESSES="${NUM_PROCESSES:-8}" BATCH_SIZE="${BATCH_SIZE:-32}" -STEPS="${STEPS:-2000}" +STEPS="${STEPS:-15000}" RUN_ID="${SLURM_JOB_ID:-$(date +%Y%m%d_%H%M%S)}" OUTPUT_DIR="${OUTPUT_DIR:-/fsx/pepijn/outputs/train/smolvla2_hirobot_super_poulain_tool3_${STEPS}_${RUN_ID}}" @@ -48,7 +48,7 @@ echo " GPUs: $NUM_PROCESSES" echo " batch: $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))" echo " steps: $STEPS" echo " output: $OUTPUT_DIR" -echo " augmentation: image_transforms ON, prompt dropout {plan:0.50 memory:0.50 subtask:0.20}" +echo " augmentation: image_transforms ON, prompt dropout {plan:0.30 memory:0.30 subtask:0.20}" accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \ -m lerobot.scripts.lerobot_train \ @@ -75,6 +75,6 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \ --dataset.image_transforms.enable=true \ --dataset.image_transforms.max_num_transforms=3 \ --dataset.image_transforms.random_order=true \ - --policy.plan_dropout_prob=0.50 \ - --policy.memory_dropout_prob=0.50 \ + --policy.plan_dropout_prob=0.30 \ + --policy.memory_dropout_prob=0.30 \ --policy.subtask_dropout_prob=0.20