From b6fb536460f1c1e7770fa5b6cf9924cefab9c78f Mon Sep 17 00:00:00 2001 From: Pepijn Date: Tue, 12 May 2026 21:30:51 +0200 Subject: [PATCH] chore(training): bump plan/memory dropout to 0.50 to force vision-grounding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After the recipe fix (target=${subtask} at every frame) the model can still reach low text_loss by reading the answer off the plan in the prompt: at training the prompt contains the 6-step plan, and the current subtask is one of those steps, so the model just learns "active step N matches subtask N" and never needs to look at the image. Symptom at inference: subtask string is set but never updates because the model isn't really conditioning on the visual progress. Drop plan and memory with p=0.50 each — half of training frames the prompt is just "${task}" (constant for this dataset) + visual prefix, which is the only place the answer can come from. Forces the LM head to actually use vision. ``subtask_dropout`` stays at 0.20 because subtask isn't in the high-level prompt anymore (recipe fix removed the "Current subtask: X" message); the knob still affects other sub-recipes that reference it as context. Co-Authored-By: Claude Opus 4.7 (1M context) --- examples/training/smolvla2_hirobot.slurm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/training/smolvla2_hirobot.slurm b/examples/training/smolvla2_hirobot.slurm index c03022ce3..ee5060005 100644 --- a/examples/training/smolvla2_hirobot.slurm +++ b/examples/training/smolvla2_hirobot.slurm @@ -48,7 +48,7 @@ echo " GPUs: $NUM_PROCESSES" echo " batch: $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))" echo " steps: $STEPS" echo " output: $OUTPUT_DIR" -echo " augmentation: image_transforms ON, prompt dropout {plan:0.15 memory:0.15 subtask:0.20}" +echo " augmentation: image_transforms ON, prompt dropout {plan:0.50 memory:0.50 subtask:0.20}" accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \ -m lerobot.scripts.lerobot_train \ @@ -75,6 +75,6 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \ --dataset.image_transforms.enable=true \ --dataset.image_transforms.max_num_transforms=3 \ --dataset.image_transforms.random_order=true \ - --policy.plan_dropout_prob=0.15 \ - --policy.memory_dropout_prob=0.15 \ + --policy.plan_dropout_prob=0.50 \ + --policy.memory_dropout_prob=0.50 \ --policy.subtask_dropout_prob=0.20