From b6fb536460f1c1e7770fa5b6cf9924cefab9c78f Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Tue, 12 May 2026 21:30:51 +0200
Subject: [PATCH] chore(training): bump plan/memory dropout to 0.50 to force
 vision-grounding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After the recipe fix (target=${subtask} at every frame) the model
can still reach low text_loss by reading the answer off the plan in
the prompt: at training the prompt contains the 6-step plan, and the
current subtask is one of those steps, so the model just learns
"active step N matches subtask N" and never needs to look at the
image. Symptom at inference: subtask string is set but never updates
because the model isn't really conditioning on the visual progress.

Drop plan and memory with p=0.50 each — half of training frames the
prompt is just "${task}" (constant for this dataset) + visual prefix,
which is the only place the answer can come from. Forces the LM head
to actually use vision.

``subtask_dropout`` stays at 0.20 because subtask isn't in the
high-level prompt anymore (recipe fix removed the "Current subtask:
X" message); the knob still affects other sub-recipes that reference
it as context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 examples/training/smolvla2_hirobot.slurm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/training/smolvla2_hirobot.slurm b/examples/training/smolvla2_hirobot.slurm
index c03022ce3..ee5060005 100644
--- a/examples/training/smolvla2_hirobot.slurm
+++ b/examples/training/smolvla2_hirobot.slurm
@@ -48,7 +48,7 @@ echo "  GPUs:         $NUM_PROCESSES"
 echo "  batch:        $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))"
 echo "  steps:        $STEPS"
 echo "  output:       $OUTPUT_DIR"
-echo "  augmentation: image_transforms ON, prompt dropout {plan:0.15 memory:0.15 subtask:0.20}"
+echo "  augmentation: image_transforms ON, prompt dropout {plan:0.50 memory:0.50 subtask:0.20}"
 
 accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
     -m lerobot.scripts.lerobot_train \
@@ -75,6 +75,6 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
     --dataset.image_transforms.enable=true \
     --dataset.image_transforms.max_num_transforms=3 \
     --dataset.image_transforms.random_order=true \
-    --policy.plan_dropout_prob=0.15 \
-    --policy.memory_dropout_prob=0.15 \
+    --policy.plan_dropout_prob=0.50 \
+    --policy.memory_dropout_prob=0.50 \
     --policy.subtask_dropout_prob=0.20