chore(training): bump plan/memory dropout to 0.50 to force vision-grounding

After the recipe fix (target=${subtask} at every frame) the model
can still reach low text_loss by reading the answer off the plan in
the prompt: at training the prompt contains the 6-step plan, and the
current subtask is one of those steps, so the model just learns
"active step N matches subtask N" and never needs to look at the
image. Symptom at inference: subtask string is set but never updates
because the model isn't really conditioning on the visual progress.

Drop plan and memory with p=0.50 each — half of training frames the
prompt is just "${task}" (constant for this dataset) + visual prefix,
which is the only place the answer can come from. Forces the LM head
to actually use vision.

``subtask_dropout`` stays at 0.20 because subtask isn't in the
high-level prompt anymore (recipe fix removed the "Current subtask:
X" message); the knob still affects other sub-recipes that reference
it as context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-12 21:30:51 +02:00
parent bfd3bb1791
commit b6fb536460
+3 -3
View File
@@ -48,7 +48,7 @@ echo " GPUs: $NUM_PROCESSES"
echo " batch: $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))"
echo " steps: $STEPS"
echo " output: $OUTPUT_DIR"
echo " augmentation: image_transforms ON, prompt dropout {plan:0.15 memory:0.15 subtask:0.20}"
echo " augmentation: image_transforms ON, prompt dropout {plan:0.50 memory:0.50 subtask:0.20}"
accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
-m lerobot.scripts.lerobot_train \
@@ -75,6 +75,6 @@ accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
--dataset.image_transforms.enable=true \
--dataset.image_transforms.max_num_transforms=3 \
--dataset.image_transforms.random_order=true \
--policy.plan_dropout_prob=0.15 \
--policy.memory_dropout_prob=0.15 \
--policy.plan_dropout_prob=0.50 \
--policy.memory_dropout_prob=0.50 \
--policy.subtask_dropout_prob=0.20