The tensor-level comparison between dry-run (dataset frame) and live-
robot inference proved the runtime is bug-free — same shape, dtype,
device, channel order, batch dim, and normalization on both paths.
The remaining variable: front-camera mean brightness was 0.26 live vs
0.39 on the dataset frame, ~33% darker. Training augmentation only
covered ±20% brightness, so the live scene sits just outside the
supervised envelope and the LM head collapses to its dominant prior.
Widen the augmentation knobs for the next retrain:
* brightness 0.8–1.2 → 0.5–1.6 (covers ~30% darker / 60% lighter)
* contrast 0.8–1.2 → 0.6–1.5
* saturation 0.5–1.5 → 0.3–1.7
* hue ±0.05 → ±0.10
* affine ±5°/±5% → ±15°/±15% (covers cube placement / camera drift)
* max_num_transforms 3 → 4
And bump prompt-component dropout (subtask 0.20 → 0.30) so the LM
can't lean on stale memorised plan/memory at inference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two complementary regularisers to attack the
``text_loss=6e-6 = memorised one dataset`` failure mode that's
making the model collapse on real-robot input:
1. **Per-component prompt dropout** (Pi0.7 §V.E / plan's
``feat/pi05-prompt-dropout`` follow-up).
``SmolVLA2ChatTokenizerStep`` gains
``plan_dropout_prob`` / ``memory_dropout_prob`` /
``subtask_dropout_prob`` knobs (default 0.0 — opt-in). At training,
non-target messages whose rendered content starts with
``Plan:`` / ``Memory:`` / ``Current subtask:`` etc. are dropped
with their respective probability before tokenisation, with a
deterministic per-sample RNG keyed off the dataset ``index``.
``target_message_indices`` is re-mapped so the supervision still
lands on the right turn. Forces the model to handle missing
plan/memory/subtask context — directly attacks the real-robot
collapse where a stale or empty plan field puts the prompt OOD.
Surfaced on ``SmolVLA2Config`` as three floats so they're
``--policy.<knob>=<value>``-controllable from the train CLI;
plumbed through ``make_smolvla2_pre_post_processors``.
2. **Image augmentation** is already wired in lerobot via
``--dataset.image_transforms.enable=true`` (torchvision v2
ColorJitter + SharpnessJitter + RandomAffine, default 3 of 6
sampled per frame). No code change needed — just a CLI flag.
``examples/training/smolvla2_hirobot.slurm`` shows the full
training command with both enabled. Drop-in replacement for the
ad-hoc SLURM script Pepijn was using locally; same args, plus the
three dropout probs and the image-transforms flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Remove unused scripts, add docs for image transforms and add example
* fix(examples): move train_policy.py under examples, remove outdated readme parts
* remove script thats copied to train folder
* remove outdated links to examples and example tests