Commit Graph

14 Commits

Author SHA1 Message Date
Pepijn b2aa372fcf refactor(recipes): fold memory into action_execution, drop interjection, fuse smolvla2 forward
Recipe changes:
* action_execution now bundles the memory update as a second
  assistant target gated on a new ``new_memory`` binding (fires
  only at subtask-boundary frames). No "Completed subtask: X"
  filler — the model emits the new subtask AND the updated
  memory back-to-back in one prefix.
* user_interjection_response sub-recipe removed (current
  datasets don't have interjection / say() annotations).
* Standalone memory_update sub-recipe removed (folded above).
* Weights rebalanced: action_execution 0.85, ask_vqa_top/wrist
  0.075 each (sums to 1.0).

Runtime ``_msgs_for_memory`` updated to match the new
boundary-frame prompt layout.

Modeling:
* SmolVLA2Policy now fuses the flow + text losses into a SINGLE
  backbone forward via ``_compute_fused_loss`` (one
  vlm_with_expert pass with [prefix, suffix] embeds, then both
  lm_head CE on lang slice + action_out_proj MSE on suffix).
  Mirrors pi052's existing ``_compute_all_losses_fused`` —
  saves one backbone pass per training step.

Examples:
* Removed the two training SLURM scaffolds; they were
  out-of-date with the recipe refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:51:09 +02:00
Pepijn 8eba704f15 Revert "chore(training): align pi052_hirobot.slurm with the operator's actual command"
This reverts commit ecbac17196.
2026-05-13 11:03:58 +02:00
Pepijn ecbac17196 chore(training): align pi052_hirobot.slurm with the operator's actual command
Match the working SmolVLA2 launch pattern so the two SLURM scripts
are interchangeable:

  * literal NUM_PROCESSES / BATCH_SIZE / STEPS (no env-var defaults)
  * STEPS=10000 to match the next SmolVLA2 run
  * save_freq=$STEPS so only the final checkpoint is saved
  * dropouts 0.1/0.1/0.1 (mild — matches the operator's iteration)
  * flow_loss_weight / text_loss_weight come from the PI052Config
    defaults (10.0 / 1.0 per Pi 0.5 paper §IV.D), no need to pass
    them explicitly

Job name and policy_repo_id mirror the SmolVLA2 ``_tool-g2`` naming
so the two runs can be compared side-by-side in WandB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:03:09 +02:00
Pepijn 12cce8f2cc fix(smolvla2): align flow_loss_weight default with Pi 0.5 paper's α=10
Pi 0.5 paper §IV.D Eq. (1) sets the loss balance to α=10 between text
CE and flow MSE: actions are the primary output and the flow head
should dominate the gradient signal. SmolVLA2 was defaulting both
weights to 1.0, which inverts that — text CE (~0.5-2.0 nats) ends up
larger than flow MSE (~0.1-1.0), so the action expert gets less
gradient than the LM head despite being the primary task.

Match the paper's split: text_loss_weight=1.0, flow_loss_weight=10.0.
Same as ``pi052`` (the new full reproduction policy).

Also pin the values explicitly in the SLURM launcher so the choice is
visible and overridable per-run rather than buried in the config
default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:02:17 +02:00
Pepijn ef5879a02a feat(pi052): π0.5 v2 — full reproduction of the π0.5 paper recipe
New ``lerobot.policies.pi052`` (parallel to ``smolvla2``) that adds
text-prediction + hierarchical-inference on top of the existing π0.5
implementation. Mirrors the paper's §IV.D dual-head training:

  L = H(text) + α * ‖ω - a - f_θ_action(...)‖²,  α = 10

Components:

  * ``configuration_pi052.py``     thin PI05Config subclass; adds
                                    recipe_path, text/flow loss weights
                                    (default α=10 per paper), prompt
                                    dropout knobs, ``unfreeze_lm_head``.
  * ``text_processor_pi052.py``    PI052TextTokenizerStep — concatenates
                                    rendered messages as ``Role: ...``
                                    plain text (PaliGemma has no chat
                                    template), tokenises with the
                                    PaliGemma tokenizer, builds a label
                                    mask covering supervised target
                                    spans. Includes Pi 0.7 §V.E
                                    per-component prompt dropout.
  * ``processor_pi052.py``         make_pi052_pre_post_processors —
                                    Rename + Batch + Relative +
                                    Normalize + RenderMessagesStep +
                                    PI052TextTokenizerStep + Device.
                                    Falls back to π0.5's plain pipeline
                                    when recipe_path is unset.
  * ``modeling_pi052.py``          PI052Policy(PI05Policy) — re-enables
                                    PaliGemma ``lm_head``, computes
                                    text_loss via CE on the supervised
                                    span, sums with flow_loss in
                                    forward(), and adds select_message
                                    for AR text generation at inference
                                    (same surface as
                                    SmolVLA2Policy.select_message so
                                    SmolVLA2Runtime drives it unchanged).

Plus the supporting plumbing:

  * recipe ``configs/recipes/pi052_hirobot.yaml`` — same Hi-Robot blend
    as smolvla2_hirobot.yaml, with the same ``${subtask}`` /
    ``if_present`` supervision fix (current span at every frame, not
    ``${next_subtask}``).
  * SLURM ``examples/training/pi052_hirobot.slurm`` — full training
    command matching the SmolVLA2 launcher.
  * factory registration: ``--policy.type=pi052`` resolves to
    PI052Policy with the new processor.

Same multi-rate runtime (``lerobot.policies.smolvla2.inference``)
drives this policy too — both expose ``predict_action_chunk`` for the
action expert and ``select_message`` for the LM head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 10:59:26 +02:00
Pepijn 1d24301b67 chore(training): STEPS=15000 default + dropout walked back to 0.30/0.30/0.20
After _tool-good (2000 steps, 0.50/0.50/0.20 dropout) the LM head's
distribution at position 0 shifted from EOS to subtask-vocabulary
tokens but emitted bag-of-words ("cube arm and") rather than well-
formed sentences. That's the expected mid-fine-tuning phase: token-
level supervision has landed, sequence-level grammar hasn't.

Two changes for the next retrain:

  * STEPS=15000 (from 2000) — chat-pretrained backbones need O(10k+)
    steps to walk their pretraining priors down far enough to commit
    to the fine-tuned distribution structurally, not just at the
    token level. _tool-g2's bag-of-words output proves the model is
    on the right path; it just needs more gradient signal.

  * plan/memory dropout 0.50 -> 0.30 — 0.50 was probably too
    aggressive for a small dataset. Half the training samples had
    crucial context missing, which slows down learning the full
    conditional structure. 0.30 still regularises against prompt
    leakage but lets the model learn proper grammar first; the
    higher dropout can be revisited once the head is solid.

Subtask dropout stays at 0.20 since subtask isn't in the high-level
prompt anyway (recipe fix removed the "Current subtask:" message).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 10:46:19 +02:00
Pepijn b6fb536460 chore(training): bump plan/memory dropout to 0.50 to force vision-grounding
After the recipe fix (target=${subtask} at every frame) the model
can still reach low text_loss by reading the answer off the plan in
the prompt: at training the prompt contains the 6-step plan, and the
current subtask is one of those steps, so the model just learns
"active step N matches subtask N" and never needs to look at the
image. Symptom at inference: subtask string is set but never updates
because the model isn't really conditioning on the visual progress.

Drop plan and memory with p=0.50 each — half of training frames the
prompt is just "${task}" (constant for this dataset) + visual prefix,
which is the only place the answer can come from. Forces the LM head
to actually use vision.

``subtask_dropout`` stays at 0.20 because subtask isn't in the
high-level prompt anymore (recipe fix removed the "Current subtask:
X" message); the knob still affects other sub-recipes that reference
it as context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 21:31:00 +02:00
Pepijn 4908433f9a chore(training): align smolvla2_hirobot.slurm with what's actually run
Match the operator's current training command for the _tool6 retrain:

  * default DATASET / POLICY_REPO_ID / JOB_NAME point at the tool6
    iteration (super_poulain_full_tool3 → smolvla2_hirobot_super_poulain_tool6)
  * STEPS default 2000 (short enough to iterate; bump to 10k for full)
  * save_freq=$STEPS so the only checkpoint is the final one
  * OUTPUT_DIR includes step count so successive runs don't clobber
  * Drop the wider augmentation envelope I added earlier — back to
    default ColorJitter ranges (brightness ±20% etc) since the
    high_level_subtask recipe fix (current-subtask supervision) is
    expected to fix the LM-head collapse on its own; the augmentation
    is just the standard regulariser, not a load-bearing widener.
  * prompt-dropout fractions stay at the original 0.15 / 0.15 / 0.20.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:45:38 +02:00
Pepijn 47fb8318b1 chore(training): widen augmentation envelope after live-robot diagnostic
The tensor-level comparison between dry-run (dataset frame) and live-
robot inference proved the runtime is bug-free — same shape, dtype,
device, channel order, batch dim, and normalization on both paths.
The remaining variable: front-camera mean brightness was 0.26 live vs
0.39 on the dataset frame, ~33% darker. Training augmentation only
covered ±20% brightness, so the live scene sits just outside the
supervised envelope and the LM head collapses to its dominant prior.

Widen the augmentation knobs for the next retrain:

  * brightness    0.8–1.2  → 0.5–1.6   (covers ~30% darker / 60% lighter)
  * contrast      0.8–1.2  → 0.6–1.5
  * saturation    0.5–1.5  → 0.3–1.7
  * hue          ±0.05    → ±0.10
  * affine        ±5°/±5%  → ±15°/±15% (covers cube placement / camera drift)
  * max_num_transforms 3 → 4

And bump prompt-component dropout (subtask 0.20 → 0.30) so the LM
can't lean on stale memorised plan/memory at inference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:25:41 +02:00
Pepijn 01e2228b24 feat(smolvla2): per-component prompt dropout + augmented training script
Two complementary regularisers to attack the
``text_loss=6e-6 = memorised one dataset`` failure mode that's
making the model collapse on real-robot input:

1. **Per-component prompt dropout** (Pi0.7 §V.E / plan's
   ``feat/pi05-prompt-dropout`` follow-up).
   ``SmolVLA2ChatTokenizerStep`` gains
   ``plan_dropout_prob`` / ``memory_dropout_prob`` /
   ``subtask_dropout_prob`` knobs (default 0.0 — opt-in). At training,
   non-target messages whose rendered content starts with
   ``Plan:`` / ``Memory:`` / ``Current subtask:`` etc. are dropped
   with their respective probability before tokenisation, with a
   deterministic per-sample RNG keyed off the dataset ``index``.
   ``target_message_indices`` is re-mapped so the supervision still
   lands on the right turn. Forces the model to handle missing
   plan/memory/subtask context — directly attacks the real-robot
   collapse where a stale or empty plan field puts the prompt OOD.

   Surfaced on ``SmolVLA2Config`` as three floats so they're
   ``--policy.<knob>=<value>``-controllable from the train CLI;
   plumbed through ``make_smolvla2_pre_post_processors``.

2. **Image augmentation** is already wired in lerobot via
   ``--dataset.image_transforms.enable=true`` (torchvision v2
   ColorJitter + SharpnessJitter + RandomAffine, default 3 of 6
   sampled per frame). No code change needed — just a CLI flag.

``examples/training/smolvla2_hirobot.slurm`` shows the full
training command with both enabled. Drop-in replacement for the
ad-hoc SLURM script Pepijn was using locally; same args, plus the
three dropout probs and the image-transforms flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:52:32 +02:00
Steven Palma df0763a2bc feat(dependencies): minimal default tag install (#3362) 2026-04-12 20:03:04 +02:00
Steven Palma d90e4bcfd3 refactor(dataset): modular files (#3171)
* refactor(dataset): modular files

* refactor(dataset): update imports across the codebase
2026-03-15 23:58:09 -07:00
Steven Palma 7cf04a5ec3 chore: move constants to utils (#2016) 2025-09-24 11:11:53 +02:00
Pepijn d65668ff3c Add docs for LeRobot Image transforms (#1972)
* Remove unused scripts, add docs for image transforms and add example

* fix(examples): move train_policy.py under examples, remove outdated readme parts

* remove script thats copied to train folder

* remove outdated links to examples and example tests
2025-09-19 15:19:49 +02:00