Files
lerobot/tests/policies
Pepijn 426d48dbbf fix(pi052): port the smolvla2 text-head fixes to pi052
pi052 had the same text-CE collapse bug smolvla2 had — PaliGemma's
embed_prefix flags the language block att=0, so make_att_2d_masks makes
it fully bidirectional and the text cross-entropy degenerates into a
copy task. Ported the three model-specific fixes:

- _mark_target_span_causal: set att=1 on supervised target language
  positions so the text-CE is genuine causal next-token prediction.
  Applied in both _compute_all_losses_fused and _compute_text_and_fast_loss.
- flow_loss_weight 10.0 -> 5.0: the paper's a=10 swamps the LM head once
  the flow-only low_level recipe fires often (matches SmolVLA2Config).
- _flatten_say_tool_calls in the text tokenizer: serialize `say` tool
  calls into a <say>...</say> marker so the spoken reply is tokenized
  and supervised (PaliGemma's flat prompt has no structured calls, so
  they were dropped entirely).

select_message needed no change: pi052's prefix is [images, language]
with no trailing state token, so it already decodes from the last
language token.

Regression tests mirror the smolvla2 attention-masking + tool-call suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:42:19 +02:00
..
2025-12-18 12:50:32 +01:00