lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-17 15:01:54 +00:00

Files

T

Pepijn 426d48dbbf fix(pi052): port the smolvla2 text-head fixes to pi052

pi052 had the same text-CE collapse bug smolvla2 had — PaliGemma's
embed_prefix flags the language block att=0, so make_att_2d_masks makes
it fully bidirectional and the text cross-entropy degenerates into a
copy task. Ported the three model-specific fixes:

- _mark_target_span_causal: set att=1 on supervised target language
  positions so the text-CE is genuine causal next-token prediction.
  Applied in both _compute_all_losses_fused and _compute_text_and_fast_loss.
- flow_loss_weight 10.0 -> 5.0: the paper's a=10 swamps the LM head once
  the flow-only low_level recipe fires often (matches SmolVLA2Config).
- _flatten_say_tool_calls in the text tokenizer: serialize `say` tool
  calls into a <say>...</say> marker so the spoken reply is tokenized
  and supervised (PaliGemma's flat prompt has no structured calls, so
  they were dropped entirely).

select_message needed no change: pi052's prefix is [images, language]
with no trailing state token, so it already decodes from the last
language token.

Regression tests mirror the smolvla2 attention-masking + tool-call suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:42:19 +02:00

groot

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

hilserl

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

multi_task_dit

fix(test): add missing device placement in multi-task DiT tests (#3349 )

2026-04-14 12:25:29 +02:00

pi0_fast

chore(dependecies): untangle dependecies across internal modules (#3149 )