lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-07 18:11:50 +00:00

Files

T

Pepijn 426d48dbbf fix(pi052): port the smolvla2 text-head fixes to pi052

pi052 had the same text-CE collapse bug smolvla2 had — PaliGemma's
embed_prefix flags the language block att=0, so make_att_2d_masks makes
it fully bidirectional and the text cross-entropy degenerates into a
copy task. Ported the three model-specific fixes:

- _mark_target_span_causal: set att=1 on supervised target language
  positions so the text-CE is genuine causal next-token prediction.
  Applied in both _compute_all_losses_fused and _compute_text_and_fast_loss.
- flow_loss_weight 10.0 -> 5.0: the paper's a=10 swamps the LM head once
  the flow-only low_level recipe fires often (matches SmolVLA2Config).
- _flatten_say_tool_calls in the text tokenizer: serialize `say` tool
  calls into a <say>...</say> marker so the spoken reply is tokenized
  and supervised (PaliGemma's flat prompt has no structured calls, so
  they were dropped entirely).

select_message needed no change: pi052's prefix is [images, language]
with no trailing state token, so it already decodes from the last
language token.

Regression tests mirror the smolvla2 attention-masking + tool-call suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:42:19 +02:00

annotations

refactor(annotate): drop dataset-level `tools` parquet column

2026-04-30 18:48:36 +02:00

artifacts

feat(dataset): 2x faster dataloader via parallel decode, uint8 transport, and persistent workers (#3406 )

2026-04-19 00:08:22 +02:00

async_inference

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

cameras

test(cameras): skip flaky async_read test (#3106 )