lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-07 18:11:50 +00:00

Files

T

Pepijn 34269a5d78 fix(pi052): register PaliGemma <loc> tokens so they tokenize as single ids

THE bug behind the <loc>-salad. PaliGemma's vocab reserves ids
[256000, 257023] for <locDDDD> detection / pointing tokens, but the
stock AutoTokenizer does NOT match them on raw text — it BPE-splits
<loc0162> into SEVEN pieces (<, loc, 0, 1, 6, 2, >). So a VQA target
like "<loc0162><loc0759> green box<eos>" tokenized to 16 pieces, not
5, and training the LM head supervised those generic BPE pieces
instead of one detection-vocab id. The piece logits got pumped up
across ~25% of supervised positions; at inference they dominated
every turn — even subtask prompts produced <loc>-salad followed by
the actual answer.

Register the 1024 <locDDDD> tokens via tokenizer.add_tokens once on
load, in every path the policy uses: PI052TextTokenizerStep (training
encode), _build_text_batch_pi052 (runtime encode), and
select_message's default tokenizer (runtime decode). Verified
empirically with the real PaliGemma tokenizer: VQA target now
tokenizes to 5 ids matching the loc-vocab range (256162, 256759, ...)
with correct offset_mapping.

This unlocks PaliGemma's actual detection prior; <loc>-salad cannot
recur because each <locDDDD> is a single class on the LM head, not a
character sequence the head accidentally learns to extend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 11:41:41 +02:00

annotations

refactor(recipes): rename recipes, drop pi05_hirobot

2026-05-18 16:02:15 +02:00

artifacts

feat(dataset): 2x faster dataloader via parallel decode, uint8 transport, and persistent workers (#3406 )

2026-04-19 00:08:22 +02:00

async_inference

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

cameras

test(cameras): skip flaky async_read test (#3106 )