mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-19 10:40:04 +00:00
bfb8cfb432
The chat tokenizer passed assistant `tool_calls` straight to `apply_chat_template`, which renders them as a structured JSON `<tool_call>` block — so the LM head was trained to emit JSON. But the inference parser `_split_plan_and_say` looks for a `<say>...</say>` marker, which the model never saw in training, so the `say` tool never fired at inference. `_flatten_say_tool_calls` is the missing training-time serializer (the one `_split_plan_and_say`'s docstring already assumed existed): it rewrites a `say` tool call into a `<say>...</say>` marker inside the content text before the chat template runs, so the template only tokenizes plain text and the supervised target span trains the model to emit exactly the marker the runtime parses back (Pi 0.5-style flat tool-call serialization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>