mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-24 13:09:43 +00:00
fix(smolvla2): coerce str content to list-of-blocks for chat template
SmolVLM's chat template (and many other multimodal templates) declares
``message['content']`` as a list of typed blocks and iterates it
expecting dicts with a ``'type'`` field:
{% for line in message['content'] %}
{% if line['type'] == 'text' %}{{ line['text'] }}
{% elif line['type'] == 'image' %}{{ '<image>' }}
{% endif %}
{% endfor %}
When the caller passes ``content`` as a plain ``str`` (which we did
throughout ``_msgs_for_subtask`` / ``_msgs_for_memory`` etc.), Jinja
silently iterates the string character-by-character. ``'P'['type']``
returns nothing; neither branch fires; *no text tokens get emitted*.
The model receives a prompt containing only role markers
(``User:<end_of_utterance>\nAssistant:``) and predictably continues by
emitting ``Assistant:`` fragments — the gibberish ``subtask: Ass\n::``
on the runtime panel.
Before calling ``apply_chat_template``, walk the messages and rewrite
any string ``content`` into ``[{'type': 'text', 'text': content}]``.
The template's text branch then fires correctly and the model sees
the actual user/assistant text, not just structural tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -171,6 +171,17 @@ def _build_text_batch(policy: Any, prompt_messages: list[dict[str, Any]]) -> dic
|
||||
tokenizer.pad_token = tokenizer.eos_token
|
||||
|
||||
text_messages = [_strip_recipe_keys(m) for m in prompt_messages]
|
||||
# SmolVLM's chat template iterates ``message['content']`` expecting
|
||||
# a list of typed blocks (``[{type: 'text', text: ...}, ...]``).
|
||||
# When ``content`` is a plain ``str`` it silently iterates characters,
|
||||
# no branch matches, and *no content tokens are emitted* — the model
|
||||
# receives only role markers and starts hallucinating ``Assistant:``
|
||||
# fragments. Coerce string content to the list-of-blocks form the
|
||||
# template expects.
|
||||
for _m in text_messages:
|
||||
_c = _m.get("content")
|
||||
if isinstance(_c, str):
|
||||
_m["content"] = [{"type": "text", "text": _c}]
|
||||
encoded = tokenizer.apply_chat_template(
|
||||
text_messages,
|
||||
add_generation_prompt=True,
|
||||
|
||||
Reference in New Issue
Block a user