fix(smolvla2): coerce str content to list-of-blocks for chat template

SmolVLM's chat template (and many other multimodal templates) declares
``message['content']`` as a list of typed blocks and iterates it
expecting dicts with a ``'type'`` field:

    {% for line in message['content'] %}
      {% if line['type'] == 'text' %}{{ line['text'] }}
      {% elif line['type'] == 'image' %}{{ '<image>' }}
      {% endif %}
    {% endfor %}

When the caller passes ``content`` as a plain ``str`` (which we did
throughout ``_msgs_for_subtask`` / ``_msgs_for_memory`` etc.), Jinja
silently iterates the string character-by-character. ``'P'['type']``
returns nothing; neither branch fires; *no text tokens get emitted*.
The model receives a prompt containing only role markers
(``User:<end_of_utterance>\nAssistant:``) and predictably continues by
emitting ``Assistant:`` fragments — the gibberish ``subtask: Ass\n::``
on the runtime panel.

Before calling ``apply_chat_template``, walk the messages and rewrite
any string ``content`` into ``[{'type': 'text', 'text': content}]``.
The template's text branch then fires correctly and the model sees
the actual user/assistant text, not just structural tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-12 15:01:53 +02:00
parent fe4bd2b6ba
commit fc715db4a3
@@ -171,6 +171,17 @@ def _build_text_batch(policy: Any, prompt_messages: list[dict[str, Any]]) -> dic
tokenizer.pad_token = tokenizer.eos_token
text_messages = [_strip_recipe_keys(m) for m in prompt_messages]
# SmolVLM's chat template iterates ``message['content']`` expecting
# a list of typed blocks (``[{type: 'text', text: ...}, ...]``).
# When ``content`` is a plain ``str`` it silently iterates characters,
# no branch matches, and *no content tokens are emitted* — the model
# receives only role markers and starts hallucinating ``Assistant:``
# fragments. Coerce string content to the list-of-blocks form the
# template expects.
for _m in text_messages:
_c = _m.get("content")
if isinstance(_c, str):
_m["content"] = [{"type": "text", "text": _c}]
encoded = tokenizer.apply_chat_template(
text_messages,
add_generation_prompt=True,