mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-18 10:10:08 +00:00
feat(language): task_aug style + automatic ${task} rephrasing rotation
Adds task-prompt diversity (Xiao 2022 / CAST) without touching
``meta/tasks.parquet`` or forcing recipes to opt in. The plan reserved
``task_aug`` as a future style; this lands it now.
- ``language.py``: add ``task_aug`` to ``CORE_STYLES`` and
``PERSISTENT_STYLES``. ``column_for_style("task_aug")`` returns
``language_persistent`` so PR 2 writers route it correctly.
- ``language_render.py``: ``_resolve_task`` now consults the persistent
slice for rows of ``style="task_aug", role="user"``. When any exist
it picks one deterministically by ``sample_idx`` (blake2b-keyed, not
Python's randomized hash) so an epoch sees every rephrasing of every
episode while the same sample still resolves identically across
reruns. Falls back to the canonical ``meta/tasks.parquet`` task when
no rephrasings are present, so existing datasets and unannotated runs
keep their behaviour. Explicit ``task=`` overrides still win.
- Tests: rephrasing coverage across samples, determinism on repeat
``sample_idx``, fallback when persistent has no ``task_aug`` rows,
and explicit override priority.
Recipes get this for free: any ``${task}`` placeholder rotates through
the available rephrasings. Recipes that want the literal canonical task
can override the binding.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -43,7 +43,7 @@ def test_language_arrow_schema_has_expected_fields():
|
||||
|
||||
|
||||
def test_style_registry_routes_columns():
|
||||
assert {"subtask", "plan", "memory", "motion"} == PERSISTENT_STYLES
|
||||
assert {"subtask", "plan", "memory", "motion", "task_aug"} == PERSISTENT_STYLES
|
||||
assert {"interjection", "vqa", "trace"} == EVENT_ONLY_STYLES
|
||||
assert PERSISTENT_STYLES | EVENT_ONLY_STYLES <= STYLE_REGISTRY
|
||||
|
||||
@@ -51,6 +51,7 @@ def test_style_registry_routes_columns():
|
||||
assert column_for_style("plan") == LANGUAGE_PERSISTENT
|
||||
assert column_for_style("memory") == LANGUAGE_PERSISTENT
|
||||
assert column_for_style("motion") == LANGUAGE_PERSISTENT
|
||||
assert column_for_style("task_aug") == LANGUAGE_PERSISTENT
|
||||
assert column_for_style("interjection") == LANGUAGE_EVENTS
|
||||
assert column_for_style("vqa") == LANGUAGE_EVENTS
|
||||
assert column_for_style("trace") == LANGUAGE_EVENTS
|
||||
|
||||
Reference in New Issue
Block a user