Merge remote-tracking branch 'origin/feat/language-annotation-pipeline' into feat/smolvla-on-steerable

# Conflicts: # src/lerobot/datasets/__init__.py # src/lerobot/policies/__init__.py # src/lerobot/policies/factory.py # src/lerobot/processor/render_messages_processor.py # uv.lock
2026-07-24 10:16:09 +00:00 · 2026-05-25 16:56:22 +02:00
parent 83d0c390da c37b1fc7d0
commit 1e9a6d044d
184 changed files with 13830 additions and 5021 deletions
@@ -40,7 +40,7 @@ frame the row sits on already provides it):
 role: string
 content: string | null
 style: string | null
-timestamp: float64        # persistent rows only
+timestamp: float32        # persistent rows only
 camera: string | null     # observation.images.* feature key, view-dependent rows only
 tool_calls: list[Json] | null
 ```
@@ -64,6 +64,23 @@ The language stack itself has three internal modules backing layer 1:

 `LeRobotDataset` stays recipe-agnostic. It passes `language_persistent` and `language_events` through when present, and unannotated datasets keep their existing behavior.

+## Layer 2 — recipe anatomy
+
+Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`. They
+declare which annotation rows to pull (via `bindings`) and how to compose them
+into chat turns (`messages`).
+
+```yaml
+messages:
+  - { role: user, content: "${task}", stream: high_level }
+  - { role: assistant, content: "${subtask}", stream: low_level, target: true }
+```
+
+A recipe can also branch into a weighted **blend** of sub-recipes. At sample
+time, exactly one branch is selected deterministically from the sample index,
+so different frames train different objectives (e.g. memory updates vs.
+low-level execution vs. VQA) without any Python wiring.
+
 ### Temporal semantics

 Persistent styles are active after emission until replaced:
@@ -112,23 +129,6 @@ ask_vqa_top:

 Add one such sub-recipe per camera the dataset records.

-## Layer 2 — recipe anatomy
-
-Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`. They
-declare which annotation rows to pull (via `bindings`) and how to compose them
-into chat turns (`messages`).
-
-```yaml
-messages:
-  - { role: user, content: "${task}", stream: high_level }
-  - { role: assistant, content: "${subtask}", stream: low_level, target: true }
-```
-
-A recipe can also branch into a weighted **blend** of sub-recipes. At sample
-time, exactly one branch is selected deterministically from the sample index,
-so different frames train different objectives (e.g. memory updates vs.
-low-level execution vs. VQA) without any Python wiring.
-
 ## Layer 3 — training format

 Rendered samples use HF-style chat messages plus LeRobot sidecars:
@@ -139,16 +139,12 @@ sample["message_streams"]
 sample["target_message_indices"]
 ```

-<<<<<<< HEAD
-The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone.
+The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone, which keeps the same dataset usable across SmolVLA, Pi0.5, and any future VLM that expects OpenAI-style chat messages.

 ## Blends

 Blend recipes select one weighted sub-recipe deterministically from the sample index.
 `recipes/subtasks_vqa.yaml` trains the core blend — high-level subtask prediction, low-level execution, and VQA. `recipes/subtask_mem_vqa_speech.yaml` is the fuller variant that also adds memory updates and spoken interjection responses.
-=======
-The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone, which keeps the same dataset usable across SmolVLA, Pi0.5, and any future VLM that expects OpenAI-style chat messages.
->>>>>>> origin/feat/language-annotation-pipeline

 ## Graceful absence