lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-11 14:49:43 +00:00

Author	SHA1	Message	Date
Pepijn	965d42825f	review: skip-count fix, atomic writes, dedupe span reconstruction, role guards #1 Plan-update phase reports correct skip count. ``_run_plan_update_phase`` only ran ``run_plan_updates`` for episodes with at least one interjection but hardcoded ``episodes_skipped=0``. The summary undercounted skipped episodes. Now returns ``len(records) - processed`` so processed + skipped == total. #2 ``run_hf_job.py`` installs ``openai``. The ``CMD`` block does ``pip install --no-deps lerobot[branch]`` then explicitly lists transitive deps. ``openai`` was missing — and since ``VlmConfig.backend`` defaults to ``"openai"``, the job would have ``ImportError``'d when ``vlm_client._make_openai_client`` ran. #3 Dedupe subtask-span reconstruction. Module 1's ``_reconstruct_subtasks_from_rows`` (no ``and spans`` guard) and Module 2's ``_read_subtask_spans`` (with the guard) had near- identical logic. Promoted to ``reconstruct_subtask_spans`` in ``reader.py`` using the safer guarded form. Both modules now import the single helper. #5 Atomic staging.py JSONL writes. Mirroring the parquet-writer fix from an earlier review round: ``EpisodeStaging.write`` now writes to a sibling ``.tmp`` and ``Path.replace`` atomically. A crash mid-write can no longer leave a half-written JSONL that ``read()`` would then fail to parse. #6 Atomic ``info.json`` write. Same pattern in ``executor._ensure_annotation_metadata_in_info`` — ``info.json`` is load-bearing for dataset metadata, so partial writes brick the dataset. #7 Writer's role-key guard. ``_normalize_persistent_row`` and ``_normalize_event_row`` accessed ``row["role"]`` directly while every other field used ``.get()``. Pre-validate ``"role" in row`` and raise a friendly ``ValueError`` naming the row, so a future module that accidentally drops ``role`` fails with a triagable message instead of a bare KeyError deep in the writer. #8 Last subtask span's ``end`` extends to episode end. ``reconstruct_subtask_spans`` (the new shared helper) takes an optional ``episode_end_t``. When provided, the final span's ``end`` is closed to that timestamp instead of equalling its own ``start`` (zero duration). Both Module 1's plan-update pass and Module 2's interjection anchoring pass ``record.frame_timestamps[-1]``, so downstream "current subtask at refresh_t" lookups no longer miss refreshes that land inside the final span. Sweep: 66 passed, 0 failed. Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:18:09 +02:00
Pepijn	1238a0cd47	test(annotate): unstale the two failing module tests Both tests were stale relative to design changes that landed earlier on this branch. Update the tests to match the current production contract. ``test_module1_attaches_video_block_to_subtask_prompt`` The test took ``captured[0]`` and asserted on its content blocks, but Module 1 issues several sub-prompts and the rephrasings call (which is text-only, no video block) usually lands first. Two fixes: * The test's intent is "the subtask prompt carries the video block" — not "the first prompt carries it". Pick the call by content (``"atomic subtasks"`` keyword in the text block) so the test is resilient to future reordering of unrelated sub-prompts. * Set ``n_task_rephrasings=0`` so the rephrasings call is skipped entirely — keeps the test focused on ``_generate_subtasks``. ``test_module2_mid_episode_emits_paired_interjection_and_speech`` Two issues both rooted in design changes on the branch: 1. ``InterjectionsAndSpeechModule._mid_episode_interjections`` now anchors interjections on subtask boundaries from Module 1's staging tree, bailing out with zero rows when no spans exist. The production executor runs Module 1 first; the test ran Module 2 in isolation. Reproduce the contract by seeding two ``style=subtask`` rows in the staging before calling Module 2 — gives it the single ``0 → 1`` boundary it needs. 2. The test's stub responder used the marker ``"ONE realistic interruption"`` to match the interjection prompt, but that string is from a previous prompt version. The current ``module_2_interjection.txt`` says ``"Write ONE interjection..."`` — the old prompt asked for counterfactual interjections (e.g. "skip the wipe"), the new one anchors on the upcoming subtask. Marker updated to ``"Write ONE interjection"``; canned response wording aligned to the new design. Sweep on the language stack: 66 passed, 0 failed (was 64 passed, 2 failed). Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:59:27 +02:00
Pepijn	53c7641885	review: fix dead-code bug, add thread safety, atomic writes, smaller cleanups Critical: video_for_episode was unreachable dead code. ``video_for_episode`` was indented inside ``_decode_pyav_direct``, after its ``return`` statement — Python parsed it as a nested function that never executed. Module 1's ``_episode_video_block`` calls ``self.frame_provider.video_for_episode(record, target_count)`` on the ``use_video_url=False`` path, which would have AttributeError'd on any real dataset. Tests passed only because they used ``_StubFrameProvider`` / ``_NullProvider`` which have the method. Moved it to be a proper method of ``VideoFrameProvider`` (right after ``frames_at``). Thread safety on VideoFrameProvider. The executor runs Module 1/2/3 phases under a ``ThreadPoolExecutor``, so the per-instance ``_cache`` dict and the one-shot ``_warned_decode_fail`` flag were exposed to concurrent reads/writes. Added a ``threading.Lock`` field, wrapped cache reads/writes and the warn-flag check-and-set in ``with self._lock:``. Stub fixtures unaffected. episode_clip_path is now a method of VideoFrameProvider. Used to be a free function reaching into ``provider._meta.episodes`` and ``provider._meta.get_video_file_path`` from outside the class. As a method it just uses ``self._meta``. The only caller (Module 1) updated; no external callers. Atomic write in LanguageColumnsWriter. ``pq.write_table(new_table, path)`` was overwriting the parquet shard in place — a crash mid-write would corrupt the file. Now writes to a sibling ``.tmp`` and ``Path.replace`` atomically. Smaller items: * ``executor.py`` docstring opened with "four phases" but listed six. Now says "six phases" to match. * ``[annotations]`` extra in ``pyproject.toml`` now includes ``openai>=1.40,<2.0``. Default ``VlmConfig.backend`` is ``"openai"``, so without it ``_make_openai_client`` would ImportError on a fresh ``uv sync --extra annotations``. * ``_snap_to_frame`` was duplicated identically in ``plan_subtasks_memory.py`` and ``interjections_and_speech.py``. Promoted to ``snap_to_frame`` in ``reader.py`` (next to ``EpisodeRecord``); both modules now import it. Backwards-compat alias not needed — no external callers. * ``EpisodeRecord.frames_df()`` was re-reading the full parquet on every call. Now memoizes via a private dataclass field so repeat calls from different modules pay the cost once. Method signature unchanged. * ``_extract_first_json_object`` had a redundant ``and not escape`` guard that was dead because the prior block already handled and reset ``escape``. Replaced with a comment explaining the invariant. Pre-existing lint cleanups surfaced once these files entered pre-commit's scope: * dead local ``client = clients[0]`` in ``_make_openai_client`` (the real round-robin uses ``clients[rr_counter[...]]``). * ``cmd = ... if "{port}" in cmd else f"...{port}"`` ternary collapse in ``_spawn_parallel_inference_servers``. * ``seek_pts = 0 if stream.time_base is None else int(...)`` ternary collapse in ``_decode_pyav_direct``. * ``# nosec B310`` on the localhost ``urllib.request.urlopen`` probe in ``_server_is_up`` — the URL is the user-configured local-server endpoint the CLI itself spawned, not arbitrary user input. Test added. ``tests/annotations/test_frames.py`` pins the regression on ``VideoFrameProvider``: asserts ``video_for_episode`` and ``episode_clip_path`` are callable methods (not nested dead code or free functions), and that the ``_lock`` field is a real ``threading.Lock``. Sweep: 64 passed, 2 failed (same pre-existing module-impl bugs as before this commit). Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:53:43 +02:00
Pepijn	088c8371df	refactor(annotate): consolidate Module 1's prompt → VLM → JSON-extract pattern Five Module 1 sub-prompts (`_derive_task_from_video`, `_generate_task_rephrasings`, `_generate_subtasks`, `_generate_plan`, `_generate_memory`) all repeated the same shape: result = self.vlm.generate_json([messages])[0] if isinstance(result, dict) and isinstance(result.get(<field>), <type>): ... …each spelled with slightly different field names + post-processing. Three small helpers replace it: * `_vlm_field(messages, field)` — single VLM call, returns ``result[field]`` or ``None``. Centralizes the ``generate_json([m])[0]`` + ``isinstance(dict)`` dance. * `_text_message(text)` — wraps a string in the canonical user-message shape every text-only prompt builds inline. * `_video_message(record, prompt)` — combines the episode video block with a prompt; replaces the duplicated video-block construction inside `_generate_subtasks` (which previously inlined the same ``use_video_url``/``frames_per_second``/``max_video_frames`` branches that `_episode_video_block` already implements). Net -35 LOC. Each call site now is 3-5 lines instead of 10-20. The public method signatures are unchanged so tests don't move. Drive-by: `_task_seems_bad` collapsed via SIM103 fix; `zip` in `run_plan_updates` annotated `strict=True` per ruff B905. Tests: same 2 pre-existing module-impl failures (`test_module1_attaches_video_block_to_subtask_prompt`, `test_module2_mid_episode_emits_paired_interjection_and_speech`) — they were failing on `origin/feat/language-annotation-pipeline` before this commit and continue to do so for the same reasons. 61/63 in the language stack pass; pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:29:45 +02:00
Pepijn	3a52a18b0e	Merge branch 'feat/language-columns' into feat/language-annotation-pipeline Resolve conflicts and pull in the latest PR 1 fixes. Conflicts: - pyproject.toml: PR 1 added `lerobot-rollout` and PR 2 added `lerobot-annotate` to the same `[project.scripts]` block. Kept both. - uv.lock: dropped both sides and regenerated against the merged `pyproject.toml` (PR 2 dropped the `datatrove` dep when distribution moved to HF Jobs; PR 1's lock didn't have it). Test follow-up: - `tests/annotations/test_pipeline_recipe_render.py` — PR 1 deleted `src/lerobot/configs/recipes/pi05_hirobot.yaml` (review feedback: remove the canonical-recipe file; recipes are user-supplied). The cross-PR contract this test guards is "the recipe DSL renders non-empty messages from pipeline output", which doesn't depend on any specific YAML, so the test now builds an inline blend recipe with the same coverage. Passes. Sweep: 82 passed, 2 failed (pre-existing module-impl bugs: `test_module1_attaches_video_block_to_subtask_prompt`, `test_module2_mid_episode_emits_paired_interjection_and_speech`). The PR 1 carryover (`test_emitted_at_raises_on_ambiguous_per_camera_vqa`) is now passing — the merge brought in PR 1's tightened `_select_one` ambiguity check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:13:11 +02:00
Pepijn	dad2cf1178	refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch The executor previously claimed it would "optionally hand off" to datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it already runs phases inline in every code path, and HF Jobs (see ``examples/annotation/run_hf_job.py``) is the actual distribution strategy. Stop pretending we have an executor selector. * `executor.py`: drop `select_executor_class`, the "kind" log line, and the references to LocalPipelineExecutor / SlurmPipelineExecutor. Module docstring now says distribution is delegated to HF Jobs. * `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`, `slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only `episode_parallelism`. While here, prune the longer "why" docstrings on every field down to the load-bearing bits — full story moves to `docs/source/annotation_pipeline.mdx`. * `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the `[annotations]` extra; the dep was only there for the (never used) cluster executors. Comment block notes the new HF-Jobs delegation. * `reader.py`, `lerobot_annotate.py`: drop their own datatrove / flavor-namespace mentions. * `docs/source/annotation_pipeline.mdx`: - remove the flavor-namespace / sidecar paragraph (out of scope — "multiple revisions = multiple copies" is dataset-level policy); - remove the "writer drops the legacy `subtask_index` column" note (already covered by PR 1's intentional-break call-out); - remove the chat-template + `apply_chat_template(messages, tools=...)` line (covered by Tools doc); - replace the "executor picks Local vs Slurm" paragraph with `--executor.episode_parallelism` and a pointer to HF Jobs; - rewrite the style→recipe section to talk about "recipes" generically instead of pinning a specific YAML; - add a "Running on Hugging Face Jobs" section pointing at `examples/annotation/run_hf_job.py`; - add a "Running locally" example matching the CLI's docstring (`uv run lerobot-annotate --root=... --vlm.model_id=...`); - extend the paper-inspirations list with Pi0.7 and Steerable VLA Policies (Zhao 2025) for Module 3. Tests: same 3 pre-existing failures as before this commit (2 module assertions still in flight; 1 carryover from PR 1). 41/44 pass. Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:09:22 +02:00
Pepijn	bce5387e04	Merge branch 'main' into feat/language-columns	2026-05-08 10:29:49 +02:00
Steven Palma	c8ce413d73	fix(robots): allign lekiwi default with so100 use_degrees (#3531 )	2026-05-07 17:52:34 +02:00
Pepijn	82dffde7fa	fix(ci): speed up multi-task benchmark evals (parallelize + cap VLABench steps) (#3529 ) * fix(ci): run multi-task benchmark evals 5-at-a-time in parallel The eval script supports running tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus — 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra) are unchanged. * fix(ci): cap VLABench smoke eval at 50 steps per task VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s the smoke eval took ~80 minutes of rollouts on top of the image build. The eval is a pipeline smoke test (running_success_rate stays at 0% on this short rollout anyway), so we don't need full episodes — cap each task at 50 steps to bring total rollout time down ~10x. * fix(ci): run VLABench tasks 5-at-a-time in parallel The eval script already supports running multiple tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10 VLABench tasks finish in ~2 waves instead of running sequentially.	2026-05-07 13:37:16 +02:00
Ville Kuosmanen	eaf0218bc8	feat(policy): use pretrained vision encoder weights by default for diffusion and vqbet (#3202 ) * feat: add pretrained vision encoder weights for diffusion and vqbet * fix test by re-generating artifacts --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-07 12:10:38 +02:00
Pepijn	a0e52d52fe	fix(ci): bump robotwin benchmark image to CUDA 12.6 (#3525 ) The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3 on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA repo, so the build failed at apt-get install. Bumping both to -12-6 to match the base image.	2026-05-07 11:11:12 +02:00
Pepijn	85576acc29	docs(tools): drop follow-up-PR references Reword the two callouts in `tools.mdx` to describe the runtime layer in present tense ("not part of the catalog layer shipped today", "those modules don't yet exist in the tree") instead of pointing at a specific follow-up PR. Keeps the doc honest about what works now without coupling it to a particular release order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:29:42 +02:00
Pepijn	e7e5fca5de	review: emitted_at uses 0.1s tolerance; MessageTurn requires stream at construction * Float tolerance in `emitted_at` for persistent styles. The ``_timestamp(row) == t`` exact-equality check silently missed any caller that derived ``t`` arithmetically (e.g. ``frame_idx / fps``) even though the parquet timestamp would only differ by ULPs. Added ``EMITTED_AT_TOLERANCE_S = 0.1`` and check ``abs(...) <= tolerance`` instead, with a docstring explaining why exact equality wasn't enough and why 0.1 s is safe at typical 30–100 Hz control rates. Test asserts the new behavior at half-window (matches) and double-window (no match) using the constant so it stays in sync. * `MessageTurn.stream` is required at construction. It was typed ``MessageStream \| None = None`` so YAML could omit ``stream:`` and pass the dataclass invariant — but ``_validate_rendered`` rejected ``None`` streams later, surfacing the error at the first sample instead of at recipe load. Now ``__post_init__`` raises ``ValueError`` if ``stream`` is ``None``, with the list of valid streams in the message. The redundant late-stage check in ``_validate_rendered`` is replaced with a one-line comment that cites the upstream invariant. Test pins the new construction-time rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:55:08 +02:00
Pepijn	beb22afd81	review: dedupe regex, centralize column names, harden collate, more tests * #2 — dedupe `_PLACEHOLDER_RE`. The same regex was compiled in `recipe.py` and `language_render.py`. Promote to module-level `PLACEHOLDER_RE` in `recipe.py` (its primary owner — declares template syntax) and import from `language_render.py`. * #3 — centralize language column names. `io_utils.py` had hardcoded `{"language_persistent", "language_events"}` literals at two sites. Replace with `LANGUAGE_COLUMNS` import so a future column rename can't silently desync. * #4 — defensive collate preserved-keys. `lerobot_collate_fn` silently filtered language fields from samples that didn't have them, which would hand downstream consumers a preserved list shorter than the tensor batch. Now: if any sample carries a key, every sample in the batch must carry it; otherwise raise a `ValueError` so the upstream rendering bug surfaces at the boundary. * #5 — `_scalar` rejects non-singleton lists. Previously a zero- or multi-element list fell through and triggered confusing `float([])` errors downstream. Now raises `ValueError` with the actual length. * #6 — refactor `_extract_complementary_data`. Replace 11 lines of `key = {... if ... else {}}` plus an 11-line splat dict with a single `_COMPLEMENTARY_KEYS` tuple iterated once. * #7 — document `EXTENDED_STYLES`. Was an empty `set()` with no comment. Add a docstring explaining it's an intentional extension point: downstream modules append project-local styles before `column_for_style` is called. * #9 — `tools.mdx` notes the runtime layer is future work. The page referenced `src/lerobot/tools/`, `registry.py`, and `get_tools(meta)` — none exist in this PR. Added a callout at the start of "How to add your own tool" plus a note on the implementations paragraph. * #10 — tests for YAML round-trip, malformed rows, blend validation. `test_recipe.py` grew from 1 case to 12 covering: blend-or-messages exclusivity, target-turn requirement, blend emptiness, weight presence/positivity, nested-blend rejection, `from_dict` with nested blends, `from_yaml` / `load_recipe` agreement, top-level non-mapping rejection. Added a malformed-row test for `_normalize_rows` that asserts non-dict entries raise `TypeError`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:06:38 +02:00
Haoming Song	e99c55af4b	feat(policies): add EO-1 model (#3403 ) * feat(policies): add EO-1 model * chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx * chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-06 18:01:16 +02:00
Steven Palma	408e0ca763	fix(robots): openarm features with openarmmini (#3524 )	2026-05-06 17:03:09 +02:00
Pepijn	d55b581ca1	fix(language): address review — tools accessor, motion docs, conditional collate * `meta.tools` actually reads `info.json["tools"]`. `DatasetInfo` had no `tools` field, so `from_dict` silently dropped the key (it warned about unknown fields then discarded them) and the property always returned `DEFAULT_TOOLS`. Added `tools: list[dict] \| None` to the dataclass; `to_dict()` drops it when unset so existing datasets keep a clean `info.json`. Fixed the accessor to read `self.info.tools` (the previous `.get(...)` would have raised AttributeError on the dataclass anyway). Added regression tests: fallback when absent, round-trip from disk, and round-trip through `DatasetInfo.from_dict` / `to_dict`. * `motion` is not view-dependent — fix the docs. The mdx claimed rows of style `motion` must carry `camera`, but `VIEW_DEPENDENT_STYLES = {"vqa", "trace"}` and the validator agrees: motion primitives are joint/Cartesian-frame, not pixel-space. Updated both call-out paragraphs in `language_and_recipes.mdx`. * Conditional `collate_fn` swap. Added `meta.has_language_columns` and gate the `lerobot_collate_fn` swap in `lerobot_train.py` on it, so non-language datasets keep PyTorch's `default_collate`. Also added a pass-through test in `test_collate.py` that asserts on a plain tensor batch the custom collate matches `default_collate` key-for-key, plus a test for the `None`-sample drop path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:51:06 +02:00
Pepijn	24d2ffe3c6	fix(language): keep base install green — drop processor re-export, gate dataset-extra tests `lerobot.processor` re-exported `RenderMessagesStep` at the package level, so importing anything from `lerobot.processor` pulled in `lerobot.datasets.language` → `lerobot.datasets/__init__.py` → `require_package("datasets")`, which fails in the Tier 1 base install that intentionally omits the `[dataset]` extra. The chain bricked collection for unrelated suites (`tests/policies/pi0_pi05/...`, `tests/envs/...`, etc.). * Stop re-exporting `RenderMessagesStep` from `lerobot.processor`. The only consumer (the test) already imports from the submodule. Document the deliberate omission in the module docstring. * Add `pytest.importorskip("datasets", ...)` (and `pandas` where needed) at the top of the four PR-added tests that exercise the language stack: - tests/datasets/test_language.py - tests/datasets/test_language_render.py - tests/processor/test_render_messages_processor.py - tests/utils/test_collate.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:12:54 +02:00
Pepijn	789f29aa56	chore: fix CI — collapse short ValueError to one line, refresh uv.lock * `ruff format` on CI (newer version) wants the short `camera=None` ValueError on a single line. * `uv.lock` was stale relative to `pyproject.toml`'s `datasets>=4.7.0` pin (and picked up upstream `s390x` marker fixes for cuda packages). CI runs `uv sync --locked` which rejected the divergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:05:42 +02:00
Pepijn	a356b12c41	fix(language): always raise on ambiguous resolver matches `_select_one` previously skipped its ambiguity check whenever any of `role`/`tool_name`/`camera` was set, on the assumption that the caller had already pinned down a unique row. That left a real ambiguity hole for VQA: with two cameras emitting `(vqa, assistant)` at the same frame, `emitted_at(..., role="assistant")` silently picked the first sorted row instead of telling the recipe to add `camera=...`. The existing `test_emitted_at_raises_on_ambiguous_per_camera_vqa` test already encoded the desired behavior. Tighten the check: any time `len(rows) > 1` we now raise with the selectors echoed back, so users see exactly which fields they passed and that more is needed to disambiguate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:00:45 +02:00
Pepijn	e8327b8e62	refactor(language): unify resolver dispatch and prune redundant test scaffolding * Drop the unused `events` kwarg from `active_at`/`nth_prev`/`nth_next`; only `emitted_at` actually consults events. The dispatcher in `_resolve_spec` now passes events conditionally. * Replace the dual `_persistent_sort_key`/`_event_sort_key` pair with a single `_row_sort_key` and drop the `sort_key` parameter from `_select_one`. Event rows lack `timestamp` (it is implicit in the frame) and now default to `0.0` for sort purposes — the `(style, role)` tiebreaker is unchanged. * Inline `_select_latest` into `active_at` (its only caller). * Collapse `emitted_at`'s dual-branch into one `_select_one` call. * Tighten `_validate_persistent_resolver` to a single `column_for_style(style) != LANGUAGE_PERSISTENT` check. * Parameterize `test_per_camera_blend_renders_both_views` over the two cameras and factor the sub-recipe builder into `_vqa_subrecipe` so the test no longer hand-rolls two near-identical recipe blocks. Net -98 LOC; behavior, public resolver names, and test expectations unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:15:45 +02:00
Pepijn	c450298147	Apply ruff and prettier formatting after merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 12:10:41 +02:00
Pepijn	5c30b14929	Merge remote-tracking branch 'origin/main' into feat/language-columns	2026-05-06 12:09:13 +02:00
Maxime Ellerbach	ce24063efd	feat(dagger): adding smooth handover (#3506 ) * feat(dagger): adding smooth handover * update docstring * small phase fix and documenting potential issues * cleaning up	2026-05-05 14:44:32 +02:00
Steven Palma	82934719db	chore(dep): bump transformers to 5.4.0 (#3374 ) * fix(deps): breaking change from transformers 5.4.0 * Update src/lerobot/policies/xvla/modeling_florence2.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/wall_x/qwen_model/qwen2_5_vl_moe.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * removing dataclass * bumping transformers 5.4.0 * weird i can't even pass the test on main * oops, typo * chore(style): fix pre-commit run * chore: update uv.lock * seems like a weird numerical precision issue, lets check in runners * chore: update uv.lock * chore(dependecies): adjust transformers version * chore: update uv.lock --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co> Co-authored-by: raushan <raushan@huggingface.co>	2026-05-05 14:19:09 +02:00
Steven Palma	401a217597	chore(ci): increase time stale (#3507 )	2026-05-04 22:35:16 +02:00
Steven Palma	40094b0464	chore(ci): upgrade docker internal (#3505 )	2026-05-04 21:28:52 +02:00
pepijn	8fa8323c91	fix(annotate): sync language metadata after parquet rewrite Ensure annotated datasets advertise language columns in meta/info.json so non-streaming dataset loads cast against the rewritten parquet schema. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 15:17:15 +00:00
Jash Shah	fdbfc015a2	fix(peft): fix LoRA resume from Hub (PosixPath + double wrap) (#3485 )	2026-05-04 10:52:37 +02:00
Pepijn	73740ecf4b	feat(annotate): write tool catalog to meta/info.json after annotation After every ``lerobot-annotate`` run, the executor ensures ``meta/info.json["tools"]`` contains at minimum the canonical ``say`` schema, while preserving any tools the user pre-declared on the dataset. Chat-template consumers (PR 3 SmolVLA2 / Pi0.5 / dataset visualizer) read the catalog through ``LeRobotDatasetMetadata.tools`` and pass it to ``apply_chat_template(messages, tools=meta.tools, ...)``. - ``executor.py``: new ``_ensure_tools_in_info`` helper called after the parquet rewrite. Idempotent and additive — merges by ``function.name``, only writes back if the list changed. - ``writer.py``: drops the duplicated ``SAY_TOOL_SCHEMA`` / ``DEFAULT_TOOLS`` constants in favour of importing from ``lerobot.datasets.language`` (PR 1's single source of truth). Re-exported so existing imports keep working. - ``annotation_pipeline.mdx``: replace the "code constant only" note with a pointer to the new Tools doc and a description of the meta/info.json behaviour, including how to pre-declare custom tools before annotation runs. This is the storage half of the tools work; PR 3 ships the runnable implementations under ``src/lerobot/tools/`` (one file per tool, first up: ``say.py`` wired to Kyutai's pocket-tts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:51:38 +02:00
Pepijn	1b81e49214	feat(annotate): task rephrasings + video-derived task fallback Module 1 now produces ``task_aug`` rows (registered in PR 1) so the PR-1 ``${task}`` resolver can rotate phrasings deterministically per ``sample_idx``. Plus an opt-in video-derived task that bypasses the canonical ``meta/tasks.parquet`` task when it's empty, low-quality, or explicitly disabled — every downstream Module-1 prompt then uses the derived task as its grounding. - ``Module1Config``: adds ``n_task_rephrasings`` (default 10) and ``derive_task_from_video`` ∈ ``{off, if_short, always}`` (default ``if_short``: triggers when canonical is empty, < 3 words, or matches a placeholder string like ``debug`` / ``unnamed`` / ``tbd``). - ``plan_subtasks_memory.py``: ``run_episode`` now resolves an ``effective_task`` (canonical OR video-derived) and threads it through ``_generate_subtasks`` / ``_generate_plan`` / ``_generate_memory`` so subtasks, plans, and memory are all grounded in the same task string. Then generates ``n`` rephrasings of the effective task and writes them as ``task_aug`` rows at ``t=0`` with ``role=user``. The effective task itself is included as the first variant so the rotation is guaranteed to cover the source-of-truth phrasing. - New prompts: ``module_1_video_task.txt`` (one-shot video → task), ``module_1_task_rephrasings.txt`` (text-only paraphraser, ``n`` per call). - ``meta/tasks.parquet`` is NOT modified — derived tasks live only in ``language_persistent``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	d813c75b76	fix(annotate): align interjections with the actual demo trajectory qwen36moe-11 surfaced a deeper semantic problem with mid-episode interjections: they were generated as counterfactual user requests ("actually skip the wipe", "use the blue one instead") but teleop data is frozen — the robot in the video already executed everything, including the steps the user "asked to skip". The training signal was therefore self-contradictory: interjection text said one thing, the robot's subsequent action stream did the opposite. Flip the framing. Anchor every interjection at a subtask boundary and write it as a natural user request for the upcoming subtask. The robot's visible next behavior IS the interjection's effect, so: interjection text → plan refresh → action stream are all consistent with the same observed video. Concretely: - ``interjections_and_speech.py``: instead of sampling random timestamps from ``frame_timestamps``, walk Module 1's subtask spans and sample from the (subtask N → subtask N+1) transitions. Pass both the just-finished and the upcoming subtask texts into the prompt. - ``_window_timestamps``: re-center the multi-frame video window on the boundary itself (half the frames cover the end of the previous subtask, half cover the start of the next one) so the VLM has the same visual conditioning the policy will see at training time. - ``module_2_interjection.txt``: rewritten. The prompt now states explicitly that this is offline data, the robot already committed to the next subtask, and the interjection must be a natural request that aligns with — not contradicts — the next subtask. Removes the "negative task / situated correction" Hi Robot framing because those scenarios require online execution to be coherent. Plan-refresh logic from the previous commit (forwarding interjection text into the refresh prompt) is unchanged and now reinforces the same direction: the refreshed plan emphasizes the upcoming subtask the interjection just asked for. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	3434d2ef22	fix(annotate): ground interjections in video + propagate text to plan refresh qwen36moe-10 showed three Module-2 / plan-refresh quality issues that are not architecture problems — they're prompt-grounding bugs: 1. Interjection prompt passed ``current_subtask = record.episode_task`` (the WHOLE-episode task), not the actual subtask in force at the chosen timestamp. The VLM had no signal about what was visible at that moment, so its interjections were generic ("actually skip X" where X had nothing to do with the visible activity). 2. Interjection prompt only attached a single frame (``frames_at(record, [t_snap])``). With one frozen image the VLM couldn't read the ongoing motion. Module 1 already gets the whole episode video for subtask decomposition, which is why subtasks are well-grounded; Module 2 was the outlier. 3. The plan-refresh prompt told the model "a plan refresh after a user interjection at t=X.YZs" but never showed it the interjection text. So the refreshed plan couldn't actually reflect the user's correction — at best it recombined the same step list. Fix: - ``interjections_and_speech.py``: Module 2 reads Module 1's subtask rows from the same staging tree (executor orders module_1 → module_2 so they're already there) and resolves the actual ``current_subtask`` at each chosen timestamp. Pulls a small clip (``interjection_window_seconds`` × ``interjection_window_frames``, defaulting to 4 frames over the leading 2 s) instead of one frame. Drops the silently-zeroing ``len(candidate_ts) // 4`` cap on the interjection count. - ``module_2_interjection.txt``: prompt is rewritten to reference the multi-frame visual context and require the interjection to mention something visible OR named in the current subtask, not invented. - ``plan_subtasks_memory.py``: ``run_plan_updates`` now accepts and threads through interjection texts. ``_generate_plan(refresh_t, interjection)`` injects both the current subtask AND the interjection text into the prompt so the refreshed plan can drop / reorder / constrain steps to match the user's correction. (Plan still refreshes ONLY at user interjections — subtask generation runs ~1 Hz at inference, plan re-emission is event-driven.) - ``executor.py``: forwards ``interjection_texts`` alongside ``interjection_times`` to ``run_plan_updates``. - ``Module2Config``: bumps ``max_interjections_per_episode`` default from 1 to 3 and exposes ``interjection_window_seconds`` / ``interjection_window_frames``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	b71e10da6b	refactor(annotate): drop dataset-level ``tools`` parquet column PR 2 used to write a top-level ``tools`` column on every parquet shard holding the JSON schema for the ``say`` tool, broadcast identically across every row. That extends PR 1's schema for no real information gain — the schema is a fixed code constant, parquet's RLE/dict encoding collapses it on disk anyway, and HF/TRL chat-template consumers can just import the constant directly. PR 2 should fill in PR 1's existing schema, not add to it. So: - ``writer.py``: stop emitting the ``tools`` column. Strip any legacy ``tools`` column from older shards on rerun so the schema converges to v3.1. ``SAY_TOOL_SCHEMA`` stays as a public constant (now joined by ``DEFAULT_TOOLS = [SAY_TOOL_SCHEMA]``); chat-template policies and the visualizer import them directly. - ``test_writer.py``: replace the "tools column present" assertion with one that explicitly checks the column is absent, plus a new test asserting the constant's shape. - ``test_pipeline_recipe_render.py``: drop the tools-column read; assert it's not present in the rewritten parquet. - ``annotation_pipeline.mdx``: update the writer description to note the parquet stays small and the schema lives as a code constant. If multi-tool-set support ever becomes real (datasets with different tool inventories), the right home is ``meta/info.json["tools"]`` — adding it later is non-breaking; ripping out a parquet column already shipped is not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	0f6e3230df	fix(annotate): decode video frames with PyAV directly ``lerobot.datasets.video_utils.decode_video_frames`` routes ``backend="pyav"`` through ``decode_video_frames_torchvision`` → ``torchvision.io.VideoReader``, but ``VideoReader`` was removed in torchvision >= 0.22 (the vllm/vllm-openai:latest container ships with torchvision 0.25). That made every Module 3 frame decode raise ``AttributeError: module 'torchvision.io' has no attribute 'VideoReader'``, which the previous catch-all silently turned into an empty image list, which then made every Module 3 prompt skip via the ``not _has_image_block(messages)`` branch and produce zero VQA rows. Bypass ``video_utils`` entirely. The annotation pipeline only needs a handful of PIL frames per (episode, ts), so a direct PyAV decode is both simpler and insulated from torchvision API churn. ``av`` is already in the install set, no new dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	2f2e42c4aa	log(annotate): warn loudly on first video decode failure VideoFrameProvider._decode used to swallow every exception silently and return []. That made Module 3 (VQA) produce zero rows whenever local video decoding broke (codec, backend, missing file, ...) because every prompt got skipped via the ``not _has_image_block(messages)`` branch in general_vqa.py — without any signal in the job log. Log the first failure with full exception info (subsequent failures stay quiet to avoid log spam) so this fast-path is debuggable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	5ee0104739	log(annotate): surface resolved frame-provider cameras at startup Print the default and full camera list once at the top of every run so a silent Module-3-no-op (cam_keys=[]) is visible in the job log instead of only being discoverable by counting parquet rows after upload. Also warn loudly when Module 3 is enabled but no cameras resolved, with a hint about the --vlm.camera_key fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	e064cfcb04	fix(annotate): seed Module 3 cameras from camera_keys + camera_key fallback Module 3 fast-pathed out (50 episodes in 0.6s) when ``frame_provider.camera_keys`` came back empty even though Module 1/2 worked, because they use ``frame_provider.camera_key`` (singular) and were happy with the explicit ``--vlm.camera_key=...`` override. Two fixes: - ``frames.py``: read ``meta.camera_keys`` (covers both video- and image-stored cameras) instead of ``meta.video_keys`` (video-only), matching :class:`LeRobotDatasetMetadata`'s canonical accessor. If metadata still surfaces nothing but the caller explicitly passed ``--vlm.camera_key=<key>``, fall back to ``[<key>]`` — the key is by definition known to exist on the dataset. - ``general_vqa.py``: emit a one-time WARNING log when Module 3 sees zero cameras so this never silently produces zero VQA again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	b3d9494831	docs(annotate): add HF Jobs runner example for lerobot-annotate A ready-to-run example of launching the annotation pipeline on a Hugging Face job (h200x2) with two vllm replicas serving Qwen3.6-35B-A3B-FP8. Lives next to other end-to-end recipes under examples/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	1217fdb6f0	feat(annotate): emit VQA per-camera and propagate camera field Module 3 now produces one (vqa, user) + (vqa, assistant) pair per emission tick per camera rather than only against the dataset's first camera. Each emitted row carries the `camera` field added in PR 1 (language-columns), so the resolver can disambiguate per-camera VQA via `emitted_at(t, style=vqa, role=assistant, camera=...)` without ambiguity. - `frames.py`: `FrameProvider` Protocol gains a `camera_keys` property and a `camera_key=` argument on `frames_at` / `video_for_episode`. `VideoFrameProvider` exposes every `observation.images.*` key the dataset declares (not just the first) and keys its decode cache on `(episode, camera, timestamp)` so per-camera reads don't collide. Module 1 / 2 keep their old single-camera behaviour by leaving `camera_key=None` (falls back to the default camera). - `modules/general_vqa.py`: `run_episode` iterates `frame_provider .camera_keys` for each emission tick, builds one prompt per camera, batches all of them through the VLM, and stamps the resulting rows with `camera=<that key>`. Empty `camera_keys` (null provider) makes the module a no-op rather than silently emitting untagged rows. - `writer.py`: `_normalize_persistent_row` / `_normalize_event_row` carry `camera` through and call `validate_camera_field` so the invariant is enforced at the writer boundary. Event sort key now includes `camera` for deterministic ordering when several cameras share `(timestamp, style, role)`. `speech_atom` sets `camera=None`. - `validator.py`: `StagingValidator` gains a `dataset_camera_keys` field; `_check_camera_field` enforces the invariant and cross-checks every view-dependent row's `camera` against the dataset's known video keys. New `_check_vqa_uniqueness_per_frame_camera` flags duplicate `(vqa, role)` pairs at the same `(t, camera)`. - `lerobot_annotate.py`: passes the live frame provider's `camera_keys` into the validator so the cross-check uses the actual dataset camera set. - Tests: `_StubFrameProvider` exposes `camera_keys` and accepts the new `camera_key=` kwarg. `test_module3_vqa_unique_per_frame_and_camera` configures two cameras and asserts both are represented, that every emitted row has a `camera` tag, and that uniqueness holds per `(timestamp, camera, role)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	d0388e1142	fix(annotate): transcode subclips to H.264 instead of stream-copy Modern LeRobot datasets store videos in AV1, which vllm's libav build cannot decode (the video processor returns 0 frames and downstream chokes with ZeroDivisionError). Re-encode each per-episode subclip with libx264 (preset ultrafast, crf 23) so the resulting mp4 is universally decodable. Strip audio with -an for a smaller payload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:36 +02:00
Pepijn	524aa59faa	feat(annotate): pack multiple vllm replicas per GPU via num_gpus Adds VlmConfig.num_gpus so parallel_servers can exceed the physical GPU count. Replicas are round-robin-assigned to GPUs (e.g. parallel_servers=4 + num_gpus=2 → replicas pinned to GPUs 0,1,0,1). Backward-compatible: num_gpus=0 keeps the existing 1-replica-per-GPU behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	27f7829b09	feat(annotate): forward chat_template_kwargs to OpenAI extra_body Lets callers pass per-request template flags such as {"enable_thinking": false} for Qwen3.5/Qwen3.6 models, where the default thinking preamble otherwise consumes the entire max_new_tokens budget before any JSON is emitted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	7f8bf108e8	fix(annotate): include prompt .txt files in wheel The setuptools package-data declaration only listed envs/*.json, so pip-installed wheels (including HF Jobs runs) were missing the module_1_subtasks/plan/memory and module_2/3 prompt templates, causing FileNotFoundError at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	855ff027f8	refactor(annotate): drop HF Inference Providers code path Default backend is now a local OpenAI-compatible server (vllm / transformers) which auto_serve spawns. Removes the use_hf_inference_providers config flag and the router.huggingface.co routing branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	3b797bb118	feat(annotate): --vlm.push_to_hub uploads the annotated dataset After the pipeline completes, optionally create/locate a dataset repo and upload the dataset root (excluding .annotate_staging/). Add push_private and push_commit_message knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	aea04721ae	feat(annotate): parallelize episodes within each module phase Saturates parallel_servers + client_concurrency. Previously the executor processed one episode at a time, so each Module 1 episode's 3-5 dependent VLM calls hit a single server with the others idle. Now defaults to 16 episodes in flight; configurable via ExecutorConfig.episode_parallelism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	ab5479129a	fix(annotate): probe /v1/models for spawn-helper readiness vllm with --uvicorn-log-level warning suppresses the "Uvicorn running" banner that the readiness watcher waited for, so the spawn helper hung forever even after the API was live. Add an HTTP probe in parallel with the log watcher and broaden the log markers to include vllm's own "Starting vLLM API server" / "Available routes are" lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	e6d4ac6f02	fix(annotate): lock-protect per-line writes for parallel server streams 8 server-streaming threads writing chars unsynchronized cause UTF-8 sequences from different servers to interleave mid-byte, garbling the terminal output. Switch to line-buffered reads with a single shared print lock — output stays readable, ready-marker detection still works on the line containing 'Uvicorn running' / 'Application startup complete'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00
Pepijn	5722d365c5	feat(annotate): client_concurrency for parallel in-flight requests Adds vlm.client_concurrency (default 16) which uses a ThreadPoolExecutor to fan out batched chat.completions calls. vllm batches them internally on the server side, giving big throughput wins on a single TP=1 server without needing DP/TP and the NCCL setup it requires. Module 3 now batches all per-episode VQA calls into a single generate_json invocation so they fire in parallel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:48:35 +02:00

1 2 3 4 5 ...

1518 Commits