lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-24 10:16:09 +00:00

Author	SHA1	Message	Date
Pepijn	53172873e3	chore(smolvla2-runtime): probe obs once at dry-run startup The dry-run REPL only fires a tick when the user types, so the ``_log_obs_tensors_once`` diagnostic never reached stdout (the provider was never called). Probe the provider once at startup — the result is discarded; we only care about the obs log it triggers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:21:58 +02:00
Pepijn	fcdae0ce8e	chore(smolvla2-runtime): tensor-level obs print for both inference paths Helper that prints (once per provider lifetime) every ``observation.`` tensor the policy is about to see, with its shape, dtype, device, and per-channel min/max/mean/std. Wired into both the dry-run dataset path and the live-robot path. Now we can bisect train/inference mismatch at the tensor level* — if the same checkpoint produces coherent text on one path's tensors and ``\n`` on the other's, and the printed tensor stats differ materially, the bug is in the observation prep, not in the model or the training distribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:19:18 +02:00
Pepijn	4852b9f952	feat(smolvla2-runtime): --dataset.augment_at_inference for the bisection test Apply the training-time torchvision-v2 ColorJitter / SharpnessJitter / RandomAffine pipeline to dataset frames in dry-run, so we can isolate whether the LM head's collapse to '\n' on live frames is: * pure scene-content OOD (unaugmented dataset frames work, mildly augmented ones still work — model has learned the augmentation distribution, only fails when the scene content itself diverges) * hyper-specific memorisation (dry-run with augmentation also collapses to '\n' — head is nailed to the exact unperturbed training samples and only the retrain helps) Usage: lerobot-smolvla2-runtime --no_robot --policy.path=... \ --dataset.repo_id=... --dataset.episode=0 \ --dataset.start_frame=1000 \ --dataset.augment_at_inference Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:14:57 +02:00
Pepijn	0410705aff	chore(smolvla2-runtime): print live state vector once at startup So the operator can compare live joint values to the dataset's ``observation.state`` mean/std and spot when the robot's home pose is several σ off the supervised support region. State OOD is the remaining viable hypothesis for why the live LM head collapses to ``\n`` even though images are pixel-shape-matched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:12:27 +02:00
Pepijn	398a8cf730	chore(smolvla2-runtime): log first-tick resize so train/inference match is verifiable Print one warning the first time the robot observation provider runs through, showing live camera resolution and the dataset's training resolution, plus whether we resized. Lets the operator confirm at a glance that the visual prefix really is being fed at the same shape the model saw at training — instead of guessing whether the resize fired silently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:06:00 +02:00
Pepijn	ab5c1dc392	fix(smolvla2-runtime): match training visual distribution on robot frames Root cause for the LM head's empty-completion symptom on the live robot (while the same checkpoint produced sensible subtask/plan/memory in ``--no_robot`` dry-run on dataset frames): the camera observation was flowing into the model at its native resolution. A Mac/USB webcam hands us 1280×720 or 1920×1080; the dataset was recorded at the feature schema's ``observation.images.['shape']`` resolution (typically 480×640). SmolVLA's internal ``resize_with_pad(512, 512)`` does* fit both — but with very different pad geometry, so visual tokens at each tile carry different content than at training. Action expert tolerates this; the tightly-supervised LM head goes OOD and the head's distribution at position 0 collapses to its dominant mode (``\n`` ×N then ``<end_of_utterance>`` for this checkpoint). The fix: in ``_build_robot_observation_provider``, pre-compute the camera-key → (H, W) target from ``ds_features`` and ``cv2.resize`` each live frame to that shape before tensorising. The downstream ``resize_with_pad`` then sees the same input geometry as training and the LM head returns to producing readable subtask text under plain greedy decoding — the same as dry-run. Also drops the inference-time patches (``min_new_tokens``, ``temperature``, ``top_p`` overrides) on the four high-level callers. They were band-aids around the visual-distribution shift, not a real LM problem, and they drift inference off the training distribution. Greedy argmax is what training matched. The ``select_message`` signature still accepts the knobs for callers that want them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:59:24 +02:00
Pepijn	1292304c42	fix(smolvla2): suppress all special tokens during min_new_tokens window Previous attempt only masked the tokenizer's eos_token_id during the min_new_tokens prefix. The empty-completion symptom persisted because a memorised SmolVLM head doesn't just want EOS — its top-1 at position 0 is some special token, and when EOS is masked the argmax shifts to a sibling (``<\|im_end\|>``, ``<image>``, ``<fake_token_around_image>``, ``<row_X_col_Y>``, …). Those tokens survive generation but then get stripped by ``decode(skip_special_tokens=True)``, so the runtime still saw ``last_raw='(empty)'`` every chunk boundary. Mask the full ``tokenizer.all_special_ids`` set instead. Forces the head to commit to a normal vocabulary token before it can close or quietly poison the turn. Also: when decode returns empty but tokens were generated, expose the raw token ids and the special-tokens-included decoded string via ``policy._last_select_message_debug``. The runtime surfaces this in the scrollback so the operator can see what the head is actually emitting — distinguishing "head EOS-ing" from "head emitting image placeholders" from "head emitting chat-template fragments". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:49:53 +02:00
Pepijn	b95eebff77	fix(smolvla2): force min_new_tokens + sampling so memorised LM emits something Real-robot run confirmed the LM head is producing 0 tokens at every chunk boundary (empty:N counter climbing, no exception in scrollback): the model EOS-es at decode step 0. That's the memorisation collapse — training reached text_loss=6e-6 by overfitting one trajectory whose supervised subtask turn ended in EOS, and at inference the head's argmax for token 0 is EOS regardless of the actual frame. Two changes in select_message: * ``min_new_tokens`` parameter masks the EOS logit to -inf until at least N real tokens have been decoded. Without this the head's "EOS first" prior produces an empty completion every single time. * The runtime callers now pass ``min_new_tokens=5..10`` plus ``temperature=0.4..0.5`` + ``top_p=0.9``. Sampling at moderate temperature with nucleus filtering also helps break the greedy argmax collapse — when the model has memorised one continuation, greedy keeps replaying it; nucleus sampling forces it to commit to some coherent continuation that's well-supported by the prefix even when greedy's top-1 is degenerate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:48:08 +02:00
Pepijn	fbcac95662	feat(smolvla2-runtime): scrollback in autonomous panel + empty-gen counter Two improvements for diagnosing why ``last_raw`` stays empty: 1. The autonomous panel-redraw thread calls console.clear() every 0.5 s, wiping any log lines the runtime printed since the last redraw. So warnings from generation (``[warn] subtask gen failed: ...``, ``[info] subtask gen rejected (gibberish): ...``) flashed for milliseconds and disappeared, leaving the operator blind. Capture log_lines from each tick into a bounded scrollback (last 12 entries) and render them inside the panel itself, below the diag row. They now stick across redraws until rotated out. 2. ``empty`` counter for subtask gen. Persistent empty completions are their own failure mode — the LM head EOS-es immediately from the chat-template generation prompt, distinct from "generated something but filter rejected it". The diag row now reads: subtask diag repeat:0 gibberish:0 empty:14 last_raw: '(empty)' ^^^^^^^ plus a periodic log line every 10 empties so the cause is also surfaced in the scrollback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:42:13 +02:00
Pepijn	b9db4d21a2	fix(smolvla2): high-level steps must run before LowLevelForward refills Both HighLevelSubtaskFwd and LowLevelForward are gated on 'action queue is empty'. With LowLevelForward listed first, it refilled the queue on the empty-queue tick before HighLevelSubtaskFwd got to check — so the gate I added in the previous commit made the high-level step a permanent no-op after the initial bootstrap. Visible symptom: subtask string never advances past whatever bootstrap seeded, no subtask_change events, memory stays unset, and the new overfit diagnostics never appear on the panel because last_subtask_raw is never written. Move all high-level steps (subtask, memory, interjection, vqa) ahead of LowLevelForward. On an empty-queue tick the subtask refreshes first, the new string flows into the next chunk's prompt, then LowLevelForward generates the chunk, then DispatchAction drains it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:38:06 +02:00
Pepijn	aecb80a9d2	feat(smolvla2-runtime): overfit/memorisation diagnostics on the panel The autonomous-mode panel now surfaces what the model is actually producing at every chunk boundary, not just what got accepted: * last_subtask_raw most recent generation (accepted or not) * subtask_repeat_count times the same accepted string regenerated * subtask_gibberish_count rejections by the gibberish filter * memory_gibberish_count / plan_gibberish_count for the other heads These let the operator see memorisation collapse without scrolling back through logs: subtask diag repeat:8 gibberish:0 last_raw: '<same string>' ^^^^^^^^^^ → model can't move past current phase subtask diag repeat:0 gibberish:14 last_raw: 'Ass:::' ^^^^^^^^^^^^^^^^^^^^^^ → LM collapsed to template salad Also silences the per-action ``Relative goal position magnitude had to be clamped`` warning. The clamp fires every dispatch tick when the model emits stale joint targets, flooding the panel at ctrl_hz=30. Replaced the bare ``logging.warning`` call in robots/utils.py with a module logger so it can be selectively raised to ERROR. Operators who need the per-tick clamp detail can use ``-v``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:31:04 +02:00
Pepijn	c98c695127	feat(smolvla2-runtime): 'rephrase:' prefix to swap task string in place Adds a third stdin channel alongside 'task:' and bare interjections: rephrase: <text> Swaps state['task'] with the new string while preserving plan/memory/ subtask. Lets the operator probe how robust the model is to wording variations of the same task — the trained augmentation provided n_task_rephrasings≈30 task wordings per dataset task, and this is the direct way to exercise that distribution at inference without generating a fresh plan via user_interjection_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:26:59 +02:00
Pepijn	d528078aca	fix(smolvla2-runtime): allow task switching mid-run via 'task:' prefix Both stdin handlers (autonomous mode and rich REPL) gated 'task:' to 'only if no task is set yet' — once the initial task existed, typing 'task: <new task>' silently fell through to the interjection branch. Make 'task:' always override the active task and clear stale plan/memory/subtask so the next high-level pass regenerates context from scratch for the new task. For rephrasings within the same task, the interjection path (user_interjection_response recipe) is still the right channel — it refreshes the plan and emits a paired <say> in one trained call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:24:16 +02:00
Pepijn	a648da0455	fix(smolvla2): unblock action dispatch when high-level LLM stalls loop The runtime is single-threaded. `HighLevelSubtaskFwd` at HzTrigger(1.0) fires every loop iteration on MPS because each `select_message` call takes ~2 s, longer than its 1/hz period. The whole tick stretches to ~2.5 s, so `DispatchAction` (HzTrigger 30) only pops a single action per loop iteration — the queue drains at ~0.4 actions/sec instead of 30 and the robot barely moves between chunk refreshes. Two changes, both purely about scheduling — no threading: * Gate `HighLevelSubtaskFwd` to fire only when the action queue is empty, matching `LowLevelForward`'s refresh condition. The slow LLM call now happens during the "think" phase between chunks, not on every dispatch tick. Restores a clean sense → think → act cycle. * `DispatchAction` catches up via wall-clock: when the trigger fires after a stall, pop `round(elapsed * hz)` entries and send only the most recent. Open-loop chunks are timestamped at ctrl_hz; sending stale joint targets one-by-one would just lag the robot further behind. The dynamixel smooths to the latest goal anyway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:23:09 +02:00
Pepijn	d866c2c9fd	fix(smolvla2): only regenerate chunk when queue is fully drained The previous refresh threshold (queue > chunk_size // 2) made each new chunk telescope past the previous one: at queue=25, we kicked off a new chunk forward from the current observation, but by the time the new chunk's first action was actually dispatched, the robot had executed the remaining 25 actions of the previous chunk — so the new chunk was planned from an observation 25+ steps stale. Canonical sense → think → act loop: execute the full chunk, then re-observe and replan. Refresh only when the queue is empty. Every step of every chunk still gets dispatched to the robot (no behaviour change there), but each chunk is now planned from an observation that's at most one chunk's worth of dispatch latency old, not "previous chunk's worth of stale state on top of that". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:15:02 +02:00
Steven Palma	04125492e4	fix(datasets): expand torchcodec platform coverage + rewrite pyav fallback for torchvision >0.26 (#3588 ) * fix(deps): better versioning control for torchcodec * refactor(video_utils): replace torchvision with pyav * adding Torchcodec version to lerobot-info * chore(benchmarks): delete video benchmark --------- Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co>	2026-05-12 16:59:11 +02:00
Pepijn	01e2228b24	feat(smolvla2): per-component prompt dropout + augmented training script Two complementary regularisers to attack the ``text_loss=6e-6 = memorised one dataset`` failure mode that's making the model collapse on real-robot input: 1. Per-component prompt dropout (Pi0.7 §V.E / plan's ``feat/pi05-prompt-dropout`` follow-up). ``SmolVLA2ChatTokenizerStep`` gains ``plan_dropout_prob`` / ``memory_dropout_prob`` / ``subtask_dropout_prob`` knobs (default 0.0 — opt-in). At training, non-target messages whose rendered content starts with ``Plan:`` / ``Memory:`` / ``Current subtask:`` etc. are dropped with their respective probability before tokenisation, with a deterministic per-sample RNG keyed off the dataset ``index``. ``target_message_indices`` is re-mapped so the supervision still lands on the right turn. Forces the model to handle missing plan/memory/subtask context — directly attacks the real-robot collapse where a stale or empty plan field puts the prompt OOD. Surfaced on ``SmolVLA2Config`` as three floats so they're ``--policy.<knob>=<value>``-controllable from the train CLI; plumbed through ``make_smolvla2_pre_post_processors``. 2. Image augmentation is already wired in lerobot via ``--dataset.image_transforms.enable=true`` (torchvision v2 ColorJitter + SharpnessJitter + RandomAffine, default 3 of 6 sampled per frame). No code change needed — just a CLI flag. ``examples/training/smolvla2_hirobot.slurm`` shows the full training command with both enabled. Drop-in replacement for the ad-hoc SLURM script Pepijn was using locally; same args, plus the three dropout probs and the image-transforms flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:52:32 +02:00
Khalil Meftah	e963e5a0c4	RL stack refactoring (#3075 ) * refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring * chore: clarify torch.compile disabled note in SACAlgorithm * fix(teleop): keyboard EE teleop not registering special keys and losing intervention state Fixes #2345 Co-authored-by: jpizarrom <jpizarrom@gmail.com> * fix: remove leftover normalization calls from reward classifier predict_reward Fixes #2355 * fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample() * refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference * perf: remove redundant CPU→GPU→CPU transition move in learner * Fix: add kwargs in reward classifier __init__() * fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer * fix: add try/finally to control_loop to ensure image writer cleanup on exit * fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error * fix: skip tests that require grpc if not available * fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests * fix(tests): skip tests that require grpc if not available * refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages * fix(config): update vision encoder model name to lerobot/resnet10 * fix(sac): clarify torch.compile status * refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity * refactor(sac): simplify optimizer return structure * perf(rl): use async iterators in OnlineOfflineMixer.get_iterator * refactor(sac): decouple algorithm hyperparameters from policy config * update losses names in tests * fix docstring * remove unused type alias * fix test for flat dict structure * refactor(policies): rename policies/sac → policies/gaussian_actor * refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic * perf(observation_processor): add CUDA support for image processing * fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline (cherry picked from commit `9c2af818ff`) * fix(rl): add time limit processor to environment pipeline (cherry picked from commit `cd105f65cb`) * fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100 (cherry picked from commit `494f469a2b`) * fix(rl): update neutral gripper action (cherry picked from commit `9c9064e5be`) * fix(rl): merge environment and action-processor info in transition processing (cherry picked from commit `30e1886b64`) * fix(rl): mirror gym_manipulator in actor (cherry picked from commit `d2a046dfc5`) * fix(rl): postprocess action in actor (cherry picked from commit `c2556439e5`) * fix(rl): improve action processing for discrete and continuous actions (cherry picked from commit `f887ab3f6a`) * fix(rl): enhance intervention handling in actor and learner (cherry picked from commit `ef8bfffbd7`) * Revert "perf(observation_processor): add CUDA support for image processing" This reverts commit `38b88c414c`. * refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable * refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation * refactor(rl): add type property to RLAlgorithmConfig for better clarity * refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility * refactor(tests): remove grpc import checks from test files for cleaner code * fix(tests): gate RL tests on the `datasets` extra * refactor: simplify docstrings for clarity and conciseness across multiple files * fix(rl): update gripper position key and handle action absence during reset * fix(rl): record pre-step observation so (obs, action, next.reward) align in gym_manipulator dataset * refactor: clean up import statements * chore: address reviewer comments * chore: improve visual stats reshaping logic and update docstring for clarity * refactor: enforce mandatory config_class and name attributes in RLAlgorithm * refactor: implement NotImplementedError for abstract methods in RLAlgorithm and DataMixer * refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests * refactor: add require_package calls for grpcio and gym-hil in relevant modules * refactor(rl): move grpcio guards to runtime entry points * feat(rl): consolidate HIL-SERL checkpoint into HF-style components Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract `state_dict()` / `load_state_dict()` for critic ensemble, target nets and `log_alpha`, and persist them as a sibling `algorithm/` component next to `pretrained_model/`. Replace the pickled `training_state.pt` with an enriched `training_step.json` carrying `step` and `interaction_step`, so resume restores actor + critics + target nets + temperature + optimizers + RNG + counters from HF-standard files. * refactor(rl): move actor weight-sync wire format from policy to algorithm * refactor(rl): update type hints for learner and actor functions * refactor(rl): hoist grpcio guard to module top in actor/learner * chore(rl): manage import pattern in actor (#3564) * chore(rl): manage import pattern in actor * chore(rl): optional grpc imports in learner; quote grpc ServicerContext types --------- Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> * update uv.lock * chore(doc): update doc --------- Co-authored-by: jpizarrom <jpizarrom@gmail.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-12 15:49:54 +02:00
Pepijn	c36de3a3e8	fix(smolvla2): enqueue full chunk via predict_action_chunk ``LowLevelForward`` was calling ``select_action()`` once per ``chunk_hz`` tick. SmolVLA's ``select_action`` is a thin queue-pop: it returns one action per call and only re-runs the expensive flow-matching forward when its private internal queue empties. Result: we got one action back per chunk_hz tick (1Hz default), ``DispatchAction`` at ctrl_hz=30 popped it instantly, then queue sat empty for ~1s waiting for the next tick. Net throughput was 1 dispatched action/sec instead of the 30 we wanted. Switch to ``predict_action_chunk`` and enqueue every step of the returned ``(batch, n_action_steps, action_dim)`` chunk. Refresh only when the queue is below half a chunk so we don't burn one flow-matching forward per chunk_hz tick — saves ~5x inference cost on this hot path. At ctrl_hz=30, chunk_size=50, the queue drains in ~1.7s before the next refresh, giving smooth dispatch at the control rate the robot was trained on. Side effect: ``state['last_chunk_size']`` records how many actions the most recent chunk produced — useful for the panel later if we want to surface "chunks generated" alongside "dispatched". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:27:23 +02:00
Pepijn	cbfaf2c544	feat(smolvla2): action-dispatch counter + tighter gibberish filter Real-robot run was unreadable for two reasons: 1. The panel surfaced ``queued actions: 0`` (always zero — dispatch pops faster than chunk_hz generates) and gave no signal that actions were actually reaching the robot. The only sign of life was the safety-clamp warning lines scrolling past. 2. The text head consistently collapses to ``the`` / ``Ass`` fragments on real-camera input (memorisation wall). The old gibberish filter caught ``":":":"`` JSON salad but let single-token fragments through, and the ``[info] subtask gen produced no text this tick`` line flooded the panel every second. Changes: * ``DispatchAction`` bumps ``state["actions_dispatched"]`` each tick; panel renders it next to queue depth. Operator can see the policy IS issuing actions even when text is broken. * ``_looks_like_gibberish`` now also rejects: - too few unique alphabetic tokens (``the``, ``the the``, ...) - chat-template marker leakage (``Assistant:``, ``Ass\\n::``) catching the actual failure mode on real-robot frames. * Gibberish rejections log only the first occurrence + every 30th after that, with a count, so the panel stays legible. * Empty completions no longer log at all (was every tick). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:22:36 +02:00
Pepijn	d0278ea093	feat(smolvla2): render state panel in autonomous mode too Dry-run REPL had a clean ANSI-clear-+-rich-panel layout via ``_redraw`` showing task / subtask / plan / memory / queued-actions / pending-tool-calls; autonomous mode just had bare ``> `` plus log lines scrolling past the user. Same data, two presentations. Extract ``_make_state_panel_renderer(runtime, mode_label=...)`` and use it from both ``_run_repl`` (called per user input) and ``_run_autonomous`` (called both on user input and on a 0.5s background timer so subtask / plan / memory refreshes from the runtime's own loop become visible without the user typing anything). Title bar shows ``dry-run`` vs ``autonomous`` so it's obvious which mode you're in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:16:28 +02:00
Pepijn	15f6b08b0e	fix(smolvla2): use canonical _strip_lerobot_blocks for inference msgs Training tokenises messages through ``_strip_lerobot_blocks`` (in ``chat_processor_smolvla2.py``), which normalises every variant of ``message['content']`` into the ``[{type:text, text:...}]`` list shape SmolVLM's chat template expects: * ``list[block]`` → keep text blocks, drop images * ``None`` → ``[{type:text, text:""}]`` * ``str`` / other → ``[{type:text, text:str(content)}]`` Inference was doing a partial inline conversion that only handled the ``str`` case — ``None`` and pre-formatted ``list`` content slipped through unchanged. ``memory_update``'s ``Previous memory: ...`` assistant turn ends up with ``None`` content when there's no prior memory, which then renders as no-content / role-marker-only and the model hallucinates ``Assistant:`` fragments. Subtask gen got further because its prompt always has at least the task string. Reuse ``_strip_lerobot_blocks`` directly. Now the inference prompt shape matches the exact tokenisation training did — no more "trained on shape X, asked to predict shape Y" mismatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:07:39 +02:00
Pepijn	fc715db4a3	fix(smolvla2): coerce str content to list-of-blocks for chat template SmolVLM's chat template (and many other multimodal templates) declares ``message['content']`` as a list of typed blocks and iterates it expecting dicts with a ``'type'`` field: {% for line in message['content'] %} {% if line['type'] == 'text' %}{{ line['text'] }} {% elif line['type'] == 'image' %}{{ '<image>' }} {% endif %} {% endfor %} When the caller passes ``content`` as a plain ``str`` (which we did throughout ``_msgs_for_subtask`` / ``_msgs_for_memory`` etc.), Jinja silently iterates the string character-by-character. ``'P'['type']`` returns nothing; neither branch fires; no text tokens get emitted. The model receives a prompt containing only role markers (``User:<end_of_utterance>\nAssistant:``) and predictably continues by emitting ``Assistant:`` fragments — the gibberish ``subtask: Ass\n::`` on the runtime panel. Before calling ``apply_chat_template``, walk the messages and rewrite any string ``content`` into ``[{'type': 'text', 'text': content}]``. The template's text branch then fires correctly and the model sees the actual user/assistant text, not just structural tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:01:53 +02:00
Pepijn	fe4bd2b6ba	fix(smolvla2): pass flat batch dict to preprocessor (no manual wrap) ``PolicyProcessorPipeline.__call__`` already wraps its input via ``to_transition`` (defaulting to ``batch_to_transition``) before running the steps, and unwraps via ``to_output`` (defaulting to ``transition_to_batch``) afterwards. The input format is therefore a flat batch dict keyed by ``observation.`` / ``action`` / etc., not an ``EnvTransition``. Previous attempt pre-wrapped the observation into a transition with ``TransitionKey.OBSERVATION`` as the key, then handed that* to the pipeline — which fed it to ``batch_to_transition``, which looked for top-level ``observation.*`` entries, found none (they were nested inside the enum key), and produced an empty observation. Every step then bailed with ``ObservationProcessorStep requires an observation in the transition.`` Pass the flat dict from ``build_inference_frame`` straight to the preprocessor — it does the wrap/unwrap itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:54:48 +02:00
Pepijn	3f7436ff8a	fix(smolvla2): use TransitionKey enum (not .value) as transition keys ``EnvTransition`` is declared as a ``TypedDict`` keyed by ``TransitionKey.OBSERVATION.value`` (the string ``'observation'``), but every concrete ``ProcessorStep`` in the pipeline indexes the transition with the enum member (``transition[TransitionKey. OBSERVATION]`` / ``transition.get(TransitionKey.OBSERVATION)``). Those are two different keys in a Python dict — string key vs enum key — so steps couldn't find the observation we'd placed under the string variant, and bailed every tick with ``ObservationProcessorStep requires an observation in the transition``. Build the transition with the enum members directly. Matches how ``BatchProcessor``, ``RelativeActionProcessor``, ``HilProcessor``, etc. read the dict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:50:22 +02:00
Pepijn	992d13d4e9	fix(smolvla2): use build_inference_frame for raw robot observations ``robot.get_observation()`` on omx_follower (and most lerobot robots) returns: * per-joint scalar floats with ``.pos`` suffix (``shoulder_pan.pos: 0.123``, ``shoulder_lift.pos: 0.456``, ...) * per-camera ndarrays keyed by the camera config name (``wrist: ndarray(H,W,3)``) But the trained policy expects: * single ``observation.state: tensor[N_joints]`` vector * image keys prefixed: ``observation.images.<cam_key>: tensor[1, 3, H, W]`` ``prepare_observation_for_inference`` only handles the tensor / batch-dim / device step — it crashes on scalar floats with ``expected np.ndarray (got float)``. The right helper is ``build_inference_frame`` which uses the dataset's feature schema (``ds_meta.features``) to: 1. extract the right raw keys per dataset feature, 2. fold ``shoulder_pan.pos`` / ``shoulder_lift.pos`` / ... into a single ``observation.state`` ndarray, 3. prefix camera keys with ``observation.images.``, 4. delegate to ``prepare_observation_for_inference`` for the tensor / batch / device step. Pass ``ds_meta.features`` into the observation provider and switch to ``build_inference_frame`` when available; fall back to the bare ``prepare_observation_for_inference`` only when no dataset is provided (rare — autonomous mode already requires it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:47:59 +02:00
Pepijn	afe40a016b	fix(smolvla2): wrap robot obs in EnvTransition before preprocessor The policy preprocessor pipeline is transition-shaped — its steps read ``TransitionKey.OBSERVATION`` off an ``EnvTransition`` dict, not a flat ``RobotObservation`` dict. Passing the raw observation through made every step bail with ``ObservationProcessorStep requires an observation in the transition``, which the runtime swallowed at warning level. ``select_message`` then got called with no ``observation.images.*`` features and crashed with ``All image features are missing from the batch``. Mirror ``lerobot-record``'s preamble: 1. ``prepare_observation_for_inference`` → numpy → torch, ``CHW`` image layout, ``[0,1]`` scaling, add batch dim, move to device. 2. Wrap into an ``EnvTransition`` (``{TransitionKey.OBSERVATION.value: ...}`` plus ``COMPLEMENTARY_DATA: {}`` and ``None``s for the rest) so transition-aware steps see the keys they expect. 3. Run preprocessor. 4. Unwrap the transition's ``OBSERVATION`` slot to get the final flat dict the policy's ``select_action`` / ``select_message`` consume. Image features now reach the policy; the autonomous loop produces real actions instead of swallowing warnings every tick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:44:24 +02:00
Pepijn	41095e3cc3	fix(smolvla2): instantiate CameraConfig subclasses from JSON dicts ``--robot.cameras`` parses the JSON into ``dict[str, dict]``, but ``RobotConfig`` expects ``dict[str, CameraConfig]`` — each inner value must be the actual ``CameraConfig`` subclass instance for the chosen backend (e.g. ``OpenCVCameraConfig``). Passing raw dicts blew up in ``RobotConfig.__post_init__`` with ``AttributeError: 'dict' object has no attribute 'width'`` when it iterated cameras and tried to read attributes. Look up the right subclass per-camera by its ``"type"`` field via ``CameraConfig.get_choice_class(...)`` (mirroring the lazy-import dance we already do for ``RobotConfig``: eagerly walk ``lerobot.cameras``'s submodules so the registry is populated before lookup). Construct an instance with the rest of the dict's fields. On an unknown camera type, raise a clean ``ValueError`` listing the available choices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:39:28 +02:00
Pepijn	e0fa957569	fix(smolvla2): eagerly import robot submodules before get_choice_class ``RobotConfig._choice_registry`` is populated as a side-effect of each robot's ``@RobotConfig.register_subclass`` decorator running, and those decorators only fire when the corresponding ``lerobot.robots.<name>`` module is imported. The package's ``__init__.py`` doesn't import them — instead ``make_robot_from_config`` does it lazily in its big if/elif chain. ``_build_robot`` jumped the gun: called ``RobotConfig.get_choice_class (robot_type)`` before any robot module had been imported, so the registry was empty and every ``--robot.type=<X>`` produced ``KeyError: 'X'`` (e.g. ``KeyError: 'omx_follower'``). Walk ``lerobot.robots``'s submodules via ``pkgutil.iter_modules`` and ``importlib.import_module`` each one before the lookup. ~200ms on the first invocation, negligible for an autonomous run. On a real ``KeyError`` (typo / unsupported robot), raise a clean ``ValueError`` listing the registry's available choices instead of a bare KeyError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:31:58 +02:00
Pepijn	c661d81409	fix(smolvla2): use RobotConfig.max_relative_target, drop --max_action_norm The hand-rolled action-norm safety clip duplicated what every ``RobotConfig`` already exposes — ``max_relative_target`` — and at the wrong layer (after postprocess but before send_action, instead of inside the robot driver where every other lerobot entry point puts it). The norm clip also rejected entire actions instead of clipping per-motor relative motion, so a single rogue joint would kill the whole tick. Replace with ``--robot.max_relative_target``: a string parsed as either a bare float (uniform per-motor cap) or a JSON object mapping motor name → cap. Passed through to ``RobotConfig(max_relative_target=...)`` at robot construction; the driver's ``send_action`` clips each commanded joint position relative to the current measured one before issuing it on the bus — same behaviour ``lerobot-record`` ships. Also bump ``--chunk_hz`` default from ``4.0`` to ``1.0``. One new chunk per second is what the trained checkpoint can comfortably keep up with on common hardware and gives smoother motion than sub-second chunk regenerations (no RTC interpolation between chunks yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:41:57 +02:00
Steven Palma	26ff40ddd7	chore(deps): cap torch ceiling at <2.12, pin Linux wheels to cu128 (#3570 ) * chore(deps): ceiling + cuda * ci: bump cuda version docker image * ci: add cpu wheel to release workflow * chore(deps): update uv.lock * docs: update installation with cuda note	2026-05-11 19:47:55 +02:00
Maxime Ellerbach	6d269b28c8	docs(omx): adding some examples and scripts (#3566 ) * docs(omx): adding some examples and scripts * cleaning up and reviewing the cli args * adding __init__.py to example folder, adjusting the examples * adding reference to pretrained act policy * moving `.send_action` before `dataset.add_frame` for consistency Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adjusting docstring Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adressing hardcoded dataset fps * removed init as it worked without --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>	2026-05-11 15:36:32 +02:00
Steven Palma	b607c8458e	docs: add policy & compute guide (#3534 ) * docs(policy): contributing a policy guide * docs(training): HW compute guide * chore(docs): add to readme and index * Apply suggestions from code review Co-authored-by: Haoming Song <1847575517@qq.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * chore(docs): slight improvements * refactor(docs): consolidate add policy docs * chore(style): fix pre-commit --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Haoming Song <1847575517@qq.com>	2026-05-11 15:19:12 +02:00
Jash Shah	9e83510c99	fix(datasets): close file handle on VideoDecoder init failure in cache (#3542 ) If VideoDecoder() raises during initialization, the fsspec file handle was leaked since it was opened via __enter__() but never closed on the exception path. Now explicitly closes the handle before re-raising.	2026-05-10 17:30:37 +02:00
Anthony Shoumikhin	1f7b03f5f2	chore(deps): allow torch 2.11/2.12 and fix autocast deprecation (#3435 ) * chore(deps): allow torch 2.11/2.12 and fix autocast deprecation - Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26), and torchcodec to <0.13 (was <0.11) to allow installs against the latest stable torch 2.11 and the upcoming 2.12 line. - Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda") in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+). - Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130, torchcodec 0.11.1, full CUDA 13 stack). Verified locally with `uv sync --locked` from a clean .venv and the lerobot test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is identical to the pre-bump baseline: 18 pre-existing failures (test_sac_policy, test_pi0_rtc, test_pi05_rtc, test_replay_buffer), 0 new, 0 fixed. AI assistance: this change was authored with Claude Code per AI_POLICY.md. * fix(policies): use device-agnostic autocast dtype lookup Pass query_states.device.type to torch.get_autocast_dtype() instead of hardcoding 'cuda', so the cast matches the active autocast context when running under CPU/MPS/XPU autocast. --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-10 13:05:35 +02:00
Steven Palma	cb8edf17e6	chore(dependencies): update uv.lock (#3475 )	2026-05-10 12:24:22 +02:00
Steven Palma	5699f6cbf4	chore(ci): disable auto-stale (#3550 )	2026-05-10 11:49:31 +02:00
masato-ka	0e6114ac36	fix(train): restrict legacy RA-BC migration to JSON checkpoints only (#3490 ) * fix(train): restrict legacy RA-BC migration to JSON checkpoints only _migrate_legacy_rabc_fields was called for all config files, causing json.load to raise DecodeError when a YAML/TOML config was passed to lerobot-train for a new training run. Guard the block with an .endswith(".json") check so migration only runs when resuming from a JSON checkpoint.	2026-05-08 20:27:01 +02:00
Pepijn	965d42825f	review: skip-count fix, atomic writes, dedupe span reconstruction, role guards #1 Plan-update phase reports correct skip count. ``_run_plan_update_phase`` only ran ``run_plan_updates`` for episodes with at least one interjection but hardcoded ``episodes_skipped=0``. The summary undercounted skipped episodes. Now returns ``len(records) - processed`` so processed + skipped == total. #2 ``run_hf_job.py`` installs ``openai``. The ``CMD`` block does ``pip install --no-deps lerobot[branch]`` then explicitly lists transitive deps. ``openai`` was missing — and since ``VlmConfig.backend`` defaults to ``"openai"``, the job would have ``ImportError``'d when ``vlm_client._make_openai_client`` ran. #3 Dedupe subtask-span reconstruction. Module 1's ``_reconstruct_subtasks_from_rows`` (no ``and spans`` guard) and Module 2's ``_read_subtask_spans`` (with the guard) had near- identical logic. Promoted to ``reconstruct_subtask_spans`` in ``reader.py`` using the safer guarded form. Both modules now import the single helper. #5 Atomic staging.py JSONL writes. Mirroring the parquet-writer fix from an earlier review round: ``EpisodeStaging.write`` now writes to a sibling ``.tmp`` and ``Path.replace`` atomically. A crash mid-write can no longer leave a half-written JSONL that ``read()`` would then fail to parse. #6 Atomic ``info.json`` write. Same pattern in ``executor._ensure_annotation_metadata_in_info`` — ``info.json`` is load-bearing for dataset metadata, so partial writes brick the dataset. #7 Writer's role-key guard. ``_normalize_persistent_row`` and ``_normalize_event_row`` accessed ``row["role"]`` directly while every other field used ``.get()``. Pre-validate ``"role" in row`` and raise a friendly ``ValueError`` naming the row, so a future module that accidentally drops ``role`` fails with a triagable message instead of a bare KeyError deep in the writer. #8 Last subtask span's ``end`` extends to episode end. ``reconstruct_subtask_spans`` (the new shared helper) takes an optional ``episode_end_t``. When provided, the final span's ``end`` is closed to that timestamp instead of equalling its own ``start`` (zero duration). Both Module 1's plan-update pass and Module 2's interjection anchoring pass ``record.frame_timestamps[-1]``, so downstream "current subtask at refresh_t" lookups no longer miss refreshes that land inside the final span. Sweep: 66 passed, 0 failed. Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:18:09 +02:00
Pepijn	1238a0cd47	test(annotate): unstale the two failing module tests Both tests were stale relative to design changes that landed earlier on this branch. Update the tests to match the current production contract. ``test_module1_attaches_video_block_to_subtask_prompt`` The test took ``captured[0]`` and asserted on its content blocks, but Module 1 issues several sub-prompts and the rephrasings call (which is text-only, no video block) usually lands first. Two fixes: * The test's intent is "the subtask prompt carries the video block" — not "the first prompt carries it". Pick the call by content (``"atomic subtasks"`` keyword in the text block) so the test is resilient to future reordering of unrelated sub-prompts. * Set ``n_task_rephrasings=0`` so the rephrasings call is skipped entirely — keeps the test focused on ``_generate_subtasks``. ``test_module2_mid_episode_emits_paired_interjection_and_speech`` Two issues both rooted in design changes on the branch: 1. ``InterjectionsAndSpeechModule._mid_episode_interjections`` now anchors interjections on subtask boundaries from Module 1's staging tree, bailing out with zero rows when no spans exist. The production executor runs Module 1 first; the test ran Module 2 in isolation. Reproduce the contract by seeding two ``style=subtask`` rows in the staging before calling Module 2 — gives it the single ``0 → 1`` boundary it needs. 2. The test's stub responder used the marker ``"ONE realistic interruption"`` to match the interjection prompt, but that string is from a previous prompt version. The current ``module_2_interjection.txt`` says ``"Write ONE interjection..."`` — the old prompt asked for counterfactual interjections (e.g. "skip the wipe"), the new one anchors on the upcoming subtask. Marker updated to ``"Write ONE interjection"``; canned response wording aligned to the new design. Sweep on the language stack: 66 passed, 0 failed (was 64 passed, 2 failed). Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:59:27 +02:00
Pepijn	53c7641885	review: fix dead-code bug, add thread safety, atomic writes, smaller cleanups Critical: video_for_episode was unreachable dead code. ``video_for_episode`` was indented inside ``_decode_pyav_direct``, after its ``return`` statement — Python parsed it as a nested function that never executed. Module 1's ``_episode_video_block`` calls ``self.frame_provider.video_for_episode(record, target_count)`` on the ``use_video_url=False`` path, which would have AttributeError'd on any real dataset. Tests passed only because they used ``_StubFrameProvider`` / ``_NullProvider`` which have the method. Moved it to be a proper method of ``VideoFrameProvider`` (right after ``frames_at``). Thread safety on VideoFrameProvider. The executor runs Module 1/2/3 phases under a ``ThreadPoolExecutor``, so the per-instance ``_cache`` dict and the one-shot ``_warned_decode_fail`` flag were exposed to concurrent reads/writes. Added a ``threading.Lock`` field, wrapped cache reads/writes and the warn-flag check-and-set in ``with self._lock:``. Stub fixtures unaffected. episode_clip_path is now a method of VideoFrameProvider. Used to be a free function reaching into ``provider._meta.episodes`` and ``provider._meta.get_video_file_path`` from outside the class. As a method it just uses ``self._meta``. The only caller (Module 1) updated; no external callers. Atomic write in LanguageColumnsWriter. ``pq.write_table(new_table, path)`` was overwriting the parquet shard in place — a crash mid-write would corrupt the file. Now writes to a sibling ``.tmp`` and ``Path.replace`` atomically. Smaller items: * ``executor.py`` docstring opened with "four phases" but listed six. Now says "six phases" to match. * ``[annotations]`` extra in ``pyproject.toml`` now includes ``openai>=1.40,<2.0``. Default ``VlmConfig.backend`` is ``"openai"``, so without it ``_make_openai_client`` would ImportError on a fresh ``uv sync --extra annotations``. * ``_snap_to_frame`` was duplicated identically in ``plan_subtasks_memory.py`` and ``interjections_and_speech.py``. Promoted to ``snap_to_frame`` in ``reader.py`` (next to ``EpisodeRecord``); both modules now import it. Backwards-compat alias not needed — no external callers. * ``EpisodeRecord.frames_df()`` was re-reading the full parquet on every call. Now memoizes via a private dataclass field so repeat calls from different modules pay the cost once. Method signature unchanged. * ``_extract_first_json_object`` had a redundant ``and not escape`` guard that was dead because the prior block already handled and reset ``escape``. Replaced with a comment explaining the invariant. Pre-existing lint cleanups surfaced once these files entered pre-commit's scope: * dead local ``client = clients[0]`` in ``_make_openai_client`` (the real round-robin uses ``clients[rr_counter[...]]``). * ``cmd = ... if "{port}" in cmd else f"...{port}"`` ternary collapse in ``_spawn_parallel_inference_servers``. * ``seek_pts = 0 if stream.time_base is None else int(...)`` ternary collapse in ``_decode_pyav_direct``. * ``# nosec B310`` on the localhost ``urllib.request.urlopen`` probe in ``_server_is_up`` — the URL is the user-configured local-server endpoint the CLI itself spawned, not arbitrary user input. Test added. ``tests/annotations/test_frames.py`` pins the regression on ``VideoFrameProvider``: asserts ``video_for_episode`` and ``episode_clip_path`` are callable methods (not nested dead code or free functions), and that the ``_lock`` field is a real ``threading.Lock``. Sweep: 64 passed, 2 failed (same pre-existing module-impl bugs as before this commit). Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:53:43 +02:00
Pepijn	088c8371df	refactor(annotate): consolidate Module 1's prompt → VLM → JSON-extract pattern Five Module 1 sub-prompts (`_derive_task_from_video`, `_generate_task_rephrasings`, `_generate_subtasks`, `_generate_plan`, `_generate_memory`) all repeated the same shape: result = self.vlm.generate_json([messages])[0] if isinstance(result, dict) and isinstance(result.get(<field>), <type>): ... …each spelled with slightly different field names + post-processing. Three small helpers replace it: * `_vlm_field(messages, field)` — single VLM call, returns ``result[field]`` or ``None``. Centralizes the ``generate_json([m])[0]`` + ``isinstance(dict)`` dance. * `_text_message(text)` — wraps a string in the canonical user-message shape every text-only prompt builds inline. * `_video_message(record, prompt)` — combines the episode video block with a prompt; replaces the duplicated video-block construction inside `_generate_subtasks` (which previously inlined the same ``use_video_url``/``frames_per_second``/``max_video_frames`` branches that `_episode_video_block` already implements). Net -35 LOC. Each call site now is 3-5 lines instead of 10-20. The public method signatures are unchanged so tests don't move. Drive-by: `_task_seems_bad` collapsed via SIM103 fix; `zip` in `run_plan_updates` annotated `strict=True` per ruff B905. Tests: same 2 pre-existing module-impl failures (`test_module1_attaches_video_block_to_subtask_prompt`, `test_module2_mid_episode_emits_paired_interjection_and_speech`) — they were failing on `origin/feat/language-annotation-pipeline` before this commit and continue to do so for the same reasons. 61/63 in the language stack pass; pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:29:45 +02:00
Pepijn	3a52a18b0e	Merge branch 'feat/language-columns' into feat/language-annotation-pipeline Resolve conflicts and pull in the latest PR 1 fixes. Conflicts: - pyproject.toml: PR 1 added `lerobot-rollout` and PR 2 added `lerobot-annotate` to the same `[project.scripts]` block. Kept both. - uv.lock: dropped both sides and regenerated against the merged `pyproject.toml` (PR 2 dropped the `datatrove` dep when distribution moved to HF Jobs; PR 1's lock didn't have it). Test follow-up: - `tests/annotations/test_pipeline_recipe_render.py` — PR 1 deleted `src/lerobot/configs/recipes/pi05_hirobot.yaml` (review feedback: remove the canonical-recipe file; recipes are user-supplied). The cross-PR contract this test guards is "the recipe DSL renders non-empty messages from pipeline output", which doesn't depend on any specific YAML, so the test now builds an inline blend recipe with the same coverage. Passes. Sweep: 82 passed, 2 failed (pre-existing module-impl bugs: `test_module1_attaches_video_block_to_subtask_prompt`, `test_module2_mid_episode_emits_paired_interjection_and_speech`). The PR 1 carryover (`test_emitted_at_raises_on_ambiguous_per_camera_vqa`) is now passing — the merge brought in PR 1's tightened `_select_one` ambiguity check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:13:11 +02:00
Pepijn	dad2cf1178	refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch The executor previously claimed it would "optionally hand off" to datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it already runs phases inline in every code path, and HF Jobs (see ``examples/annotation/run_hf_job.py``) is the actual distribution strategy. Stop pretending we have an executor selector. * `executor.py`: drop `select_executor_class`, the "kind" log line, and the references to LocalPipelineExecutor / SlurmPipelineExecutor. Module docstring now says distribution is delegated to HF Jobs. * `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`, `slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only `episode_parallelism`. While here, prune the longer "why" docstrings on every field down to the load-bearing bits — full story moves to `docs/source/annotation_pipeline.mdx`. * `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the `[annotations]` extra; the dep was only there for the (never used) cluster executors. Comment block notes the new HF-Jobs delegation. * `reader.py`, `lerobot_annotate.py`: drop their own datatrove / flavor-namespace mentions. * `docs/source/annotation_pipeline.mdx`: - remove the flavor-namespace / sidecar paragraph (out of scope — "multiple revisions = multiple copies" is dataset-level policy); - remove the "writer drops the legacy `subtask_index` column" note (already covered by PR 1's intentional-break call-out); - remove the chat-template + `apply_chat_template(messages, tools=...)` line (covered by Tools doc); - replace the "executor picks Local vs Slurm" paragraph with `--executor.episode_parallelism` and a pointer to HF Jobs; - rewrite the style→recipe section to talk about "recipes" generically instead of pinning a specific YAML; - add a "Running on Hugging Face Jobs" section pointing at `examples/annotation/run_hf_job.py`; - add a "Running locally" example matching the CLI's docstring (`uv run lerobot-annotate --root=... --vlm.model_id=...`); - extend the paper-inspirations list with Pi0.7 and Steerable VLA Policies (Zhao 2025) for Module 3. Tests: same 3 pre-existing failures as before this commit (2 module assertions still in flight; 1 carryover from PR 1). 41/44 pass. Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:09:22 +02:00
Pepijn	bce5387e04	Merge branch 'main' into feat/language-columns	2026-05-08 10:29:49 +02:00
Steven Palma	c8ce413d73	fix(robots): allign lekiwi default with so100 use_degrees (#3531 )	2026-05-07 17:52:34 +02:00
Pepijn	82dffde7fa	fix(ci): speed up multi-task benchmark evals (parallelize + cap VLABench steps) (#3529 ) * fix(ci): run multi-task benchmark evals 5-at-a-time in parallel The eval script supports running tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus — 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra) are unchanged. * fix(ci): cap VLABench smoke eval at 50 steps per task VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s the smoke eval took ~80 minutes of rollouts on top of the image build. The eval is a pipeline smoke test (running_success_rate stays at 0% on this short rollout anyway), so we don't need full episodes — cap each task at 50 steps to bring total rollout time down ~10x. * fix(ci): run VLABench tasks 5-at-a-time in parallel The eval script already supports running multiple tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10 VLABench tasks finish in ~2 waves instead of running sequentially.	2026-05-07 13:37:16 +02:00
Ville Kuosmanen	eaf0218bc8	feat(policy): use pretrained vision encoder weights by default for diffusion and vqbet (#3202 ) * feat: add pretrained vision encoder weights for diffusion and vqbet * fix test by re-generating artifacts --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-07 12:10:38 +02:00
Pepijn	a0e52d52fe	fix(ci): bump robotwin benchmark image to CUDA 12.6 (#3525 ) The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3 on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA repo, so the build failed at apt-get install. Bumping both to -12-6 to match the base image.	2026-05-07 11:11:12 +02:00
Pepijn	85576acc29	docs(tools): drop follow-up-PR references Reword the two callouts in `tools.mdx` to describe the runtime layer in present tense ("not part of the catalog layer shipped today", "those modules don't yet exist in the tree") instead of pointing at a specific follow-up PR. Keeps the doc honest about what works now without coupling it to a particular release order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:29:42 +02:00

... 2 3 4 5 6 ...

1734 Commits