Commit Graph

1513 Commits

Author SHA1 Message Date
Steven Palma 4688b9c27f refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config)
Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and
style standards. No change to policy algorithm/numerics; only UX/CLI and packaging
changes. Tests are intentionally left untouched (out of scope) and need updating
for the removed `model_version` field.

Cleanup & consolidation:
- Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value
  coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the
  config and processor layers.
- Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver
  cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper.

Imports & execution guards:
- Hoist nested imports to module top; relative imports within the package, absolute
  for external modules. The version-gated Qwen3-VL classes import under the single
  `_transformers_available` guard (transformers is pinned >=5.4, which ships them).
- No import-time side effects: `_register_with_transformers()` now runs in
  `GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step
  stubs register lazily before pipeline deserialization (idempotent via the
  registry, no run-once globals).
- Gate optional deps at the point of use with `require_package(..., extra="groot")`.

Dependencies & docs:
- Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default
  to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`.
  Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`.
- Rewrite the `groot.mdx` install section: flash-attn is a purely optional,
  user-managed optimization that LeRobot neither installs nor requires.

Config & CLI:
- Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`;
  no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`,
  `use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning)
  and `tune_vlln` (previously hardwired to True).
- Convert the single-valued `model_version` and `n1_7_backbone_model` fields to
  internal constants.
- Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA
  checkpoints have no LeRobot `type` field and load only via `base_model_path`) and
  is genuinely user-tunable.
- Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a
  back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is
  rejected with the friendly N1.5 removal message instead of an opaque decode error.
2026-06-16 14:45:37 +02:00
Steven Palma 5753f8c18b fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution
GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth).
GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images
on the single CPU main-loop thread, and that cost is timed inside dataloading_s
(preprocessor(batch) runs in the main process, not the dataloader workers), so
adding workers cannot hide it.

- Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead
  of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify
  on config.device (GPU) when available. Bit-identical on CPU when no resize is
  configured; with a resize only the PIL->torchvision bicubic backend differs
  (<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box
  without the saved device falls back to CPU.

- Default image_target_size/crop to the N1.7 backbone's training geometry
  (256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets
  is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new
  embodiment). Previously image_target_size=None disabled the resize, so
  full-resolution frames were patchified into ~4.7x more vision tokens than the
  model was trained on -- inflating dataloading_s (patchify) and update_s (VLM
  sequence) and skewing the input distribution. Checkpoints that pin their own
  sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS.

Net: preprocessing leaves the CPU critical path and the VLM sees the resolution
it was trained on -- faster training/inference and a correct train/serve
distribution. Affects inference too (shared preprocessor); existing checkpoints
still load (backward compatible) but must be retrained to gain the benefits.
2026-06-15 18:20:49 +02:00
Kartik 97bd373d15 Merge pull request #15 from huggingface/fix/groot_n17_core
fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes
2026-06-13 23:05:51 +02:00
Kartik 10a73e3c95 Merge pull request #14 from huggingface/fix/groot_n17_backbone
fix(groot): N1.7 backbone loading and DiT parameter-count logging
2026-06-13 21:47:35 +02:00
Kartik 27c9288b24 Merge pull request #13 from huggingface/fix/groot_n17_docs
docs(groot): document the N1.5 removal and the N1.7 parity test
2026-06-13 21:47:05 +02:00
Steven Palma 378897800a fix(groot): skip normalization overrides for training 2026-06-13 19:51:29 +02:00
Steven Palma fcb371eddd fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes
Covers the GR00T N1.7 source trio (configuration, processor, model wrapper).

Config:
- GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era
  values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning
  instead of silently.
- action_decode_transform gains an 'auto' sentinel so an explicit 'none'
  opt-out wins over the libero_sim default and survives save/load round-trips.
- action_delta_indices is cached on the inputs that determine it.
- Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/
  architectures/eagle backbone markers) are rejected with a single clear
  error pointing to lerobot==0.5.1.

Processor:
- GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync
  select_action (relative eef/non-eef decode in eval/record flows).
- Postprocessor falls back to dataset stats when a raw checkpoint lacks the
  configured embodiment tag; raw-state cache is per-instance, not
  process-global; caller overrides (device, rename_map) are honored on the
  raw-checkpoint branch.
- Camera/modality-key mismatches warn (including the zero-match fallback);
  deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor;
  removed N1.5 processor steps are stubbed to raise the removal guidance and
  the action-unpack step is re-registered as _v2.

Model:
- Flash-attention probe is diagnostic-only; forward raises on a missing loss;
  print() replaced with logging; N1.5 base-path mismatch includes the
  removal guidance.
2026-06-13 18:30:21 +02:00
Steven Palma 895eaf0d7c fix(groot): N1.7 backbone loading and DiT parameter-count logging
- select_layer default tracks the N1.7-3B checkpoint value (16); real
  checkpoint loads still override it from config.json.
- get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and
  warns (instead of silently assuming) when an unrecognized backbone is loaded
  only on the strength of backbone_model_type='qwen'.
- 'revision' pins the GR00T checkpoint repo only and is no longer forwarded
  into the unrelated backbone repo load; pin the backbone via
  transformers_loading_kwargs instead.
- DiT / SelfAttentionTransformer parameter counts go through logging.debug
  instead of print().
2026-06-12 23:55:33 +02:00
Steven Palma edda8552ec docs(groot): document the N1.5 removal and the N1.7 parity test
- groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to
  keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced
  with `hf download`.
- policy_groot_README.md: N1.5 removal note, updated paper / model-card links,
  and the two-comparison (model parity + preprocessor parity) description of
  the original-vs-LeRobot test, including the raw-observation artifacts and
  recorded seed.
2026-06-12 23:40:36 +02:00
Kartik c8225d749a Merge pull request #12 from acwrenn53/exp/groot-n17-test-groot-lerobot
Adopt test_groot_lerobot for GR00T N1.7, drop N1.5
2026-06-12 11:01:25 +02:00
nv-sachdevkartik 68f869b7a0 test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5
The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5
checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On
load, model_version defaults to n1.7 while the base path infers n1.5, so the
version-consistency guard in GrootConfig.__post_init__ raised ValueError and both
test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no
longer a supported model_version.

Adopt the test for N1.7:
- MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via
  GrootPolicy.from_pretrained as a base N1.7 model).
- Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint
  embodiment_id.json), via a single EMBODIMENT_TAG constant.
- DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size.
- Docstrings/labels updated to 'GR00T N1.7'.

Both tests run and pass on CUDA; full tests/policies/groot/ suite is
73 passed / 0 failed / 0 skipped.
2026-06-12 08:42:45 +00:00
Kartik 4119ad4d10 Merge pull request #11 from acwrenn53/exp/groot-n17-logit-parity
GR00T N1.7 logit parity
2026-06-12 10:14:05 +02:00
nv-sachdevkartik 750358895b test(groot): move parity producer into utils/ package
Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into
a tests/policies/groot/utils/ package (with __init__.py) and update all path
references in the test docstring/skip-message and the policy README.
2026-06-12 08:10:03 +00:00
nv-sachdevkartik bc4d0db8f4 docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring 2026-06-12 08:06:33 +00:00
nv-sachdevkartik 45e273b806 test(groot): self-contained parity test + in-repo producer + docs
- Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py
- Make the test self-contained: producer script (dump_original_n1_7.py) now lives
  next to the test; default artifact dir is repo-relative
  (tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The
  test only reads artifacts and skips if absent -- it never creates external dirs.
- Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer;
  never committed.
- Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note).
- Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md).
- Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity.

9/9 embodiments still pass (max|diff| < 3e-6, fp32 eps).
2026-06-12 07:47:11 +00:00
nv-sachdevkartik 8b5f56b63c test(groot): parametrize N1.7 parity across all checkpoint embodiments
Generalize the original-vs-LeRobot N1.7 output-parity test from a single
libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid,
real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built
generically from checkpoint metadata; the test discovers per-tag .npz artifacts
and runs one parametrized case each, loading the LeRobot model once via a fixture.

All 9 embodiments match the original to fp32 epsilon (max|diff| < 3e-6), confirming
the integration is correct across the model's full embodiment space and not overfit
to libero_sim.
2026-06-11 21:41:30 +00:00
nv-sachdevkartik 9f1ee224cb test(groot): add N1.7 original-vs-LeRobot output parity test
Verifies the LeRobot GR00T N1.7 integration produces equivalent raw
action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed,
precision (fp32) and attention kernel (SDPA): max|diff|=8.9e-7 on the
libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10).

The two impls pin incompatible transformers majors (orig 4.57.3 vs
LeRobot 5.x) and cannot share a process, so the original outputs + exact
collated inputs are produced out-of-process and loaded from an .npz. The
test skips on CI / when the checkpoint or artifact are absent.
2026-06-11 20:59:14 +00:00
nv-sachdevkartik 885f55ef04 groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone)
Addresses checker nit: processor_groot.py docstring still described the N1.5
Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code.
2026-06-11 18:10:46 +00:00
nv-sachdevkartik bba996ef8d groot: reuse lerobot get_device_from_parameters instead of inline lookup
modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot
ships get_device_from_parameters in policies/utils.py (used by diffusion,
vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework.
2026-06-11 18:03:28 +00:00
nv-sachdevkartik 162b07512a groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder)
N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration,
not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir.

GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder +
CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so
flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole
dependency action_encoder.py are dead. Verified: no src/ or tests/ reference.

Removed (~2037 LOC):
- eagle2_hg_model/ (4 files, ~1575 LOC)
- action_head/flow_matching_action_head.py (408 LOC)
- action_head/action_encoder.py (54 LOC)

cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7).
2026-06-11 17:49:12 +00:00
acwrenn53 0509ea05df Merge pull request #10 from acwrenn53/nvidia-gr00t-n17-lerobot-cleanup
Remove GR00T N1.5 support and fix LIBERO gripper action transform
2026-06-05 12:15:10 -07:00
Andrew Wrenn de1a9e5ad9 Reconnect GR00T relative action processors 2026-06-05 09:31:04 -07:00
groot-validation 6803439f22 groot: auto-enable LIBERO gripper action transform for libero_sim
GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode
transform existed but was never auto-enabled for embodiment_tag=libero_sim,
so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still
overridable). LIBERO Spatial eval: 0% -> 98%.
2026-06-05 00:56:11 +00:00
nv-sachdevkartik 90d1e70da2 removed remaining N1.5 traces 2026-06-05 00:11:37 +00:00
nv-sachdevkartik a35ac22afd removed n1.5 dependency 2026-06-04 22:14:07 +00:00
Kartik fd7fed08e2 Merge branch 'huggingface:main' into nvidia-gr00t-n17-lerobot 2026-06-04 23:41:09 +02:00
Maxime Ellerbach 2e9cd87bbd feat(policies): add VLA-JEPA (#3568)
* first commit

* feat(policies): add VLA-JEPA

* feat(policies): add VLA-JEPA

* support vla_jepa

* (feat)policies: add VLA-JEPA

* linting

* adding deps to pyproject.toml

* updating uv lock

* adding guards to avoid needing transformers and diffusers for type checking and basic tests

* fixing action and state dim

* fix warnings with qwen processor kwargs

* fixing wm_loss not propagating

* adjusting obs steps, tublets size to match original implementation

* some more fixes to be closer to the original implem

* adding more tests to ensure good coverage

* align VLA-JEPA architecture with original checkpoint

- Remove stale `action_num_heads` / `action_attention_head_dim` config fields;
  DiT head dimensions are now always derived from the preset (DiT-B/L/test).
- Add `num_target_vision_tokens` and `action_max_seq_len` config fields required
  by the action head's future-token embedding and positional embedding tables.
- Fix default `qwen_model_name` to 2B (matches all released checkpoints).
- Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match
  checkpoint key names; replace `nn.Sequential` decoder/state-encoder with
  `_MLP2` (layer1/layer2 naming).
- Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim`
  (DiT input width) rather than `action_hidden_size` (DiT output width).
- Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match
  checkpoint; add alternating cross/self attention (even blocks cross-attend to
  Qwen context, odd blocks self-attend).
- Add `DiT-test` preset for unit tests.
- Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks
  (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename
  `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`.

* propagate action_is_pad masking through VLA-JEPA policy pipeline

Pass the `action_is_pad` tensor from the batch through to the action head
so padded timesteps are excluded from the flow-matching loss.

* update VLA-JEPA tests for arch changes and action_is_pad

- Switch conftest to use `action_model_type="DiT-test"` now that
  `action_num_heads` / `action_attention_head_dim` have been removed.
- Add action_head tests covering fully-padded loss (zero) and equivalence
  of action_is_pad=None vs all-zeros mask.
- Remove obsolete `test_native_to_lerobot_wm_only` test.

* add VLA-JEPA documentation

Covers architecture overview, pretrained checkpoints, config reference,
training/eval commands for LIBERO-10, and guidance on fine-tuning for
single-camera datasets.

* add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated)

* make default params more aligned with paper and pretrained models
- adding possibility of freezing qwen backbone and world model
- added tests for weight loading

* trying out to re-init the action head to avoid pretraining dimension mismatch

* allow different state dim and action dim

* removing missleading future_action_window_size to just use chunk_size

* lots of changes to make existing weights work, need to massively refactor the pre and post processing

* refactoring into using pre and post processor

* pre-commit cleanup

* fixing doc defaults args

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>

* adressing dtype zeros issue

* adding guard for diffusers

* fixing training and exal examples

* trying to close success rate gap

* fix qwen norm layer output libero eval is now as expected

* adding instructions for different embodiement + fixing some tests

* smol fix to avoid having default CPU device when training

* fixing misconception about multiview / singleview handling

* removing conversion script

* adding licences

* adding .mdx docs and shortening polivy_vla_jepa_README.md

* removing useless pre-processor

* cleanup

* removing swish in favor of silu

* adding configuration gripper index and threshold

* fixing simlink

---------

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: ginwind <ginwind@mail.ustc.edu.cn>
2026-06-04 19:22:51 +02:00
acwrenn53 0c3cc4c9d6 Merge pull request #6 from acwrenn53/nvidia-gr00t-n17-lerobot-rtc-2
Nvidia gr00t n17 lerobot rtc 2
2026-06-03 16:10:49 -07:00
Andrew Wrenn 6caeac9d07 Ignore padded GR00T N1.7 RTC prefix rows 2026-06-03 14:04:31 -07:00
Andrew Wrenn 1d6810b814 Trim GR00T N1.7 RTC chunks to valid horizon 2026-06-03 13:51:35 -07:00
Andrew Wrenn de9af57475 Fix GR00T N1.7 RTC action decoding 2026-06-03 13:43:13 -07:00
Jaimin d1b1c5c8cf docs: fix broken dataset script paths (datasets/v30 -> scripts) (#3695)
The docs pointed at src/lerobot/datasets/v30/, which does not exist.
Both scripts actually live in src/lerobot/scripts/:

- convert_dataset_v21_to_v30.py
- augment_dataset_quantile_stats.py

Updated the four references (one python -m module path and three
file-path invocations) to the correct location, matching each
script's own usage docstring.
2026-06-03 14:48:19 +02:00
Nikodem Bartnik 741c2d0a39 Docs/add lelab (#3707)
* first text draft (no images)

* simplified docs

* fix formatting

* add youtube video

* add a tip about compatibility

* fix broken link
2026-06-03 14:22:05 +02:00
Haoming Song 19fe315971 fix(train): enable relative action overrides for pretrained processors (#3711)
* fix(train): enable relative action overrides for pretrained processors
Keep pretrained processor pipelines when use_relative_actions is enabled and
apply relative/absolute action processor settings through overrides. Rename the
relative action processor registry key to relative_actions_processor.

* fix(config): reject rename_map without pretrained checkpoint

Fail fast when rename_map is set during fresh initialization, since fresh
configs derive feature names from the current dataset and no rename is applied.

---------

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-06-03 11:46:35 +02:00
Andrew Wrenn 364750ada2 Allow Groot fake RTC chunk prefetch 2026-06-02 14:20:00 -07:00
Andrew Wrenn 342d223706 Restore GR00T Flash Attention install guidance 2026-06-02 13:26:08 -07:00
Andrew Wrenn e3b203e5a7 Move Groot processor compatibility into Groot loader 2026-06-02 13:19:12 -07:00
Khalil Meftah 906b585826 fix(datasets): default private to None in push_to_hub to respect Hub org visibility settings (#3713) 2026-06-02 19:25:13 +02:00
Andrew Wrenn b568c41355 Add GR00T N1.7 support
Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests.

Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
2026-06-01 08:57:04 -07:00
Khalil Meftah b8ad81bf39 feat(rewards): add ROBOMETER reward model (#3627)
* feat/add ROBOMETER reward model

* feat(rewards): add Robometer offline progress labeling script

* fix(rewards/robometer): add missing input keys mm_token_type_ids

* chore(rewards/robometer): default to lerobot/Robometer-4b model

* doc(rewards/robometer): update citation and original github link

* feat(rewards/robometer): add image key argument to compute Robometer progress
2026-05-29 21:45:39 +02:00
Haoquan Fang 24017e960c Add MolmoAct2 policy (#3604)
* add molmoact2 policy

* add apache headers to molmoact2 files

* simplify molmoact2 package imports

* align molmoact2 feature validation with eo pattern

* remove molmoact2 processor override from factory

* guard molmoact2 transformers imports

* guard molmoact2 processor transformers import

* add scipy dependency to molmoact2 extra

* use a single molmoact2 action queue

* move molmoact2 config logic into config

* fix molmoact2 hf image key resolution

* load molmoact2 without remote code

* lazy import molmoact2 scipy

* format molmoact2 files

* skip molmoact2 tests without optional deps

* fix molmoact2 pre-commit checks

* validate molmoact2 gripper range
2026-05-27 18:58:37 +02:00
Khalil Meftah e86f5af5bf feat(rewards): add TOPReward reward model (#3629)
* feat(rewards): add TOPReward reward model

* refactor(rewards): clean up TOPReward processor/model

* fix(rewards/topreward): add missing input keys mm_token_type_ids

* fix(rewards/topreward): fix pyproject extra typo and simplify processor (#3653)

Add lerobot[topreward] extra to all in
pyproject.toml, drop the redundant labels arg in scoring, and
collapse the dead-branch shape check in the encoder processor.

* optmize topreward input processing (#3660)

---------

Co-authored-by: Cole <91766445+jcoleharrison@users.noreply.github.com>
Co-authored-by: Haoming Song <haomingsong24@gmail.com>
2026-05-27 14:24:31 +02:00
Haoming Song 5c98e80430 fix(gr00t): fix Eagle25VL model and processor crash in transformers>=5.4.0, <5.6.0 (#3652)
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-05-26 14:04:22 +02:00
Reece O'Mahoney f65f3f7a4a Fix policy.path in YAML configs (PR #3145 followup) (#3597)
PR #3145 added YAML support for policy.path but left two bugs:

1. extract_path_fields_from_config only deleted config_data[field] when
   no sibling overrides existed. With siblings, the dict stayed in place
   and draccus crashed decoding it as PreTrainedConfig (no 'type' key).
   Sibling overrides go into _config_yaml_overrides and are applied later
   by from_pretrained(), so the field can always be removed.

2. wrap() updated config_path_cli to the cleaned temp file path but
   never propagated it to the draccus.parse fallback branch. cli_args
   still contained --config_path=<original>, so draccus read the
   original YAML with path: still present.

Tests passed because they (a) called extract_path_fields_from_config
directly and (b) included type: alongside path: in the YAML, sidestepping
both bugs.

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-05-26 14:01:19 +02:00
Pepijn 8194897994 fix(deps): cap placo below 0.9.16 and harden kinematics import (#3647)
* fix(deps): cap placo below 0.9.16 and harden kinematics import

placo 0.9.16 links against liburdfdom_sensor.so.4, which is unavailable
on Ubuntu 24.04 (noble ships urdfdom 3.x). Importing placo on that base
crashes with:

  ImportError: liburdfdom_sensor.so.4.0: cannot open shared object file

This broke nightly Latest Deps tests (CPU and GPU) when the lockfile
upgrade picked placo 0.9.16, since lerobot.model.kinematics
unconditionally imports placo when _placo_available is true, and that
check (importlib.util.find_spec) cannot detect dlopen failures of
transitive shared libraries — so unrelated subsystems (RL actor,
gym_manipulator) became unimportable.

Two changes:

1. Pin placo to <0.9.16 in pyproject.toml + regenerate uv.lock
   (0.9.16 → 0.9.15). Short-term unblock for nightly CI until system
   urdfdom 4.x is broadly available.

2. Harden the import guard in src/lerobot/model/kinematics.py:
   wrap 'import placo' in try/except ImportError so a missing
   transitive .so no longer crashes module import. RobotKinematics
   instantiation now raises an informative ImportError citing the
   underlying dlopen failure via _raise_if_placo_unusable().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(kinematics): hoist _placo_runtime_error to module scope for mypy

Mypy walks the TYPE_CHECKING branch in which the runtime else-block is
not executed, so _placo_runtime_error was only defined at runtime and
mypy reported 'Name "_placo_runtime_error" is not defined' on the
three references inside _raise_if_placo_unusable. Declare the symbol
unconditionally at module scope with a default of None; the runtime
import-failure branch still assigns to it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(kinematics): drop verbose comments around placo import guard

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 12:03:07 +02:00
Haoming Song 9f437d86b6 fix(groot): align GR00TN15Config with transformers config dataclasses (#3606)
* fix(gr00t): fix gr00t config dataclass init TypeError

* fix(groot): guard strict config decorator without transformers for passing CI

---------

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-05-22 10:31:04 +02:00
Haoming Song b74a551d38 fix(pi0, pi05): stabilize torch.compile and expand test coverage (#3610)
* chore(gr00t): sync with #3606 for fixing gr00t config crash

* fix(pi0&pi05): fix graph break caused by deepcopy of past_key_values in sample_actions

* fix(pi0&pi05): fix frequent recompile caused by compute_layer_complete

* feat(test): add compile test and benchamrk for pi0 and pi05

* feat(test): add comprehensive testing for pi0 and pi05. Including processor, forward, sample action, etc.
2026-05-22 10:29:34 +02:00
Nikodem Bartnik c0a2e9814d fix examples (#3623)
- Fixed broken API examples in Lerobot Imitation Learning Documentation
- Teleoperation with cameras improved by adding a fixed frequency in the loop (without it the cameras feed gets very slow)
- Wrapped record example script in main() to avoid problems on Mac
- Previously teleoperation example was using SO-ARM and teleoperation with cameras was using Koch. I changed it to use SO-ARM in all of the examples.
- Added section on how to train with HF Jobs - CLI and Python examples
- Replaced lerobot-record with lerobot-rollout in policies examples
2026-05-21 22:14:07 +02:00
Khalil Meftah bac4f61eae refactor: support custom progress parquet overlays (#3640) 2026-05-21 14:32:10 +02:00
Virgileboat f4b834844e Feat/clean can bus (#3526)
* change timeout  for handshake

* enforce last state read when querry

* change import order

* fix(motors): flush stale robstride RX and harden feedback drain

* robstride: remove redundant timeout and max_messages casts

* bugfix + %-style

* update exception catch
2026-05-21 11:44:04 +02:00