lerobot

admin/lerobot

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-05 17:17:01 +00:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Steven Palma	708fa1d189	feat(policies): add Gr00t N1.7 policy (#3922 ) * Add GR00T N1.7 support Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests. Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com> * Move Groot processor compatibility into Groot loader * Restore GR00T Flash Attention install guidance * Allow Groot fake RTC chunk prefetch * Fix GR00T N1.7 RTC action decoding * Trim GR00T N1.7 RTC chunks to valid horizon * Ignore padded GR00T N1.7 RTC prefix rows * removed n1.5 dependency * removed remaining N1.5 traces * groot: auto-enable LIBERO gripper action transform for libero_sim GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode transform existed but was never auto-enabled for embodiment_tag=libero_sim, so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still overridable). LIBERO Spatial eval: 0% -> 98%. * Reconnect GR00T relative action processors * groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder) N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration, not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir. GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder + CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole dependency action_encoder.py are dead. Verified: no src/ or tests/ reference. Removed (~2037 LOC): - eagle2_hg_model/ (4 files, ~1575 LOC) - action_head/flow_matching_action_head.py (408 LOC) - action_head/action_encoder.py (54 LOC) cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7). * groot: reuse lerobot get_device_from_parameters instead of inline lookup modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot ships get_device_from_parameters in policies/utils.py (used by diffusion, vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework. * groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone) Addresses checker nit: processor_groot.py docstring still described the N1.5 Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code. * test(groot): add N1.7 original-vs-LeRobot output parity test Verifies the LeRobot GR00T N1.7 integration produces equivalent raw action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed, precision (fp32) and attention kernel (SDPA): max\|diff\|=8.9e-7 on the libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10). The two impls pin incompatible transformers majors (orig 4.57.3 vs LeRobot 5.x) and cannot share a process, so the original outputs + exact collated inputs are produced out-of-process and loaded from an .npz. The test skips on CI / when the checkpoint or artifact are absent. * test(groot): parametrize N1.7 parity across all checkpoint embodiments Generalize the original-vs-LeRobot N1.7 output-parity test from a single libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid, real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built generically from checkpoint metadata; the test discovers per-tag .npz artifacts and runs one parametrized case each, loading the LeRobot model once via a fixture. All 9 embodiments match the original to fp32 epsilon (max\|diff\| < 3e-6), confirming the integration is correct across the model's full embodiment space and not overfit to libero_sim. * test(groot): self-contained parity test + in-repo producer + docs - Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py - Make the test self-contained: producer script (dump_original_n1_7.py) now lives next to the test; default artifact dir is repo-relative (tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The test only reads artifacts and skips if absent -- it never creates external dirs. - Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer; never committed. - Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note). - Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md). - Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity. 9/9 embodiments still pass (max\|diff\| < 3e-6, fp32 eps). * docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring * test(groot): move parity producer into utils/ package Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into a tests/policies/groot/utils/ package (with __init__.py) and update all path references in the test docstring/skip-message and the policy README. * test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5 The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5 checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On load, model_version defaults to n1.7 while the base path infers n1.5, so the version-consistency guard in GrootConfig.__post_init__ raised ValueError and both test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no longer a supported model_version. Adopt the test for N1.7: - MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via GrootPolicy.from_pretrained as a base N1.7 model). - Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint embodiment_id.json), via a single EMBODIMENT_TAG constant. - DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size. - Docstrings/labels updated to 'GR00T N1.7'. Both tests run and pass on CUDA; full tests/policies/groot/ suite is 73 passed / 0 failed / 0 skipped. * docs(groot): document the N1.5 removal and the N1.7 parity test - groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced with `hf download`. - policy_groot_README.md: N1.5 removal note, updated paper / model-card links, and the two-comparison (model parity + preprocessor parity) description of the original-vs-LeRobot test, including the raw-observation artifacts and recorded seed. * fix(groot): N1.7 backbone loading and DiT parameter-count logging - select_layer default tracks the N1.7-3B checkpoint value (16); real checkpoint loads still override it from config.json. - get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and warns (instead of silently assuming) when an unrecognized backbone is loaded only on the strength of backbone_model_type='qwen'. - 'revision' pins the GR00T checkpoint repo only and is no longer forwarded into the unrelated backbone repo load; pin the backbone via transformers_loading_kwargs instead. - DiT / SelfAttentionTransformer parameter counts go through logging.debug instead of print(). * fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes Covers the GR00T N1.7 source trio (configuration, processor, model wrapper). Config: - GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning instead of silently. - action_decode_transform gains an 'auto' sentinel so an explicit 'none' opt-out wins over the libero_sim default and survives save/load round-trips. - action_delta_indices is cached on the inputs that determine it. - Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/ architectures/eagle backbone markers) are rejected with a single clear error pointing to lerobot==0.5.1. Processor: - GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync select_action (relative eef/non-eef decode in eval/record flows). - Postprocessor falls back to dataset stats when a raw checkpoint lacks the configured embodiment tag; raw-state cache is per-instance, not process-global; caller overrides (device, rename_map) are honored on the raw-checkpoint branch. - Camera/modality-key mismatches warn (including the zero-match fallback); deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor; removed N1.5 processor steps are stubbed to raise the removal guidance and the action-unpack step is re-registered as _v2. Model: - Flash-attention probe is diagnostic-only; forward raises on a missing loss; print() replaced with logging; N1.5 base-path mismatch includes the removal guidance. * fix(groot): skip normalization overrides for training * fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth). GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images on the single CPU main-loop thread, and that cost is timed inside dataloading_s (preprocessor(batch) runs in the main process, not the dataloader workers), so adding workers cannot hide it. - Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify on config.device (GPU) when available. Bit-identical on CPU when no resize is configured; with a resize only the PIL->torchvision bicubic backend differs (<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box without the saved device falls back to CPU. - Default image_target_size/crop to the N1.7 backbone's training geometry (256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new embodiment). Previously image_target_size=None disabled the resize, so full-resolution frames were patchified into ~4.7x more vision tokens than the model was trained on -- inflating dataloading_s (patchify) and update_s (VLM sequence) and skewing the input distribution. Checkpoints that pin their own sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS. Net: preprocessing leaves the CPU critical path and the VLM sees the resolution it was trained on -- faster training/inference and a correct train/serve distribution. Affects inference too (shared preprocessor); existing checkpoints still load (backward compatible) but must be retrained to gain the benefits. * refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config) Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and style standards. No change to policy algorithm/numerics; only UX/CLI and packaging changes. Tests are intentionally left untouched (out of scope) and need updating for the removed `model_version` field. Cleanup & consolidation: - Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the config and processor layers. - Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper. Imports & execution guards: - Hoist nested imports to module top; relative imports within the package, absolute for external modules. The version-gated Qwen3-VL classes import under the single `_transformers_available` guard (transformers is pinned >=5.4, which ships them). - No import-time side effects: `_register_with_transformers()` now runs in `GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step stubs register lazily before pipeline deserialization (idempotent via the registry, no run-once globals). - Gate optional deps at the point of use with `require_package(..., extra="groot")`. Dependencies & docs: - Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`. Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`. - Rewrite the `groot.mdx` install section: flash-attn is a purely optional, user-managed optimization that LeRobot neither installs nor requires. Config & CLI: - Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`; no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`, `use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning) and `tune_vlln` (previously hardwired to True). - Convert the single-valued `model_version` and `n1_7_backbone_model` fields to internal constants. - Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA checkpoints have no LeRobot `type` field and load only via `base_model_path`) and is genuinely user-tunable. - Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is rejected with the friendly N1.5 removal message instead of an opaque decode error. * Optimize GR00T N1.7 image preprocessing * Remove PIL fallback from GR00T preprocessing * Fix GROOT relative action training stats * Address GROOT relative action review feedback * Fix GROOT N1.7 relative action stats * Fix GROOT relative action training stats * Fix GROOT relative action padding and RTC leftovers * Reset rollout state after robot episode end * Revert "Reset rollout state after robot episode end" This reverts commit `1322f45aec`. * Move GROOT relative stats out of train script * Guard GR00T relative action stepwise decode * Match GR00T N1.7 OSS preprocessing and relative actions * Apply LIBERO action decode override after loading * Format GR00T OSS parity changes * chore(policies): add guards, warnings and comments + recover tests n1.5 check * fix(style): pre-commit * fix(ci): guard dependecy checks * chore(groot): move cv2 to the top as its in the default install tag * chore(policies): add explicit dataset dependecy to gr00t implementation * fix(test): add guard * fix(groot): make N1.7 letterbox opt-in * feat(groot): activate checkpoint-configured N1.7 raw-state dropout during training Isaac-GR00T applies dual state regularization during fine-tuning: raw-state zeroing driven by the processor sidecar's state_dropout_prob (0.2 for the inspected N1.7 checkpoint) plus encoded-feature dropout. Baseline LeRobot kept the processor in deterministic mode, so the raw-state dropout never activated (RCA Tier-2 contributor to the LeRobot-trained SO-101 failures). - GrootN17PackInputsStep: runtime-only 'training' flag + state_dropout_prob; whole-sample state zeroing gated on torch.is_grad_enabled() so eval and no_grad validation paths are unaffected - sidecar loader reads state_dropout_prob from processor_config.json - state_dropout_prob serializes with the step; the training flag intentionally does not (reloaded pipelines default to eval, re-enabled only when processors are rebuilt with dataset_meta) - _set_groot_preprocessor_training toggles any dataclass step exposing a 'training' field on serialized-pipeline reloads Verification: tests/policies/groot/test_groot_state_dropout.py (4 passed) on RTX PRO 6000 / CUDA 13.3. * fix(groot): align N1.7 fine-tuning optimizer/scheduler/precision with Isaac-GR00T Evidence from the LeRobot-vs-OSS checkpoint comparison: the LeRobot/HF 8k checkpoint's DiT moved only ~19% as far from base as the OSS-trained one (0.0547 vs 0.285 relative L2) - undertrained because the scheduler decayed over a hardcoded 10k steps regardless of --steps, on top of beta1/clip mismatches. - AdamW betas (0.95, 0.999) -> (0.9, 0.999) and grad_clip_norm 10.0 -> 1.0 (Isaac defaults) - scheduler: hardcoded CosineDecayWithWarmup(10k decay, floor 10% peak) -> DiffuserSchedulerConfig HF cosine with ceil(max_steps * warmup_ratio) warmup, deriving num_training_steps from the outer --steps at runtime - model_params_fp32 (default true): keep master weights in FP32 and compute under BF16 autocast like the native N1.7 recipe (fixes optimizer-update numerics vs pure-BF16 params) - weight-decay grouping via transformers get_parameter_names: biases and norm parameters excluded from decay - restore the TF4 lm_head/embedding weight tie so the unused Qwen LM head stays frozen and deduplicated in checkpoints - action_mask kept in native dtype for the masked flow-matching loss - drop_n_last_frames: exclude episode tails that cannot supply a complete action chunk (Isaac sampler behavior) Verification: tests/policies/groot/test_groot_training_optim_contract.py (7 passed) + remaining groot suite 11 passed/5 skipped on RTX PRO 6000 / CUDA 13.3. Note: tests/policies/groot/test_groot_n1_7.py does not collect on the base branch (pre-existing ImportError, fixed in PR #37). * feat(groot): train-time random crop for N1.7 (eval keeps center crop) Isaac-GR00T crops a random crop_fraction window during training and the deterministic center window at eval, replaying the sampled window across all camera views of a sample. This contract is unchanged since the N1.5 release (gr00t/data/transform/video.py: "If mode is 'train', return a random crop transform. If mode is 'eval', return a center crop transform.") and mirrors LeRobot's own Diffusion/VQBeT crop_is_random pattern. The LeRobot N1.7 port used the eval center crop for training too, so the fine-tuned projector/DiT never sees frame borders and trains on a single fixed appearance point. Scope: crop geometry ONLY - no color jitter, no new dependencies. The random window is plain numpy slicing inside the existing cv2 eval transform: - _transform_n1_7_image_for_vlm_albumentations gains crop_position=(y, x) fractions; None keeps the center crop byte-identical to before (verified by test) - GrootN17VLMEncodeStep gains a runtime-only 'training' flag (never serialized; reloaded pipelines default to eval); training samples ONE window per sample and reuses it across (timestep, view) frames - Isaac's cross-view consistency - gated on torch.is_grad_enabled() so no_grad validation and frozen-eval paths are unaffected - wired via dataset_meta is not None in make_groot_pre_post_processors and the existing _set_groot_preprocessor_training on serialized reloads Verification: tests/policies/groot/test_groot_train_random_crop.py (8 passed: center-crop bit-exactness with crop_position=None, corner/center windows, cross-view replay, train!=eval, no_grad gating, seed reproducibility, serialization contract) + groot suite 23 passed / 5 skipped on RTX PRO 6000 / CUDA 13.3. * docs(groot): update Training & hardware Evaluation commands Replace the multi-GPU accelerate-launch Training snippet with the current single-command 'uv run lerobot-train' N1.7 recipe (relative actions excluding gripper, bf16, flash attention, chunk/n_action_steps=16, bs64/20k steps). Replace the bimanual 'Evaluate in your hardware setup' rollout example with the SO-101 follower RTC 'uv run lerobot-rollout' command (strategy.type=base, inference.type=rtc, wrist+front cameras, place-the-vial task). Docs-only; no source/test changes. * docs(groot): parameterize commands with env vars + fill LIBERO results - Introduce BASE_MODEL / DATASET_ID / REPO_ID / JOB_NAME / OUTPUT_DIR env vars in the training command and reuse OUTPUT_DIR + BASE_MODEL in the rollout cmd. - Fill the LIBERO benchmark table with GR00T-LeRobot success rates (Spatial 94%, Object 98%, Goal 93%, LIBERO 10/Long 90%; avg 93.75%), drop the OSS column and XX placeholders. LeRobot-focused. * docs(groot): drop export block, reference env vars directly Use $DATASET_ID / $BASE_MODEL / $REPO_ID / $OUTPUT_DIR / $JOB_NAME as bare placeholders in the commands without concrete export assignments. * docs(groot): keep BASE_MODEL export in training command * docs(groot): use literal HF repo IDs for dataset/policy repo_id Public-facing Hub references (--dataset.repo_id, --policy.repo_id) shown as concrete IDs; local-only values ($OUTPUT_DIR, $JOB_NAME) stay as placeholders. * docs(groot): add LIBERO training command example * docs(groot): remove LIBERO checkpoints subdirectory section * docs(groot): use $BASE_MODEL for base_model_path in LIBERO eval * docs(groot): drop hf download step from LIBERO eval, fix intro * docs(groot): restore suite checkpoint download intro sentence * docs(groot): remove checkpoint download note above LIBERO eval * docs(groot): update training and rollout commands with new parameters and dependencies * Add sample so101 training command * Remove sample so101 training command * docs(groot): remove optional Flash Attention setup instructions and update base model path for evaluation * docs(groot): update training command with image transformation parameters * docs(groot): add note on inference.queue_threshold value for stable inference * chore(style): pre-commit gr00t * docs(groot): update * chore(policies): minor details * fix(groot): license headers + test guards * chore(policies): fix tests * docs(groot): relative actions param doc * chore(policy): address some of the AI review items --------- Co-authored-by: Andrew Wrenn <awrenn@nvidia.com> Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: groot-validation <groot-validation@localhost> Co-authored-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: lbenhorin <lbenhorin@nvidia.com>	2026-07-03 21:15:09 +02:00
Steven Palma	be46bdea8f	feat(policies): add Nvidia Gr00t N1.5 model (#2292 ) * feat(policies): add Nvidia Gr00t N1.5 model Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> * fix(docs): add groot to index Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com> --------- Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com>	2025-10-23 13:50:30 +02:00

Steven Palma

708fa1d189

feat(policies): add Gr00t N1.7 policy (#3922 )

* Add GR00T N1.7 support

Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests.

Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>

* Move Groot processor compatibility into Groot loader

* Restore GR00T Flash Attention install guidance

* Allow Groot fake RTC chunk prefetch

* Fix GR00T N1.7 RTC action decoding

* Trim GR00T N1.7 RTC chunks to valid horizon

* Ignore padded GR00T N1.7 RTC prefix rows

* removed n1.5 dependency

* removed remaining N1.5 traces

* groot: auto-enable LIBERO gripper action transform for libero_sim

GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode
transform existed but was never auto-enabled for embodiment_tag=libero_sim,
so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still
overridable). LIBERO Spatial eval: 0% -> 98%.

* Reconnect GR00T relative action processors

* groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder)

N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration,
not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir.

GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder +
CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so
flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole
dependency action_encoder.py are dead. Verified: no src/ or tests/ reference.

Removed (~2037 LOC):
- eagle2_hg_model/ (4 files, ~1575 LOC)
- action_head/flow_matching_action_head.py (408 LOC)
- action_head/action_encoder.py (54 LOC)

cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7).

* groot: reuse lerobot get_device_from_parameters instead of inline lookup

modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot
ships get_device_from_parameters in policies/utils.py (used by diffusion,
vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework.

* groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone)

Addresses checker nit: processor_groot.py docstring still described the N1.5
Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code.

* test(groot): add N1.7 original-vs-LeRobot output parity test

Verifies the LeRobot GR00T N1.7 integration produces equivalent raw
action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed,
precision (fp32) and attention kernel (SDPA): max|diff|=8.9e-7 on the
libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10).

The two impls pin incompatible transformers majors (orig 4.57.3 vs
LeRobot 5.x) and cannot share a process, so the original outputs + exact
collated inputs are produced out-of-process and loaded from an .npz. The
test skips on CI / when the checkpoint or artifact are absent.

* test(groot): parametrize N1.7 parity across all checkpoint embodiments

Generalize the original-vs-LeRobot N1.7 output-parity test from a single
libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid,
real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built
generically from checkpoint metadata; the test discovers per-tag .npz artifacts
and runs one parametrized case each, loading the LeRobot model once via a fixture.

All 9 embodiments match the original to fp32 epsilon (max|diff| < 3e-6), confirming
the integration is correct across the model's full embodiment space and not overfit
to libero_sim.

* test(groot): self-contained parity test + in-repo producer + docs

- Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py
- Make the test self-contained: producer script (dump_original_n1_7.py) now lives
  next to the test; default artifact dir is repo-relative
  (tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The
  test only reads artifacts and skips if absent -- it never creates external dirs.
- Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer;
  never committed.
- Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note).
- Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md).
- Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity.

9/9 embodiments still pass (max|diff| < 3e-6, fp32 eps).

* docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring

* test(groot): move parity producer into utils/ package

Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into
a tests/policies/groot/utils/ package (with __init__.py) and update all path
references in the test docstring/skip-message and the policy README.

* test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5

The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5
checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On
load, model_version defaults to n1.7 while the base path infers n1.5, so the
version-consistency guard in GrootConfig.__post_init__ raised ValueError and both
test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no
longer a supported model_version.

Adopt the test for N1.7:
- MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via
  GrootPolicy.from_pretrained as a base N1.7 model).
- Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint
  embodiment_id.json), via a single EMBODIMENT_TAG constant.
- DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size.
- Docstrings/labels updated to 'GR00T N1.7'.

Both tests run and pass on CUDA; full tests/policies/groot/ suite is
73 passed / 0 failed / 0 skipped.

* docs(groot): document the N1.5 removal and the N1.7 parity test

- groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to
  keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced
  with `hf download`.
- policy_groot_README.md: N1.5 removal note, updated paper / model-card links,
  and the two-comparison (model parity + preprocessor parity) description of
  the original-vs-LeRobot test, including the raw-observation artifacts and
  recorded seed.

* fix(groot): N1.7 backbone loading and DiT parameter-count logging

- select_layer default tracks the N1.7-3B checkpoint value (16); real
  checkpoint loads still override it from config.json.
- get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and
  warns (instead of silently assuming) when an unrecognized backbone is loaded
  only on the strength of backbone_model_type='qwen'.
- 'revision' pins the GR00T checkpoint repo only and is no longer forwarded
  into the unrelated backbone repo load; pin the backbone via
  transformers_loading_kwargs instead.
- DiT / SelfAttentionTransformer parameter counts go through logging.debug
  instead of print().

* fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes

Covers the GR00T N1.7 source trio (configuration, processor, model wrapper).

Config:
- GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era
  values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning
  instead of silently.
- action_decode_transform gains an 'auto' sentinel so an explicit 'none'
  opt-out wins over the libero_sim default and survives save/load round-trips.
- action_delta_indices is cached on the inputs that determine it.
- Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/
  architectures/eagle backbone markers) are rejected with a single clear
  error pointing to lerobot==0.5.1.

Processor:
- GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync
  select_action (relative eef/non-eef decode in eval/record flows).
- Postprocessor falls back to dataset stats when a raw checkpoint lacks the
  configured embodiment tag; raw-state cache is per-instance, not
  process-global; caller overrides (device, rename_map) are honored on the
  raw-checkpoint branch.
- Camera/modality-key mismatches warn (including the zero-match fallback);
  deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor;
  removed N1.5 processor steps are stubbed to raise the removal guidance and
  the action-unpack step is re-registered as _v2.

Model:
- Flash-attention probe is diagnostic-only; forward raises on a missing loss;
  print() replaced with logging; N1.5 base-path mismatch includes the
  removal guidance.

* fix(groot): skip normalization overrides for training

* fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution

GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth).
GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images
on the single CPU main-loop thread, and that cost is timed inside dataloading_s
(preprocessor(batch) runs in the main process, not the dataloader workers), so
adding workers cannot hide it.

- Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead
  of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify
  on config.device (GPU) when available. Bit-identical on CPU when no resize is
  configured; with a resize only the PIL->torchvision bicubic backend differs
  (<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box
  without the saved device falls back to CPU.

- Default image_target_size/crop to the N1.7 backbone's training geometry
  (256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets
  is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new
  embodiment). Previously image_target_size=None disabled the resize, so
  full-resolution frames were patchified into ~4.7x more vision tokens than the
  model was trained on -- inflating dataloading_s (patchify) and update_s (VLM
  sequence) and skewing the input distribution. Checkpoints that pin their own
  sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS.

Net: preprocessing leaves the CPU critical path and the VLM sees the resolution
it was trained on -- faster training/inference and a correct train/serve
distribution. Affects inference too (shared preprocessor); existing checkpoints
still load (backward compatible) but must be retrained to gain the benefits.

* refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config)

Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and
style standards. No change to policy algorithm/numerics; only UX/CLI and packaging
changes. Tests are intentionally left untouched (out of scope) and need updating
for the removed `model_version` field.

Cleanup & consolidation:
- Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value
  coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the
  config and processor layers.
- Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver
  cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper.

Imports & execution guards:
- Hoist nested imports to module top; relative imports within the package, absolute
  for external modules. The version-gated Qwen3-VL classes import under the single
  `_transformers_available` guard (transformers is pinned >=5.4, which ships them).
- No import-time side effects: `_register_with_transformers()` now runs in
  `GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step
  stubs register lazily before pipeline deserialization (idempotent via the
  registry, no run-once globals).
- Gate optional deps at the point of use with `require_package(..., extra="groot")`.

Dependencies & docs:
- Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default
  to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`.
  Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`.
- Rewrite the `groot.mdx` install section: flash-attn is a purely optional,
  user-managed optimization that LeRobot neither installs nor requires.

Config & CLI:
- Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`;
  no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`,
  `use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning)
  and `tune_vlln` (previously hardwired to True).
- Convert the single-valued `model_version` and `n1_7_backbone_model` fields to
  internal constants.
- Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA
  checkpoints have no LeRobot `type` field and load only via `base_model_path`) and
  is genuinely user-tunable.
- Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a
  back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is
  rejected with the friendly N1.5 removal message instead of an opaque decode error.

* Optimize GR00T N1.7 image preprocessing

* Remove PIL fallback from GR00T preprocessing

* Fix GROOT relative action training stats

* Address GROOT relative action review feedback

* Fix GROOT N1.7 relative action stats

* Fix GROOT relative action training stats

* Fix GROOT relative action padding and RTC leftovers

* Reset rollout state after robot episode end

* Revert "Reset rollout state after robot episode end"

This reverts commit 1322f45aec.

* Move GROOT relative stats out of train script

* Guard GR00T relative action stepwise decode

* Match GR00T N1.7 OSS preprocessing and relative actions

* Apply LIBERO action decode override after loading

* Format GR00T OSS parity changes

* chore(policies): add guards, warnings and comments + recover tests n1.5 check

* fix(style): pre-commit

* fix(ci): guard dependecy checks

* chore(groot): move cv2 to the top as its in the default install tag

* chore(policies): add explicit dataset dependecy to gr00t implementation

* fix(test): add guard

* fix(groot): make N1.7 letterbox opt-in

* feat(groot): activate checkpoint-configured N1.7 raw-state dropout during training

Isaac-GR00T applies dual state regularization during fine-tuning: raw-state
zeroing driven by the processor sidecar's state_dropout_prob (0.2 for the
inspected N1.7 checkpoint) plus encoded-feature dropout. Baseline LeRobot kept
the processor in deterministic mode, so the raw-state dropout never activated
(RCA Tier-2 contributor to the LeRobot-trained SO-101 failures).

- GrootN17PackInputsStep: runtime-only 'training' flag + state_dropout_prob;
  whole-sample state zeroing gated on torch.is_grad_enabled() so eval and
  no_grad validation paths are unaffected
- sidecar loader reads state_dropout_prob from processor_config.json
- state_dropout_prob serializes with the step; the training flag intentionally
  does not (reloaded pipelines default to eval, re-enabled only when processors
  are rebuilt with dataset_meta)
- _set_groot_preprocessor_training toggles any dataclass step exposing a
  'training' field on serialized-pipeline reloads

Verification: tests/policies/groot/test_groot_state_dropout.py (4 passed) on
RTX PRO 6000 / CUDA 13.3.

* fix(groot): align N1.7 fine-tuning optimizer/scheduler/precision with Isaac-GR00T

Evidence from the LeRobot-vs-OSS checkpoint comparison: the LeRobot/HF 8k
checkpoint's DiT moved only ~19% as far from base as the OSS-trained one
(0.0547 vs 0.285 relative L2) - undertrained because the scheduler decayed over
a hardcoded 10k steps regardless of --steps, on top of beta1/clip mismatches.

- AdamW betas (0.95, 0.999) -> (0.9, 0.999) and grad_clip_norm 10.0 -> 1.0
  (Isaac defaults)
- scheduler: hardcoded CosineDecayWithWarmup(10k decay, floor 10% peak) ->
  DiffuserSchedulerConfig HF cosine with ceil(max_steps * warmup_ratio) warmup,
  deriving num_training_steps from the outer --steps at runtime
- model_params_fp32 (default true): keep master weights in FP32 and compute
  under BF16 autocast like the native N1.7 recipe (fixes optimizer-update
  numerics vs pure-BF16 params)
- weight-decay grouping via transformers get_parameter_names: biases and norm
  parameters excluded from decay
- restore the TF4 lm_head/embedding weight tie so the unused Qwen LM head stays
  frozen and deduplicated in checkpoints
- action_mask kept in native dtype for the masked flow-matching loss
- drop_n_last_frames: exclude episode tails that cannot supply a complete
  action chunk (Isaac sampler behavior)

Verification: tests/policies/groot/test_groot_training_optim_contract.py
(7 passed) + remaining groot suite 11 passed/5 skipped on RTX PRO 6000 /
CUDA 13.3. Note: tests/policies/groot/test_groot_n1_7.py does not collect on
the base branch (pre-existing ImportError, fixed in PR #37).

* feat(groot): train-time random crop for N1.7 (eval keeps center crop)

Isaac-GR00T crops a random crop_fraction window during training and the
deterministic center window at eval, replaying the sampled window across all
camera views of a sample. This contract is unchanged since the N1.5 release
(gr00t/data/transform/video.py: "If mode is 'train', return a random crop
transform. If mode is 'eval', return a center crop transform.") and mirrors
LeRobot's own Diffusion/VQBeT crop_is_random pattern. The LeRobot N1.7 port
used the eval center crop for training too, so the fine-tuned projector/DiT
never sees frame borders and trains on a single fixed appearance point.

Scope: crop geometry ONLY - no color jitter, no new dependencies. The random
window is plain numpy slicing inside the existing cv2 eval transform:

- _transform_n1_7_image_for_vlm_albumentations gains crop_position=(y, x)
  fractions; None keeps the center crop byte-identical to before (verified
  by test)
- GrootN17VLMEncodeStep gains a runtime-only 'training' flag (never
  serialized; reloaded pipelines default to eval); training samples ONE
  window per sample and reuses it across (timestep, view) frames - Isaac's
  cross-view consistency
- gated on torch.is_grad_enabled() so no_grad validation and frozen-eval
  paths are unaffected
- wired via dataset_meta is not None in make_groot_pre_post_processors and
  the existing _set_groot_preprocessor_training on serialized reloads

Verification: tests/policies/groot/test_groot_train_random_crop.py (8 passed:
center-crop bit-exactness with crop_position=None, corner/center windows,
cross-view replay, train!=eval, no_grad gating, seed reproducibility,
serialization contract) + groot suite 23 passed / 5 skipped on RTX PRO 6000 /
CUDA 13.3.

* docs(groot): update Training & hardware Evaluation commands

Replace the multi-GPU accelerate-launch Training snippet with the current
single-command 'uv run lerobot-train' N1.7 recipe (relative actions excluding
gripper, bf16, flash attention, chunk/n_action_steps=16, bs64/20k steps).

Replace the bimanual 'Evaluate in your hardware setup' rollout example with the
SO-101 follower RTC 'uv run lerobot-rollout' command (strategy.type=base,
inference.type=rtc, wrist+front cameras, place-the-vial task).

Docs-only; no source/test changes.

* docs(groot): parameterize commands with env vars + fill LIBERO results

- Introduce BASE_MODEL / DATASET_ID / REPO_ID / JOB_NAME / OUTPUT_DIR env vars
  in the training command and reuse OUTPUT_DIR + BASE_MODEL in the rollout cmd.
- Fill the LIBERO benchmark table with GR00T-LeRobot success rates
  (Spatial 94%, Object 98%, Goal 93%, LIBERO 10/Long 90%; avg 93.75%),
  drop the OSS column and XX placeholders. LeRobot-focused.

* docs(groot): drop export block, reference env vars directly

Use $DATASET_ID / $BASE_MODEL / $REPO_ID / $OUTPUT_DIR / $JOB_NAME as
bare placeholders in the commands without concrete export assignments.

* docs(groot): keep BASE_MODEL export in training command

* docs(groot): use literal HF repo IDs for dataset/policy repo_id

Public-facing Hub references (--dataset.repo_id, --policy.repo_id) shown as
concrete IDs; local-only values ($OUTPUT_DIR, $JOB_NAME) stay as placeholders.

* docs(groot): add LIBERO training command example

* docs(groot): remove LIBERO checkpoints subdirectory section

* docs(groot): use $BASE_MODEL for base_model_path in LIBERO eval

* docs(groot): drop hf download step from LIBERO eval, fix intro

* docs(groot): restore suite checkpoint download intro sentence

* docs(groot): remove checkpoint download note above LIBERO eval

* docs(groot): update training and rollout commands with new parameters and dependencies

* Add sample so101 training command

* Remove sample so101 training command

* docs(groot): remove optional Flash Attention setup instructions and update base model path for evaluation

* docs(groot): update training command with  image transformation parameters

* docs(groot): add note on inference.queue_threshold value for stable inference

* chore(style): pre-commit gr00t

* docs(groot): update

* chore(policies): minor details

* fix(groot): license headers + test guards

* chore(policies): fix tests

* docs(groot): relative actions param doc

* chore(policy): address some of the AI review items

---------

Co-authored-by: Andrew Wrenn <awrenn@nvidia.com>
Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com>
Co-authored-by: groot-validation <groot-validation@localhost>
Co-authored-by: johnnynunez <johnnynuca14@gmail.com>
Co-authored-by: lbenhorin <lbenhorin@nvidia.com>

2026-07-03 21:15:09 +02:00

Steven Palma

be46bdea8f

feat(policies): add Nvidia Gr00t N1.5 model (#2292 )

* feat(policies): add Nvidia Gr00t N1.5 model

Co-authored-by: lbenhorin <lbenhorin@nvidia.com>
Co-authored-by: Aravindh <aravindhs@nvidia.com>
Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com>
Co-authored-by: youliangt <youliangt@nvidia.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>

* fix(docs): add groot to index

Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com>

---------

Co-authored-by: lbenhorin <lbenhorin@nvidia.com>
Co-authored-by: Aravindh <aravindhs@nvidia.com>
Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com>
Co-authored-by: youliangt <youliangt@nvidia.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com>

2025-10-23 13:50:30 +02:00

2 Commits