mirror of
https://github.com/huggingface/lerobot.git
synced 2026-07-05 17:17:01 +00:00
feat/dynamixel-protocol-1
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
708fa1d189 |
feat(policies): add Gr00t N1.7 policy (#3922)
* Add GR00T N1.7 support
Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests.
Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
* Move Groot processor compatibility into Groot loader
* Restore GR00T Flash Attention install guidance
* Allow Groot fake RTC chunk prefetch
* Fix GR00T N1.7 RTC action decoding
* Trim GR00T N1.7 RTC chunks to valid horizon
* Ignore padded GR00T N1.7 RTC prefix rows
* removed n1.5 dependency
* removed remaining N1.5 traces
* groot: auto-enable LIBERO gripper action transform for libero_sim
GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode
transform existed but was never auto-enabled for embodiment_tag=libero_sim,
so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still
overridable). LIBERO Spatial eval: 0% -> 98%.
* Reconnect GR00T relative action processors
* groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder)
N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration,
not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir.
GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder +
CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so
flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole
dependency action_encoder.py are dead. Verified: no src/ or tests/ reference.
Removed (~2037 LOC):
- eagle2_hg_model/ (4 files, ~1575 LOC)
- action_head/flow_matching_action_head.py (408 LOC)
- action_head/action_encoder.py (54 LOC)
cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7).
* groot: reuse lerobot get_device_from_parameters instead of inline lookup
modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot
ships get_device_from_parameters in policies/utils.py (used by diffusion,
vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework.
* groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone)
Addresses checker nit: processor_groot.py docstring still described the N1.5
Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code.
* test(groot): add N1.7 original-vs-LeRobot output parity test
Verifies the LeRobot GR00T N1.7 integration produces equivalent raw
action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed,
precision (fp32) and attention kernel (SDPA): max|diff|=8.9e-7 on the
libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10).
The two impls pin incompatible transformers majors (orig 4.57.3 vs
LeRobot 5.x) and cannot share a process, so the original outputs + exact
collated inputs are produced out-of-process and loaded from an .npz. The
test skips on CI / when the checkpoint or artifact are absent.
* test(groot): parametrize N1.7 parity across all checkpoint embodiments
Generalize the original-vs-LeRobot N1.7 output-parity test from a single
libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid,
real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built
generically from checkpoint metadata; the test discovers per-tag .npz artifacts
and runs one parametrized case each, loading the LeRobot model once via a fixture.
All 9 embodiments match the original to fp32 epsilon (max|diff| < 3e-6), confirming
the integration is correct across the model's full embodiment space and not overfit
to libero_sim.
* test(groot): self-contained parity test + in-repo producer + docs
- Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py
- Make the test self-contained: producer script (dump_original_n1_7.py) now lives
next to the test; default artifact dir is repo-relative
(tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The
test only reads artifacts and skips if absent -- it never creates external dirs.
- Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer;
never committed.
- Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note).
- Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md).
- Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity.
9/9 embodiments still pass (max|diff| < 3e-6, fp32 eps).
* docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring
* test(groot): move parity producer into utils/ package
Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into
a tests/policies/groot/utils/ package (with __init__.py) and update all path
references in the test docstring/skip-message and the policy README.
* test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5
The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5
checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On
load, model_version defaults to n1.7 while the base path infers n1.5, so the
version-consistency guard in GrootConfig.__post_init__ raised ValueError and both
test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no
longer a supported model_version.
Adopt the test for N1.7:
- MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via
GrootPolicy.from_pretrained as a base N1.7 model).
- Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint
embodiment_id.json), via a single EMBODIMENT_TAG constant.
- DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size.
- Docstrings/labels updated to 'GR00T N1.7'.
Both tests run and pass on CUDA; full tests/policies/groot/ suite is
73 passed / 0 failed / 0 skipped.
* docs(groot): document the N1.5 removal and the N1.7 parity test
- groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to
keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced
with `hf download`.
- policy_groot_README.md: N1.5 removal note, updated paper / model-card links,
and the two-comparison (model parity + preprocessor parity) description of
the original-vs-LeRobot test, including the raw-observation artifacts and
recorded seed.
* fix(groot): N1.7 backbone loading and DiT parameter-count logging
- select_layer default tracks the N1.7-3B checkpoint value (16); real
checkpoint loads still override it from config.json.
- get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and
warns (instead of silently assuming) when an unrecognized backbone is loaded
only on the strength of backbone_model_type='qwen'.
- 'revision' pins the GR00T checkpoint repo only and is no longer forwarded
into the unrelated backbone repo load; pin the backbone via
transformers_loading_kwargs instead.
- DiT / SelfAttentionTransformer parameter counts go through logging.debug
instead of print().
* fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes
Covers the GR00T N1.7 source trio (configuration, processor, model wrapper).
Config:
- GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era
values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning
instead of silently.
- action_decode_transform gains an 'auto' sentinel so an explicit 'none'
opt-out wins over the libero_sim default and survives save/load round-trips.
- action_delta_indices is cached on the inputs that determine it.
- Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/
architectures/eagle backbone markers) are rejected with a single clear
error pointing to lerobot==0.5.1.
Processor:
- GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync
select_action (relative eef/non-eef decode in eval/record flows).
- Postprocessor falls back to dataset stats when a raw checkpoint lacks the
configured embodiment tag; raw-state cache is per-instance, not
process-global; caller overrides (device, rename_map) are honored on the
raw-checkpoint branch.
- Camera/modality-key mismatches warn (including the zero-match fallback);
deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor;
removed N1.5 processor steps are stubbed to raise the removal guidance and
the action-unpack step is re-registered as _v2.
Model:
- Flash-attention probe is diagnostic-only; forward raises on a missing loss;
print() replaced with logging; N1.5 base-path mismatch includes the
removal guidance.
* fix(groot): skip normalization overrides for training
* fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution
GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth).
GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images
on the single CPU main-loop thread, and that cost is timed inside dataloading_s
(preprocessor(batch) runs in the main process, not the dataloader workers), so
adding workers cannot hide it.
- Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead
of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify
on config.device (GPU) when available. Bit-identical on CPU when no resize is
configured; with a resize only the PIL->torchvision bicubic backend differs
(<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box
without the saved device falls back to CPU.
- Default image_target_size/crop to the N1.7 backbone's training geometry
(256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets
is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new
embodiment). Previously image_target_size=None disabled the resize, so
full-resolution frames were patchified into ~4.7x more vision tokens than the
model was trained on -- inflating dataloading_s (patchify) and update_s (VLM
sequence) and skewing the input distribution. Checkpoints that pin their own
sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS.
Net: preprocessing leaves the CPU critical path and the VLM sees the resolution
it was trained on -- faster training/inference and a correct train/serve
distribution. Affects inference too (shared preprocessor); existing checkpoints
still load (backward compatible) but must be retrained to gain the benefits.
* refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config)
Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and
style standards. No change to policy algorithm/numerics; only UX/CLI and packaging
changes. Tests are intentionally left untouched (out of scope) and need updating
for the removed `model_version` field.
Cleanup & consolidation:
- Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value
coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the
config and processor layers.
- Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver
cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper.
Imports & execution guards:
- Hoist nested imports to module top; relative imports within the package, absolute
for external modules. The version-gated Qwen3-VL classes import under the single
`_transformers_available` guard (transformers is pinned >=5.4, which ships them).
- No import-time side effects: `_register_with_transformers()` now runs in
`GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step
stubs register lazily before pipeline deserialization (idempotent via the
registry, no run-once globals).
- Gate optional deps at the point of use with `require_package(..., extra="groot")`.
Dependencies & docs:
- Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default
to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`.
Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`.
- Rewrite the `groot.mdx` install section: flash-attn is a purely optional,
user-managed optimization that LeRobot neither installs nor requires.
Config & CLI:
- Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`;
no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`,
`use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning)
and `tune_vlln` (previously hardwired to True).
- Convert the single-valued `model_version` and `n1_7_backbone_model` fields to
internal constants.
- Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA
checkpoints have no LeRobot `type` field and load only via `base_model_path`) and
is genuinely user-tunable.
- Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a
back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is
rejected with the friendly N1.5 removal message instead of an opaque decode error.
* Optimize GR00T N1.7 image preprocessing
* Remove PIL fallback from GR00T preprocessing
* Fix GROOT relative action training stats
* Address GROOT relative action review feedback
* Fix GROOT N1.7 relative action stats
* Fix GROOT relative action training stats
* Fix GROOT relative action padding and RTC leftovers
* Reset rollout state after robot episode end
* Revert "Reset rollout state after robot episode end"
This reverts commit
|
||
|
|
be46bdea8f |
feat(policies): add Nvidia Gr00t N1.5 model (#2292)
* feat(policies): add Nvidia Gr00t N1.5 model Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> * fix(docs): add groot to index Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com> --------- Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com> |