mirror of
https://github.com/huggingface/lerobot.git
synced 2026-07-05 09:07:03 +00:00
708fa1d189
* Add GR00T N1.7 support
Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests.
Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
* Move Groot processor compatibility into Groot loader
* Restore GR00T Flash Attention install guidance
* Allow Groot fake RTC chunk prefetch
* Fix GR00T N1.7 RTC action decoding
* Trim GR00T N1.7 RTC chunks to valid horizon
* Ignore padded GR00T N1.7 RTC prefix rows
* removed n1.5 dependency
* removed remaining N1.5 traces
* groot: auto-enable LIBERO gripper action transform for libero_sim
GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode
transform existed but was never auto-enabled for embodiment_tag=libero_sim,
so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still
overridable). LIBERO Spatial eval: 0% -> 98%.
* Reconnect GR00T relative action processors
* groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder)
N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration,
not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir.
GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder +
CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so
flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole
dependency action_encoder.py are dead. Verified: no src/ or tests/ reference.
Removed (~2037 LOC):
- eagle2_hg_model/ (4 files, ~1575 LOC)
- action_head/flow_matching_action_head.py (408 LOC)
- action_head/action_encoder.py (54 LOC)
cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7).
* groot: reuse lerobot get_device_from_parameters instead of inline lookup
modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot
ships get_device_from_parameters in policies/utils.py (used by diffusion,
vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework.
* groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone)
Addresses checker nit: processor_groot.py docstring still described the N1.5
Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code.
* test(groot): add N1.7 original-vs-LeRobot output parity test
Verifies the LeRobot GR00T N1.7 integration produces equivalent raw
action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed,
precision (fp32) and attention kernel (SDPA): max|diff|=8.9e-7 on the
libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10).
The two impls pin incompatible transformers majors (orig 4.57.3 vs
LeRobot 5.x) and cannot share a process, so the original outputs + exact
collated inputs are produced out-of-process and loaded from an .npz. The
test skips on CI / when the checkpoint or artifact are absent.
* test(groot): parametrize N1.7 parity across all checkpoint embodiments
Generalize the original-vs-LeRobot N1.7 output-parity test from a single
libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid,
real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built
generically from checkpoint metadata; the test discovers per-tag .npz artifacts
and runs one parametrized case each, loading the LeRobot model once via a fixture.
All 9 embodiments match the original to fp32 epsilon (max|diff| < 3e-6), confirming
the integration is correct across the model's full embodiment space and not overfit
to libero_sim.
* test(groot): self-contained parity test + in-repo producer + docs
- Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py
- Make the test self-contained: producer script (dump_original_n1_7.py) now lives
next to the test; default artifact dir is repo-relative
(tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The
test only reads artifacts and skips if absent -- it never creates external dirs.
- Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer;
never committed.
- Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note).
- Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md).
- Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity.
9/9 embodiments still pass (max|diff| < 3e-6, fp32 eps).
* docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring
* test(groot): move parity producer into utils/ package
Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into
a tests/policies/groot/utils/ package (with __init__.py) and update all path
references in the test docstring/skip-message and the policy README.
* test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5
The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5
checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On
load, model_version defaults to n1.7 while the base path infers n1.5, so the
version-consistency guard in GrootConfig.__post_init__ raised ValueError and both
test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no
longer a supported model_version.
Adopt the test for N1.7:
- MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via
GrootPolicy.from_pretrained as a base N1.7 model).
- Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint
embodiment_id.json), via a single EMBODIMENT_TAG constant.
- DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size.
- Docstrings/labels updated to 'GR00T N1.7'.
Both tests run and pass on CUDA; full tests/policies/groot/ suite is
73 passed / 0 failed / 0 skipped.
* docs(groot): document the N1.5 removal and the N1.7 parity test
- groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to
keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced
with `hf download`.
- policy_groot_README.md: N1.5 removal note, updated paper / model-card links,
and the two-comparison (model parity + preprocessor parity) description of
the original-vs-LeRobot test, including the raw-observation artifacts and
recorded seed.
* fix(groot): N1.7 backbone loading and DiT parameter-count logging
- select_layer default tracks the N1.7-3B checkpoint value (16); real
checkpoint loads still override it from config.json.
- get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and
warns (instead of silently assuming) when an unrecognized backbone is loaded
only on the strength of backbone_model_type='qwen'.
- 'revision' pins the GR00T checkpoint repo only and is no longer forwarded
into the unrelated backbone repo load; pin the backbone via
transformers_loading_kwargs instead.
- DiT / SelfAttentionTransformer parameter counts go through logging.debug
instead of print().
* fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes
Covers the GR00T N1.7 source trio (configuration, processor, model wrapper).
Config:
- GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era
values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning
instead of silently.
- action_decode_transform gains an 'auto' sentinel so an explicit 'none'
opt-out wins over the libero_sim default and survives save/load round-trips.
- action_delta_indices is cached on the inputs that determine it.
- Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/
architectures/eagle backbone markers) are rejected with a single clear
error pointing to lerobot==0.5.1.
Processor:
- GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync
select_action (relative eef/non-eef decode in eval/record flows).
- Postprocessor falls back to dataset stats when a raw checkpoint lacks the
configured embodiment tag; raw-state cache is per-instance, not
process-global; caller overrides (device, rename_map) are honored on the
raw-checkpoint branch.
- Camera/modality-key mismatches warn (including the zero-match fallback);
deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor;
removed N1.5 processor steps are stubbed to raise the removal guidance and
the action-unpack step is re-registered as _v2.
Model:
- Flash-attention probe is diagnostic-only; forward raises on a missing loss;
print() replaced with logging; N1.5 base-path mismatch includes the
removal guidance.
* fix(groot): skip normalization overrides for training
* fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution
GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth).
GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images
on the single CPU main-loop thread, and that cost is timed inside dataloading_s
(preprocessor(batch) runs in the main process, not the dataloader workers), so
adding workers cannot hide it.
- Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead
of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify
on config.device (GPU) when available. Bit-identical on CPU when no resize is
configured; with a resize only the PIL->torchvision bicubic backend differs
(<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box
without the saved device falls back to CPU.
- Default image_target_size/crop to the N1.7 backbone's training geometry
(256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets
is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new
embodiment). Previously image_target_size=None disabled the resize, so
full-resolution frames were patchified into ~4.7x more vision tokens than the
model was trained on -- inflating dataloading_s (patchify) and update_s (VLM
sequence) and skewing the input distribution. Checkpoints that pin their own
sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS.
Net: preprocessing leaves the CPU critical path and the VLM sees the resolution
it was trained on -- faster training/inference and a correct train/serve
distribution. Affects inference too (shared preprocessor); existing checkpoints
still load (backward compatible) but must be retrained to gain the benefits.
* refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config)
Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and
style standards. No change to policy algorithm/numerics; only UX/CLI and packaging
changes. Tests are intentionally left untouched (out of scope) and need updating
for the removed `model_version` field.
Cleanup & consolidation:
- Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value
coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the
config and processor layers.
- Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver
cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper.
Imports & execution guards:
- Hoist nested imports to module top; relative imports within the package, absolute
for external modules. The version-gated Qwen3-VL classes import under the single
`_transformers_available` guard (transformers is pinned >=5.4, which ships them).
- No import-time side effects: `_register_with_transformers()` now runs in
`GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step
stubs register lazily before pipeline deserialization (idempotent via the
registry, no run-once globals).
- Gate optional deps at the point of use with `require_package(..., extra="groot")`.
Dependencies & docs:
- Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default
to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`.
Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`.
- Rewrite the `groot.mdx` install section: flash-attn is a purely optional,
user-managed optimization that LeRobot neither installs nor requires.
Config & CLI:
- Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`;
no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`,
`use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning)
and `tune_vlln` (previously hardwired to True).
- Convert the single-valued `model_version` and `n1_7_backbone_model` fields to
internal constants.
- Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA
checkpoints have no LeRobot `type` field and load only via `base_model_path`) and
is genuinely user-tunable.
- Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a
back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is
rejected with the friendly N1.5 removal message instead of an opaque decode error.
* Optimize GR00T N1.7 image preprocessing
* Remove PIL fallback from GR00T preprocessing
* Fix GROOT relative action training stats
* Address GROOT relative action review feedback
* Fix GROOT N1.7 relative action stats
* Fix GROOT relative action training stats
* Fix GROOT relative action padding and RTC leftovers
* Reset rollout state after robot episode end
* Revert "Reset rollout state after robot episode end"
This reverts commit 1322f45aec.
* Move GROOT relative stats out of train script
* Guard GR00T relative action stepwise decode
* Match GR00T N1.7 OSS preprocessing and relative actions
* Apply LIBERO action decode override after loading
* Format GR00T OSS parity changes
* chore(policies): add guards, warnings and comments + recover tests n1.5 check
* fix(style): pre-commit
* fix(ci): guard dependecy checks
* chore(groot): move cv2 to the top as its in the default install tag
* chore(policies): add explicit dataset dependecy to gr00t implementation
* fix(test): add guard
* fix(groot): make N1.7 letterbox opt-in
* feat(groot): activate checkpoint-configured N1.7 raw-state dropout during training
Isaac-GR00T applies dual state regularization during fine-tuning: raw-state
zeroing driven by the processor sidecar's state_dropout_prob (0.2 for the
inspected N1.7 checkpoint) plus encoded-feature dropout. Baseline LeRobot kept
the processor in deterministic mode, so the raw-state dropout never activated
(RCA Tier-2 contributor to the LeRobot-trained SO-101 failures).
- GrootN17PackInputsStep: runtime-only 'training' flag + state_dropout_prob;
whole-sample state zeroing gated on torch.is_grad_enabled() so eval and
no_grad validation paths are unaffected
- sidecar loader reads state_dropout_prob from processor_config.json
- state_dropout_prob serializes with the step; the training flag intentionally
does not (reloaded pipelines default to eval, re-enabled only when processors
are rebuilt with dataset_meta)
- _set_groot_preprocessor_training toggles any dataclass step exposing a
'training' field on serialized-pipeline reloads
Verification: tests/policies/groot/test_groot_state_dropout.py (4 passed) on
RTX PRO 6000 / CUDA 13.3.
* fix(groot): align N1.7 fine-tuning optimizer/scheduler/precision with Isaac-GR00T
Evidence from the LeRobot-vs-OSS checkpoint comparison: the LeRobot/HF 8k
checkpoint's DiT moved only ~19% as far from base as the OSS-trained one
(0.0547 vs 0.285 relative L2) - undertrained because the scheduler decayed over
a hardcoded 10k steps regardless of --steps, on top of beta1/clip mismatches.
- AdamW betas (0.95, 0.999) -> (0.9, 0.999) and grad_clip_norm 10.0 -> 1.0
(Isaac defaults)
- scheduler: hardcoded CosineDecayWithWarmup(10k decay, floor 10% peak) ->
DiffuserSchedulerConfig HF cosine with ceil(max_steps * warmup_ratio) warmup,
deriving num_training_steps from the outer --steps at runtime
- model_params_fp32 (default true): keep master weights in FP32 and compute
under BF16 autocast like the native N1.7 recipe (fixes optimizer-update
numerics vs pure-BF16 params)
- weight-decay grouping via transformers get_parameter_names: biases and norm
parameters excluded from decay
- restore the TF4 lm_head/embedding weight tie so the unused Qwen LM head stays
frozen and deduplicated in checkpoints
- action_mask kept in native dtype for the masked flow-matching loss
- drop_n_last_frames: exclude episode tails that cannot supply a complete
action chunk (Isaac sampler behavior)
Verification: tests/policies/groot/test_groot_training_optim_contract.py
(7 passed) + remaining groot suite 11 passed/5 skipped on RTX PRO 6000 /
CUDA 13.3. Note: tests/policies/groot/test_groot_n1_7.py does not collect on
the base branch (pre-existing ImportError, fixed in PR #37).
* feat(groot): train-time random crop for N1.7 (eval keeps center crop)
Isaac-GR00T crops a random crop_fraction window during training and the
deterministic center window at eval, replaying the sampled window across all
camera views of a sample. This contract is unchanged since the N1.5 release
(gr00t/data/transform/video.py: "If mode is 'train', return a random crop
transform. If mode is 'eval', return a center crop transform.") and mirrors
LeRobot's own Diffusion/VQBeT crop_is_random pattern. The LeRobot N1.7 port
used the eval center crop for training too, so the fine-tuned projector/DiT
never sees frame borders and trains on a single fixed appearance point.
Scope: crop geometry ONLY - no color jitter, no new dependencies. The random
window is plain numpy slicing inside the existing cv2 eval transform:
- _transform_n1_7_image_for_vlm_albumentations gains crop_position=(y, x)
fractions; None keeps the center crop byte-identical to before (verified
by test)
- GrootN17VLMEncodeStep gains a runtime-only 'training' flag (never
serialized; reloaded pipelines default to eval); training samples ONE
window per sample and reuses it across (timestep, view) frames - Isaac's
cross-view consistency
- gated on torch.is_grad_enabled() so no_grad validation and frozen-eval
paths are unaffected
- wired via dataset_meta is not None in make_groot_pre_post_processors and
the existing _set_groot_preprocessor_training on serialized reloads
Verification: tests/policies/groot/test_groot_train_random_crop.py (8 passed:
center-crop bit-exactness with crop_position=None, corner/center windows,
cross-view replay, train!=eval, no_grad gating, seed reproducibility,
serialization contract) + groot suite 23 passed / 5 skipped on RTX PRO 6000 /
CUDA 13.3.
* docs(groot): update Training & hardware Evaluation commands
Replace the multi-GPU accelerate-launch Training snippet with the current
single-command 'uv run lerobot-train' N1.7 recipe (relative actions excluding
gripper, bf16, flash attention, chunk/n_action_steps=16, bs64/20k steps).
Replace the bimanual 'Evaluate in your hardware setup' rollout example with the
SO-101 follower RTC 'uv run lerobot-rollout' command (strategy.type=base,
inference.type=rtc, wrist+front cameras, place-the-vial task).
Docs-only; no source/test changes.
* docs(groot): parameterize commands with env vars + fill LIBERO results
- Introduce BASE_MODEL / DATASET_ID / REPO_ID / JOB_NAME / OUTPUT_DIR env vars
in the training command and reuse OUTPUT_DIR + BASE_MODEL in the rollout cmd.
- Fill the LIBERO benchmark table with GR00T-LeRobot success rates
(Spatial 94%, Object 98%, Goal 93%, LIBERO 10/Long 90%; avg 93.75%),
drop the OSS column and XX placeholders. LeRobot-focused.
* docs(groot): drop export block, reference env vars directly
Use $DATASET_ID / $BASE_MODEL / $REPO_ID / $OUTPUT_DIR / $JOB_NAME as
bare placeholders in the commands without concrete export assignments.
* docs(groot): keep BASE_MODEL export in training command
* docs(groot): use literal HF repo IDs for dataset/policy repo_id
Public-facing Hub references (--dataset.repo_id, --policy.repo_id) shown as
concrete IDs; local-only values ($OUTPUT_DIR, $JOB_NAME) stay as placeholders.
* docs(groot): add LIBERO training command example
* docs(groot): remove LIBERO checkpoints subdirectory section
* docs(groot): use $BASE_MODEL for base_model_path in LIBERO eval
* docs(groot): drop hf download step from LIBERO eval, fix intro
* docs(groot): restore suite checkpoint download intro sentence
* docs(groot): remove checkpoint download note above LIBERO eval
* docs(groot): update training and rollout commands with new parameters and dependencies
* Add sample so101 training command
* Remove sample so101 training command
* docs(groot): remove optional Flash Attention setup instructions and update base model path for evaluation
* docs(groot): update training command with image transformation parameters
* docs(groot): add note on inference.queue_threshold value for stable inference
* chore(style): pre-commit gr00t
* docs(groot): update
* chore(policies): minor details
* fix(groot): license headers + test guards
* chore(policies): fix tests
* docs(groot): relative actions param doc
* chore(policy): address some of the AI review items
---------
Co-authored-by: Andrew Wrenn <awrenn@nvidia.com>
Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com>
Co-authored-by: groot-validation <groot-validation@localhost>
Co-authored-by: johnnynunez <johnnynuca14@gmail.com>
Co-authored-by: lbenhorin <lbenhorin@nvidia.com>
139 lines
7.1 KiB
Markdown
139 lines
7.1 KiB
Markdown
## Research Paper
|
||
|
||
GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734
|
||
|
||
GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B
|
||
|
||
GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/
|
||
|
||
> GR00T N1.5 support was removed from LeRobot; the last release supporting it is `lerobot==0.5.1`.
|
||
> Current releases support GR00T N1.7 only.
|
||
|
||
## Repository
|
||
|
||
Code: https://github.com/NVIDIA/Isaac-GR00T
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@inproceedings{gr00tn1_2025,
|
||
archivePrefix = {arxiv},
|
||
eprint = {2503.14734},
|
||
title = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
|
||
author = {NVIDIA and Johan Bjorck andFernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
|
||
month = {March},
|
||
year = {2025},
|
||
booktitle = {ArXiv Preprint},
|
||
}
|
||
```
|
||
|
||
## Additional Resources
|
||
|
||
Blog: https://developer.nvidia.com/isaac/gr00t
|
||
|
||
Hugging Face Models:
|
||
|
||
- GR00T N1.7: https://huggingface.co/nvidia/GR00T-N1.7-3B
|
||
- GR00T N1.7 LIBERO checkpoints: https://huggingface.co/nvidia/GR00T-N1.7-LIBERO
|
||
|
||
<details>
|
||
<summary><b>Original-vs-LeRobot parity test</b></summary>
|
||
|
||
## Original-vs-LeRobot parity test
|
||
|
||
`tests/policies/groot/test_groot_vs_original.py` verifies this LeRobot
|
||
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
|
||
against NVIDIA's original `gr00t` package with two comparisons, each parametrized
|
||
over every embodiment tag present in the checkpoint:
|
||
|
||
1. **Model parity** — given byte-identical pre-processed inputs and the same
|
||
flow-matching seed (recorded in each artifact), both implementations must produce
|
||
the **same raw model output** (`get_action(...)["action_pred"]`, the normalized
|
||
flow-matching prediction). Output shapes must match exactly; any action-horizon
|
||
or action-dim mismatch fails the test.
|
||
2. **Preprocessor parity** — given the identical raw observations (per-camera
|
||
frames, state vectors, language instruction), LeRobot's own preprocessor pipeline
|
||
(real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven
|
||
state normalization, no mocks) must produce the **same collated model inputs**
|
||
(`input_ids`, `attention_mask`, `pixel_values`, `image_grid_thw`, `state`,
|
||
`embodiment_id`) as the original package's processor.
|
||
|
||
### Why two environments
|
||
|
||
The original `gr00t` package pins `transformers==4.57.3` (Python 3.10); this
|
||
integration requires `transformers>=5.x` (Qwen3-VL). Under 5.x, `PretrainedConfig`
|
||
is itself a defaulted dataclass, so the original config dataclasses fail to import
|
||
(`non-default argument follows default argument`). The two implementations therefore
|
||
**cannot be imported in the same Python process**.
|
||
|
||
So the test uses a **producer / consumer** split across two venvs:
|
||
|
||
1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the _original_
|
||
gr00t venv. For each embodiment it builds dummy inputs generically from the
|
||
checkpoint metadata (state dims from `statistics.json`; camera/language keys from
|
||
the processor modality configs), runs the original model, and saves to one `.npz`
|
||
per tag: the raw observations (`raw::` keys), the exact collated inputs
|
||
(`in::` keys), the seed, and the raw `action_pred`.
|
||
2. **Consumer** — the pytest above, run in the _LeRobot_ venv. It discovers every
|
||
`.npz`; the model-parity case replays the byte-identical collated inputs through
|
||
the LeRobot model with the recorded seed and asserts the outputs match, and the
|
||
preprocessor-parity case replays the raw observations through LeRobot's full
|
||
preprocessor pipeline and asserts the collated tensors match.
|
||
|
||
> Artifacts generated by older versions of the dump script contain no `raw::`
|
||
> fields; the preprocessor-parity case then **skips** with a regeneration hint.
|
||
> Re-run the producer to refresh them.
|
||
|
||
### Fairness controls
|
||
|
||
- **Same pre-processed inputs (model parity)** — the original processor's `input_ids`,
|
||
`pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
|
||
fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the
|
||
model comparison isolates the model. LeRobot's own tokenization / image packing is
|
||
covered separately by the preprocessor-parity case, which compares its output
|
||
against those same collated tensors from identical raw observations.
|
||
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
|
||
original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
|
||
producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
|
||
kernel/rounding noise, not an implementation difference.)
|
||
- **Same flow-matching seed** — fixed right before sampling on both sides; the
|
||
producer records it in each artifact (`--seed`, default 42) and the consumer
|
||
replays the recorded value.
|
||
|
||
### How to run
|
||
|
||
```bash
|
||
# Resolve a local checkpoint (GR00T-N1.7-LIBERO / libero_10)
|
||
CKPT=$(python - <<'PY'
|
||
import os
|
||
from huggingface_hub import snapshot_download
|
||
print(os.path.join(snapshot_download("nvidia/GR00T-N1.7-LIBERO",
|
||
allow_patterns=["libero_10/*"]), "libero_10"))
|
||
PY
|
||
)
|
||
|
||
# 1) Produce the original-side artifacts for all embodiments (original gr00t venv, CUDA)
|
||
CUDA_VISIBLE_DEVICES=0 /path/to/Isaac-GR00T/.venv-original/bin/python \
|
||
tests/policies/groot/utils/dump_original_n1_7.py \
|
||
--ckpt "$CKPT" --out-dir tests/policies/groot/artifacts --device cuda --seed 42
|
||
|
||
# 2) Run the parity test (LeRobot venv) — one parametrized case per embodiment
|
||
CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
|
||
uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
|
||
```
|
||
|
||
The `.npz` artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by
|
||
the producer; they are never committed. The tests **skip** (do not fail) on CI or
|
||
when the checkpoint / artifacts are absent.
|
||
|
||
#### Env knobs (all optional)
|
||
|
||
| Var | Default | Purpose |
|
||
| ----------------------------------------- | -------------------------------- | ------------------------------------- |
|
||
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
|
||
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
|
||
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
|
||
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |
|
||
|
||
</details>
|