Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into a tests/policies/groot/utils/ package (with __init__.py) and update all path references in the test docstring/skip-message and the policy README.
4.8 KiB
Research Paper
Paper: https://research.nvidia.com/labs/gear/gr00t-n1_5/
Repository
Code: https://github.com/NVIDIA/Isaac-GR00T
Citation
@inproceedings{gr00tn1_2025,
archivePrefix = {arxiv},
eprint = {2503.14734},
title = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
author = {NVIDIA and Johan Bjorck andFernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
month = {March},
year = {2025},
booktitle = {ArXiv Preprint},
}
Additional Resources
Blog: https://developer.nvidia.com/isaac/gr00t
Hugging Face Models:
- GR00T N1.7: https://huggingface.co/nvidia/GR00T-N1.7-3B
- GR00T N1.7 LIBERO checkpoints: https://huggingface.co/nvidia/GR00T-N1.7-LIBERO
Original-vs-LeRobot parity test
tests/policies/groot/test_groot_vs_original.py verifies that this LeRobot
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
produces the same raw model output (get_action(...)["action_pred"], the
normalized flow-matching prediction) as NVIDIA's original gr00t package, given
byte-identical pre-processed inputs and the same flow-matching seed. It is
parametrized over every embodiment tag present in the checkpoint.
Why two environments
The original gr00t package pins transformers==4.57.3 (Python 3.10); this
integration requires transformers>=5.x (Qwen3-VL). Under 5.x, PretrainedConfig
is itself a defaulted dataclass, so the original config dataclasses fail to import
(non-default argument follows default argument). The two implementations therefore
cannot be imported in the same Python process.
So the test uses a producer / consumer split across two venvs:
- Producer —
tests/policies/groot/utils/dump_original_n1_7.py, run in the original gr00t venv. For each embodiment it builds dummy inputs generically from the checkpoint metadata (state dims fromstatistics.json; camera/language keys from the processor modality configs), runs the original model, and saves the exact collated inputs + rawaction_predto one.npzper tag. - Consumer — the pytest above, run in the LeRobot venv. It discovers every
.npz, replays the byte-identical inputs through the LeRobot model with the same seed, and asserts the outputs match.
Fairness controls
- Same pre-processed inputs — the original processor's
input_ids,pixel_values,image_grid_thw,attention_mask,state,embodiment_idare fed verbatim to the LeRobot model (no re-tokenization / re-normalization). - Same precision + attention kernel — both sides run fp32 + SDPA. The
original defaults to
use_flash_attention=True(flash_attention_2 + bf16); the producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure kernel/rounding noise, not an implementation difference.) - Same flow-matching seed — fixed (42) right before sampling on both sides.
How to run
# Resolve a local checkpoint (GR00T-N1.7-LIBERO / libero_10)
CKPT=$(python - <<'PY'
import os
from huggingface_hub import snapshot_download
print(os.path.join(snapshot_download("nvidia/GR00T-N1.7-LIBERO",
allow_patterns=["libero_10/*"]), "libero_10"))
PY
)
# 1) Produce the original-side artifacts for all embodiments (original gr00t venv, CUDA)
CUDA_VISIBLE_DEVICES=0 /path/to/Isaac-GR00T/.venv-original/bin/python \
tests/policies/groot/utils/dump_original_n1_7.py \
--ckpt "$CKPT" --out-dir tests/policies/groot/artifacts --device cuda --seed 42
# 2) Run the parity test (LeRobot venv) — one parametrized case per embodiment
CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
The .npz artifacts are local-only (gitignored, ~6–9 MB each) and are regenerated by
the producer; they are never committed. The test skips (does not fail) on CI or
when the checkpoint / artifacts are absent.
Env knobs (all optional)
| Var | Default | Purpose |
|---|---|---|
GROOT_N1_7_PARITY_DIR |
tests/policies/groot/artifacts |
directory of per-tag .npz artifacts |
GROOT_N1_7_LIBERO_CKPT |
auto (HF cache) | override checkpoint dir |
GROOT_PARITY_DEVICE |
cuda if available |
cpu or cuda |
GROOT_PARITY_ATOL / GROOT_PARITY_RTOL |
1e-3 |
comparison tolerance |