lerobot/docs/source/policy_groot_README.md

## Research Paper

Paper: https://research.nvidia.com/labs/gear/gr00t-n1_5/

## Repository

Code: https://github.com/NVIDIA/Isaac-GR00T

## Citation

```bibtex
@inproceedings{gr00tn1_2025,
  archivePrefix = {arxiv},
  eprint     = {2503.14734},
  title      = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
  author     = {NVIDIA and Johan Bjorck andFernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
  month      = {March},
  year       = {2025},
  booktitle  = {ArXiv Preprint},
}
```

## Additional Resources

Blog: https://developer.nvidia.com/isaac/gr00t

Hugging Face Models:

- GR00T N1.7: https://huggingface.co/nvidia/GR00T-N1.7-3B
- GR00T N1.7 LIBERO checkpoints: https://huggingface.co/nvidia/GR00T-N1.7-LIBERO

## Original-vs-LeRobot parity test

`tests/policies/groot/test_groot_vs_original.py` verifies that this LeRobot
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
produces the **same raw model output** (`get_action(...)["action_pred"]`, the
normalized flow-matching prediction) as NVIDIA's original `gr00t` package, given
byte-identical pre-processed inputs and the same flow-matching seed. It is
parametrized over every embodiment tag present in the checkpoint.

### Why two environments

The original `gr00t` package pins `transformers==4.57.3` (Python 3.10); this
integration requires `transformers>=5.x` (Qwen3-VL). Under 5.x, `PretrainedConfig`
is itself a defaulted dataclass, so the original config dataclasses fail to import
(`non-default argument follows default argument`). The two implementations therefore
**cannot be imported in the same Python process**.

So the test uses a **producer / consumer** split across two venvs:

1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the *original*
   gr00t venv. For each embodiment it builds dummy inputs generically from the
   checkpoint metadata (state dims from `statistics.json`; camera/language keys from
   the processor modality configs), runs the original model, and saves the exact
   collated inputs + raw `action_pred` to one `.npz` per tag.
2. **Consumer** — the pytest above, run in the *LeRobot* venv. It discovers every
   `.npz`, replays the byte-identical inputs through the LeRobot model with the same
   seed, and asserts the outputs match.

### Fairness controls

- **Same pre-processed inputs** — the original processor's `input_ids`,
  `pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
  fed verbatim to the LeRobot model (no re-tokenization / re-normalization).
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
  original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
  producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
  kernel/rounding noise, not an implementation difference.)
- **Same flow-matching seed** — fixed (42) right before sampling on both sides.

### How to run

```bash
# Resolve a local checkpoint (GR00T-N1.7-LIBERO / libero_10)
CKPT=$(python - <<'PY'
import os
from huggingface_hub import snapshot_download
print(os.path.join(snapshot_download("nvidia/GR00T-N1.7-LIBERO",
      allow_patterns=["libero_10/*"]), "libero_10"))
PY
)

# 1) Produce the original-side artifacts for all embodiments (original gr00t venv, CUDA)
CUDA_VISIBLE_DEVICES=0 /path/to/Isaac-GR00T/.venv-original/bin/python \
    tests/policies/groot/utils/dump_original_n1_7.py \
    --ckpt "$CKPT" --out-dir tests/policies/groot/artifacts --device cuda --seed 42

# 2) Run the parity test (LeRobot venv) — one parametrized case per embodiment
CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
    uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
```

The `.npz` artifacts are local-only (gitignored, ~6–9 MB each) and are regenerated by
the producer; they are never committed. The test **skips** (does not fail) on CI or
when the checkpoint / artifacts are absent.

#### Env knobs (all optional)

| Var | Default | Purpose |
|---|---|---|
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |