lerobot/docs/source/policy_groot_README.md

## Research Paper

GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734

GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B

GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/

> GR00T N1.5 support was removed from LeRobot; the last release supporting it is `lerobot==0.5.1`.
> Current releases support GR00T N1.7 only.

## Repository

Code: https://github.com/NVIDIA/Isaac-GR00T

## Citation

```bibtex
@inproceedings{gr00tn1_2025,
  archivePrefix = {arxiv},
  eprint     = {2503.14734},
  title      = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
  author     = {NVIDIA and Johan Bjorck andFernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
  month      = {March},
  year       = {2025},
  booktitle  = {ArXiv Preprint},
}
```

## Additional Resources

Blog: https://developer.nvidia.com/isaac/gr00t

Hugging Face Models:

- GR00T N1.7: https://huggingface.co/nvidia/GR00T-N1.7-3B
- GR00T N1.7 LIBERO checkpoints: https://huggingface.co/nvidia/GR00T-N1.7-LIBERO

<details>
<summary><b>Original-vs-LeRobot parity test</b></summary>

## Original-vs-LeRobot parity test

`tests/policies/groot/test_groot_vs_original.py` verifies this LeRobot
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
against NVIDIA's original `gr00t` package with two comparisons, each parametrized
over every embodiment tag present in the checkpoint:

1. **Model parity** — given byte-identical pre-processed inputs and the same
   flow-matching seed (recorded in each artifact), both implementations must produce
   the **same raw model output** (`get_action(...)["action_pred"]`, the normalized
   flow-matching prediction). Output shapes must match exactly; any action-horizon
   or action-dim mismatch fails the test.
2. **Preprocessor parity** — given the identical raw observations (per-camera
   frames, state vectors, language instruction), LeRobot's own preprocessor pipeline
   (real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven
   state normalization, no mocks) must produce the **same collated model inputs**
   (`input_ids`, `attention_mask`, `pixel_values`, `image_grid_thw`, `state`,
   `embodiment_id`) as the original package's processor.

### Why two environments

The original `gr00t` package pins `transformers==4.57.3` (Python 3.10); this
integration requires `transformers>=5.x` (Qwen3-VL). Under 5.x, `PretrainedConfig`
is itself a defaulted dataclass, so the original config dataclasses fail to import
(`non-default argument follows default argument`). The two implementations therefore
**cannot be imported in the same Python process**.

So the test uses a **producer / consumer** split across two venvs:

1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the _original_
   gr00t venv. For each embodiment it builds dummy inputs generically from the
   checkpoint metadata (state dims from `statistics.json`; camera/language keys from
   the processor modality configs), runs the original model, and saves to one `.npz`
   per tag: the raw observations (`raw::` keys), the exact collated inputs
   (`in::` keys), the seed, and the raw `action_pred`.
2. **Consumer** — the pytest above, run in the _LeRobot_ venv. It discovers every
   `.npz`; the model-parity case replays the byte-identical collated inputs through
   the LeRobot model with the recorded seed and asserts the outputs match, and the
   preprocessor-parity case replays the raw observations through LeRobot's full
   preprocessor pipeline and asserts the collated tensors match.

> Artifacts generated by older versions of the dump script contain no `raw::`
> fields; the preprocessor-parity case then **skips** with a regeneration hint.
> Re-run the producer to refresh them.

### Fairness controls

- **Same pre-processed inputs (model parity)** — the original processor's `input_ids`,
  `pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
  fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the
  model comparison isolates the model. LeRobot's own tokenization / image packing is
  covered separately by the preprocessor-parity case, which compares its output
  against those same collated tensors from identical raw observations.
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
  original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
  producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
  kernel/rounding noise, not an implementation difference.)
- **Same flow-matching seed** — fixed right before sampling on both sides; the
  producer records it in each artifact (`--seed`, default 42) and the consumer
  replays the recorded value.

### How to run

```bash
# Resolve a local checkpoint (GR00T-N1.7-LIBERO / libero_10)
CKPT=$(python - <<'PY'
import os
from huggingface_hub import snapshot_download
print(os.path.join(snapshot_download("nvidia/GR00T-N1.7-LIBERO",
      allow_patterns=["libero_10/*"]), "libero_10"))
PY
)

# 1) Produce the original-side artifacts for all embodiments (original gr00t venv, CUDA)
CUDA_VISIBLE_DEVICES=0 /path/to/Isaac-GR00T/.venv-original/bin/python \
    tests/policies/groot/utils/dump_original_n1_7.py \
    --ckpt "$CKPT" --out-dir tests/policies/groot/artifacts --device cuda --seed 42

# 2) Run the parity test (LeRobot venv) — one parametrized case per embodiment
CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
    uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
```

The `.npz` artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by
the producer; they are never committed. The tests **skip** (do not fail) on CI or
when the checkpoint / artifacts are absent.

#### Env knobs (all optional)

| Var                                       | Default                          | Purpose                               |
| ----------------------------------------- | -------------------------------- | ------------------------------------- |
| `GROOT_N1_7_PARITY_DIR`                   | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
| `GROOT_N1_7_LIBERO_CKPT`                  | auto (HF cache)                  | override checkpoint dir               |
| `GROOT_PARITY_DEVICE`                     | `cuda` if available              | `cpu` or `cuda`                       |
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3`                           | comparison tolerance                  |

</details>