From 1c660feda423f84e37b38458769400df9dde865a Mon Sep 17 00:00:00 2001 From: Steven Palma Date: Fri, 12 Jun 2026 23:38:08 +0200 Subject: [PATCH] docs(groot): document the N1.5 removal and the N1.7 parity test - groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced with `hf download`. - policy_groot_README.md: N1.5 removal note, updated paper / model-card links, and the two-comparison (model parity + preprocessor parity) description of the original-vs-LeRobot test, including the raw-observation artifacts and recorded seed. --- docs/source/groot.mdx | 5 +- docs/source/policy_groot_README.md | 75 +++++++++++++++++++++--------- 2 files changed, 56 insertions(+), 24 deletions(-) diff --git a/docs/source/groot.mdx b/docs/source/groot.mdx index 2c26dbe45..4c2b5e22e 100644 --- a/docs/source/groot.mdx +++ b/docs/source/groot.mdx @@ -4,6 +4,9 @@ GR00T is an NVIDIA foundation model family for generalized humanoid robot reason LeRobot integrates GR00T N1.7 through the `groot` policy type. +> [!WARNING] +> **Breaking change:** GR00T N1.5 support was removed from LeRobot, and current releases support GR00T N1.7 only. N1.5 checkpoints, configs, and `--policy.model_version=n1.5` are rejected with a clear error. To keep using an N1.5 checkpoint, pin the last release that supports it: `pip install 'lerobot==0.5.1'`. To use the current release, migrate to GR00T N1.7 (`model_version='n1.7'`, base model [`nvidia/GR00T-N1.7-3B`](https://huggingface.co/nvidia/GR00T-N1.7-3B)). + ## Model Overview GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO. @@ -133,7 +136,7 @@ Replace the `XX` placeholders with final eval artifacts before merge. Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field. ```bash -huggingface-cli download nvidia/GR00T-N1.7-LIBERO \ +hf download nvidia/GR00T-N1.7-LIBERO \ --include "libero_spatial/*" \ --local-dir ./GR00T-N1.7-LIBERO diff --git a/docs/source/policy_groot_README.md b/docs/source/policy_groot_README.md index 200a3872b..49e61e98c 100644 --- a/docs/source/policy_groot_README.md +++ b/docs/source/policy_groot_README.md @@ -1,6 +1,13 @@ ## Research Paper -Paper: https://research.nvidia.com/labs/gear/gr00t-n1_5/ +GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734 + +GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B + +GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/ + +> GR00T N1.5 support was removed from LeRobot; the last release supporting it is `lerobot==0.5.1`. +> Current releases support GR00T N1.7 only. ## Repository @@ -31,12 +38,22 @@ Hugging Face Models: ## Original-vs-LeRobot parity test -`tests/policies/groot/test_groot_vs_original.py` verifies that this LeRobot +`tests/policies/groot/test_groot_vs_original.py` verifies this LeRobot reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head) -produces the **same raw model output** (`get_action(...)["action_pred"]`, the -normalized flow-matching prediction) as NVIDIA's original `gr00t` package, given -byte-identical pre-processed inputs and the same flow-matching seed. It is -parametrized over every embodiment tag present in the checkpoint. +against NVIDIA's original `gr00t` package with two comparisons, each parametrized +over every embodiment tag present in the checkpoint: + +1. **Model parity** — given byte-identical pre-processed inputs and the same + flow-matching seed (recorded in each artifact), both implementations must produce + the **same raw model output** (`get_action(...)["action_pred"]`, the normalized + flow-matching prediction). Output shapes must match exactly; any action-horizon + or action-dim mismatch fails the test. +2. **Preprocessor parity** — given the identical raw observations (per-camera + frames, state vectors, language instruction), LeRobot's own preprocessor pipeline + (real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven + state normalization, no mocks) must produce the **same collated model inputs** + (`input_ids`, `attention_mask`, `pixel_values`, `image_grid_thw`, `state`, + `embodiment_id`) as the original package's processor. ### Why two environments @@ -48,25 +65,37 @@ is itself a defaulted dataclass, so the original config dataclasses fail to impo So the test uses a **producer / consumer** split across two venvs: -1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the *original* +1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the _original_ gr00t venv. For each embodiment it builds dummy inputs generically from the checkpoint metadata (state dims from `statistics.json`; camera/language keys from - the processor modality configs), runs the original model, and saves the exact - collated inputs + raw `action_pred` to one `.npz` per tag. -2. **Consumer** — the pytest above, run in the *LeRobot* venv. It discovers every - `.npz`, replays the byte-identical inputs through the LeRobot model with the same - seed, and asserts the outputs match. + the processor modality configs), runs the original model, and saves to one `.npz` + per tag: the raw observations (`raw::` keys), the exact collated inputs + (`in::` keys), the seed, and the raw `action_pred`. +2. **Consumer** — the pytest above, run in the _LeRobot_ venv. It discovers every + `.npz`; the model-parity case replays the byte-identical collated inputs through + the LeRobot model with the recorded seed and asserts the outputs match, and the + preprocessor-parity case replays the raw observations through LeRobot's full + preprocessor pipeline and asserts the collated tensors match. + +> Artifacts generated by older versions of the dump script contain no `raw::` +> fields; the preprocessor-parity case then **skips** with a regeneration hint. +> Re-run the producer to refresh them. ### Fairness controls -- **Same pre-processed inputs** — the original processor's `input_ids`, +- **Same pre-processed inputs (model parity)** — the original processor's `input_ids`, `pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are - fed verbatim to the LeRobot model (no re-tokenization / re-normalization). + fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the + model comparison isolates the model. LeRobot's own tokenization / image packing is + covered separately by the preprocessor-parity case, which compares its output + against those same collated tensors from identical raw observations. - **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure kernel/rounding noise, not an implementation difference.) -- **Same flow-matching seed** — fixed (42) right before sampling on both sides. +- **Same flow-matching seed** — fixed right before sampling on both sides; the + producer records it in each artifact (`--seed`, default 42) and the consumer + replays the recorded value. ### How to run @@ -90,15 +119,15 @@ CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \ uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s ``` -The `.npz` artifacts are local-only (gitignored, ~6–9 MB each) and are regenerated by -the producer; they are never committed. The test **skips** (does not fail) on CI or +The `.npz` artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by +the producer; they are never committed. The tests **skip** (do not fail) on CI or when the checkpoint / artifacts are absent. #### Env knobs (all optional) -| Var | Default | Purpose | -|---|---|---| -| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts | -| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir | -| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` | -| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance | +| Var | Default | Purpose | +| ----------------------------------------- | -------------------------------- | ------------------------------------- | +| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts | +| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir | +| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` | +| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |