mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-17 16:27:04 +00:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| edda8552ec |
@@ -4,6 +4,9 @@ GR00T is an NVIDIA foundation model family for generalized humanoid robot reason
|
|||||||
|
|
||||||
LeRobot integrates GR00T N1.7 through the `groot` policy type.
|
LeRobot integrates GR00T N1.7 through the `groot` policy type.
|
||||||
|
|
||||||
|
> [!WARNING]
|
||||||
|
> **Breaking change:** GR00T N1.5 support was removed from LeRobot, and current releases support GR00T N1.7 only. N1.5 checkpoints, configs, and `--policy.model_version=n1.5` are rejected with a clear error. To keep using an N1.5 checkpoint, pin the last release that supports it: `pip install 'lerobot==0.5.1'`. To use the current release, migrate to GR00T N1.7 (`model_version='n1.7'`, base model [`nvidia/GR00T-N1.7-3B`](https://huggingface.co/nvidia/GR00T-N1.7-3B)).
|
||||||
|
|
||||||
## Model Overview
|
## Model Overview
|
||||||
|
|
||||||
GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO.
|
GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO.
|
||||||
@@ -133,7 +136,7 @@ Replace the `XX` placeholders with final eval artifacts before merge.
|
|||||||
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
huggingface-cli download nvidia/GR00T-N1.7-LIBERO \
|
hf download nvidia/GR00T-N1.7-LIBERO \
|
||||||
--include "libero_spatial/*" \
|
--include "libero_spatial/*" \
|
||||||
--local-dir ./GR00T-N1.7-LIBERO
|
--local-dir ./GR00T-N1.7-LIBERO
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,13 @@
|
|||||||
## Research Paper
|
## Research Paper
|
||||||
|
|
||||||
Paper: https://research.nvidia.com/labs/gear/gr00t-n1_5/
|
GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734
|
||||||
|
|
||||||
|
GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B
|
||||||
|
|
||||||
|
GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/
|
||||||
|
|
||||||
|
> GR00T N1.5 support was removed from LeRobot; the last release supporting it is `lerobot==0.5.1`.
|
||||||
|
> Current releases support GR00T N1.7 only.
|
||||||
|
|
||||||
## Repository
|
## Repository
|
||||||
|
|
||||||
@@ -31,12 +38,22 @@ Hugging Face Models:
|
|||||||
|
|
||||||
## Original-vs-LeRobot parity test
|
## Original-vs-LeRobot parity test
|
||||||
|
|
||||||
`tests/policies/groot/test_groot_vs_original.py` verifies that this LeRobot
|
`tests/policies/groot/test_groot_vs_original.py` verifies this LeRobot
|
||||||
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
|
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
|
||||||
produces the **same raw model output** (`get_action(...)["action_pred"]`, the
|
against NVIDIA's original `gr00t` package with two comparisons, each parametrized
|
||||||
normalized flow-matching prediction) as NVIDIA's original `gr00t` package, given
|
over every embodiment tag present in the checkpoint:
|
||||||
byte-identical pre-processed inputs and the same flow-matching seed. It is
|
|
||||||
parametrized over every embodiment tag present in the checkpoint.
|
1. **Model parity** — given byte-identical pre-processed inputs and the same
|
||||||
|
flow-matching seed (recorded in each artifact), both implementations must produce
|
||||||
|
the **same raw model output** (`get_action(...)["action_pred"]`, the normalized
|
||||||
|
flow-matching prediction). Output shapes must match exactly; any action-horizon
|
||||||
|
or action-dim mismatch fails the test.
|
||||||
|
2. **Preprocessor parity** — given the identical raw observations (per-camera
|
||||||
|
frames, state vectors, language instruction), LeRobot's own preprocessor pipeline
|
||||||
|
(real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven
|
||||||
|
state normalization, no mocks) must produce the **same collated model inputs**
|
||||||
|
(`input_ids`, `attention_mask`, `pixel_values`, `image_grid_thw`, `state`,
|
||||||
|
`embodiment_id`) as the original package's processor.
|
||||||
|
|
||||||
### Why two environments
|
### Why two environments
|
||||||
|
|
||||||
@@ -48,25 +65,37 @@ is itself a defaulted dataclass, so the original config dataclasses fail to impo
|
|||||||
|
|
||||||
So the test uses a **producer / consumer** split across two venvs:
|
So the test uses a **producer / consumer** split across two venvs:
|
||||||
|
|
||||||
1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the *original*
|
1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the _original_
|
||||||
gr00t venv. For each embodiment it builds dummy inputs generically from the
|
gr00t venv. For each embodiment it builds dummy inputs generically from the
|
||||||
checkpoint metadata (state dims from `statistics.json`; camera/language keys from
|
checkpoint metadata (state dims from `statistics.json`; camera/language keys from
|
||||||
the processor modality configs), runs the original model, and saves the exact
|
the processor modality configs), runs the original model, and saves to one `.npz`
|
||||||
collated inputs + raw `action_pred` to one `.npz` per tag.
|
per tag: the raw observations (`raw::` keys), the exact collated inputs
|
||||||
2. **Consumer** — the pytest above, run in the *LeRobot* venv. It discovers every
|
(`in::` keys), the seed, and the raw `action_pred`.
|
||||||
`.npz`, replays the byte-identical inputs through the LeRobot model with the same
|
2. **Consumer** — the pytest above, run in the _LeRobot_ venv. It discovers every
|
||||||
seed, and asserts the outputs match.
|
`.npz`; the model-parity case replays the byte-identical collated inputs through
|
||||||
|
the LeRobot model with the recorded seed and asserts the outputs match, and the
|
||||||
|
preprocessor-parity case replays the raw observations through LeRobot's full
|
||||||
|
preprocessor pipeline and asserts the collated tensors match.
|
||||||
|
|
||||||
|
> Artifacts generated by older versions of the dump script contain no `raw::`
|
||||||
|
> fields; the preprocessor-parity case then **skips** with a regeneration hint.
|
||||||
|
> Re-run the producer to refresh them.
|
||||||
|
|
||||||
### Fairness controls
|
### Fairness controls
|
||||||
|
|
||||||
- **Same pre-processed inputs** — the original processor's `input_ids`,
|
- **Same pre-processed inputs (model parity)** — the original processor's `input_ids`,
|
||||||
`pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
|
`pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
|
||||||
fed verbatim to the LeRobot model (no re-tokenization / re-normalization).
|
fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the
|
||||||
|
model comparison isolates the model. LeRobot's own tokenization / image packing is
|
||||||
|
covered separately by the preprocessor-parity case, which compares its output
|
||||||
|
against those same collated tensors from identical raw observations.
|
||||||
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
|
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
|
||||||
original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
|
original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
|
||||||
producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
|
producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
|
||||||
kernel/rounding noise, not an implementation difference.)
|
kernel/rounding noise, not an implementation difference.)
|
||||||
- **Same flow-matching seed** — fixed (42) right before sampling on both sides.
|
- **Same flow-matching seed** — fixed right before sampling on both sides; the
|
||||||
|
producer records it in each artifact (`--seed`, default 42) and the consumer
|
||||||
|
replays the recorded value.
|
||||||
|
|
||||||
### How to run
|
### How to run
|
||||||
|
|
||||||
@@ -90,15 +119,15 @@ CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
|
|||||||
uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
|
uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
|
||||||
```
|
```
|
||||||
|
|
||||||
The `.npz` artifacts are local-only (gitignored, ~6–9 MB each) and are regenerated by
|
The `.npz` artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by
|
||||||
the producer; they are never committed. The test **skips** (does not fail) on CI or
|
the producer; they are never committed. The tests **skip** (do not fail) on CI or
|
||||||
when the checkpoint / artifacts are absent.
|
when the checkpoint / artifacts are absent.
|
||||||
|
|
||||||
#### Env knobs (all optional)
|
#### Env knobs (all optional)
|
||||||
|
|
||||||
| Var | Default | Purpose |
|
| Var | Default | Purpose |
|
||||||
|---|---|---|
|
| ----------------------------------------- | -------------------------------- | ------------------------------------- |
|
||||||
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
|
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
|
||||||
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
|
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
|
||||||
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
|
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
|
||||||
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |
|
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |
|
||||||
|
|||||||
Reference in New Issue
Block a user