lerobot/docs/source/policy_groot_README.md at f81fc7956487c0e8cb27230b4f7c7b50cfef050d

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-18 00:37:10 +00:00

Files

T

Steven Palma 9ce6633518 fix(groot): address review findings for the N1.7 port

N1.5 removal is now explicit and actionable:
- Legacy N1.5 checkpoint configs (tokenizer_assets_repo) parse and fail
  with a single clear error pointing to lerobot==0.5.1 instead of a
  cryptic draccus DecodingError
- Removed N1.5 processor registry names (groot_pack_inputs_v3,
  groot_eagle_encode_v3, groot_eagle_collate_v3) are stubbed to raise the
  same guidance; groot_action_unpack_unnormalize_v1 changed semantics, so
  the step is re-registered as _v2 and _v1 is stubbed
- N1.5 detection also recognizes checkpoint config.json content
  (model_type/architectures/eagle backbone), not just path names; every
  rejection surface includes the migration guidance
- groot.mdx documents the breaking change and migration path

Runtime fixes:
- use_bf16=False no longer crashes (compute_dtype only set when used)
- GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by
  sync select_action (relative eef/non-eef decode was broken in
  lerobot-eval/record flows)
- Postprocessor falls back to dataset stats when a raw checkpoint lacks
  the configured embodiment tag instead of silently emitting normalized
  [-1, 1] actions
- Hub-hosted finetuned N1.7 checkpoints load: the processor config is
  resolved via hf_hub_download for non-local paths, with a tolerant
  retry when inspection fails
- Raw-checkpoint processor branch honors caller overrides (device,
  rename_map) instead of dropping them
- Relative-action raw-state cache is per-instance instead of
  process-global (cross-instance contamination)
- Camera/modality-key mismatches warn, including the zero-match
  fallback; checkpoint revision is no longer forwarded into backbone
  loading; deprecated Qwen2VLImageProcessorFast replaced with
  Qwen2VLImageProcessor

Config/UX:
- GrootConfig defaults are the N1.7 values; explicitly passed legacy
  N1.5-era values (chunk_size=50, max_state_dim=64, ...) are remapped
  with a warning instead of silently
- Explicit action_decode_transform='none' wins over the libero_sim
  default (new 'auto' sentinel) and survives save/load round-trips

Tests/CI:
- pytest.importorskip guards so fast_tests tiers pass without
  transformers (was 10 failures, now 0)
- Regression tests for every fix; from_pretrained rejection tests now
  actually exercise from_pretrained
- Parity test reads the artifact seed, fails on shape mismatch instead
  of silently truncating, and a new case runs LeRobot's real Qwen3-VL
  preprocessing on raw observations dumped by the producer
- docs: dead huggingface-cli download replaced with hf download

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-06-12 16:51:14 +02:00

7.0 KiB

Raw Blame History

Research Paper

GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734

GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B

GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/

GR00T N1.5 support was removed from LeRobot; the last release supporting it is lerobot==0.5.1. Current releases support GR00T N1.7 only.

Repository

Code: https://github.com/NVIDIA/Isaac-GR00T

Citation

@inproceedings{gr00tn1_2025,
  archivePrefix = {arxiv},
  eprint     = {2503.14734},
  title      = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
  author     = {NVIDIA and Johan Bjorck andFernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
  month      = {March},
  year       = {2025},
  booktitle  = {ArXiv Preprint},
}

Additional Resources

Blog: https://developer.nvidia.com/isaac/gr00t

Hugging Face Models:

GR00T N1.7: https://huggingface.co/nvidia/GR00T-N1.7-3B
GR00T N1.7 LIBERO checkpoints: https://huggingface.co/nvidia/GR00T-N1.7-LIBERO

Original-vs-LeRobot parity test

tests/policies/groot/test_groot_vs_original.py verifies this LeRobot reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head) against NVIDIA's original gr00t package with two comparisons, each parametrized over every embodiment tag present in the checkpoint:

Model parity — given byte-identical pre-processed inputs and the same flow-matching seed (recorded in each artifact), both implementations must produce the same raw model output (get_action(...)["action_pred"], the normalized flow-matching prediction). Output shapes must match exactly; any action-horizon or action-dim mismatch fails the test.
Preprocessor parity — given the identical raw observations (per-camera frames, state vectors, language instruction), LeRobot's own preprocessor pipeline (real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven state normalization, no mocks) must produce the same collated model inputs (input_ids, attention_mask, pixel_values, image_grid_thw, state, embodiment_id) as the original package's processor.

Why two environments

The original gr00t package pins transformers==4.57.3 (Python 3.10); this integration requires transformers>=5.x (Qwen3-VL). Under 5.x, PretrainedConfig is itself a defaulted dataclass, so the original config dataclasses fail to import (non-default argument follows default argument). The two implementations therefore cannot be imported in the same Python process.

So the test uses a producer / consumer split across two venvs:

Producer — tests/policies/groot/utils/dump_original_n1_7.py, run in the original gr00t venv. For each embodiment it builds dummy inputs generically from the checkpoint metadata (state dims from statistics.json; camera/language keys from the processor modality configs), runs the original model, and saves to one .npz per tag: the raw observations (raw:: keys), the exact collated inputs (in:: keys), the seed, and the raw action_pred.
Consumer — the pytest above, run in the LeRobot venv. It discovers every .npz; the model-parity case replays the byte-identical collated inputs through the LeRobot model with the recorded seed and asserts the outputs match, and the preprocessor-parity case replays the raw observations through LeRobot's full preprocessor pipeline and asserts the collated tensors match.

Artifacts generated by older versions of the dump script contain no raw:: fields; the preprocessor-parity case then skips with a regeneration hint. Re-run the producer to refresh them.

Fairness controls

Same pre-processed inputs (model parity) — the original processor's input_ids, pixel_values, image_grid_thw, attention_mask, state, embodiment_id are fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the model comparison isolates the model. LeRobot's own tokenization / image packing is covered separately by the preprocessor-parity case, which compares its output against those same collated tensors from identical raw observations.
Same precision + attention kernel — both sides run fp32 + SDPA. The original defaults to use_flash_attention=True (flash_attention_2 + bf16); the producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure kernel/rounding noise, not an implementation difference.)
Same flow-matching seed — fixed right before sampling on both sides; the producer records it in each artifact (--seed, default 42) and the consumer replays the recorded value.

How to run

# Resolve a local checkpoint (GR00T-N1.7-LIBERO / libero_10)
CKPT=$(python - <<'PY'
import os
from huggingface_hub import snapshot_download
print(os.path.join(snapshot_download("nvidia/GR00T-N1.7-LIBERO",
      allow_patterns=["libero_10/*"]), "libero_10"))
PY
)

# 1) Produce the original-side artifacts for all embodiments (original gr00t venv, CUDA)
CUDA_VISIBLE_DEVICES=0 /path/to/Isaac-GR00T/.venv-original/bin/python \
    tests/policies/groot/utils/dump_original_n1_7.py \
    --ckpt "$CKPT" --out-dir tests/policies/groot/artifacts --device cuda --seed 42

# 2) Run the parity test (LeRobot venv) — one parametrized case per embodiment
CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
    uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s

The .npz artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by the producer; they are never committed. The tests skip (do not fail) on CI or when the checkpoint / artifacts are absent.

Env knobs (all optional)

Var	Default	Purpose
`GROOT_N1_7_PARITY_DIR`	`tests/policies/groot/artifacts`	directory of per-tag `.npz` artifacts
`GROOT_N1_7_LIBERO_CKPT`	auto (HF cache)	override checkpoint dir
`GROOT_PARITY_DEVICE`	`cuda` if available	`cpu` or `cuda`
`GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL`	`1e-3`	comparison tolerance

7.0 KiB Raw Blame History Unescape Escape