Files
lerobot/docs/source/groot.mdx
T
Steven Palma 4688b9c27f refactor(groot): N1.7 style cleanup (utils, imports, flash-attn, config)
Mechanical refactor of the GR00T N1.7 policy to match the repo's architecture and
style standards. No change to policy algorithm/numerics; only UX/CLI and packaging
changes. Tests are intentionally left untouched (out of scope) and need updating
for the removed `model_version` field.

Cleanup & consolidation:
- Add `groot/utils.py` holding the pure, side-effect-free helpers (JSON I/O, value
  coercion, stat flattening, rot6d/SE3 math, language/batch prep) shared by the
  config and processor layers.
- Remove dead code: the unused `resolve_groot_n1_7_backbone_model` cache-resolver
  cluster, `GR00TN17Config.to_filtered_dict/json`, and the `_copy_default` wrapper.

Imports & execution guards:
- Hoist nested imports to module top; relative imports within the package, absolute
  for external modules. The version-gated Qwen3-VL classes import under the single
  `_transformers_available` guard (transformers is pinned >=5.4, which ships them).
- No import-time side effects: `_register_with_transformers()` now runs in
  `GR00TN17.__init__` (idempotent via `register(exist_ok=True)`), and the N1.5 step
  stubs register lazily before pipeline deserialization (idempotent via the
  registry, no run-once globals).
- Gate optional deps at the point of use with `require_package(..., extra="groot")`.

Dependencies & docs:
- Drop `flash-attn` (and its build-only dep `ninja`) from the `groot` extra; default
  to SDPA (numerically equivalent) with opt-in via `--policy.use_flash_attention`.
  Un-comment `lerobot[groot]` in the `all` extra and regenerate `uv.lock`.
- Rewrite the `groot.mdx` install section: flash-attn is a purely optional,
  user-managed optimization that LeRobot neither installs nor requires.

Config & CLI:
- Surface previously-frozen knobs on `GrootConfig` (plumbed into `GR00TN17Config`;
  no-ops at their defaults): inference — `num_inference_timesteps`, `rtc_ramp_rate`,
  `use_flash_attention`; fine-tuning — `tune_top_llm_layers` (partial-LLM tuning)
  and `tune_vlln` (previously hardwired to True).
- Convert the single-valued `model_version` and `n1_7_backbone_model` fields to
  internal constants.
- Keep `base_model_path`: it is NOT equivalent to `pretrained_path` (raw NVIDIA
  checkpoints have no LeRobot `type` field and load only via `base_model_path`) and
  is genuinely user-tunable.
- Keep the deprecated Isaac-GR00T/N1.5 fields (and the dead LoRA fields) as a
  back-compat block so a v0.5.1 N1.5 `config.json` still parses under draccus and is
  rejected with the friendly N1.5 removal message instead of an opaque decode error.
2026-06-16 14:45:37 +02:00

180 lines
6.9 KiB
Plaintext

# GR00T Policy
GR00T is an NVIDIA foundation model family for generalized humanoid robot reasoning and skills. It is a cross-embodiment policy that accepts multimodal input, including language, images, and proprioception, to perform manipulation tasks in diverse environments.
LeRobot integrates GR00T N1.7 through the `groot` policy type.
> [!WARNING]
> **Breaking change:** GR00T N1.5 support was removed from LeRobot, and current releases support GR00T N1.7 only. N1.5 checkpoints and configs are rejected with a migration note. To keep using an N1.5 checkpoint, pin the last release that supports it: `pip install 'lerobot==0.5.1'`. To use the current release, migrate to GR00T N1.7 (base model [`nvidia/GR00T-N1.7-3B`](https://huggingface.co/nvidia/GR00T-N1.7-3B)).
## Model Overview
GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO.
Developers and researchers can post-train GR00T with their own real or synthetic data to adapt it for specific humanoid robots or tasks.
GR00T uses pre-trained vision and language encoders with a flow matching action transformer to model a chunk of actions conditioned on vision, language, and proprioception.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png"
alt="An overview of GR00T"
width="80%"
/>
Its strong performance comes from being trained on an expansive and diverse humanoid dataset, which includes:
- Real captured data from robots.
- Synthetic data generated using NVIDIA Isaac GR00T Blueprint.
- Internet-scale video data.
This approach allows the model to be highly adaptable through post-training for specific embodiments, tasks, and environments.
## Installation Requirements
GR00T is intended for NVIDIA GPU-accelerated systems. Install LeRobot with the GR00T extra:
```bash
pip install "lerobot[groot]"
```
For a source checkout:
```bash
pip install -e ".[groot]"
```
### Optional: Flash Attention acceleration
Flash Attention is a purely optional performance optimization. **LeRobot neither installs nor requires it**, and setting it up is up to the user as it has environment-specific build requirements (a matching PyTorch/CUDA toolchain). To enable it:
1. Install a `flash-attn` build matching your PyTorch/CUDA environment (see the [Flash Attention project](https://github.com/Dao-AILab/flash-attention)):
```bash
# Check https://pytorch.org/get-started/locally/ for the right CUDA wheel index for your system.
pip install "torch>=2.7,<2.12.0" "torchvision>=0.22.0,<0.27.0" \
--index-url https://download.pytorch.org/whl/cu128
pip install "ninja>=1.11.1,<2.0.0" "packaging>=24.2,<26.0"
pip install "flash-attn>=2.5.9,<3.0.0" --no-build-isolation
python -c "import flash_attn; print(f'Flash Attention {flash_attn.__version__} imported successfully')"
```
2. Install lerobot with the groot extra.
3. Opt in by passing `--policy.use_flash_attention=true` when training/evaluating GR00T. If the kernel is missing or fails to import, the backbone transparently falls back to SDPA.
## Usage
To use GR00T N1.7:
```bash
--policy.type=groot
```
## Training
### Training Command Example
Here's a complete training command for finetuning the base GR00T model on your own dataset:
```bash
# Using a multi-GPU setup
accelerate launch \
--multi_gpu \
--num_processes=$NUM_GPUS \
$(which lerobot-train) \
--output_dir=$OUTPUT_DIR \
--save_checkpoint=true \
--batch_size=$BATCH_SIZE \
--steps=$NUM_STEPS \
--save_freq=$SAVE_FREQ \
--log_freq=$LOG_FREQ \
--policy.push_to_hub=true \
--policy.type=groot \
--policy.repo_id=$REPO_ID \
--policy.tune_diffusion_model=false \
--dataset.repo_id=$DATASET_ID \
--wandb.enable=true \
--wandb.disable_artifact=true \
--job_name=$JOB_NAME
```
## Performance Results
### LIBERO Benchmark Results
> [!NOTE]
> Follow the [LIBERO](./libero) setup instructions before running `lerobot-eval`.
GR00T N1.7 has demonstrated strong performance on the LIBERO benchmark suite. To reproduce LeRobot results, follow the instructions in the [LIBERO](./libero) section.
### GR00T N1.7 LIBERO Checkpoints
NVIDIA publishes GR00T N1.7 LIBERO checkpoints at [`nvidia/GR00T-N1.7-LIBERO`](https://huggingface.co/nvidia/GR00T-N1.7-LIBERO), with one subdirectory per LIBERO suite:
| Suite | Checkpoint subdirectory |
| -------------- | ----------------------- |
| LIBERO Spatial | `libero_spatial` |
| LIBERO Object | `libero_object` |
| LIBERO Goal | `libero_goal` |
| LIBERO 10 | `libero_10` |
Preliminary LeRobot integration results:
| Suite | Status | Success rate | n_episodes |
| -------------- | ------ | -----------: | ---------: |
| LIBERO Spatial | ✓ | ~95% | XX |
| LIBERO Object | ✓ | XX% | XX |
| LIBERO Goal | ✓ | XX% | XX |
| LIBERO 10 | ✓ | XX% | XX |
| **Average** | ✓ | **XX%** | **XX** |
Replace the `XX` placeholders with final eval artifacts before merge.
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
```bash
hf download nvidia/GR00T-N1.7-LIBERO \
--include "libero_spatial/*" \
--local-dir ./GR00T-N1.7-LIBERO
lerobot-eval \
--policy.type=groot \
--policy.base_model_path=./GR00T-N1.7-LIBERO/libero_spatial \
--policy.embodiment_tag=libero_sim \
--env.type=libero \
--env.task=libero_spatial \
--eval.n_episodes=50
```
Use `eval.n_episodes >= 50` per suite when reporting success rates.
### Evaluate in your hardware setup
Once you have trained your model using your parameters you can run inference in your downstream task. Follow the instructions in [Policy Deployment (lerobot-rollout)](./inference). For example:
```bash
lerobot-rollout\
--strategy.type=sentry \
--strategy.upload_every_n_episodes=5 \
--robot.type=bi_so_follower \
--robot.left_arm_port=/dev/ttyACM1 \
--robot.right_arm_port=/dev/ttyACM0 \
--robot.id=bimanual_follower \
--robot.cameras='{ right: {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
left: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
top: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
}' \
--display_data=true \
--dataset.repo_id=<user>/eval_groot-bimanual \
--dataset.single_task="Grab and handover the red cube to the other arm" \
--dataset.streaming_encoding=true \
--dataset.encoder_threads=2 \
# --dataset.camera_encoder.vcodec=auto \
--policy.path=<user>/groot-bimanual \ # your trained model
--duration=600
```
## License
GR00T N1.7 is released under the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).