mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-25 04:07:02 +00:00
9ce6633518
N1.5 removal is now explicit and actionable: - Legacy N1.5 checkpoint configs (tokenizer_assets_repo) parse and fail with a single clear error pointing to lerobot==0.5.1 instead of a cryptic draccus DecodingError - Removed N1.5 processor registry names (groot_pack_inputs_v3, groot_eagle_encode_v3, groot_eagle_collate_v3) are stubbed to raise the same guidance; groot_action_unpack_unnormalize_v1 changed semantics, so the step is re-registered as _v2 and _v1 is stubbed - N1.5 detection also recognizes checkpoint config.json content (model_type/architectures/eagle backbone), not just path names; every rejection surface includes the migration guidance - groot.mdx documents the breaking change and migration path Runtime fixes: - use_bf16=False no longer crashes (compute_dtype only set when used) - GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync select_action (relative eef/non-eef decode was broken in lerobot-eval/record flows) - Postprocessor falls back to dataset stats when a raw checkpoint lacks the configured embodiment tag instead of silently emitting normalized [-1, 1] actions - Hub-hosted finetuned N1.7 checkpoints load: the processor config is resolved via hf_hub_download for non-local paths, with a tolerant retry when inspection fails - Raw-checkpoint processor branch honors caller overrides (device, rename_map) instead of dropping them - Relative-action raw-state cache is per-instance instead of process-global (cross-instance contamination) - Camera/modality-key mismatches warn, including the zero-match fallback; checkpoint revision is no longer forwarded into backbone loading; deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor Config/UX: - GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning instead of silently - Explicit action_decode_transform='none' wins over the libero_sim default (new 'auto' sentinel) and survives save/load round-trips Tests/CI: - pytest.importorskip guards so fast_tests tiers pass without transformers (was 10 failures, now 0) - Regression tests for every fix; from_pretrained rejection tests now actually exercise from_pretrained - Parity test reads the artifact seed, fails on shape mismatch instead of silently truncating, and a new case runs LeRobot's real Qwen3-VL preprocessing on raw observations dumped by the producer - docs: dead huggingface-cli download replaced with hf download Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
184 lines
7.1 KiB
Plaintext
184 lines
7.1 KiB
Plaintext
# GR00T Policy
|
|
|
|
GR00T is an NVIDIA foundation model family for generalized humanoid robot reasoning and skills. It is a cross-embodiment policy that accepts multimodal input, including language, images, and proprioception, to perform manipulation tasks in diverse environments.
|
|
|
|
LeRobot integrates GR00T N1.7 through the `groot` policy type.
|
|
|
|
> [!WARNING]
|
|
> **Breaking change:** GR00T N1.5 support was removed from LeRobot, and current releases support GR00T N1.7 only. N1.5 checkpoints, configs, and `--policy.model_version=n1.5` are rejected with a clear error. To keep using an N1.5 checkpoint, pin the last release that supports it: `pip install 'lerobot==0.5.1'`. To use the current release, migrate to GR00T N1.7 (`model_version='n1.7'`, base model [`nvidia/GR00T-N1.7-3B`](https://huggingface.co/nvidia/GR00T-N1.7-3B)).
|
|
|
|
## Model Overview
|
|
|
|
GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO.
|
|
|
|
Developers and researchers can post-train GR00T with their own real or synthetic data to adapt it for specific humanoid robots or tasks.
|
|
|
|
GR00T uses pre-trained vision and language encoders with a flow matching action transformer to model a chunk of actions conditioned on vision, language, and proprioception.
|
|
|
|
<img
|
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png"
|
|
alt="An overview of GR00T"
|
|
width="80%"
|
|
/>
|
|
|
|
Its strong performance comes from being trained on an expansive and diverse humanoid dataset, which includes:
|
|
|
|
- Real captured data from robots.
|
|
- Synthetic data generated using NVIDIA Isaac GR00T Blueprint.
|
|
- Internet-scale video data.
|
|
|
|
This approach allows the model to be highly adaptable through post-training for specific embodiments, tasks, and environments.
|
|
|
|
## Installation Requirements
|
|
|
|
GR00T is intended for NVIDIA GPU-accelerated systems. The `groot` extra still includes Flash Attention on non-macOS platforms, and Flash Attention needs a compatible PyTorch/CUDA environment before it is installed. Install the dependencies in this order:
|
|
|
|
1. Follow the Environment Setup in the [Installation Guide](./installation). Do not install `lerobot` yet.
|
|
2. Install PyTorch, TorchVision, and the build dependencies used by Flash Attention:
|
|
|
|
```bash
|
|
# Check https://pytorch.org/get-started/locally/ for the right CUDA wheel index for your system.
|
|
pip install "torch>=2.7,<2.12.0" "torchvision>=0.22.0,<0.27.0" \
|
|
--index-url https://download.pytorch.org/whl/cu128
|
|
pip install "ninja>=1.11.1,<2.0.0" "packaging>=24.2,<26.0"
|
|
```
|
|
|
|
3. Install and verify Flash Attention:
|
|
|
|
```bash
|
|
pip install "flash-attn>=2.5.9,<3.0.0" --no-build-isolation
|
|
python -c "import flash_attn; print(f'Flash Attention {flash_attn.__version__} imported successfully')"
|
|
```
|
|
|
|
4. Install LeRobot with the GR00T extra:
|
|
|
|
```bash
|
|
pip install "lerobot[groot]"
|
|
```
|
|
|
|
For a source checkout, use the same order, then install the local package with:
|
|
|
|
```bash
|
|
pip install -e ".[groot]"
|
|
```
|
|
|
|
If your CUDA/PyTorch build needs a different Flash Attention wheel or source build, follow the [Flash Attention project](https://github.com/Dao-AILab/flash-attention) instructions, but keep the same ordering: PyTorch first, Flash Attention next, then `lerobot[groot]`.
|
|
|
|
## Usage
|
|
|
|
To use GR00T N1.7:
|
|
|
|
```bash
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7
|
|
```
|
|
|
|
## Training
|
|
|
|
### Training Command Example
|
|
|
|
Here's a complete training command for finetuning the base GR00T model on your own dataset:
|
|
|
|
```bash
|
|
# Using a multi-GPU setup
|
|
accelerate launch \
|
|
--multi_gpu \
|
|
--num_processes=$NUM_GPUS \
|
|
$(which lerobot-train) \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--save_checkpoint=true \
|
|
--batch_size=$BATCH_SIZE \
|
|
--steps=$NUM_STEPS \
|
|
--save_freq=$SAVE_FREQ \
|
|
--log_freq=$LOG_FREQ \
|
|
--policy.push_to_hub=true \
|
|
--policy.type=groot \
|
|
--policy.repo_id=$REPO_ID \
|
|
--policy.tune_diffusion_model=false \
|
|
--dataset.repo_id=$DATASET_ID \
|
|
--wandb.enable=true \
|
|
--wandb.disable_artifact=true \
|
|
--job_name=$JOB_NAME
|
|
```
|
|
|
|
## Performance Results
|
|
|
|
### LIBERO Benchmark Results
|
|
|
|
> [!NOTE]
|
|
> Follow the [LIBERO](./libero) setup instructions before running `lerobot-eval`.
|
|
|
|
GR00T N1.7 has demonstrated strong performance on the LIBERO benchmark suite. To reproduce LeRobot results, follow the instructions in the [LIBERO](./libero) section.
|
|
|
|
### GR00T N1.7 LIBERO Checkpoints
|
|
|
|
NVIDIA publishes GR00T N1.7 LIBERO checkpoints at [`nvidia/GR00T-N1.7-LIBERO`](https://huggingface.co/nvidia/GR00T-N1.7-LIBERO), with one subdirectory per LIBERO suite:
|
|
|
|
| Suite | Checkpoint subdirectory |
|
|
| -------------- | ----------------------- |
|
|
| LIBERO Spatial | `libero_spatial` |
|
|
| LIBERO Object | `libero_object` |
|
|
| LIBERO Goal | `libero_goal` |
|
|
| LIBERO 10 | `libero_10` |
|
|
|
|
Preliminary LeRobot integration results:
|
|
|
|
| Suite | Status | Success rate | n_episodes |
|
|
| -------------- | ------ | -----------: | ---------: |
|
|
| LIBERO Spatial | ✓ | ~95% | XX |
|
|
| LIBERO Object | ✓ | XX% | XX |
|
|
| LIBERO Goal | ✓ | XX% | XX |
|
|
| LIBERO 10 | ✓ | XX% | XX |
|
|
| **Average** | ✓ | **XX%** | **XX** |
|
|
|
|
Replace the `XX` placeholders with final eval artifacts before merge.
|
|
|
|
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
|
|
|
```bash
|
|
hf download nvidia/GR00T-N1.7-LIBERO \
|
|
--include "libero_spatial/*" \
|
|
--local-dir ./GR00T-N1.7-LIBERO
|
|
|
|
lerobot-eval \
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7 \
|
|
--policy.base_model_path=./GR00T-N1.7-LIBERO/libero_spatial \
|
|
--policy.embodiment_tag=libero_sim \
|
|
--env.type=libero \
|
|
--env.task=libero_spatial \
|
|
--eval.n_episodes=50
|
|
```
|
|
|
|
Use `eval.n_episodes >= 50` per suite when reporting success rates.
|
|
|
|
### Evaluate in your hardware setup
|
|
|
|
Once you have trained your model using your parameters you can run inference in your downstream task. Follow the instructions in [Policy Deployment (lerobot-rollout)](./inference). For example:
|
|
|
|
```bash
|
|
lerobot-rollout\
|
|
--strategy.type=sentry \
|
|
--strategy.upload_every_n_episodes=5 \
|
|
--robot.type=bi_so_follower \
|
|
--robot.left_arm_port=/dev/ttyACM1 \
|
|
--robot.right_arm_port=/dev/ttyACM0 \
|
|
--robot.id=bimanual_follower \
|
|
--robot.cameras='{ right: {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
|
|
left: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
|
|
top: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
|
|
}' \
|
|
--display_data=true \
|
|
--dataset.repo_id=<user>/eval_groot-bimanual \
|
|
--dataset.single_task="Grab and handover the red cube to the other arm" \
|
|
--dataset.streaming_encoding=true \
|
|
--dataset.encoder_threads=2 \
|
|
# --dataset.camera_encoder.vcodec=auto \
|
|
--policy.path=<user>/groot-bimanual \ # your trained model
|
|
--duration=600
|
|
```
|
|
|
|
## License
|
|
|
|
GR00T N1.7 is released under the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|