mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-17 16:27:04 +00:00
202 lines
7.8 KiB
Plaintext
202 lines
7.8 KiB
Plaintext
# GR00T Policy
|
|
|
|
GR00T is an NVIDIA foundation model family for generalized humanoid robot reasoning and skills. It is a cross-embodiment policy that accepts multimodal input, including language, images, and proprioception, to perform manipulation tasks in diverse environments.
|
|
|
|
LeRobot integrates GR00T through the `groot` policy type. The default model family is GR00T N1.5, and GR00T N1.7 can be selected with `policy.model_version=n1.7`.
|
|
|
|
## Model Overview
|
|
|
|
NVIDIA Isaac GR00T N1.5 is an upgraded version of the GR00T N1 foundation model. GR00T N1.7 extends the family with a Cosmos-Reason2/Qwen3-VL backbone and N1.7 checkpoints for SimplerEnv, DROID, and LIBERO.
|
|
|
|
Developers and researchers can post-train GR00T with their own real or synthetic data to adapt it for specific humanoid robots or tasks.
|
|
|
|
GR00T uses pre-trained vision and language encoders with a flow matching action transformer to model a chunk of actions conditioned on vision, language, and proprioception.
|
|
|
|
<img
|
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png"
|
|
alt="An overview of GR00T"
|
|
width="80%"
|
|
/>
|
|
|
|
Its strong performance comes from being trained on an expansive and diverse humanoid dataset, which includes:
|
|
|
|
- Real captured data from robots.
|
|
- Synthetic data generated using NVIDIA Isaac GR00T Blueprint.
|
|
- Internet-scale video data.
|
|
|
|
This approach allows the model to be highly adaptable through post-training for specific embodiments, tasks, and environments.
|
|
|
|
## Installation Requirements
|
|
|
|
GR00T is intended for NVIDIA GPU-accelerated systems. The `groot` extra still includes Flash Attention on non-macOS platforms, and Flash Attention needs a compatible PyTorch/CUDA environment before it is installed. Install the dependencies in this order:
|
|
|
|
1. Follow the Environment Setup in the [Installation Guide](./installation). Do not install `lerobot` yet.
|
|
2. Install PyTorch, TorchVision, and the build dependencies used by Flash Attention:
|
|
|
|
```bash
|
|
# Check https://pytorch.org/get-started/locally/ for the right CUDA wheel index for your system.
|
|
pip install "torch>=2.7,<2.12.0" "torchvision>=0.22.0,<0.27.0" \
|
|
--index-url https://download.pytorch.org/whl/cu128
|
|
pip install "ninja>=1.11.1,<2.0.0" "packaging>=24.2,<26.0"
|
|
```
|
|
|
|
3. Install and verify Flash Attention:
|
|
|
|
```bash
|
|
pip install "flash-attn>=2.5.9,<3.0.0" --no-build-isolation
|
|
python -c "import flash_attn; print(f'Flash Attention {flash_attn.__version__} imported successfully')"
|
|
```
|
|
|
|
4. Install LeRobot with the GR00T extra:
|
|
|
|
```bash
|
|
pip install "lerobot[groot]"
|
|
```
|
|
|
|
For a source checkout, use the same order, then install the local package with:
|
|
|
|
```bash
|
|
pip install -e ".[groot]"
|
|
```
|
|
|
|
If your CUDA/PyTorch build needs a different Flash Attention wheel or source build, follow the [Flash Attention project](https://github.com/Dao-AILab/flash-attention) instructions, but keep the same ordering: PyTorch first, Flash Attention next, then `lerobot[groot]`.
|
|
|
|
## Usage
|
|
|
|
To use GR00T N1.5 in your LeRobot configuration, specify the policy type:
|
|
|
|
```bash
|
|
--policy.type=groot
|
|
```
|
|
|
|
To use GR00T N1.7:
|
|
|
|
```bash
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7
|
|
```
|
|
|
|
## Training
|
|
|
|
### Training Command Example
|
|
|
|
Here's a complete training command for finetuning the base GR00T model on your own dataset:
|
|
|
|
```bash
|
|
# Using a multi-GPU setup
|
|
accelerate launch \
|
|
--multi_gpu \
|
|
--num_processes=$NUM_GPUS \
|
|
$(which lerobot-train) \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--save_checkpoint=true \
|
|
--batch_size=$BATCH_SIZE \
|
|
--steps=$NUM_STEPS \
|
|
--save_freq=$SAVE_FREQ \
|
|
--log_freq=$LOG_FREQ \
|
|
--policy.push_to_hub=true \
|
|
--policy.type=groot \
|
|
--policy.repo_id=$REPO_ID \
|
|
--policy.tune_diffusion_model=false \
|
|
--dataset.repo_id=$DATASET_ID \
|
|
--wandb.enable=true \
|
|
--wandb.disable_artifact=true \
|
|
--job_name=$JOB_NAME
|
|
```
|
|
|
|
For N1.7, add:
|
|
|
|
```bash
|
|
--policy.model_version=n1.7
|
|
```
|
|
|
|
## Performance Results
|
|
|
|
### LIBERO Benchmark Results
|
|
|
|
> [!NOTE]
|
|
> Follow the [LIBERO](./libero) setup instructions before running `lerobot-eval`.
|
|
|
|
GR00T has demonstrated strong performance on the LIBERO benchmark suite. To compare and test its LeRobot implementation, we finetuned the GR00T N1.5 model for 30k steps on the LIBERO dataset and compared the results to the GR00T reference results.
|
|
|
|
| Benchmark | LeRobot Implementation | GR00T Reference |
|
|
| ------------------ | ---------------------- | --------------- |
|
|
| **Libero Spatial** | 82.0% | 92.0% |
|
|
| **Libero Object** | 99.0% | 92.0% |
|
|
| **Libero Long** | 82.0% | 76.0% |
|
|
| **Average** | 87.0% | 87.0% |
|
|
|
|
These results demonstrate GR00T's strong generalization capabilities across diverse robotic manipulation tasks. To reproduce these results, follow the instructions in the [LIBERO](./libero) section.
|
|
|
|
### GR00T N1.7 LIBERO Checkpoints
|
|
|
|
NVIDIA publishes GR00T N1.7 LIBERO checkpoints at [`nvidia/GR00T-N1.7-LIBERO`](https://huggingface.co/nvidia/GR00T-N1.7-LIBERO), with one subdirectory per LIBERO suite:
|
|
|
|
| Suite | Checkpoint subdirectory |
|
|
| -------------- | ----------------------- |
|
|
| LIBERO Spatial | `libero_spatial` |
|
|
| LIBERO Object | `libero_object` |
|
|
| LIBERO Goal | `libero_goal` |
|
|
| LIBERO 10 | `libero_10` |
|
|
|
|
Preliminary LeRobot integration results:
|
|
|
|
| Suite | Status | Success rate | n_episodes |
|
|
| -------------- | ------ | -----------: | ---------: |
|
|
| LIBERO Spatial | ✓ | ~95% | XX |
|
|
| LIBERO Object | ✓ | XX% | XX |
|
|
| LIBERO Goal | ✓ | XX% | XX |
|
|
| LIBERO 10 | ✓ | XX% | XX |
|
|
| **Average** | ✓ | **XX%** | **XX** |
|
|
|
|
Replace the `XX` placeholders with final eval artifacts before merge.
|
|
|
|
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
|
|
|
```bash
|
|
huggingface-cli download nvidia/GR00T-N1.7-LIBERO \
|
|
--include "libero_spatial/*" \
|
|
--local-dir ./GR00T-N1.7-LIBERO
|
|
|
|
lerobot-eval \
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7 \
|
|
--policy.base_model_path=./GR00T-N1.7-LIBERO/libero_spatial \
|
|
--policy.embodiment_tag=libero_sim \
|
|
--env.type=libero \
|
|
--env.task=libero_spatial \
|
|
--eval.n_episodes=50
|
|
```
|
|
|
|
Use `eval.n_episodes >= 50` per suite when reporting success rates.
|
|
|
|
### Evaluate in your hardware setup
|
|
|
|
Once you have trained your model using your parameters you can run inference in your downstream task. Follow the instructions in [Policy Deployment (lerobot-rollout)](./inference). For example:
|
|
|
|
```bash
|
|
lerobot-rollout\
|
|
--strategy.type=sentry \
|
|
--strategy.upload_every_n_episodes=5 \
|
|
--robot.type=bi_so_follower \
|
|
--robot.left_arm_port=/dev/ttyACM1 \
|
|
--robot.right_arm_port=/dev/ttyACM0 \
|
|
--robot.id=bimanual_follower \
|
|
--robot.cameras='{ right: {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
|
|
left: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
|
|
top: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
|
|
}' \
|
|
--display_data=true \
|
|
--dataset.repo_id=<user>/eval_groot-bimanual \
|
|
--dataset.single_task="Grab and handover the red cube to the other arm" \
|
|
--dataset.streaming_encoding=true \
|
|
--dataset.encoder_threads=2 \
|
|
# --dataset.camera_encoder.vcodec=auto \
|
|
--policy.path=<user>/groot-bimanual \ # your trained model
|
|
--duration=600
|
|
```
|
|
|
|
## License
|
|
|
|
GR00T N1.5 follows NVIDIA's license terms, consistent with the original [GR00T repository](https://github.com/NVIDIA/Isaac-GR00T). GR00T N1.7 is released under the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|