This commit is contained in:
Pepijn
2025-09-16 10:09:42 +02:00
parent d883c78a94
commit 70624da239
5 changed files with 251 additions and 3 deletions
+5 -1
View File
@@ -22,7 +22,11 @@
title: "Tutorials"
- sections:
- local: smolvla
title: Finetune SmolVLA
title: SmolVLA
- local: pi0
title: π₀ (Pi0)
- local: pi05
title: π₀.₅ (Pi05)
title: "Policies"
- sections:
- local: hope_jr
+109
View File
@@ -0,0 +1,109 @@
# π₀ (Pi0)
π₀ is a **Vision-Language-Action model for general robot control**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
## Model Overview
π₀ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi0). Unlike traditional robots that are narrow specialists programmed for repetitive motions, π₀ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.
### The Vision for Physical Intelligence
As described by Physical Intelligence, while AI has achieved remarkable success in digital domains, from chess-playing to drug discovery, human intelligence still dramatically outpaces AI in the physical world. To paraphrase Moravec's paradox, winning a game of chess represents an "easy" problem for AI, but folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived. π₀ represents a first step toward developing artificial physical intelligence that enables users to simply ask robots to perform any task they want, just like they can with large language models.
### Architecture and Approach
π₀ combines several key innovations:
- **Flow Matching**: Uses a novel method to augment pre-trained VLMs with continuous action outputs via flow matching (a variant of diffusion models)
- **Cross-Embodiment Training**: Trained on data from 8 distinct robot platforms including UR5e, Bimanual UR5e, Franka, Bimanual Trossen, Bimanual ARX, Mobile Trossen, and Mobile Fibocom
- **Internet-Scale Pre-training**: Inherits semantic knowledge from a pre-trained 3B parameter Vision-Language Model
- **High-Frequency Control**: Outputs motor commands at up to 50 Hz for real-time dexterous manipulation
## Installation Requirements
⚠️ **Warning**: This policy requires patching the Hugging Face `transformers` library.
### Prerequisites
1. Ensure you have the exact version installed:
```bash
pip show transformers
```
It must be version **4.53.2**.
2. Apply the custom patches:
```bash
cp -r ./src/lerobot/policies/pi0_openpi/transformers_replace/* \
$(python -c "import transformers, os; print(os.path.dirname(transformers.__file__))")
```
### What the patches do:
- Support the **AdaRMS optimizer**
- Correctly control the precision of activations
- Allow the KV cache to be used without updates
**Important Notes:**
- This permanently modifies your `transformers` installation
- The changes survive reinstalls unless you explicitly remove the patched files or recreate the environment
### Restoring Clean State
To undo the patches and restore a clean state:
```bash
pip uninstall transformers
pip install transformers==4.53.2
```
## Training Data and Capabilities
π₀ is trained on the largest robot interaction dataset to date, combining three key data sources:
1. **Internet-Scale Pre-training**: Vision-language data from the web for semantic understanding
2. **Open X-Embodiment Dataset**: Open-source robot manipulation datasets
3. **Physical Intelligence Dataset**: Large and diverse dataset of dexterous tasks across 8 distinct robots
## Usage
To use π₀ in LeRobot, specify the policy type as:
```python
policy.type=pi0_openpi
```
## Training
For training π₀, you can use the standard LeRobot training script with the appropriate configuration:
```bash
python src/lerobot/scripts/train.py \
--dataset.repo_id=your_dataset \
--policy.type=pi0_openpi \
--output_dir=./outputs/pi0_training \
--job_name=pi0_training \
--policy.pretrained_path=pepijn223/pi0_base_fp32 \
--policy.repo_id=your_repo_id \
--policy.compile_model=true \
--policy.gradient_checkpointing=true \
--policy.dtype=bfloat16 \
--steps=3000 \
--policy.scheduler_decay_steps=3000 \
--policy.device=cuda \
--batch_size=32
```
### Key Training Parameters
- **`--policy.compile_model=true`**: Enables model compilation for faster training
- **`--policy.gradient_checkpointing=true`**: Reduces memory usage significantly during training
- **`--policy.dtype=bfloat16`**: Use mixed precision training for efficiency
- **`--policy.pretrained_path=pepijn223/pi0_base_fp32`**: The base π₀.₅ model to finetune, options are: `pepijn223/pi0_base_fp32`, `pepijn223/pi0_libero_fp32`, `pepijn223/pi0_droid_fp32`
- **`--batch_size=32`**: Batch size for training, adapt this based on your GPU memory
## License
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
+135
View File
@@ -0,0 +1,135 @@
# π₀.₅ (Pi05) Policy
π₀.₅ is a **Vision-Language-Action model with open-world generalization**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
## Model Overview
π₀.₅ represents a significant evolution from π₀, developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi05) to address the biggest challenge in robotics: **open-world generalization**. While robots can perform impressive feats in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
### The Generalization Challenge
As Physical Intelligence explains, the fundamental challenge isn't performing feats of agility or dexterity, but generalization, the ability to correctly perform tasks in new settings with new objects. Consider a robot cleaning different homes: each home has different objects in different places. Generalization must occur at multiple levels:
- **Physical Level**: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
- **Semantic Level**: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), what tools are appropriate for cleaning spills
- **Environmental Level**: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
### Co-Training on Heterogeneous Data
The breakthrough innovation in π₀.₅ is **co-training on heterogeneous data sources**. The model learns from:
1. **Multimodal Web Data**: Image captioning, visual question answering, object detection
2. **Verbal Instructions**: Humans coaching robots through complex tasks step-by-step
3. **Subtask Commands**: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
4. **Cross-Embodiment Robot Data**: Data from various robot platforms with different capabilities
5. **Multi-Environment Data**: Static robots deployed across many different homes
6. **Mobile Manipulation Data**: ~400 hours of mobile robot demonstrations
This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously.
## Installation Requirements
⚠️ **Warning**: This policy requires patching the Hugging Face `transformers` library.
### Prerequisites
1. Ensure you have the exact version installed:
```bash
pip show transformers
```
It must be version **4.53.2**.
2. Apply the custom patches:
```bash
cp -r ./src/lerobot/policies/pi05_openpi/transformers_replace/* \
$(python -c "import transformers, os; print(os.path.dirname(transformers.__file__))")
```
### What the patches do:
- Support the **AdaRMS optimizer**
- Correctly control the precision of activations
- Allow the KV cache to be used without updates
**Important Notes:**
- This permanently modifies your `transformers` installation
- The changes survive reinstalls unless you explicitly remove the patched files or recreate the environment
### Restoring Clean State
To undo the patches and restore a clean state:
```bash
pip uninstall transformers
pip install transformers==4.53.2
```
## Usage
To use π₀.₅ in your LeRobot configuration, specify the policy type as:
```python
policy.type=pi05_openpi
```
## Training
### Training Command Example
Here's a complete training command for finetuning the base π₀.₅ model on your own dataset:
```bash
python src/lerobot/scripts/train.py \
--dataset.repo_id=your_dataset \
--policy.type=pi0_openpi \
--output_dir=./outputs/pi0_training \
--job_name=pi0_training \
--policy.repo_id=pepijn223/pi05_base_fp32 \
--policy.pretrained_path=your_repo_id \
--policy.compile_model=true \
--policy.gradient_checkpointing=true \
--wandb.enable=true \
--policy.dtype=bfloat16 \
--steps=3000 \
--policy.scheduler_decay_steps=3000 \
--policy.device=cuda \
--batch_size=32
```
### Key Training Parameters
- **`--policy.compile_model=true`**: Enables model compilation for faster training
- **`--policy.gradient_checkpointing=true`**: Reduces memory usage significantly during training
- **`--policy.dtype=bfloat16`**: Use mixed precision training for efficiency
- **`--policy.pretrained_path=pepijn223/pi05_base_fp32`**: The base π₀.₅ model to finetune, options are: `pepijn223/pi05_base_fp32`, `pepijn223/pi05_libero_fp32`, `pepijn223/pi05_droid_fp32`
- **`--batch_size=32`**: Batch size for training, adapt this based on your GPU memory
## Performance Results
### Libero Benchmark Results
π₀.₅ has demonstrated strong performance on the Libero benchmark suite:
#### Our Results (LeRobot Implementation)
- **Libero Spatial**: 98.0% success rate
- **Libero Object**: 99.0% success rate
- **Libero Goal**: 97.0% success rate
- **Libero 10**: 93.0% success rate
#### OpenPI Reference Results (30k finetuned)
- **Libero Spatial**: 98.8% success rate
- **Libero Object**: 98.2% success rate
- **Libero Goal**: 98.0% success rate
- **Libero 10**: 92.4% success rate
- **Average**: 96.85% success rate
These results demonstrate π₀.₅'s strong generalization capabilities across diverse robotic manipulation tasks.
## License
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
+1 -1
View File
@@ -1,4 +1,4 @@
# Finetune SmolVLA
# SmolVLA
SmolVLA is Hugging Faces lightweight foundation model for robotics. Designed for easy fine-tuning on LeRobot datasets, it helps accelerate your development!