mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 17:20:05 +00:00
Add docs
This commit is contained in:
@@ -22,7 +22,11 @@
|
||||
title: "Tutorials"
|
||||
- sections:
|
||||
- local: smolvla
|
||||
title: Finetune SmolVLA
|
||||
title: SmolVLA
|
||||
- local: pi0
|
||||
title: π₀ (Pi0)
|
||||
- local: pi05
|
||||
title: π₀.₅ (Pi05)
|
||||
title: "Policies"
|
||||
- sections:
|
||||
- local: hope_jr
|
||||
|
||||
@@ -0,0 +1,109 @@
|
||||
# π₀ (Pi0)
|
||||
|
||||
π₀ is a **Vision-Language-Action model for general robot control**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
|
||||
|
||||
## Model Overview
|
||||
|
||||
π₀ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi0). Unlike traditional robots that are narrow specialists programmed for repetitive motions, π₀ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.
|
||||
|
||||
### The Vision for Physical Intelligence
|
||||
|
||||
As described by Physical Intelligence, while AI has achieved remarkable success in digital domains, from chess-playing to drug discovery, human intelligence still dramatically outpaces AI in the physical world. To paraphrase Moravec's paradox, winning a game of chess represents an "easy" problem for AI, but folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived. π₀ represents a first step toward developing artificial physical intelligence that enables users to simply ask robots to perform any task they want, just like they can with large language models.
|
||||
|
||||
### Architecture and Approach
|
||||
|
||||
π₀ combines several key innovations:
|
||||
|
||||
- **Flow Matching**: Uses a novel method to augment pre-trained VLMs with continuous action outputs via flow matching (a variant of diffusion models)
|
||||
- **Cross-Embodiment Training**: Trained on data from 8 distinct robot platforms including UR5e, Bimanual UR5e, Franka, Bimanual Trossen, Bimanual ARX, Mobile Trossen, and Mobile Fibocom
|
||||
- **Internet-Scale Pre-training**: Inherits semantic knowledge from a pre-trained 3B parameter Vision-Language Model
|
||||
- **High-Frequency Control**: Outputs motor commands at up to 50 Hz for real-time dexterous manipulation
|
||||
|
||||
## Installation Requirements
|
||||
|
||||
⚠️ **Warning**: This policy requires patching the Hugging Face `transformers` library.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. Ensure you have the exact version installed:
|
||||
|
||||
```bash
|
||||
pip show transformers
|
||||
```
|
||||
|
||||
It must be version **4.53.2**.
|
||||
|
||||
2. Apply the custom patches:
|
||||
```bash
|
||||
cp -r ./src/lerobot/policies/pi0_openpi/transformers_replace/* \
|
||||
$(python -c "import transformers, os; print(os.path.dirname(transformers.__file__))")
|
||||
```
|
||||
|
||||
### What the patches do:
|
||||
|
||||
- Support the **AdaRMS optimizer**
|
||||
- Correctly control the precision of activations
|
||||
- Allow the KV cache to be used without updates
|
||||
|
||||
**Important Notes:**
|
||||
|
||||
- This permanently modifies your `transformers` installation
|
||||
- The changes survive reinstalls unless you explicitly remove the patched files or recreate the environment
|
||||
|
||||
### Restoring Clean State
|
||||
|
||||
To undo the patches and restore a clean state:
|
||||
|
||||
```bash
|
||||
pip uninstall transformers
|
||||
pip install transformers==4.53.2
|
||||
```
|
||||
|
||||
## Training Data and Capabilities
|
||||
|
||||
π₀ is trained on the largest robot interaction dataset to date, combining three key data sources:
|
||||
|
||||
1. **Internet-Scale Pre-training**: Vision-language data from the web for semantic understanding
|
||||
2. **Open X-Embodiment Dataset**: Open-source robot manipulation datasets
|
||||
3. **Physical Intelligence Dataset**: Large and diverse dataset of dexterous tasks across 8 distinct robots
|
||||
|
||||
## Usage
|
||||
|
||||
To use π₀ in LeRobot, specify the policy type as:
|
||||
|
||||
```python
|
||||
policy.type=pi0_openpi
|
||||
```
|
||||
|
||||
## Training
|
||||
|
||||
For training π₀, you can use the standard LeRobot training script with the appropriate configuration:
|
||||
|
||||
```bash
|
||||
python src/lerobot/scripts/train.py \
|
||||
--dataset.repo_id=your_dataset \
|
||||
--policy.type=pi0_openpi \
|
||||
--output_dir=./outputs/pi0_training \
|
||||
--job_name=pi0_training \
|
||||
--policy.pretrained_path=pepijn223/pi0_base_fp32 \
|
||||
--policy.repo_id=your_repo_id \
|
||||
--policy.compile_model=true \
|
||||
--policy.gradient_checkpointing=true \
|
||||
--policy.dtype=bfloat16 \
|
||||
--steps=3000 \
|
||||
--policy.scheduler_decay_steps=3000 \
|
||||
--policy.device=cuda \
|
||||
--batch_size=32
|
||||
```
|
||||
|
||||
### Key Training Parameters
|
||||
|
||||
- **`--policy.compile_model=true`**: Enables model compilation for faster training
|
||||
- **`--policy.gradient_checkpointing=true`**: Reduces memory usage significantly during training
|
||||
- **`--policy.dtype=bfloat16`**: Use mixed precision training for efficiency
|
||||
- **`--policy.pretrained_path=pepijn223/pi0_base_fp32`**: The base π₀.₅ model to finetune, options are: `pepijn223/pi0_base_fp32`, `pepijn223/pi0_libero_fp32`, `pepijn223/pi0_droid_fp32`
|
||||
- **`--batch_size=32`**: Batch size for training, adapt this based on your GPU memory
|
||||
|
||||
## License
|
||||
|
||||
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
|
||||
@@ -0,0 +1,135 @@
|
||||
# π₀.₅ (Pi05) Policy
|
||||
|
||||
π₀.₅ is a **Vision-Language-Action model with open-world generalization**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
|
||||
|
||||
## Model Overview
|
||||
|
||||
π₀.₅ represents a significant evolution from π₀, developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi05) to address the biggest challenge in robotics: **open-world generalization**. While robots can perform impressive feats in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
|
||||
|
||||
### The Generalization Challenge
|
||||
|
||||
As Physical Intelligence explains, the fundamental challenge isn't performing feats of agility or dexterity, but generalization, the ability to correctly perform tasks in new settings with new objects. Consider a robot cleaning different homes: each home has different objects in different places. Generalization must occur at multiple levels:
|
||||
|
||||
- **Physical Level**: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
|
||||
- **Semantic Level**: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), what tools are appropriate for cleaning spills
|
||||
- **Environmental Level**: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
|
||||
|
||||
### Co-Training on Heterogeneous Data
|
||||
|
||||
The breakthrough innovation in π₀.₅ is **co-training on heterogeneous data sources**. The model learns from:
|
||||
|
||||
1. **Multimodal Web Data**: Image captioning, visual question answering, object detection
|
||||
2. **Verbal Instructions**: Humans coaching robots through complex tasks step-by-step
|
||||
3. **Subtask Commands**: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
|
||||
4. **Cross-Embodiment Robot Data**: Data from various robot platforms with different capabilities
|
||||
5. **Multi-Environment Data**: Static robots deployed across many different homes
|
||||
6. **Mobile Manipulation Data**: ~400 hours of mobile robot demonstrations
|
||||
|
||||
This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously.
|
||||
|
||||
## Installation Requirements
|
||||
|
||||
⚠️ **Warning**: This policy requires patching the Hugging Face `transformers` library.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. Ensure you have the exact version installed:
|
||||
|
||||
```bash
|
||||
pip show transformers
|
||||
```
|
||||
|
||||
It must be version **4.53.2**.
|
||||
|
||||
2. Apply the custom patches:
|
||||
```bash
|
||||
cp -r ./src/lerobot/policies/pi05_openpi/transformers_replace/* \
|
||||
$(python -c "import transformers, os; print(os.path.dirname(transformers.__file__))")
|
||||
```
|
||||
|
||||
### What the patches do:
|
||||
|
||||
- Support the **AdaRMS optimizer**
|
||||
- Correctly control the precision of activations
|
||||
- Allow the KV cache to be used without updates
|
||||
|
||||
**Important Notes:**
|
||||
|
||||
- This permanently modifies your `transformers` installation
|
||||
- The changes survive reinstalls unless you explicitly remove the patched files or recreate the environment
|
||||
|
||||
### Restoring Clean State
|
||||
|
||||
To undo the patches and restore a clean state:
|
||||
|
||||
```bash
|
||||
pip uninstall transformers
|
||||
pip install transformers==4.53.2
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
To use π₀.₅ in your LeRobot configuration, specify the policy type as:
|
||||
|
||||
```python
|
||||
policy.type=pi05_openpi
|
||||
```
|
||||
|
||||
## Training
|
||||
|
||||
### Training Command Example
|
||||
|
||||
Here's a complete training command for finetuning the base π₀.₅ model on your own dataset:
|
||||
|
||||
```bash
|
||||
python src/lerobot/scripts/train.py \
|
||||
--dataset.repo_id=your_dataset \
|
||||
--policy.type=pi0_openpi \
|
||||
--output_dir=./outputs/pi0_training \
|
||||
--job_name=pi0_training \
|
||||
--policy.repo_id=pepijn223/pi05_base_fp32 \
|
||||
--policy.pretrained_path=your_repo_id \
|
||||
--policy.compile_model=true \
|
||||
--policy.gradient_checkpointing=true \
|
||||
--wandb.enable=true \
|
||||
--policy.dtype=bfloat16 \
|
||||
--steps=3000 \
|
||||
--policy.scheduler_decay_steps=3000 \
|
||||
--policy.device=cuda \
|
||||
--batch_size=32
|
||||
```
|
||||
|
||||
### Key Training Parameters
|
||||
|
||||
- **`--policy.compile_model=true`**: Enables model compilation for faster training
|
||||
- **`--policy.gradient_checkpointing=true`**: Reduces memory usage significantly during training
|
||||
- **`--policy.dtype=bfloat16`**: Use mixed precision training for efficiency
|
||||
- **`--policy.pretrained_path=pepijn223/pi05_base_fp32`**: The base π₀.₅ model to finetune, options are: `pepijn223/pi05_base_fp32`, `pepijn223/pi05_libero_fp32`, `pepijn223/pi05_droid_fp32`
|
||||
- **`--batch_size=32`**: Batch size for training, adapt this based on your GPU memory
|
||||
|
||||
## Performance Results
|
||||
|
||||
### Libero Benchmark Results
|
||||
|
||||
π₀.₅ has demonstrated strong performance on the Libero benchmark suite:
|
||||
|
||||
#### Our Results (LeRobot Implementation)
|
||||
|
||||
- **Libero Spatial**: 98.0% success rate
|
||||
- **Libero Object**: 99.0% success rate
|
||||
- **Libero Goal**: 97.0% success rate
|
||||
- **Libero 10**: 93.0% success rate
|
||||
|
||||
#### OpenPI Reference Results (30k finetuned)
|
||||
|
||||
- **Libero Spatial**: 98.8% success rate
|
||||
- **Libero Object**: 98.2% success rate
|
||||
- **Libero Goal**: 98.0% success rate
|
||||
- **Libero 10**: 92.4% success rate
|
||||
- **Average**: 96.85% success rate
|
||||
|
||||
These results demonstrate π₀.₅'s strong generalization capabilities across diverse robotic manipulation tasks.
|
||||
|
||||
## License
|
||||
|
||||
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
|
||||
@@ -1,4 +1,4 @@
|
||||
# Finetune SmolVLA
|
||||
# SmolVLA
|
||||
|
||||
SmolVLA is Hugging Face’s lightweight foundation model for robotics. Designed for easy fine-tuning on LeRobot datasets, it helps accelerate your development!
|
||||
|
||||
|
||||
Reference in New Issue
Block a user