Add docs

2026-07-18 15:31:47 +00:00 · 2025-09-16 10:09:42 +02:00
parent d883c78a94
commit 70624da239
5 changed files with 251 additions and 3 deletions
@@ -0,0 +1,109 @@
+# π₀ (Pi0)
+
+π₀ is a **Vision-Language-Action model for general robot control**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
+
+## Model Overview
+
+π₀ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi0). Unlike traditional robots that are narrow specialists programmed for repetitive motions, π₀ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.
+
+### The Vision for Physical Intelligence
+
+As described by Physical Intelligence, while AI has achieved remarkable success in digital domains, from chess-playing to drug discovery, human intelligence still dramatically outpaces AI in the physical world. To paraphrase Moravec's paradox, winning a game of chess represents an "easy" problem for AI, but folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived. π₀ represents a first step toward developing artificial physical intelligence that enables users to simply ask robots to perform any task they want, just like they can with large language models.
+
+### Architecture and Approach
+
+π₀ combines several key innovations:
+
+- **Flow Matching**: Uses a novel method to augment pre-trained VLMs with continuous action outputs via flow matching (a variant of diffusion models)
+- **Cross-Embodiment Training**: Trained on data from 8 distinct robot platforms including UR5e, Bimanual UR5e, Franka, Bimanual Trossen, Bimanual ARX, Mobile Trossen, and Mobile Fibocom
+- **Internet-Scale Pre-training**: Inherits semantic knowledge from a pre-trained 3B parameter Vision-Language Model
+- **High-Frequency Control**: Outputs motor commands at up to 50 Hz for real-time dexterous manipulation
+
+## Installation Requirements
+
+⚠️ **Warning**: This policy requires patching the Hugging Face `transformers` library.
+
+### Prerequisites
+
+1. Ensure you have the exact version installed:
+
+   ```bash
+   pip show transformers
+   ```
+
+   It must be version **4.53.2**.
+
+2. Apply the custom patches:
+   ```bash
+   cp -r ./src/lerobot/policies/pi0_openpi/transformers_replace/* \
+     $(python -c "import transformers, os; print(os.path.dirname(transformers.__file__))")
+   ```
+
+### What the patches do:
+
+- Support the **AdaRMS optimizer**
+- Correctly control the precision of activations
+- Allow the KV cache to be used without updates
+
+**Important Notes:**
+
+- This permanently modifies your `transformers` installation
+- The changes survive reinstalls unless you explicitly remove the patched files or recreate the environment
+
+### Restoring Clean State
+
+To undo the patches and restore a clean state:
+
+```bash
+pip uninstall transformers
+pip install transformers==4.53.2
+```
+
+## Training Data and Capabilities
+
+π₀ is trained on the largest robot interaction dataset to date, combining three key data sources:
+
+1. **Internet-Scale Pre-training**: Vision-language data from the web for semantic understanding
+2. **Open X-Embodiment Dataset**: Open-source robot manipulation datasets
+3. **Physical Intelligence Dataset**: Large and diverse dataset of dexterous tasks across 8 distinct robots
+
+## Usage
+
+To use π₀ in LeRobot, specify the policy type as:
+
+```python
+policy.type=pi0_openpi
+```
+
+## Training
+
+For training π₀, you can use the standard LeRobot training script with the appropriate configuration:
+
+```bash
+python src/lerobot/scripts/train.py \
+    --dataset.repo_id=your_dataset \
+    --policy.type=pi0_openpi \
+    --output_dir=./outputs/pi0_training \
+    --job_name=pi0_training \
+    --policy.pretrained_path=pepijn223/pi0_base_fp32 \
+    --policy.repo_id=your_repo_id \
+    --policy.compile_model=true \
+    --policy.gradient_checkpointing=true \
+    --policy.dtype=bfloat16 \
+    --steps=3000 \
+    --policy.scheduler_decay_steps=3000 \
+    --policy.device=cuda \
+    --batch_size=32
+```
+
+### Key Training Parameters
+
+- **`--policy.compile_model=true`**: Enables model compilation for faster training
+- **`--policy.gradient_checkpointing=true`**: Reduces memory usage significantly during training
+- **`--policy.dtype=bfloat16`**: Use mixed precision training for efficiency
+- **`--policy.pretrained_path=pepijn223/pi0_base_fp32`**: The base π₀.₅ model to finetune, options are: `pepijn223/pi0_base_fp32`, `pepijn223/pi0_libero_fp32`, `pepijn223/pi0_droid_fp32`
+- **`--batch_size=32`**: Batch size for training, adapt this based on your GPU memory
+
+## License
+
+This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).