|
|
|
@@ -41,30 +41,116 @@ For more details, see the [Physical Intelligence π₀ blog post](https://www.ph
|
|
|
|
|
For more details, see the [Physical Intelligence π₀.₅ blog post](https://www.physicalintelligence.company/blog/pi05).
|
|
|
|
|
{% elif model_name == "gaussian_actor" %}
|
|
|
|
|
This is a Gaussian Actor policy (Gaussian policy with a tanh squash) — the policy-side component used by [Soft Actor-Critic (SAC)](https://huggingface.co/papers/1801.01290) and related maximum-entropy continuous-control algorithms.
|
|
|
|
|
{% elif model_name == "pi0_fast" %}
|
|
|
|
|
[π₀-FAST (Pi0-FAST)](https://www.physicalintelligence.company/research/fast) is a Vision-Language-Action model for general robot control, from Physical Intelligence. It models continuous robot actions with autoregressive next-token prediction using FAST (Frequency-space Action Sequence Tokenization), training up to 5x faster than diffusion-based π₀.
|
|
|
|
|
{% elif model_name == "eo1" %}
|
|
|
|
|
[EO-1](https://huggingface.co/papers/2508.21112) is a Vision-Language-Action model for general robot control. It pairs a Qwen2.5-VL backbone for vision-language understanding with a continuous flow-matching action head that denoises action chunks.
|
|
|
|
|
{% elif model_name == "groot" %}
|
|
|
|
|
[GR00T N1.5](https://github.com/NVIDIA/Isaac-GR00T) is an open, cross-embodiment foundation model from NVIDIA for generalized humanoid robot reasoning and skills. It takes language and images as input and uses a flow-matching action transformer to predict actions conditioned on vision, language, and proprioception.
|
|
|
|
|
{% elif model_name == "multi_task_dit" %}
|
|
|
|
|
[Multi-Task Diffusion Transformer (DiT)](https://huggingface.co/papers/2507.05331) extends Diffusion Policy with a large Diffusion Transformer and text + vision conditioning for multi-task robot learning. It supports both diffusion and flow-matching objectives and reaches high dexterity with only ~450M parameters.
|
|
|
|
|
{% elif model_name == "wall_x" %}
|
|
|
|
|
[WALL-OSS](https://huggingface.co/papers/2509.11766) is an open-source foundation model for embodied intelligence from XSquare Robot. Built on Qwen2.5-VL, it uses a tightly-coupled multimodal architecture with flow matching to unify semantic reasoning and high-frequency action generation for cross-embodiment control.
|
|
|
|
|
{% elif model_name == "xvla" %}
|
|
|
|
|
[X-VLA](https://huggingface.co/papers/2510.10274) is a soft-prompted, flow-matching Vision-Language-Action framework that treats each robot or hardware setup as a "task" encoded with a small set of learnable Soft Prompt embeddings, letting a single model reconcile diverse robot morphologies, sensors, and action spaces.
|
|
|
|
|
{% else %}
|
|
|
|
|
_Model type not recognized — please update this template._
|
|
|
|
|
This is a **{{ model_name }}** policy trained with [LeRobot](https://github.com/huggingface/lerobot).
|
|
|
|
|
{% endif %}
|
|
|
|
|
{% set diagrams = {
|
|
|
|
|
"smolvla": "https://cdn-uploads.huggingface.co/production/uploads/640e21ef3c82bd463ee5a76d/aooU0a3DMtYmy_1IWMaIM.png",
|
|
|
|
|
"pi0": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-pi0%20(1).png",
|
|
|
|
|
"pi0_fast": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-pifast.png",
|
|
|
|
|
"eo1": "https://huggingface.co/datasets/HaomingSong/lerobot-documentation-images/resolve/main/lerobot/eo_pipeline.png",
|
|
|
|
|
"groot": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png",
|
|
|
|
|
"wall_x": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/walloss-lerobot-paper.png",
|
|
|
|
|
"xvla": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/xvla-architecture.png"
|
|
|
|
|
} %}
|
|
|
|
|
{% if diagrams.get(model_name) %}
|
|
|
|
|
<p align="center">
|
|
|
|
|
<img src="{{ diagrams[model_name] }}" alt="{{ model_name }} architecture" width="85%"/>
|
|
|
|
|
</p>
|
|
|
|
|
{% endif %}
|
|
|
|
|
|
|
|
|
|
This policy has been trained and pushed to the Hub using [LeRobot](https://github.com/huggingface/lerobot).
|
|
|
|
|
See the full documentation at [LeRobot Docs](https://huggingface.co/docs/lerobot/index).
|
|
|
|
|
{% set policy_docs = {"act": "act", "smolvla": "smolvla", "pi0": "pi0", "pi0_fast": "pi0fast", "pi05": "pi05", "eo1": "eo1", "groot": "groot"} %}
|
|
|
|
|
{% if policy_docs.get(model_name) %}Learn how to train and run it in the [LeRobot {{ model_name }} guide](https://huggingface.co/docs/lerobot/main/en/{{ policy_docs[model_name] }}), or browse the [full documentation](https://huggingface.co/docs/lerobot/index).
|
|
|
|
|
{% else %}See the [full LeRobot documentation](https://huggingface.co/docs/lerobot/index).
|
|
|
|
|
{% endif %}
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
|
|
|
|
- **License:** {{ license | default("\[More Information Needed]", true) }}
|
|
|
|
|
{% if robot_type %}- **Robot type:** `{{ robot_type }}`
|
|
|
|
|
{% endif %}{% if cameras %}- **Cameras:** {% for camera in cameras %}`{{ camera }}`{% if not loop.last %}, {% endif %}{% endfor %}
|
|
|
|
|
{% endif %}
|
|
|
|
|
{% if input_features or output_features %}
|
|
|
|
|
## Inputs & Outputs
|
|
|
|
|
|
|
|
|
|
The policy consumes these observation features and produces these action features.
|
|
|
|
|
{% if input_features %}
|
|
|
|
|
**Inputs**
|
|
|
|
|
|
|
|
|
|
| Feature | Type | Shape |
|
|
|
|
|
| --- | --- | --- |
|
|
|
|
|
{% for name, feature in input_features.items() %}| `{{ name }}` | {{ feature.type.value }} | `{{ feature.shape }}` |
|
|
|
|
|
{% endfor %}{% endif %}{% if output_features %}
|
|
|
|
|
**Outputs**
|
|
|
|
|
|
|
|
|
|
| Feature | Type | Shape |
|
|
|
|
|
| --- | --- | --- |
|
|
|
|
|
{% for name, feature in output_features.items() %}| `{{ name }}` | {{ feature.type.value }} | `{{ feature.shape }}` |
|
|
|
|
|
{% endfor %}{% endif %}{% endif %}
|
|
|
|
|
{% if dataset %}
|
|
|
|
|
## Training Dataset
|
|
|
|
|
|
|
|
|
|
- **Repository:** [{{ dataset.repo_id }}](https://huggingface.co/datasets/{{ dataset.repo_id }})
|
|
|
|
|
- **Episodes:** {{ dataset.episodes }}
|
|
|
|
|
- **Frames:** {{ dataset.frames }}
|
|
|
|
|
- **Frame rate:** {{ dataset.fps }} FPS
|
|
|
|
|
{% if dataset.tasks %}- **Task(s):** {% for task in dataset.tasks %}"{{ task }}"{% if not loop.last %}, {% endif %}{% endfor %}
|
|
|
|
|
{% endif %}
|
|
|
|
|
<a class="flex" href="https://huggingface.co/spaces/lerobot/visualize_dataset?path={{ dataset.repo_id }}">
|
|
|
|
|
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/badges/resolve/main/visualize-this-dataset-xl.svg"/>
|
|
|
|
|
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/badges/resolve/main/visualize-this-dataset-xl-dark.svg"/>
|
|
|
|
|
</a>
|
|
|
|
|
{% endif %}
|
|
|
|
|
{% if training %}
|
|
|
|
|
## Training Configuration
|
|
|
|
|
|
|
|
|
|
| Setting | Value |
|
|
|
|
|
| --- | --- |
|
|
|
|
|
| Training steps | {{ training.steps }} |
|
|
|
|
|
| Batch size | {{ training.batch_size }} |
|
|
|
|
|
{% if training.optimizer %}| Optimizer | {{ training.optimizer }} |
|
|
|
|
|
{% endif %}{% if training.lr %}| Learning rate | {{ training.lr }} |
|
|
|
|
|
{% endif %}{% if training.seed is not none %}| Seed | {{ training.seed }} |
|
|
|
|
|
{% endif %}| LeRobot version | {{ training.lerobot_version }} |
|
|
|
|
|
{% endif %}
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## How to Get Started with the Model
|
|
|
|
|
|
|
|
|
|
For a complete walkthrough, see the [training guide](https://huggingface.co/docs/lerobot/il_robots#train-a-policy).
|
|
|
|
|
Below is the short version on how to train and run inference/eval:
|
|
|
|
|
New to LeRobot? These guides cover the full workflow:
|
|
|
|
|
|
|
|
|
|
- **[Install LeRobot](https://huggingface.co/docs/lerobot/main/en/installation)** — set up the `lerobot` package.
|
|
|
|
|
- **[Hardware setup](https://huggingface.co/docs/lerobot/main/en/hardware_guide)** — assemble, wire, and calibrate your robot and cameras.
|
|
|
|
|
- **[Record data & train a policy](https://huggingface.co/docs/lerobot/en/il_robots)** — the end-to-end imitation-learning walkthrough.
|
|
|
|
|
- **[CLI cheat-sheet](https://huggingface.co/docs/lerobot/main/en/cheat-sheet)** — quick reference for the `lerobot-*` commands.
|
|
|
|
|
|
|
|
|
|
The short version to train and run this policy:
|
|
|
|
|
|
|
|
|
|
### Train from scratch
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
lerobot-train \
|
|
|
|
|
--dataset.repo_id=${HF_USER}/<dataset> \
|
|
|
|
|
--policy.type=act \
|
|
|
|
|
--policy.type={{ model_name }} \
|
|
|
|
|
--output_dir=outputs/train/<desired_policy_repo_id> \
|
|
|
|
|
--job_name=lerobot_training \
|
|
|
|
|
--policy.device=cuda \
|
|
|
|
|
--policy.repo_id=${HF_USER}/<desired_policy_repo_id>
|
|
|
|
|
--policy.repo_id=${HF_USER}/<desired_policy_repo_id> \
|
|
|
|
|
--wandb.enable=true
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
@@ -73,17 +159,39 @@ _Writes checkpoints to `outputs/train/<desired_policy_repo_id>/checkpoints/`._
|
|
|
|
|
### Evaluate the policy/run inference
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
lerobot-record \
|
|
|
|
|
--robot.type=so100_follower \
|
|
|
|
|
--dataset.repo_id=<hf_user>/eval_<dataset> \
|
|
|
|
|
lerobot-rollout \
|
|
|
|
|
--strategy.type=base \
|
|
|
|
|
--robot.type=<your_robot_type> \
|
|
|
|
|
--robot.port=<your_robot_port> \
|
|
|
|
|
--robot.cameras="{ <camera_1>: {type: opencv, index_or_path: <index_or_path>, width: 640, height: 480, fps: 30}, <camera_2>: {type: opencv, index_or_path: <index_or_path>, width: 640, height: 480, fps: 30}}" \
|
|
|
|
|
--policy.path=<hf_user>/<desired_policy_repo_id> \
|
|
|
|
|
--episodes=10
|
|
|
|
|
--task="<your_task_description>" \
|
|
|
|
|
--duration=60
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Prefix the dataset repo with **eval\_** and supply `--policy.path` pointing to a local or hub checkpoint.
|
|
|
|
|
Replace every `<...>` placeholder with your own values. The `--robot.type`, `--robot.port`, and camera names/indices must match the robot and observation keys this policy was trained on, and `--task` should describe what you want the policy to do.
|
|
|
|
|
|
|
|
|
|
When `--strategy.type=base` is used the script doesn't record the episodes. Skipping duration will make the policy run indefinitely. For more information look at [rollout documentation](https://huggingface.co/docs/lerobot/main/en/inference).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Model Details
|
|
|
|
|
## Evaluation
|
|
|
|
|
|
|
|
|
|
- **License:** {{ license | default("\[More Information Needed]", true) }}
|
|
|
|
|
<!-- Add evaluation results here: success rate, number of trials, and the conditions (robot, task, environment). -->
|
|
|
|
|
|
|
|
|
|
_No evaluation results have been provided for this policy yet._
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
|
|
|
|
If you use this policy, please cite the method linked in the description above, along with LeRobot:
|
|
|
|
|
|
|
|
|
|
```bibtex
|
|
|
|
|
@misc{cadene2024lerobot,
|
|
|
|
|
author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
|
|
|
|
|
title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
|
|
|
|
|
howpublished = "\url{https://github.com/huggingface/lerobot}",
|
|
|
|
|
year = {2024}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|