mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
4dbbcca496
* docs(benchmarks): add benchmark integration guide and standardize benchmark docs Add a comprehensive guide for adding new benchmarks to LeRobot, and refactor the existing LIBERO and Meta-World docs to follow the new standardized template. Made-with: Cursor * docs(benchmarks): clean up adding-benchmarks guide for clarity Rewrite for simpler language, better structure, and easier navigation. Move quick-reference table to the top, fold eval explanation into architecture section, condense the doc template to a bulleted outline. Made-with: Cursor * fix link * fix task count * Update docs/source/adding_benchmarks.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/metaworld.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/adding_benchmarks.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/adding_benchmarks.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/adding_benchmarks.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * docs(benchmarks): add verification checklist to adding-benchmarks guide Made-with: Cursor --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
131 lines
4.9 KiB
Plaintext
131 lines
4.9 KiB
Plaintext
# Meta-World
|
|
|
|
Meta-World is an open-source simulation benchmark for **multi-task and meta reinforcement learning** in continuous-control robotic manipulation. It bundles 50 diverse manipulation tasks using everyday objects and a common tabletop Sawyer arm, providing a standardized playground to test whether algorithms can learn many different tasks and generalize quickly to new ones.
|
|
|
|
- Paper: [Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning](https://arxiv.org/abs/1910.10897)
|
|
- GitHub: [Farama-Foundation/Metaworld](https://github.com/Farama-Foundation/Metaworld)
|
|
- Project website: [metaworld.farama.org](https://metaworld.farama.org)
|
|
|
|

|
|
|
|
## Available tasks
|
|
|
|
Meta-World provides 50 tasks organized into difficulty groups. In LeRobot, you can evaluate on individual tasks, difficulty groups, or the full MT50 suite:
|
|
|
|
| Group | CLI name | Tasks | Description |
|
|
| ---------- | -------------------- | ----- | ------------------------------------------------------ |
|
|
| Easy | `easy` | 28 | Tasks with simple dynamics and single-step goals |
|
|
| Medium | `medium` | 11 | Tasks requiring multi-step reasoning |
|
|
| Hard | `hard` | 6 | Tasks with complex contacts and precise manipulation |
|
|
| Very Hard | `very_hard` | 5 | The most challenging tasks in the suite |
|
|
| MT50 (all) | Comma-separated list | 50 | All 50 tasks — the most challenging multi-task setting |
|
|
|
|
You can also pass individual task names directly (e.g., `assembly-v3`, `dial-turn-v3`).
|
|
|
|
We provide a LeRobot-ready dataset for Meta-World MT50 on the HF Hub: [lerobot/metaworld_mt50](https://huggingface.co/datasets/lerobot/metaworld_mt50). This dataset is formatted for the MT50 evaluation that uses all 50 tasks with fixed object/goal positions and one-hot task vectors for consistency.
|
|
|
|
## Installation
|
|
|
|
After following the LeRobot installation instructions:
|
|
|
|
```bash
|
|
pip install -e ".[metaworld]"
|
|
```
|
|
|
|
<Tip warning={true}>
|
|
If you encounter an `AssertionError: ['human', 'rgb_array', 'depth_array']` when running Meta-World environments, this is a mismatch between Meta-World and your Gymnasium version. Fix it with:
|
|
|
|
```bash
|
|
pip install "gymnasium==1.1.0"
|
|
```
|
|
|
|
</Tip>
|
|
|
|
## Evaluation
|
|
|
|
### Default evaluation (recommended)
|
|
|
|
Evaluate on the medium difficulty split (a good balance of coverage and compute):
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-policy-id" \
|
|
--env.type=metaworld \
|
|
--env.task=medium \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=10
|
|
```
|
|
|
|
### Single-task evaluation
|
|
|
|
Evaluate on a specific task:
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-policy-id" \
|
|
--env.type=metaworld \
|
|
--env.task=assembly-v3 \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=10
|
|
```
|
|
|
|
### Multi-task evaluation
|
|
|
|
Evaluate across multiple tasks or difficulty groups:
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-policy-id" \
|
|
--env.type=metaworld \
|
|
--env.task=assembly-v3,dial-turn-v3,handle-press-side-v3 \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=10
|
|
```
|
|
|
|
- `--env.task` accepts explicit task lists (comma-separated) or difficulty groups (e.g., `easy`, `medium`, `hard`, `very_hard`).
|
|
- `--eval.batch_size` controls how many environments run in parallel.
|
|
- `--eval.n_episodes` sets how many episodes to run per task.
|
|
|
|
### Policy inputs and outputs
|
|
|
|
**Observations:**
|
|
|
|
- `observation.image` — single camera view (`corner2`), 480x480 HWC uint8
|
|
- `observation.state` — 4-dim proprioceptive state (end-effector position + gripper)
|
|
|
|
**Actions:**
|
|
|
|
- Continuous control in `Box(-1, 1, shape=(4,))` — 3D end-effector delta + 1D gripper
|
|
|
|
### Recommended evaluation episodes
|
|
|
|
For reproducible benchmarking, use **10 episodes per task**. For the full MT50 suite this gives 500 total episodes. If you care about generalization, run on the full MT50 — it is intentionally challenging and reveals strengths/weaknesses better than a few narrow tasks.
|
|
|
|
## Training
|
|
|
|
### Example training command
|
|
|
|
Train a SmolVLA policy on a subset of Meta-World tasks:
|
|
|
|
```bash
|
|
lerobot-train \
|
|
--policy.type=smolvla \
|
|
--policy.repo_id=${HF_USER}/metaworld-test \
|
|
--policy.load_vlm_weights=true \
|
|
--dataset.repo_id=lerobot/metaworld_mt50 \
|
|
--env.type=metaworld \
|
|
--env.task=assembly-v3,dial-turn-v3,handle-press-side-v3 \
|
|
--output_dir=./outputs/ \
|
|
--steps=100000 \
|
|
--batch_size=4 \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=1 \
|
|
--eval_freq=1000
|
|
```
|
|
|
|
## Practical tips
|
|
|
|
- Use the one-hot task conditioning for multi-task training (MT10/MT50 conventions) so policies have explicit task context.
|
|
- Inspect the dataset task descriptions and the `info["is_success"]` keys when writing post-processing or logging so your success metrics line up with the benchmark.
|
|
- Adjust `batch_size`, `steps`, and `eval_freq` to match your compute budget.
|