mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
0f1c9b0851
* feat(envs): add RoboTwin 2.0 benchmark integration
- RoboTwinEnvConfig with 4-camera setup (head/front/left_wrist/right_wrist)
- Docker image with SAPIEN, mplib, CuRobo, pytorch3d (Python 3.12)
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robotwin
- RoboTwinProcessorStep for state float32 casting
- Camera rename_map: head_camera/front_camera/left_wrist -> camera1/2/3
* fix(robotwin): re-enable autograd for CuRobo planner warmup and take_action
lerobot_eval wraps the full rollout in torch.no_grad() (lerobot_eval.py:566),
but RoboTwin's setup_demo → load_robot → CuroboPlanner(...) runs
motion_gen.warmup(), which invokes Newton's-method trajectory optimization.
That optimizer calls cost.backward() internally, which raises
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
when autograd is disabled. take_action() hits the same planner path at every
step. Wrap both setup_demo and take_action in torch.enable_grad() so CuRobo's
optimizer can build its computation graph. Policy inference is unaffected —
rollout()'s inner torch.inference_mode() block around select_action() is
untouched, so we still don't allocate grad buffers during policy forward.
* fix(robotwin): read nested get_obs() output and use aloha-agilex camera names
RoboTwin's base_task.get_obs() returns a nested dict:
{"observation": {cam: {"rgb": ..., "intrinsic_matrix": ...}},
"joint_action": {"left_arm": ..., "left_gripper": ...,
"right_arm": ..., "right_gripper": ...,
"vector": np.ndarray},
"endpose": {...}}
Our _get_obs was reading raw["{cam}_rgb"] / raw["{cam}"] and raw["joint_action"]
as if they were flat, so np.asarray(raw["joint_action"], dtype=float64) tripped
on a dict and raised
TypeError: float() argument must be a string or a real number, not 'dict'
Fix:
- Pull images from raw["observation"][cam]["rgb"]
- Pull joint state from raw["joint_action"]["vector"] (the flat array)
- Update the default camera tuple to (head_camera, left_camera, right_camera)
to match RoboTwin's actual wrist-camera names (envs/camera/camera.py:135-151)
* refactor(robotwin): drop defensive dict guards, cache black fallback frame
_get_obs was guarding every dict access with isinstance(..., dict) in case
RoboTwin's get_obs returned something else — but the API contract
(envs/_base_task.py:437) always returns a dict, so the guards were silently
masking real failures behind plausible-looking zero observations. Drop them.
Also:
- Cache a single black fallback frame in __init__ instead of allocating
a fresh np.zeros((H, W, 3), uint8) for every missing camera on every
step — the "camera not exposed" set is static per env.
- Only allocate the zero joint_state on the fallback path (not unconditionally
before the real value overwrites it).
- Replace .flatten() with .ravel() (no copy when already 1-D).
- Fold the nested-dict schema comment and two identical torch.enable_grad()
rationales into a single Autograd section in the class docstring.
- Fix stale `left_wrist` camera name in the observation docstring.
* fix(robotwin): align observation_space dims with D435 camera output
lerobot_eval crashed in gym.vector's SyncVectorEnv.reset with:
ValueError: Output array is the wrong shape
because RoboTwinEnvConfig declared observation_space = (480, 640, 3) but
task_config/demo_clean.yml specifies head_camera_type=D435, which renders
(240, 320, 3). gym.vector.concatenate pre-allocates a buffer from the
declared space, so the first np.stack raises on shape mismatch.
Changes:
- Config defaults now 240×320 (the D435 dims in _camera_config.yml), with
a comment pointing at the source of truth.
- RoboTwinEnv.__init__ accepts observation_height/width as Optional and
falls back to setup_kwargs["head_camera_h/w"] so the env is self-consistent
even if the config is not in sync.
- Config camera_names / features_map use the actual aloha-agilex camera
names (head_camera, left_camera, right_camera). Drops the stale
"front_camera" and "left_wrist"/"right_wrist" entries that never matched
anything RoboTwin exposes.
- CI workflow's rename_map updated to match the new camera names.
* fix(robotwin): expose _max_episode_steps for lerobot_eval.rollout
rollout() does `env.call("_max_episode_steps")` (lerobot_eval.py:157) to
know when to stop stepping. LiberoEnv and MetaworldEnv set this attribute;
RoboTwinEnv was tracking the limit under `episode_length` only, so the call
raised AttributeError once CuRobo finished warming up.
* fix(robotwin): install av-dep so lerobot_eval can write rollout MP4s
write_video (utils/io_utils.py:53) lazily imports PyAV via require_package
and raises silently inside the video-writing thread when the extra is not
installed — so the eval itself succeeds with pc_success=100 but no MP4
ever lands in videos/, and the artifact upload reports "No files were
found". Add av-dep to the install line (same pattern as the RoboMME image).
* feat(robotwin): eval 5 diverse tasks per CI run with NL descriptions
Widen the smoke eval from a single task (beat_block_hammer) to five:
click_bell, handover_block, open_laptop, stack_blocks_two on top of the
original. Each gets its own rollout video in videos/<task>_0/ so the
dashboard can surface visually distinct behaviours.
extract_task_descriptions.py now has a RoboTwin branch that reads
`description/task_instruction/<task>.json` (already shipped in the clone
at /opt/robotwin) and pulls the `full_description` field. CI cds into
the clone before invoking the script so the relative path resolves.
parse_eval_metrics.py is invoked with the same 5-task list so the
metrics.json embeds one entry per task.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
* refactor(robotwin): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/libero_plus/robomme pattern: start from the
nightly GPU image (apt deps, python, uv, venv, lerobot[all] already
there) and layer on only what RoboTwin 2.0 uniquely needs —
cuda-nvcc + cuda-cudart-dev (CuRobo builds from source), Vulkan libs +
NVIDIA ICD (SAPIEN renderer), sapien/mplib/open3d/pytorch3d/curobo
installs, the mplib + sapien upstream patches, and the TianxingChen
asset download.
Drops ~90 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init, base lerobot install). 199 → 110.
Also repoint the docs + env docstring dataset link from
hxma/RoboTwin-LeRobot-v3.0 to the canonical lerobot/robotwin_unified.
* docs(robotwin): add robotwin to _toctree.yml under Benchmarks
doc-builder's TOC integrity check was rejecting the branch because
docs/source/robotwin.mdx existed but wasn't listed in _toctree.yml.
* fix(robotwin): defer YAML lookup and realign tests with current API
__init__ was eagerly calling _load_robotwin_setup_kwargs just to read
head_camera_h/w from the YAML. That import (`from envs import CONFIGS_PATH`)
required a real RoboTwin install, so constructing the env — and thus every
test in tests/envs/test_robotwin.py — blew up with ModuleNotFoundError
on fast-tests where RoboTwin isn't installed.
Replace the eager lookup with DEFAULT_CAMERA_H/W constants (240×320, the
D435 dims baked into task_config/demo_clean.yml). reset() still resolves
the full setup_kwargs lazily — that's fine because reset() is only
called inside the benchmark Docker image where RoboTwin is present.
Also resync the test file with the current env API:
- mock get_obs() as the real nested {"observation": {cam: {"rgb": …}},
"joint_action": {"vector": …}} shape
- patch both _load_robotwin_task and _load_robotwin_setup_kwargs
(_patch_load → _patch_runtime)
- drop `front_camera` / `left_wrist` from assertions — aloha-agilex
exposes head_camera + left_camera + right_camera, not those
- black-frame test now uses left_camera as the missing camera
- setup_demo call check loosened to the caller-provided seed/is_test
bits (full kwargs include the YAML-derived blob)
* fix: integrate PR #3315 review feedback
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- docker: tie patches to pinned versions with removal guidance, remove
unnecessary HF_TOKEN for public dataset, fix hadolint warnings
- docs: fix paper link to arxiv, add teaser image, fix camera names
(4→3 cameras), fix observation dims (480x640→240x320)
* fix(docs): correct RoboTwin 2.0 paper arxiv link
* fix(docs): use correct RoboTwin 2.0 teaser image URL
* fix(docs): use plain markdown image to fix MDX build
* ci(robotwin): smoke-eval 10 tasks instead of 5
Broader coverage on the RoboTwin 2.0 benchmark CI job: bump the smoke
eval from 5 tasks to 10 (one episode each). Added tasks are all drawn
from ROBOTWIN_TASKS and mirror the shape/complexity of the existing
set (simple single-object or single-fixture manipulations).
Tasks now run: beat_block_hammer, click_bell, handover_block,
open_laptop, stack_blocks_two, click_alarmclock, close_laptop,
close_microwave, open_microwave, place_block.
`parse_eval_metrics.py` reads `overall` for multi-task runs so no
parser change is needed. Bumped the step name and the metrics label
to reflect the 10-task layout.
* fix(ci): swap 4 broken RoboTwin tasks in smoke eval
The smoke eval hit two upstream issues:
- `open_laptop`: bug in OpenMOSS/RoboTwin main — `check_success()` uses
`self.arm_tag`, but that attribute is only set inside `play_once()`
(the scripted-expert path). During eval `take_action()` calls
`check_success()` directly, hitting `AttributeError: 'open_laptop'
object has no attribute 'arm_tag'`.
- `close_laptop`, `close_microwave`, `place_block`: not present in
upstream RoboTwin `envs/` at all — our ROBOTWIN_TASKS tuple drifted
from upstream and these names leaked into CI.
Replace the four broken tasks with upstream-confirmed equivalents
that exist both in ROBOTWIN_TASKS and in RoboTwin's `envs/`:
`adjust_bottle`, `lift_pot`, `stamp_seal`, `turn_switch`.
New 10-task smoke set: beat_block_hammer, click_bell, handover_block,
stack_blocks_two, click_alarmclock, open_microwave, adjust_bottle,
lift_pot, stamp_seal, turn_switch.
* fix(robotwin): sync ROBOTWIN_TASKS + doc with upstream (50 tasks)
The local ROBOTWIN_TASKS tuple drifted from upstream
RoboTwin-Platform/RoboTwin. Users passing names like `close_laptop`,
`close_microwave`, `dump_bin`, `place_block`, `pour_water`,
`fold_cloth`, etc. got past our validator (the names were in the
tuple) but then crashed inside robosuite with a confusing error,
because those tasks don't exist in upstream `envs/`.
- Replace ROBOTWIN_TASKS with a verbatim mirror of upstream's
`envs/` directory: 50 tasks as of main (was 60 with many
stale entries). Added a `gh api`-based one-liner comment so
future bumps are mechanical.
- Update the `60 tasks` claims in robotwin.mdx and
RoboTwinEnvConfig's docstring to `50`.
- Replace the stale example-task table in robotwin.mdx with ten
upstream-confirmed examples, and flag `open_laptop` as
temporarily broken (its `check_success()` uses `self.arm_tag`
which is only set inside `play_once()`; eval-mode callers hit
AttributeError).
- Rebuild the "Full benchmark" command with the actual 50-task
list, omitting `open_laptop`.
* test(robotwin): lower task-count floor from 60 to 50
ROBOTWIN_TASKS was trimmed to 50 tasks (see comment in
`src/lerobot/envs/robotwin.py:48`), but the assertion still
required ≥60, causing CI failures. Align the test with the
current upstream task count.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch.
* ci: gate Docker Hub login on secret availability
* fix: integrate PR #3315 review feedback
- envs(robotwin): default `observation_height/width` in
`create_robotwin_envs` to `DEFAULT_CAMERA_H/W` (240/320) so they
match the D435 dims baked into `task_config/demo_clean.yml`.
- envs(robotwin): resolve `task_config/demo_clean.yml` via
`CONFIGS_PATH` instead of a cwd-relative path; works regardless
of where `lerobot-eval` is invoked.
- envs(robotwin): replace `print()` calls in `create_robotwin_envs`
with `logger.info(...)` (module-level `logger = logging.getLogger`).
- envs(robotwin): use `_LazyAsyncVectorEnv` for the async path so
async workers start lazily (matches LIBERO / RoboCasa / VLABench).
- envs(robotwin): cast `agent_pos` space + joint-state output to
float32 end-to-end (was mixed float64/float32).
- envs(configs): use the existing `_make_vec_env_cls(use_async,
n_envs)` helper in `RoboTwinEnvConfig.create_envs`; drop the
`get_env_processors` override so RoboTwin uses the identity
processor inherited from `EnvConfig`.
- processor: delete `RoboTwinProcessorStep` — the float32 cast now
happens in the wrapper itself, so the processor is redundant.
- tests: drop the `TestRoboTwinProcessorStep` suite; update the
mock obs fixture to use float32 `joint_action.vector`.
- ci: hoist `ROBOTWIN_POLICY` and `ROBOTWIN_TASKS` to job-level
env vars so the task list and policy aren't duplicated across
eval / extract / parse steps.
- docker: pin RoboTwin + CuRobo upstream clones to commit SHAs
(`RoboTwin@0aeea2d6`, `curobo@ca941586`) for reproducibility.
224 lines
8.9 KiB
Plaintext
224 lines
8.9 KiB
Plaintext
# RoboTwin 2.0
|
|
|
|
RoboTwin 2.0 is a **large-scale dual-arm manipulation benchmark** built on the SAPIEN physics engine. It provides a standardized evaluation protocol for bimanual robotic policies across 50 tasks (as of upstream `main`) with strong domain randomization (clutter, lighting, background, tabletop height, and language instructions).
|
|
|
|
- Paper: [RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation](https://arxiv.org/abs/2506.18088)
|
|
- GitHub: [RoboTwin-Platform/RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin)
|
|
- Leaderboard: [robotwin-platform.github.io/leaderboard](https://robotwin-platform.github.io/leaderboard)
|
|
- Dataset: [lerobot/robotwin_unified](https://huggingface.co/datasets/lerobot/robotwin_unified)
|
|
|
|

|
|
|
|
## Overview
|
|
|
|
| Property | Value |
|
|
| ------------- | -------------------------------------------------------- |
|
|
| Tasks | 50 dual-arm manipulation tasks |
|
|
| Robot | Aloha-AgileX bimanual (14 DOF, 7 per arm) |
|
|
| Action space | 14-dim joint-space, continuous in `[-1, 1]` |
|
|
| Cameras | `head_camera`, `left_camera`, `right_camera` |
|
|
| Simulator | SAPIEN (not MuJoCo) |
|
|
| Eval protocol | 100 episodes/task, 50 demo_clean demonstrations |
|
|
| Eval settings | **Easy** (`demo_clean`) and **Hard** (`demo_randomized`) |
|
|
|
|
## Available tasks
|
|
|
|
RoboTwin 2.0 ships 50 dual-arm manipulation tasks in its upstream `envs/` directory. The canonical list is the `ROBOTWIN_TASKS` tuple in `src/lerobot/envs/robotwin.py`, mirrored verbatim from the upstream repo. Example tasks:
|
|
|
|
| Task | CLI name | Category |
|
|
| ------------------------ | ------------------------ | ----------------- |
|
|
| Beat block with hammer | `beat_block_hammer` | Tool use |
|
|
| Click bell / alarm clock | `click_bell` | Precision press |
|
|
| Stack blocks (2 / 3) | `stack_blocks_two/three` | Stacking |
|
|
| Stack bowls (2 / 3) | `stack_bowls_two/three` | Stacking |
|
|
| Handover block / mic | `handover_block` | Bimanual coord. |
|
|
| Lift pot | `lift_pot` | Bimanual lift |
|
|
| Shake bottle | `shake_bottle` | Continuous motion |
|
|
| Turn switch | `turn_switch` | Articulated obj |
|
|
| Stamp seal | `stamp_seal` | Precision place |
|
|
| Scan object | `scan_object` | Mobile manip. |
|
|
|
|
Pass a comma-separated list to `--env.task` to run multiple tasks in a single eval sweep.
|
|
|
|
<Tip warning={true}>
|
|
`open_laptop` is currently broken upstream (its `check_success()` uses
|
|
`self.arm_tag`, which is only set inside the scripted-expert `play_once()`
|
|
path and therefore unavailable during normal policy eval). Avoid it until the
|
|
upstream bug is fixed, or patch the task to default `self.arm_tag = "left"` in
|
|
`load_actors()`.
|
|
</Tip>
|
|
|
|
## Dataset
|
|
|
|
The RoboTwin 2.0 dataset is available in **LeRobot v3.0 format** on the Hugging Face Hub:
|
|
|
|
```
|
|
lerobot/robotwin_unified
|
|
```
|
|
|
|
It contains over 100,000 pre-collected trajectories across all 50 tasks (79.6 GB, Apache 2.0 license). No format conversion is needed — it is already in the correct LeRobot v3.0 schema with video observations and action labels.
|
|
|
|
You can load it directly with the HF Datasets library:
|
|
|
|
```python
|
|
from datasets import load_dataset
|
|
|
|
ds = load_dataset("lerobot/robotwin_unified", split="train")
|
|
```
|
|
|
|
## Installation
|
|
|
|
RoboTwin 2.0 requires **Linux** with an NVIDIA GPU (CUDA 12.1 recommended). Installation takes approximately 20 minutes.
|
|
|
|
### 1. Create a conda environment
|
|
|
|
```bash
|
|
conda create -n robotwin python=3.10 -y
|
|
conda activate robotwin
|
|
```
|
|
|
|
### 2. Install LeRobot
|
|
|
|
```bash
|
|
git clone https://github.com/huggingface/lerobot.git
|
|
cd lerobot
|
|
pip install -e "."
|
|
```
|
|
|
|
### 3. Install RoboTwin 2.0
|
|
|
|
```bash
|
|
git clone https://github.com/RoboTwin-Platform/RoboTwin.git
|
|
cd RoboTwin
|
|
bash script/_install.sh
|
|
bash script/_download_assets.sh
|
|
```
|
|
|
|
The install script handles all Python dependencies including SAPIEN, CuRobo, mplib, and pytorch3d.
|
|
|
|
<Tip warning={true}>
|
|
If the automated install fails, install manually:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
|
|
cd envs && git clone https://github.com/NVlabs/curobo.git && cd curobo
|
|
pip install -e . --no-build-isolation
|
|
```
|
|
|
|
Then apply the required mplib fix: in `mplib/planner.py` line 807, remove `or collide` from the conditional.
|
|
|
|
</Tip>
|
|
|
|
### 4. Add RoboTwin to PYTHONPATH
|
|
|
|
The RoboTwin task modules must be importable by LeRobot. From within the `RoboTwin/` directory:
|
|
|
|
```bash
|
|
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
|
|
```
|
|
|
|
Add this to your shell profile to make it permanent.
|
|
|
|
## Evaluation
|
|
|
|
### Standard evaluation (recommended)
|
|
|
|
Evaluate a policy on a single task with the official protocol (100 episodes):
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-hf-policy-id" \
|
|
--env.type=robotwin \
|
|
--env.task=beat_block_hammer \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=100
|
|
```
|
|
|
|
### Single-task quick check
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-hf-policy-id" \
|
|
--env.type=robotwin \
|
|
--env.task=beat_block_hammer \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=5
|
|
```
|
|
|
|
### Multi-task sweep
|
|
|
|
Evaluate on several tasks in one run:
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-hf-policy-id" \
|
|
--env.type=robotwin \
|
|
--env.task=beat_block_hammer,click_bell,handover_block,stack_blocks_two \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=100
|
|
```
|
|
|
|
### Full benchmark (all 50 tasks)
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-hf-policy-id" \
|
|
--env.type=robotwin \
|
|
--env.task=adjust_bottle,beat_block_hammer,blocks_ranking_rgb,blocks_ranking_size,click_alarmclock,click_bell,dump_bin_bigbin,grab_roller,handover_block,handover_mic,hanging_mug,lift_pot,move_can_pot,move_pillbottle_pad,move_playingcard_away,move_stapler_pad,open_microwave,pick_diverse_bottles,pick_dual_bottles,place_a2b_left,place_a2b_right,place_bread_basket,place_bread_skillet,place_burger_fries,place_can_basket,place_cans_plasticbox,place_container_plate,place_dual_shoes,place_empty_cup,place_fan,place_mouse_pad,place_object_basket,place_object_scale,place_object_stand,place_phone_stand,place_shoe,press_stapler,put_bottles_dustbin,put_object_cabinet,rotate_qrcode,scan_object,shake_bottle,shake_bottle_horizontally,stack_blocks_three,stack_blocks_two,stack_bowls_three,stack_bowls_two,stamp_seal,turn_switch \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=100
|
|
```
|
|
|
|
<Tip>
|
|
`open_laptop` is intentionally omitted above because of the upstream
|
|
`self.arm_tag` bug (see the **Available tasks** section). Re-add it once the
|
|
upstream fix lands.
|
|
</Tip>
|
|
|
|
## Camera configuration
|
|
|
|
By default, all three cameras are included:
|
|
|
|
| Camera key | Description |
|
|
| -------------- | ------------------------------ |
|
|
| `head_camera` | Torso-mounted overhead view |
|
|
| `left_camera` | Left arm wrist-mounted camera |
|
|
| `right_camera` | Right arm wrist-mounted camera |
|
|
|
|
To use a subset of cameras, override `--env.camera_names`:
|
|
|
|
```bash
|
|
lerobot-eval \
|
|
--policy.path="your-hf-policy-id" \
|
|
--env.type=robotwin \
|
|
--env.task=beat_block_hammer \
|
|
--env.camera_names="head_camera,left_camera" \
|
|
--eval.batch_size=1 \
|
|
--eval.n_episodes=10
|
|
```
|
|
|
|
## Environment config reference
|
|
|
|
Key parameters for `RoboTwinEnvConfig`:
|
|
|
|
| Parameter | Default | Description |
|
|
| -------------------- | ---------------------------------------- | ---------------------------------- |
|
|
| `task` | `"beat_block_hammer"` | Comma-separated task name(s) |
|
|
| `fps` | `25` | Simulation FPS |
|
|
| `episode_length` | `300` | Max steps per episode |
|
|
| `obs_type` | `"pixels_agent_pos"` | `"pixels"` or `"pixels_agent_pos"` |
|
|
| `camera_names` | `"head_camera,left_camera,right_camera"` | Comma-separated active cameras |
|
|
| `observation_height` | `240` | Camera pixel height |
|
|
| `observation_width` | `320` | Camera pixel width |
|
|
|
|
## Leaderboard submission
|
|
|
|
Results can be submitted to the [RoboTwin 2.0 leaderboard](https://robotwin-platform.github.io/leaderboard). The official protocol requires:
|
|
|
|
- Training on 50 `demo_clean` demonstrations per task
|
|
- Evaluating 100 episodes per task
|
|
- Reporting success rate separately for **Easy** (`demo_clean`) and **Hard** (`demo_randomized`) settings
|
|
|
|
For submission instructions, refer to the [RoboTwin 2.0 documentation](https://robotwin-platform.github.io/doc/).
|