mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-23 04:30:10 +00:00
feat(envs): add RoboTwin 2.0 benchmark (#3315)
* feat(envs): add RoboTwin 2.0 benchmark integration
- RoboTwinEnvConfig with 4-camera setup (head/front/left_wrist/right_wrist)
- Docker image with SAPIEN, mplib, CuRobo, pytorch3d (Python 3.12)
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robotwin
- RoboTwinProcessorStep for state float32 casting
- Camera rename_map: head_camera/front_camera/left_wrist -> camera1/2/3
* fix(robotwin): re-enable autograd for CuRobo planner warmup and take_action
lerobot_eval wraps the full rollout in torch.no_grad() (lerobot_eval.py:566),
but RoboTwin's setup_demo → load_robot → CuroboPlanner(...) runs
motion_gen.warmup(), which invokes Newton's-method trajectory optimization.
That optimizer calls cost.backward() internally, which raises
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
when autograd is disabled. take_action() hits the same planner path at every
step. Wrap both setup_demo and take_action in torch.enable_grad() so CuRobo's
optimizer can build its computation graph. Policy inference is unaffected —
rollout()'s inner torch.inference_mode() block around select_action() is
untouched, so we still don't allocate grad buffers during policy forward.
* fix(robotwin): read nested get_obs() output and use aloha-agilex camera names
RoboTwin's base_task.get_obs() returns a nested dict:
{"observation": {cam: {"rgb": ..., "intrinsic_matrix": ...}},
"joint_action": {"left_arm": ..., "left_gripper": ...,
"right_arm": ..., "right_gripper": ...,
"vector": np.ndarray},
"endpose": {...}}
Our _get_obs was reading raw["{cam}_rgb"] / raw["{cam}"] and raw["joint_action"]
as if they were flat, so np.asarray(raw["joint_action"], dtype=float64) tripped
on a dict and raised
TypeError: float() argument must be a string or a real number, not 'dict'
Fix:
- Pull images from raw["observation"][cam]["rgb"]
- Pull joint state from raw["joint_action"]["vector"] (the flat array)
- Update the default camera tuple to (head_camera, left_camera, right_camera)
to match RoboTwin's actual wrist-camera names (envs/camera/camera.py:135-151)
* refactor(robotwin): drop defensive dict guards, cache black fallback frame
_get_obs was guarding every dict access with isinstance(..., dict) in case
RoboTwin's get_obs returned something else — but the API contract
(envs/_base_task.py:437) always returns a dict, so the guards were silently
masking real failures behind plausible-looking zero observations. Drop them.
Also:
- Cache a single black fallback frame in __init__ instead of allocating
a fresh np.zeros((H, W, 3), uint8) for every missing camera on every
step — the "camera not exposed" set is static per env.
- Only allocate the zero joint_state on the fallback path (not unconditionally
before the real value overwrites it).
- Replace .flatten() with .ravel() (no copy when already 1-D).
- Fold the nested-dict schema comment and two identical torch.enable_grad()
rationales into a single Autograd section in the class docstring.
- Fix stale `left_wrist` camera name in the observation docstring.
* fix(robotwin): align observation_space dims with D435 camera output
lerobot_eval crashed in gym.vector's SyncVectorEnv.reset with:
ValueError: Output array is the wrong shape
because RoboTwinEnvConfig declared observation_space = (480, 640, 3) but
task_config/demo_clean.yml specifies head_camera_type=D435, which renders
(240, 320, 3). gym.vector.concatenate pre-allocates a buffer from the
declared space, so the first np.stack raises on shape mismatch.
Changes:
- Config defaults now 240×320 (the D435 dims in _camera_config.yml), with
a comment pointing at the source of truth.
- RoboTwinEnv.__init__ accepts observation_height/width as Optional and
falls back to setup_kwargs["head_camera_h/w"] so the env is self-consistent
even if the config is not in sync.
- Config camera_names / features_map use the actual aloha-agilex camera
names (head_camera, left_camera, right_camera). Drops the stale
"front_camera" and "left_wrist"/"right_wrist" entries that never matched
anything RoboTwin exposes.
- CI workflow's rename_map updated to match the new camera names.
* fix(robotwin): expose _max_episode_steps for lerobot_eval.rollout
rollout() does `env.call("_max_episode_steps")` (lerobot_eval.py:157) to
know when to stop stepping. LiberoEnv and MetaworldEnv set this attribute;
RoboTwinEnv was tracking the limit under `episode_length` only, so the call
raised AttributeError once CuRobo finished warming up.
* fix(robotwin): install av-dep so lerobot_eval can write rollout MP4s
write_video (utils/io_utils.py:53) lazily imports PyAV via require_package
and raises silently inside the video-writing thread when the extra is not
installed — so the eval itself succeeds with pc_success=100 but no MP4
ever lands in videos/, and the artifact upload reports "No files were
found". Add av-dep to the install line (same pattern as the RoboMME image).
* feat(robotwin): eval 5 diverse tasks per CI run with NL descriptions
Widen the smoke eval from a single task (beat_block_hammer) to five:
click_bell, handover_block, open_laptop, stack_blocks_two on top of the
original. Each gets its own rollout video in videos/<task>_0/ so the
dashboard can surface visually distinct behaviours.
extract_task_descriptions.py now has a RoboTwin branch that reads
`description/task_instruction/<task>.json` (already shipped in the clone
at /opt/robotwin) and pulls the `full_description` field. CI cds into
the clone before invoking the script so the relative path resolves.
parse_eval_metrics.py is invoked with the same 5-task list so the
metrics.json embeds one entry per task.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
* refactor(robotwin): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/libero_plus/robomme pattern: start from the
nightly GPU image (apt deps, python, uv, venv, lerobot[all] already
there) and layer on only what RoboTwin 2.0 uniquely needs —
cuda-nvcc + cuda-cudart-dev (CuRobo builds from source), Vulkan libs +
NVIDIA ICD (SAPIEN renderer), sapien/mplib/open3d/pytorch3d/curobo
installs, the mplib + sapien upstream patches, and the TianxingChen
asset download.
Drops ~90 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init, base lerobot install). 199 → 110.
Also repoint the docs + env docstring dataset link from
hxma/RoboTwin-LeRobot-v3.0 to the canonical lerobot/robotwin_unified.
* docs(robotwin): add robotwin to _toctree.yml under Benchmarks
doc-builder's TOC integrity check was rejecting the branch because
docs/source/robotwin.mdx existed but wasn't listed in _toctree.yml.
* fix(robotwin): defer YAML lookup and realign tests with current API
__init__ was eagerly calling _load_robotwin_setup_kwargs just to read
head_camera_h/w from the YAML. That import (`from envs import CONFIGS_PATH`)
required a real RoboTwin install, so constructing the env — and thus every
test in tests/envs/test_robotwin.py — blew up with ModuleNotFoundError
on fast-tests where RoboTwin isn't installed.
Replace the eager lookup with DEFAULT_CAMERA_H/W constants (240×320, the
D435 dims baked into task_config/demo_clean.yml). reset() still resolves
the full setup_kwargs lazily — that's fine because reset() is only
called inside the benchmark Docker image where RoboTwin is present.
Also resync the test file with the current env API:
- mock get_obs() as the real nested {"observation": {cam: {"rgb": …}},
"joint_action": {"vector": …}} shape
- patch both _load_robotwin_task and _load_robotwin_setup_kwargs
(_patch_load → _patch_runtime)
- drop `front_camera` / `left_wrist` from assertions — aloha-agilex
exposes head_camera + left_camera + right_camera, not those
- black-frame test now uses left_camera as the missing camera
- setup_demo call check loosened to the caller-provided seed/is_test
bits (full kwargs include the YAML-derived blob)
* fix: integrate PR #3315 review feedback
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- docker: tie patches to pinned versions with removal guidance, remove
unnecessary HF_TOKEN for public dataset, fix hadolint warnings
- docs: fix paper link to arxiv, add teaser image, fix camera names
(4→3 cameras), fix observation dims (480x640→240x320)
* fix(docs): correct RoboTwin 2.0 paper arxiv link
* fix(docs): use correct RoboTwin 2.0 teaser image URL
* fix(docs): use plain markdown image to fix MDX build
* ci(robotwin): smoke-eval 10 tasks instead of 5
Broader coverage on the RoboTwin 2.0 benchmark CI job: bump the smoke
eval from 5 tasks to 10 (one episode each). Added tasks are all drawn
from ROBOTWIN_TASKS and mirror the shape/complexity of the existing
set (simple single-object or single-fixture manipulations).
Tasks now run: beat_block_hammer, click_bell, handover_block,
open_laptop, stack_blocks_two, click_alarmclock, close_laptop,
close_microwave, open_microwave, place_block.
`parse_eval_metrics.py` reads `overall` for multi-task runs so no
parser change is needed. Bumped the step name and the metrics label
to reflect the 10-task layout.
* fix(ci): swap 4 broken RoboTwin tasks in smoke eval
The smoke eval hit two upstream issues:
- `open_laptop`: bug in OpenMOSS/RoboTwin main — `check_success()` uses
`self.arm_tag`, but that attribute is only set inside `play_once()`
(the scripted-expert path). During eval `take_action()` calls
`check_success()` directly, hitting `AttributeError: 'open_laptop'
object has no attribute 'arm_tag'`.
- `close_laptop`, `close_microwave`, `place_block`: not present in
upstream RoboTwin `envs/` at all — our ROBOTWIN_TASKS tuple drifted
from upstream and these names leaked into CI.
Replace the four broken tasks with upstream-confirmed equivalents
that exist both in ROBOTWIN_TASKS and in RoboTwin's `envs/`:
`adjust_bottle`, `lift_pot`, `stamp_seal`, `turn_switch`.
New 10-task smoke set: beat_block_hammer, click_bell, handover_block,
stack_blocks_two, click_alarmclock, open_microwave, adjust_bottle,
lift_pot, stamp_seal, turn_switch.
* fix(robotwin): sync ROBOTWIN_TASKS + doc with upstream (50 tasks)
The local ROBOTWIN_TASKS tuple drifted from upstream
RoboTwin-Platform/RoboTwin. Users passing names like `close_laptop`,
`close_microwave`, `dump_bin`, `place_block`, `pour_water`,
`fold_cloth`, etc. got past our validator (the names were in the
tuple) but then crashed inside robosuite with a confusing error,
because those tasks don't exist in upstream `envs/`.
- Replace ROBOTWIN_TASKS with a verbatim mirror of upstream's
`envs/` directory: 50 tasks as of main (was 60 with many
stale entries). Added a `gh api`-based one-liner comment so
future bumps are mechanical.
- Update the `60 tasks` claims in robotwin.mdx and
RoboTwinEnvConfig's docstring to `50`.
- Replace the stale example-task table in robotwin.mdx with ten
upstream-confirmed examples, and flag `open_laptop` as
temporarily broken (its `check_success()` uses `self.arm_tag`
which is only set inside `play_once()`; eval-mode callers hit
AttributeError).
- Rebuild the "Full benchmark" command with the actual 50-task
list, omitting `open_laptop`.
* test(robotwin): lower task-count floor from 60 to 50
ROBOTWIN_TASKS was trimmed to 50 tasks (see comment in
`src/lerobot/envs/robotwin.py:48`), but the assertion still
required ≥60, causing CI failures. Align the test with the
current upstream task count.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch.
* ci: gate Docker Hub login on secret availability
* fix: integrate PR #3315 review feedback
- envs(robotwin): default `observation_height/width` in
`create_robotwin_envs` to `DEFAULT_CAMERA_H/W` (240/320) so they
match the D435 dims baked into `task_config/demo_clean.yml`.
- envs(robotwin): resolve `task_config/demo_clean.yml` via
`CONFIGS_PATH` instead of a cwd-relative path; works regardless
of where `lerobot-eval` is invoked.
- envs(robotwin): replace `print()` calls in `create_robotwin_envs`
with `logger.info(...)` (module-level `logger = logging.getLogger`).
- envs(robotwin): use `_LazyAsyncVectorEnv` for the async path so
async workers start lazily (matches LIBERO / RoboCasa / VLABench).
- envs(robotwin): cast `agent_pos` space + joint-state output to
float32 end-to-end (was mixed float64/float32).
- envs(configs): use the existing `_make_vec_env_cls(use_async,
n_envs)` helper in `RoboTwinEnvConfig.create_envs`; drop the
`get_env_processors` override so RoboTwin uses the identity
processor inherited from `EnvConfig`.
- processor: delete `RoboTwinProcessorStep` — the float32 cast now
happens in the wrapper itself, so the processor is redundant.
- tests: drop the `TestRoboTwinProcessorStep` suite; update the
mock obs fixture to use float32 `joint_action.vector`.
- ci: hoist `ROBOTWIN_POLICY` and `ROBOTWIN_TASKS` to job-level
env vars so the task list and policy aren't duplicated across
eval / extract / parse steps.
- docker: pin RoboTwin + CuRobo upstream clones to commit SHAs
(`RoboTwin@0aeea2d6`, `curobo@ca941586`) for reproducibility.
This commit is contained in:
@@ -118,7 +118,7 @@ jobs:
|
|||||||
bash -c "
|
bash -c "
|
||||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||||
lerobot-eval \
|
lerobot-eval \
|
||||||
--policy.path=pepijn223/smolvla_libero \
|
--policy.path=lerobot/smolvla_libero \
|
||||||
--env.type=libero \
|
--env.type=libero \
|
||||||
--env.task=libero_spatial \
|
--env.task=libero_spatial \
|
||||||
--eval.batch_size=1 \
|
--eval.batch_size=1 \
|
||||||
@@ -147,7 +147,7 @@ jobs:
|
|||||||
--artifacts-dir /tmp/libero-artifacts \
|
--artifacts-dir /tmp/libero-artifacts \
|
||||||
--env libero \
|
--env libero \
|
||||||
--task libero_spatial \
|
--task libero_spatial \
|
||||||
--policy pepijn223/smolvla_libero
|
--policy lerobot/smolvla_libero
|
||||||
|
|
||||||
- name: Upload Libero rollout video
|
- name: Upload Libero rollout video
|
||||||
if: always()
|
if: always()
|
||||||
@@ -270,7 +270,7 @@ jobs:
|
|||||||
bash -c "
|
bash -c "
|
||||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||||
lerobot-eval \
|
lerobot-eval \
|
||||||
--policy.path=pepijn223/smolvla_metaworld \
|
--policy.path=lerobot/smolvla_metaworld \
|
||||||
--env.type=metaworld \
|
--env.type=metaworld \
|
||||||
--env.task=metaworld-push-v3 \
|
--env.task=metaworld-push-v3 \
|
||||||
--eval.batch_size=1 \
|
--eval.batch_size=1 \
|
||||||
@@ -299,7 +299,7 @@ jobs:
|
|||||||
--artifacts-dir /tmp/metaworld-artifacts \
|
--artifacts-dir /tmp/metaworld-artifacts \
|
||||||
--env metaworld \
|
--env metaworld \
|
||||||
--task metaworld-push-v3 \
|
--task metaworld-push-v3 \
|
||||||
--policy pepijn223/smolvla_metaworld
|
--policy lerobot/smolvla_metaworld
|
||||||
|
|
||||||
- name: Upload MetaWorld rollout video
|
- name: Upload MetaWorld rollout video
|
||||||
if: always()
|
if: always()
|
||||||
@@ -317,6 +317,115 @@ jobs:
|
|||||||
path: /tmp/metaworld-artifacts/metrics.json
|
path: /tmp/metaworld-artifacts/metrics.json
|
||||||
if-no-files-found: warn
|
if-no-files-found: warn
|
||||||
|
|
||||||
|
# ── ROBOTWIN 2.0 ──────────────────────────────────────────────────────────
|
||||||
|
# Isolated image: full RoboTwin 2.0 stack — SAPIEN, mplib, CuRobo,
|
||||||
|
# pytorch3d, + simulation assets (~4 GB).
|
||||||
|
# Build takes ~20 min on first run; subsequent runs hit the layer cache.
|
||||||
|
# Requires an NVIDIA GPU runner with CUDA 12.1 drivers.
|
||||||
|
robotwin-integration-test:
|
||||||
|
name: RoboTwin 2.0 — build image + 1-episode eval
|
||||||
|
runs-on:
|
||||||
|
group: aws-g6-4xlarge-plus
|
||||||
|
env:
|
||||||
|
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
||||||
|
ROBOTWIN_POLICY: lerobot/smolvla_robotwin
|
||||||
|
ROBOTWIN_TASKS: beat_block_hammer,click_bell,handover_block,stack_blocks_two,click_alarmclock,open_microwave,adjust_bottle,lift_pot,stamp_seal,turn_switch
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||||
|
with:
|
||||||
|
persist-credentials: false
|
||||||
|
lfs: true
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
||||||
|
with:
|
||||||
|
cache-binary: false
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
if: ${{ env.DOCKERHUB_USERNAME != '' }}
|
||||||
|
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
|
||||||
|
env:
|
||||||
|
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
|
||||||
|
|
||||||
|
# Build the full-install image: SAPIEN, mplib, CuRobo, pytorch3d +
|
||||||
|
# simulation assets (~4 GB). Layer cache lives in the runner's local
|
||||||
|
# Docker daemon — reused across re-runs on the same machine.
|
||||||
|
- name: Build RoboTwin 2.0 benchmark image
|
||||||
|
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
||||||
|
with:
|
||||||
|
context: .
|
||||||
|
file: docker/Dockerfile.benchmark.robotwin
|
||||||
|
push: false
|
||||||
|
load: true
|
||||||
|
tags: lerobot-benchmark-robotwin:ci
|
||||||
|
cache-from: type=local,src=/tmp/.buildx-cache-robotwin
|
||||||
|
cache-to: type=local,dest=/tmp/.buildx-cache-robotwin,mode=max
|
||||||
|
|
||||||
|
- name: Run RoboTwin 2.0 smoke eval (10 tasks, 1 episode each)
|
||||||
|
if: env.HF_USER_TOKEN != ''
|
||||||
|
run: |
|
||||||
|
# Named container (no --rm) so we can docker cp artifacts out.
|
||||||
|
docker run --name robotwin-eval --gpus all \
|
||||||
|
--shm-size=4g \
|
||||||
|
-e HF_HOME=/tmp/hf \
|
||||||
|
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
||||||
|
-e ROBOTWIN_POLICY="${ROBOTWIN_POLICY}" \
|
||||||
|
-e ROBOTWIN_TASKS="${ROBOTWIN_TASKS}" \
|
||||||
|
lerobot-benchmark-robotwin:ci \
|
||||||
|
bash -c "
|
||||||
|
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||||
|
cd /opt/robotwin && lerobot-eval \
|
||||||
|
--policy.path=\"\$ROBOTWIN_POLICY\" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=\"\$ROBOTWIN_TASKS\" \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=1 \
|
||||||
|
--eval.use_async_envs=false \
|
||||||
|
--policy.device=cuda \
|
||||||
|
'--rename_map={\"observation.images.head_camera\": \"observation.images.camera1\", \"observation.images.left_camera\": \"observation.images.camera2\", \"observation.images.right_camera\": \"observation.images.camera3\"}' \
|
||||||
|
--output_dir=/tmp/eval-artifacts
|
||||||
|
python /lerobot/scripts/ci/extract_task_descriptions.py \
|
||||||
|
--env robotwin \
|
||||||
|
--task \"\$ROBOTWIN_TASKS\" \
|
||||||
|
--output /tmp/eval-artifacts/task_descriptions.json
|
||||||
|
"
|
||||||
|
|
||||||
|
- name: Copy RoboTwin artifacts from container
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
mkdir -p /tmp/robotwin-artifacts
|
||||||
|
docker cp robotwin-eval:/tmp/eval-artifacts/. /tmp/robotwin-artifacts/ 2>/dev/null || true
|
||||||
|
docker rm -f robotwin-eval || true
|
||||||
|
|
||||||
|
- name: Parse RoboTwin eval metrics
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/parse_eval_metrics.py \
|
||||||
|
--artifacts-dir /tmp/robotwin-artifacts \
|
||||||
|
--env robotwin \
|
||||||
|
--task "${ROBOTWIN_TASKS}" \
|
||||||
|
--policy "${ROBOTWIN_POLICY}"
|
||||||
|
|
||||||
|
- name: Upload RoboTwin rollout video
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: robotwin-rollout-video
|
||||||
|
path: /tmp/robotwin-artifacts/videos/
|
||||||
|
if-no-files-found: warn
|
||||||
|
|
||||||
|
- name: Upload RoboTwin eval metrics
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: robotwin-metrics
|
||||||
|
path: /tmp/robotwin-artifacts/metrics.json
|
||||||
|
if-no-files-found: warn
|
||||||
|
|
||||||
# ── ROBOCASA365 ──────────────────────────────────────────────────────────
|
# ── ROBOCASA365 ──────────────────────────────────────────────────────────
|
||||||
# Isolated image: robocasa + robosuite installed manually as editable
|
# Isolated image: robocasa + robosuite installed manually as editable
|
||||||
# clones (no `lerobot[robocasa]` extra — robocasa's setup.py pins
|
# clones (no `lerobot[robocasa]` extra — robocasa's setup.py pins
|
||||||
|
|||||||
@@ -0,0 +1,122 @@
|
|||||||
|
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Benchmark image for RoboTwin 2.0 integration tests.
|
||||||
|
# Extends the nightly GPU image with the RoboTwin simulator stack:
|
||||||
|
# sapien/mplib/pytorch3d + NVlabs CuRobo + embodiments.zip + objects.zip
|
||||||
|
# (~3.96 GB of assets; background_texture.zip ~11 GB skipped for smoke eval).
|
||||||
|
#
|
||||||
|
# Build: docker build -f docker/Dockerfile.benchmark.robotwin -t lerobot-benchmark-robotwin .
|
||||||
|
# Run: docker run --gpus all --rm lerobot-benchmark-robotwin \
|
||||||
|
# lerobot-eval --env.type=robotwin --env.task=beat_block_hammer ...
|
||||||
|
|
||||||
|
FROM huggingface/lerobot-gpu:latest
|
||||||
|
|
||||||
|
ENV NVIDIA_DRIVER_CAPABILITIES=all \
|
||||||
|
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
|
||||||
|
ROBOTWIN_ROOT=/opt/robotwin
|
||||||
|
|
||||||
|
# The nightly base is CUDA -base (no compiler, no Vulkan loader). CuRobo's
|
||||||
|
# `pip install -e .` runs nvcc, and SAPIEN renders via Vulkan — add both.
|
||||||
|
USER root
|
||||||
|
# Pinned upstream SHA for reproducible benchmark runs. Bump when we need
|
||||||
|
# an upstream fix; don't rely on `main` drift.
|
||||||
|
ARG ROBOTWIN_SHA=0aeea2d669c0f8516f4d5785f0aa33ba812c14b4
|
||||||
|
RUN apt-get update \
|
||||||
|
&& apt-get install -y --no-install-recommends \
|
||||||
|
cuda-nvcc-12-4 cuda-cudart-dev-12-4 \
|
||||||
|
libvulkan1 vulkan-tools \
|
||||||
|
&& mkdir -p /usr/share/vulkan/icd.d \
|
||||||
|
&& echo '{"file_format_version":"1.0.0","ICD":{"library_path":"libGLX_nvidia.so.0","api_version":"1.3.0"}}' \
|
||||||
|
> /usr/share/vulkan/icd.d/nvidia_icd.json \
|
||||||
|
&& git clone https://github.com/RoboTwin-Platform/RoboTwin.git ${ROBOTWIN_ROOT} \
|
||||||
|
&& git -C ${ROBOTWIN_ROOT} checkout ${ROBOTWIN_SHA} \
|
||||||
|
&& chown -R user_lerobot:user_lerobot ${ROBOTWIN_ROOT} \
|
||||||
|
&& apt-get clean && rm -rf /var/lib/apt/lists/*
|
||||||
|
USER user_lerobot
|
||||||
|
|
||||||
|
# RoboTwin runtime deps (av is already in the base via [av-dep]).
|
||||||
|
RUN uv pip install --no-cache \
|
||||||
|
"sapien==3.0.0b1" "mplib==0.2.1" "transforms3d==0.4.2" "trimesh==4.4.3" \
|
||||||
|
"open3d==0.19.0" "imageio==2.34.2" termcolor zarr pydantic h5py
|
||||||
|
|
||||||
|
# pytorch3d has no universal wheel; must be built from source (~10 min, cached).
|
||||||
|
RUN uv pip install --no-cache --no-build-isolation \
|
||||||
|
"git+https://github.com/facebookresearch/pytorch3d.git@stable"
|
||||||
|
|
||||||
|
# CuRobo — NVlabs motion generator; TORCH_CUDA_ARCH_LIST must be set or the
|
||||||
|
# build aborts on an empty arch list. Pinned SHA for reproducibility.
|
||||||
|
ARG CUROBO_SHA=ca941586c33b8482ed9c0e74d60f23efd64b516a
|
||||||
|
RUN cd ${ROBOTWIN_ROOT}/envs \
|
||||||
|
&& git clone https://github.com/NVlabs/curobo.git \
|
||||||
|
&& git -C curobo checkout ${CUROBO_SHA} \
|
||||||
|
&& cd curobo \
|
||||||
|
&& TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0" \
|
||||||
|
uv pip install -e . --no-build-isolation --no-cache
|
||||||
|
|
||||||
|
# Upstream patches (mirror RoboTwin's script/_install.sh).
|
||||||
|
# These patches target the exact versions pinned above; re-check when upgrading.
|
||||||
|
# mplib==0.2.1: drop a broken `or collide` clause in planner.py.
|
||||||
|
# Safe to remove once mplib > 0.2.1 ships with the fix upstream.
|
||||||
|
# sapien==3.0.0b1: fix URDF loader encoding + .srdf extension check.
|
||||||
|
# Safe to remove once sapien > 3.0.0b1 ships with the fix upstream.
|
||||||
|
RUN python - <<'EOF'
|
||||||
|
import pathlib, re, site
|
||||||
|
for d in site.getsitepackages():
|
||||||
|
p = pathlib.Path(d) / "mplib" / "planner.py"
|
||||||
|
if p.exists():
|
||||||
|
p.write_text(re.sub(r"\bor collide\b", "", p.read_text(), count=1))
|
||||||
|
print(f"mplib patch applied: {p}")
|
||||||
|
p = pathlib.Path(d) / "sapien" / "wrapper" / "urdf_loader.py"
|
||||||
|
if p.exists():
|
||||||
|
src = p.read_text().replace(
|
||||||
|
"with open(srdf_path) as f:", 'with open(srdf_path, encoding="utf-8") as f:'
|
||||||
|
).replace('"srdf"', '".srdf"')
|
||||||
|
p.write_text(src)
|
||||||
|
print(f"sapien patch applied: {p}")
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Simulation assets from TianxingChen/RoboTwin2.0: embodiments (~220 MB) +
|
||||||
|
# objects (~3.74 GB). background_texture (~11 GB) is intentionally skipped.
|
||||||
|
# The dataset is public — no auth token needed.
|
||||||
|
RUN python - <<'EOF'
|
||||||
|
import os, pathlib, zipfile
|
||||||
|
from huggingface_hub import hf_hub_download
|
||||||
|
|
||||||
|
assets_dir = pathlib.Path(os.environ["ROBOTWIN_ROOT"]) / "assets"
|
||||||
|
assets_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
for fname in ("embodiments.zip", "objects.zip"):
|
||||||
|
local = hf_hub_download(
|
||||||
|
repo_id="TianxingChen/RoboTwin2.0",
|
||||||
|
repo_type="dataset",
|
||||||
|
filename=fname,
|
||||||
|
local_dir=str(assets_dir),
|
||||||
|
)
|
||||||
|
with zipfile.ZipFile(local, "r") as z:
|
||||||
|
z.extractall(str(assets_dir))
|
||||||
|
pathlib.Path(local).unlink()
|
||||||
|
EOF
|
||||||
|
|
||||||
|
WORKDIR ${ROBOTWIN_ROOT}
|
||||||
|
RUN python script/update_embodiment_config_path.py
|
||||||
|
|
||||||
|
ENV PYTHONPATH="${ROBOTWIN_ROOT}:${PYTHONPATH}"
|
||||||
|
|
||||||
|
# Return to the lerobot source directory (set by base image) before overlaying.
|
||||||
|
WORKDIR /lerobot
|
||||||
|
|
||||||
|
# Overlay the PR's source code on top of the nightly image.
|
||||||
|
COPY --chown=user_lerobot:user_lerobot . .
|
||||||
|
|
||||||
|
CMD ["/bin/bash"]
|
||||||
@@ -79,6 +79,8 @@
|
|||||||
title: LIBERO
|
title: LIBERO
|
||||||
- local: metaworld
|
- local: metaworld
|
||||||
title: Meta-World
|
title: Meta-World
|
||||||
|
- local: robotwin
|
||||||
|
title: RoboTwin 2.0
|
||||||
- local: robocasa
|
- local: robocasa
|
||||||
title: RoboCasa365
|
title: RoboCasa365
|
||||||
- local: envhub_isaaclab_arena
|
- local: envhub_isaaclab_arena
|
||||||
|
|||||||
@@ -0,0 +1,223 @@
|
|||||||
|
# RoboTwin 2.0
|
||||||
|
|
||||||
|
RoboTwin 2.0 is a **large-scale dual-arm manipulation benchmark** built on the SAPIEN physics engine. It provides a standardized evaluation protocol for bimanual robotic policies across 50 tasks (as of upstream `main`) with strong domain randomization (clutter, lighting, background, tabletop height, and language instructions).
|
||||||
|
|
||||||
|
- Paper: [RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation](https://arxiv.org/abs/2506.18088)
|
||||||
|
- GitHub: [RoboTwin-Platform/RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin)
|
||||||
|
- Leaderboard: [robotwin-platform.github.io/leaderboard](https://robotwin-platform.github.io/leaderboard)
|
||||||
|
- Dataset: [lerobot/robotwin_unified](https://huggingface.co/datasets/lerobot/robotwin_unified)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
| ------------- | -------------------------------------------------------- |
|
||||||
|
| Tasks | 50 dual-arm manipulation tasks |
|
||||||
|
| Robot | Aloha-AgileX bimanual (14 DOF, 7 per arm) |
|
||||||
|
| Action space | 14-dim joint-space, continuous in `[-1, 1]` |
|
||||||
|
| Cameras | `head_camera`, `left_camera`, `right_camera` |
|
||||||
|
| Simulator | SAPIEN (not MuJoCo) |
|
||||||
|
| Eval protocol | 100 episodes/task, 50 demo_clean demonstrations |
|
||||||
|
| Eval settings | **Easy** (`demo_clean`) and **Hard** (`demo_randomized`) |
|
||||||
|
|
||||||
|
## Available tasks
|
||||||
|
|
||||||
|
RoboTwin 2.0 ships 50 dual-arm manipulation tasks in its upstream `envs/` directory. The canonical list is the `ROBOTWIN_TASKS` tuple in `src/lerobot/envs/robotwin.py`, mirrored verbatim from the upstream repo. Example tasks:
|
||||||
|
|
||||||
|
| Task | CLI name | Category |
|
||||||
|
| ------------------------ | ------------------------ | ----------------- |
|
||||||
|
| Beat block with hammer | `beat_block_hammer` | Tool use |
|
||||||
|
| Click bell / alarm clock | `click_bell` | Precision press |
|
||||||
|
| Stack blocks (2 / 3) | `stack_blocks_two/three` | Stacking |
|
||||||
|
| Stack bowls (2 / 3) | `stack_bowls_two/three` | Stacking |
|
||||||
|
| Handover block / mic | `handover_block` | Bimanual coord. |
|
||||||
|
| Lift pot | `lift_pot` | Bimanual lift |
|
||||||
|
| Shake bottle | `shake_bottle` | Continuous motion |
|
||||||
|
| Turn switch | `turn_switch` | Articulated obj |
|
||||||
|
| Stamp seal | `stamp_seal` | Precision place |
|
||||||
|
| Scan object | `scan_object` | Mobile manip. |
|
||||||
|
|
||||||
|
Pass a comma-separated list to `--env.task` to run multiple tasks in a single eval sweep.
|
||||||
|
|
||||||
|
<Tip warning={true}>
|
||||||
|
`open_laptop` is currently broken upstream (its `check_success()` uses
|
||||||
|
`self.arm_tag`, which is only set inside the scripted-expert `play_once()`
|
||||||
|
path and therefore unavailable during normal policy eval). Avoid it until the
|
||||||
|
upstream bug is fixed, or patch the task to default `self.arm_tag = "left"` in
|
||||||
|
`load_actors()`.
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## Dataset
|
||||||
|
|
||||||
|
The RoboTwin 2.0 dataset is available in **LeRobot v3.0 format** on the Hugging Face Hub:
|
||||||
|
|
||||||
|
```
|
||||||
|
lerobot/robotwin_unified
|
||||||
|
```
|
||||||
|
|
||||||
|
It contains over 100,000 pre-collected trajectories across all 50 tasks (79.6 GB, Apache 2.0 license). No format conversion is needed — it is already in the correct LeRobot v3.0 schema with video observations and action labels.
|
||||||
|
|
||||||
|
You can load it directly with the HF Datasets library:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from datasets import load_dataset
|
||||||
|
|
||||||
|
ds = load_dataset("lerobot/robotwin_unified", split="train")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
RoboTwin 2.0 requires **Linux** with an NVIDIA GPU (CUDA 12.1 recommended). Installation takes approximately 20 minutes.
|
||||||
|
|
||||||
|
### 1. Create a conda environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n robotwin python=3.10 -y
|
||||||
|
conda activate robotwin
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Install LeRobot
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/huggingface/lerobot.git
|
||||||
|
cd lerobot
|
||||||
|
pip install -e "."
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Install RoboTwin 2.0
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/RoboTwin-Platform/RoboTwin.git
|
||||||
|
cd RoboTwin
|
||||||
|
bash script/_install.sh
|
||||||
|
bash script/_download_assets.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The install script handles all Python dependencies including SAPIEN, CuRobo, mplib, and pytorch3d.
|
||||||
|
|
||||||
|
<Tip warning={true}>
|
||||||
|
If the automated install fails, install manually:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
|
||||||
|
cd envs && git clone https://github.com/NVlabs/curobo.git && cd curobo
|
||||||
|
pip install -e . --no-build-isolation
|
||||||
|
```
|
||||||
|
|
||||||
|
Then apply the required mplib fix: in `mplib/planner.py` line 807, remove `or collide` from the conditional.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
### 4. Add RoboTwin to PYTHONPATH
|
||||||
|
|
||||||
|
The RoboTwin task modules must be importable by LeRobot. From within the `RoboTwin/` directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add this to your shell profile to make it permanent.
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
|
||||||
|
### Standard evaluation (recommended)
|
||||||
|
|
||||||
|
Evaluate a policy on a single task with the official protocol (100 episodes):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lerobot-eval \
|
||||||
|
--policy.path="your-hf-policy-id" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=beat_block_hammer \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=100
|
||||||
|
```
|
||||||
|
|
||||||
|
### Single-task quick check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lerobot-eval \
|
||||||
|
--policy.path="your-hf-policy-id" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=beat_block_hammer \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-task sweep
|
||||||
|
|
||||||
|
Evaluate on several tasks in one run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lerobot-eval \
|
||||||
|
--policy.path="your-hf-policy-id" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=beat_block_hammer,click_bell,handover_block,stack_blocks_two \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=100
|
||||||
|
```
|
||||||
|
|
||||||
|
### Full benchmark (all 50 tasks)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lerobot-eval \
|
||||||
|
--policy.path="your-hf-policy-id" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=adjust_bottle,beat_block_hammer,blocks_ranking_rgb,blocks_ranking_size,click_alarmclock,click_bell,dump_bin_bigbin,grab_roller,handover_block,handover_mic,hanging_mug,lift_pot,move_can_pot,move_pillbottle_pad,move_playingcard_away,move_stapler_pad,open_microwave,pick_diverse_bottles,pick_dual_bottles,place_a2b_left,place_a2b_right,place_bread_basket,place_bread_skillet,place_burger_fries,place_can_basket,place_cans_plasticbox,place_container_plate,place_dual_shoes,place_empty_cup,place_fan,place_mouse_pad,place_object_basket,place_object_scale,place_object_stand,place_phone_stand,place_shoe,press_stapler,put_bottles_dustbin,put_object_cabinet,rotate_qrcode,scan_object,shake_bottle,shake_bottle_horizontally,stack_blocks_three,stack_blocks_two,stack_bowls_three,stack_bowls_two,stamp_seal,turn_switch \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=100
|
||||||
|
```
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
`open_laptop` is intentionally omitted above because of the upstream
|
||||||
|
`self.arm_tag` bug (see the **Available tasks** section). Re-add it once the
|
||||||
|
upstream fix lands.
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## Camera configuration
|
||||||
|
|
||||||
|
By default, all three cameras are included:
|
||||||
|
|
||||||
|
| Camera key | Description |
|
||||||
|
| -------------- | ------------------------------ |
|
||||||
|
| `head_camera` | Torso-mounted overhead view |
|
||||||
|
| `left_camera` | Left arm wrist-mounted camera |
|
||||||
|
| `right_camera` | Right arm wrist-mounted camera |
|
||||||
|
|
||||||
|
To use a subset of cameras, override `--env.camera_names`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lerobot-eval \
|
||||||
|
--policy.path="your-hf-policy-id" \
|
||||||
|
--env.type=robotwin \
|
||||||
|
--env.task=beat_block_hammer \
|
||||||
|
--env.camera_names="head_camera,left_camera" \
|
||||||
|
--eval.batch_size=1 \
|
||||||
|
--eval.n_episodes=10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment config reference
|
||||||
|
|
||||||
|
Key parameters for `RoboTwinEnvConfig`:
|
||||||
|
|
||||||
|
| Parameter | Default | Description |
|
||||||
|
| -------------------- | ---------------------------------------- | ---------------------------------- |
|
||||||
|
| `task` | `"beat_block_hammer"` | Comma-separated task name(s) |
|
||||||
|
| `fps` | `25` | Simulation FPS |
|
||||||
|
| `episode_length` | `300` | Max steps per episode |
|
||||||
|
| `obs_type` | `"pixels_agent_pos"` | `"pixels"` or `"pixels_agent_pos"` |
|
||||||
|
| `camera_names` | `"head_camera,left_camera,right_camera"` | Comma-separated active cameras |
|
||||||
|
| `observation_height` | `240` | Camera pixel height |
|
||||||
|
| `observation_width` | `320` | Camera pixel width |
|
||||||
|
|
||||||
|
## Leaderboard submission
|
||||||
|
|
||||||
|
Results can be submitted to the [RoboTwin 2.0 leaderboard](https://robotwin-platform.github.io/leaderboard). The official protocol requires:
|
||||||
|
|
||||||
|
- Training on 50 `demo_clean` demonstrations per task
|
||||||
|
- Evaluating 100 episodes per task
|
||||||
|
- Reporting success rate separately for **Easy** (`demo_clean`) and **Hard** (`demo_randomized`) settings
|
||||||
|
|
||||||
|
For submission instructions, refer to the [RoboTwin 2.0 documentation](https://robotwin-platform.github.io/doc/).
|
||||||
@@ -57,6 +57,24 @@ def _metaworld_descriptions(task_name: str) -> dict[str, str]:
|
|||||||
return {f"{task_name}_0": label}
|
return {f"{task_name}_0": label}
|
||||||
|
|
||||||
|
|
||||||
|
def _robotwin_descriptions(task_names: str) -> dict[str, str]:
|
||||||
|
"""Return descriptions for each requested RoboTwin task. Reads
|
||||||
|
`description/task_instruction/<task>.json` from the RoboTwin clone
|
||||||
|
(cwd is /opt/robotwin in CI). Falls back to the task name if missing."""
|
||||||
|
out: dict[str, str] = {}
|
||||||
|
root = Path("description/task_instruction")
|
||||||
|
for name in (t.strip() for t in task_names.split(",") if t.strip()):
|
||||||
|
desc_file = root / f"{name}.json"
|
||||||
|
desc = name.replace("_", " ")
|
||||||
|
if desc_file.is_file():
|
||||||
|
data = json.loads(desc_file.read_text())
|
||||||
|
full = data.get("full_description") or desc
|
||||||
|
# Strip the schema placeholders ({A}, {a}) — keep the sentence readable.
|
||||||
|
desc = full.replace("<", "").replace(">", "")
|
||||||
|
out[f"{name}_0"] = desc
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
def _robocasa_descriptions(task_spec: str) -> dict[str, str]:
|
def _robocasa_descriptions(task_spec: str) -> dict[str, str]:
|
||||||
"""For each task in the comma-separated list, emit a cleaned-name label.
|
"""For each task in the comma-separated list, emit a cleaned-name label.
|
||||||
|
|
||||||
@@ -87,6 +105,8 @@ def main() -> int:
|
|||||||
descriptions = _libero_descriptions(args.task)
|
descriptions = _libero_descriptions(args.task)
|
||||||
elif args.env == "metaworld":
|
elif args.env == "metaworld":
|
||||||
descriptions = _metaworld_descriptions(args.task)
|
descriptions = _metaworld_descriptions(args.task)
|
||||||
|
elif args.env == "robotwin":
|
||||||
|
descriptions = _robotwin_descriptions(args.task)
|
||||||
elif args.env == "robocasa":
|
elif args.env == "robocasa":
|
||||||
descriptions = _robocasa_descriptions(args.task)
|
descriptions = _robocasa_descriptions(args.task)
|
||||||
else:
|
else:
|
||||||
|
|||||||
@@ -649,3 +649,90 @@ class IsaaclabArenaEnv(HubEnvConfig):
|
|||||||
),
|
),
|
||||||
PolicyProcessorPipeline(steps=[]),
|
PolicyProcessorPipeline(steps=[]),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@EnvConfig.register_subclass("robotwin")
|
||||||
|
@dataclass
|
||||||
|
class RoboTwinEnvConfig(EnvConfig):
|
||||||
|
"""Configuration for RoboTwin 2.0 benchmark environments.
|
||||||
|
|
||||||
|
RoboTwin 2.0 is a dual-arm manipulation benchmark with 50 tasks built on the
|
||||||
|
SAPIEN simulator. The robot is an Aloha-AgileX bimanual platform with 14 DOF
|
||||||
|
(7 per arm). All three cameras are enabled by default.
|
||||||
|
|
||||||
|
See: https://robotwin-platform.github.io
|
||||||
|
Dataset: https://huggingface.co/datasets/lerobot/robotwin_unified
|
||||||
|
"""
|
||||||
|
|
||||||
|
task: str = "beat_block_hammer" # single task or comma-separated list
|
||||||
|
fps: int = 25
|
||||||
|
episode_length: int = 300
|
||||||
|
obs_type: str = "pixels_agent_pos"
|
||||||
|
render_mode: str = "rgb_array"
|
||||||
|
# Available cameras from RoboTwin's aloha-agilex embodiment: head_camera
|
||||||
|
# (torso-mounted) + left_camera / right_camera (wrists).
|
||||||
|
camera_names: str = "head_camera,left_camera,right_camera"
|
||||||
|
# Match the D435 dims in task_config/demo_clean.yml (_camera_config.yml).
|
||||||
|
# Gym's vector-env concatenate pre-allocates buffers of this shape, so it
|
||||||
|
# must equal what SAPIEN actually renders.
|
||||||
|
observation_height: int = 240
|
||||||
|
observation_width: int = 320
|
||||||
|
features: dict[str, PolicyFeature] = field(
|
||||||
|
default_factory=lambda: {
|
||||||
|
ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(14,)),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
features_map: dict[str, str] = field(
|
||||||
|
default_factory=lambda: {
|
||||||
|
ACTION: ACTION,
|
||||||
|
"pixels/head_camera": f"{OBS_IMAGES}.head_camera",
|
||||||
|
"pixels/left_camera": f"{OBS_IMAGES}.left_camera",
|
||||||
|
"pixels/right_camera": f"{OBS_IMAGES}.right_camera",
|
||||||
|
"agent_pos": OBS_STATE,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
cam_list = [c.strip() for c in self.camera_names.split(",") if c.strip()]
|
||||||
|
for cam in cam_list:
|
||||||
|
self.features[f"pixels/{cam}"] = PolicyFeature(
|
||||||
|
type=FeatureType.VISUAL,
|
||||||
|
shape=(self.observation_height, self.observation_width, 3),
|
||||||
|
)
|
||||||
|
# Keep features_map entry if already set (default_factory); add if missing.
|
||||||
|
key = f"pixels/{cam}"
|
||||||
|
if key not in self.features_map:
|
||||||
|
self.features_map[key] = f"{OBS_IMAGES}.{cam}"
|
||||||
|
|
||||||
|
if self.obs_type == "pixels_agent_pos":
|
||||||
|
self.features["agent_pos"] = PolicyFeature(
|
||||||
|
type=FeatureType.STATE,
|
||||||
|
shape=(14,), # 14 DOF: 7 per arm
|
||||||
|
)
|
||||||
|
elif self.obs_type != "pixels":
|
||||||
|
raise ValueError(
|
||||||
|
f"Unsupported obs_type '{self.obs_type}'. "
|
||||||
|
"RoboTwinEnvConfig supports 'pixels' and 'pixels_agent_pos'."
|
||||||
|
)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def gym_kwargs(self) -> dict:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
def create_envs(self, n_envs: int, use_async_envs: bool = True):
|
||||||
|
from lerobot.envs.robotwin import create_robotwin_envs
|
||||||
|
|
||||||
|
if not self.task:
|
||||||
|
raise ValueError("RoboTwinEnvConfig requires `task` to be specified.")
|
||||||
|
|
||||||
|
env_cls = _make_vec_env_cls(use_async_envs, n_envs)
|
||||||
|
cam_list = [c.strip() for c in self.camera_names.split(",") if c.strip()]
|
||||||
|
return create_robotwin_envs(
|
||||||
|
task=self.task,
|
||||||
|
n_envs=n_envs,
|
||||||
|
env_cls=env_cls,
|
||||||
|
camera_names=cam_list,
|
||||||
|
observation_height=self.observation_height,
|
||||||
|
observation_width=self.observation_width,
|
||||||
|
episode_length=self.episode_length,
|
||||||
|
)
|
||||||
|
|||||||
@@ -0,0 +1,488 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import logging
|
||||||
|
from collections import defaultdict
|
||||||
|
from collections.abc import Callable, Sequence
|
||||||
|
from functools import partial
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
from gymnasium import spaces
|
||||||
|
|
||||||
|
from lerobot.types import RobotObservation
|
||||||
|
|
||||||
|
from .utils import _LazyAsyncVectorEnv
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Camera names as used by RoboTwin 2.0. The wrapper appends "_rgb" when looking
|
||||||
|
# up keys in get_obs() output (e.g. "head_camera" → "head_camera_rgb").
|
||||||
|
ROBOTWIN_CAMERA_NAMES: tuple[str, ...] = (
|
||||||
|
"head_camera",
|
||||||
|
"left_camera",
|
||||||
|
"right_camera",
|
||||||
|
)
|
||||||
|
|
||||||
|
ACTION_DIM = 14 # 7 DOF × 2 arms
|
||||||
|
ACTION_LOW = -1.0
|
||||||
|
ACTION_HIGH = 1.0
|
||||||
|
DEFAULT_EPISODE_LENGTH = 300
|
||||||
|
# D435 dims from task_config/_camera_config.yml (what demo_clean.yml selects).
|
||||||
|
DEFAULT_CAMERA_H = 240
|
||||||
|
DEFAULT_CAMERA_W = 320
|
||||||
|
|
||||||
|
# Task list from RoboTwin 2.0's `envs/` directory — mirrors upstream exactly
|
||||||
|
# (50 tasks as of main; earlier revisions had 60 with a different split).
|
||||||
|
# Keep this in sync with:
|
||||||
|
# gh api /repos/RoboTwin-Platform/RoboTwin/contents/envs --paginate \
|
||||||
|
# | jq -r '.[].name' | grep -E '\.py$' | grep -v '^_' | sed 's/\.py$//'
|
||||||
|
ROBOTWIN_TASKS: tuple[str, ...] = (
|
||||||
|
"adjust_bottle",
|
||||||
|
"beat_block_hammer",
|
||||||
|
"blocks_ranking_rgb",
|
||||||
|
"blocks_ranking_size",
|
||||||
|
"click_alarmclock",
|
||||||
|
"click_bell",
|
||||||
|
"dump_bin_bigbin",
|
||||||
|
"grab_roller",
|
||||||
|
"handover_block",
|
||||||
|
"handover_mic",
|
||||||
|
"hanging_mug",
|
||||||
|
"lift_pot",
|
||||||
|
"move_can_pot",
|
||||||
|
"move_pillbottle_pad",
|
||||||
|
"move_playingcard_away",
|
||||||
|
"move_stapler_pad",
|
||||||
|
"open_laptop",
|
||||||
|
"open_microwave",
|
||||||
|
"pick_diverse_bottles",
|
||||||
|
"pick_dual_bottles",
|
||||||
|
"place_a2b_left",
|
||||||
|
"place_a2b_right",
|
||||||
|
"place_bread_basket",
|
||||||
|
"place_bread_skillet",
|
||||||
|
"place_burger_fries",
|
||||||
|
"place_can_basket",
|
||||||
|
"place_cans_plasticbox",
|
||||||
|
"place_container_plate",
|
||||||
|
"place_dual_shoes",
|
||||||
|
"place_empty_cup",
|
||||||
|
"place_fan",
|
||||||
|
"place_mouse_pad",
|
||||||
|
"place_object_basket",
|
||||||
|
"place_object_scale",
|
||||||
|
"place_object_stand",
|
||||||
|
"place_phone_stand",
|
||||||
|
"place_shoe",
|
||||||
|
"press_stapler",
|
||||||
|
"put_bottles_dustbin",
|
||||||
|
"put_object_cabinet",
|
||||||
|
"rotate_qrcode",
|
||||||
|
"scan_object",
|
||||||
|
"shake_bottle",
|
||||||
|
"shake_bottle_horizontally",
|
||||||
|
"stack_blocks_three",
|
||||||
|
"stack_blocks_two",
|
||||||
|
"stack_bowls_three",
|
||||||
|
"stack_bowls_two",
|
||||||
|
"stamp_seal",
|
||||||
|
"turn_switch",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
_ROBOTWIN_SETUP_CACHE: dict[str, dict[str, Any]] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def _load_robotwin_setup_kwargs(task_name: str) -> dict[str, Any]:
|
||||||
|
"""Build the kwargs dict RoboTwin's setup_demo expects.
|
||||||
|
|
||||||
|
Mirrors the config loading done by RoboTwin's ``script/eval_policy.py``:
|
||||||
|
reads ``task_config/demo_clean.yml``, resolves the embodiment file from
|
||||||
|
``_embodiment_config.yml``, loads the robot's own ``config.yml``, and
|
||||||
|
reads camera dimensions from ``_camera_config.yml``.
|
||||||
|
|
||||||
|
Uses ``aloha-agilex`` single-robot dual-arm by default (the only embodiment
|
||||||
|
used by beat_block_hammer and most smoke-test tasks).
|
||||||
|
"""
|
||||||
|
if task_name in _ROBOTWIN_SETUP_CACHE:
|
||||||
|
return dict(_ROBOTWIN_SETUP_CACHE[task_name])
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
import yaml # type: ignore[import-untyped]
|
||||||
|
from envs import CONFIGS_PATH # type: ignore[import-not-found]
|
||||||
|
|
||||||
|
task_config = "demo_clean"
|
||||||
|
with open(os.path.join(CONFIGS_PATH, f"{task_config}.yml"), encoding="utf-8") as f:
|
||||||
|
args = yaml.safe_load(f)
|
||||||
|
|
||||||
|
# Resolve embodiment — demo_clean.yml uses [aloha-agilex] (dual-arm single robot)
|
||||||
|
with open(os.path.join(CONFIGS_PATH, "_embodiment_config.yml"), encoding="utf-8") as f:
|
||||||
|
embodiment_types = yaml.safe_load(f)
|
||||||
|
embodiment = args.get("embodiment", ["aloha-agilex"])
|
||||||
|
if len(embodiment) == 1:
|
||||||
|
robot_file = embodiment_types[embodiment[0]]["file_path"]
|
||||||
|
args["left_robot_file"] = robot_file
|
||||||
|
args["right_robot_file"] = robot_file
|
||||||
|
args["dual_arm_embodied"] = True
|
||||||
|
elif len(embodiment) == 3:
|
||||||
|
args["left_robot_file"] = embodiment_types[embodiment[0]]["file_path"]
|
||||||
|
args["right_robot_file"] = embodiment_types[embodiment[1]]["file_path"]
|
||||||
|
args["embodiment_dis"] = embodiment[2]
|
||||||
|
args["dual_arm_embodied"] = False
|
||||||
|
else:
|
||||||
|
raise ValueError(f"embodiment must have 1 or 3 items, got {len(embodiment)}")
|
||||||
|
|
||||||
|
with open(os.path.join(args["left_robot_file"], "config.yml"), encoding="utf-8") as f:
|
||||||
|
args["left_embodiment_config"] = yaml.safe_load(f)
|
||||||
|
with open(os.path.join(args["right_robot_file"], "config.yml"), encoding="utf-8") as f:
|
||||||
|
args["right_embodiment_config"] = yaml.safe_load(f)
|
||||||
|
|
||||||
|
# Camera dimensions
|
||||||
|
with open(os.path.join(CONFIGS_PATH, "_camera_config.yml"), encoding="utf-8") as f:
|
||||||
|
camera_config = yaml.safe_load(f)
|
||||||
|
head_cam = args["camera"]["head_camera_type"]
|
||||||
|
args["head_camera_h"] = camera_config[head_cam]["h"]
|
||||||
|
args["head_camera_w"] = camera_config[head_cam]["w"]
|
||||||
|
|
||||||
|
# Headless overrides
|
||||||
|
args["render_freq"] = 0
|
||||||
|
args["task_name"] = task_name
|
||||||
|
args["task_config"] = task_config
|
||||||
|
|
||||||
|
_ROBOTWIN_SETUP_CACHE[task_name] = args
|
||||||
|
return dict(args)
|
||||||
|
|
||||||
|
|
||||||
|
def _load_robotwin_task(task_name: str) -> type:
|
||||||
|
"""Dynamically import and return a RoboTwin 2.0 task class.
|
||||||
|
|
||||||
|
RoboTwin tasks live in ``envs/<task_name>.py`` relative to the repository
|
||||||
|
root and are expected to be on ``sys.path`` after installation.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
module = importlib.import_module(f"envs.{task_name}")
|
||||||
|
except ModuleNotFoundError as e:
|
||||||
|
raise ModuleNotFoundError(
|
||||||
|
f"Could not import RoboTwin task '{task_name}'. "
|
||||||
|
"Ensure RoboTwin 2.0 is installed and its 'envs/' directory is on PYTHONPATH. "
|
||||||
|
"See the RoboTwin installation guide: https://robotwin-platform.github.io/doc/usage/robotwin-install.html"
|
||||||
|
) from e
|
||||||
|
task_cls = getattr(module, task_name, None)
|
||||||
|
if task_cls is None:
|
||||||
|
raise AttributeError(f"Task class '{task_name}' not found in envs/{task_name}.py")
|
||||||
|
return task_cls
|
||||||
|
|
||||||
|
|
||||||
|
class RoboTwinEnv(gym.Env):
|
||||||
|
"""Gymnasium wrapper around a single RoboTwin 2.0 task.
|
||||||
|
|
||||||
|
RoboTwin uses a custom SAPIEN-based API (``setup_demo`` / ``get_obs`` /
|
||||||
|
``take_action`` / ``check_success``) rather than the standard gym interface.
|
||||||
|
This class bridges that API to Gymnasium so that ``lerobot-eval`` can drive
|
||||||
|
RoboTwin exactly like LIBERO or Meta-World.
|
||||||
|
|
||||||
|
The underlying SAPIEN environment is created lazily on the first ``reset()``
|
||||||
|
call *inside the worker process*. This is required for
|
||||||
|
``gym.vector.AsyncVectorEnv`` compatibility: SAPIEN allocates EGL/GPU
|
||||||
|
contexts that must not be forked from the parent process.
|
||||||
|
|
||||||
|
Observations
|
||||||
|
------------
|
||||||
|
The ``pixels`` dict uses the raw RoboTwin camera names as keys (e.g.
|
||||||
|
``"head_camera"``, ``"left_camera"``). ``preprocess_observation`` in
|
||||||
|
``envs/utils.py`` then converts these to ``observation.images.<cam>``.
|
||||||
|
|
||||||
|
Actions
|
||||||
|
-------
|
||||||
|
14-dim float32 array in ``[-1, 1]`` (joint-space, 7 DOF per arm).
|
||||||
|
|
||||||
|
Autograd
|
||||||
|
--------
|
||||||
|
``setup_demo`` and ``take_action`` drive CuRobo's Newton trajectory
|
||||||
|
optimizer, which calls ``cost.backward()`` internally. lerobot_eval wraps
|
||||||
|
the rollout in ``torch.no_grad()``, so both call sites re-enable grad.
|
||||||
|
"""
|
||||||
|
|
||||||
|
metadata = {"render_modes": ["rgb_array"], "render_fps": 25}
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
task_name: str,
|
||||||
|
episode_index: int = 0,
|
||||||
|
n_envs: int = 1,
|
||||||
|
camera_names: Sequence[str] = ROBOTWIN_CAMERA_NAMES,
|
||||||
|
observation_height: int | None = None,
|
||||||
|
observation_width: int | None = None,
|
||||||
|
episode_length: int = DEFAULT_EPISODE_LENGTH,
|
||||||
|
render_mode: str = "rgb_array",
|
||||||
|
):
|
||||||
|
super().__init__()
|
||||||
|
self.task_name = task_name
|
||||||
|
self.task = task_name # used by add_envs_task() in utils.py
|
||||||
|
self.task_description = task_name.replace("_", " ")
|
||||||
|
self.episode_index = episode_index
|
||||||
|
self._reset_stride = n_envs
|
||||||
|
self.camera_names = list(camera_names)
|
||||||
|
# Default to D435 dims (the camera type baked into task_config/demo_clean.yml).
|
||||||
|
# The YAML-driven lookup is deferred to reset() so construction doesn't
|
||||||
|
# import RoboTwin's `envs` module — fast-tests run without RoboTwin installed.
|
||||||
|
self.observation_height = observation_height or DEFAULT_CAMERA_H
|
||||||
|
self.observation_width = observation_width or DEFAULT_CAMERA_W
|
||||||
|
self.episode_length = episode_length
|
||||||
|
self._max_episode_steps = episode_length # lerobot_eval.rollout reads this
|
||||||
|
self.render_mode = render_mode
|
||||||
|
|
||||||
|
self._env: Any | None = None # deferred — created on first reset() inside worker
|
||||||
|
self._step_count: int = 0
|
||||||
|
self._black_frame = np.zeros((self.observation_height, self.observation_width, 3), dtype=np.uint8)
|
||||||
|
|
||||||
|
image_spaces = {
|
||||||
|
cam: spaces.Box(
|
||||||
|
low=0,
|
||||||
|
high=255,
|
||||||
|
shape=(self.observation_height, self.observation_width, 3),
|
||||||
|
dtype=np.uint8,
|
||||||
|
)
|
||||||
|
for cam in self.camera_names
|
||||||
|
}
|
||||||
|
self.observation_space = spaces.Dict(
|
||||||
|
{
|
||||||
|
"pixels": spaces.Dict(image_spaces),
|
||||||
|
"agent_pos": spaces.Box(low=-np.inf, high=np.inf, shape=(ACTION_DIM,), dtype=np.float32),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
self.action_space = spaces.Box(
|
||||||
|
low=ACTION_LOW, high=ACTION_HIGH, shape=(ACTION_DIM,), dtype=np.float32
|
||||||
|
)
|
||||||
|
|
||||||
|
def _ensure_env(self) -> None:
|
||||||
|
"""Create the SAPIEN environment on first use.
|
||||||
|
|
||||||
|
Called inside the worker subprocess after fork(), so each worker gets
|
||||||
|
its own EGL/GPU context rather than inheriting a stale one from the
|
||||||
|
parent process (which causes crashes with AsyncVectorEnv).
|
||||||
|
"""
|
||||||
|
if self._env is not None:
|
||||||
|
return
|
||||||
|
task_cls = _load_robotwin_task(self.task_name)
|
||||||
|
self._env = task_cls()
|
||||||
|
|
||||||
|
def _get_obs(self) -> RobotObservation:
|
||||||
|
assert self._env is not None, "_get_obs called before _ensure_env()"
|
||||||
|
raw = self._env.get_obs()
|
||||||
|
cameras_raw = raw.get("observation", {})
|
||||||
|
|
||||||
|
images: dict[str, np.ndarray] = {}
|
||||||
|
for cam in self.camera_names:
|
||||||
|
cam_data = cameras_raw.get(cam)
|
||||||
|
img = cam_data.get("rgb") if cam_data else None
|
||||||
|
if img is None:
|
||||||
|
images[cam] = self._black_frame
|
||||||
|
continue
|
||||||
|
img = np.asarray(img, dtype=np.uint8)
|
||||||
|
if img.ndim == 2:
|
||||||
|
img = np.stack([img, img, img], axis=-1)
|
||||||
|
elif img.shape[-1] != 3:
|
||||||
|
img = img[..., :3]
|
||||||
|
images[cam] = img
|
||||||
|
|
||||||
|
ja = raw.get("joint_action") or {}
|
||||||
|
vec = ja.get("vector")
|
||||||
|
if vec is not None:
|
||||||
|
arr = np.asarray(vec, dtype=np.float32).ravel()
|
||||||
|
joint_state = (
|
||||||
|
arr[:ACTION_DIM] if arr.size >= ACTION_DIM else np.zeros(ACTION_DIM, dtype=np.float32)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
joint_state = np.zeros(ACTION_DIM, dtype=np.float32)
|
||||||
|
|
||||||
|
return {"pixels": images, "agent_pos": joint_state}
|
||||||
|
|
||||||
|
def reset(self, seed: int | None = None, **kwargs) -> tuple[RobotObservation, dict]:
|
||||||
|
self._ensure_env()
|
||||||
|
super().reset(seed=seed)
|
||||||
|
assert self._env is not None # set by _ensure_env() above
|
||||||
|
|
||||||
|
actual_seed = self.episode_index if seed is None else seed
|
||||||
|
setup_kwargs = _load_robotwin_setup_kwargs(self.task_name)
|
||||||
|
setup_kwargs.update(seed=actual_seed, is_test=True)
|
||||||
|
with torch.enable_grad():
|
||||||
|
self._env.setup_demo(**setup_kwargs)
|
||||||
|
self.episode_index += self._reset_stride
|
||||||
|
self._step_count = 0
|
||||||
|
|
||||||
|
obs = self._get_obs()
|
||||||
|
return obs, {"is_success": False, "task": self.task_name}
|
||||||
|
|
||||||
|
def step(self, action: np.ndarray) -> tuple[RobotObservation, float, bool, bool, dict[str, Any]]:
|
||||||
|
assert self._env is not None, "step() called before reset()"
|
||||||
|
if action.ndim != 1 or action.shape[0] != ACTION_DIM:
|
||||||
|
raise ValueError(f"Expected 1-D action of shape ({ACTION_DIM},), got {action.shape}")
|
||||||
|
|
||||||
|
with torch.enable_grad():
|
||||||
|
if hasattr(self._env, "take_action"):
|
||||||
|
self._env.take_action(action)
|
||||||
|
else:
|
||||||
|
self._env.step(action)
|
||||||
|
|
||||||
|
self._step_count += 1
|
||||||
|
|
||||||
|
is_success = bool(getattr(self._env, "eval_success", False))
|
||||||
|
if not is_success and hasattr(self._env, "check_success"):
|
||||||
|
is_success = bool(self._env.check_success())
|
||||||
|
|
||||||
|
obs = self._get_obs()
|
||||||
|
reward = float(is_success)
|
||||||
|
terminated = is_success
|
||||||
|
truncated = self._step_count >= self.episode_length
|
||||||
|
|
||||||
|
info: dict[str, Any] = {
|
||||||
|
"task": self.task_name,
|
||||||
|
"is_success": is_success,
|
||||||
|
"step": self._step_count,
|
||||||
|
}
|
||||||
|
if terminated or truncated:
|
||||||
|
info["final_info"] = {
|
||||||
|
"task": self.task_name,
|
||||||
|
"is_success": is_success,
|
||||||
|
}
|
||||||
|
self.reset()
|
||||||
|
|
||||||
|
return obs, reward, terminated, truncated, info
|
||||||
|
|
||||||
|
def render(self) -> np.ndarray:
|
||||||
|
self._ensure_env()
|
||||||
|
obs = self._get_obs()
|
||||||
|
# Prefer head camera for rendering; fall back to first available.
|
||||||
|
if "head_camera" in obs["pixels"]:
|
||||||
|
return obs["pixels"]["head_camera"]
|
||||||
|
return next(iter(obs["pixels"].values()))
|
||||||
|
|
||||||
|
def close(self) -> None:
|
||||||
|
if self._env is not None:
|
||||||
|
if hasattr(self._env, "close_env"):
|
||||||
|
import contextlib
|
||||||
|
|
||||||
|
with contextlib.suppress(TypeError):
|
||||||
|
self._env.close_env()
|
||||||
|
self._env = None
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Multi-task factory --------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _make_env_fns(
|
||||||
|
*,
|
||||||
|
task_name: str,
|
||||||
|
n_envs: int,
|
||||||
|
camera_names: list[str],
|
||||||
|
observation_height: int,
|
||||||
|
observation_width: int,
|
||||||
|
episode_length: int,
|
||||||
|
) -> list[Callable[[], RoboTwinEnv]]:
|
||||||
|
"""Return n_envs factory callables for a single task."""
|
||||||
|
|
||||||
|
def _make_one(episode_index: int) -> RoboTwinEnv:
|
||||||
|
return RoboTwinEnv(
|
||||||
|
task_name=task_name,
|
||||||
|
episode_index=episode_index,
|
||||||
|
n_envs=n_envs,
|
||||||
|
camera_names=camera_names,
|
||||||
|
observation_height=observation_height,
|
||||||
|
observation_width=observation_width,
|
||||||
|
episode_length=episode_length,
|
||||||
|
)
|
||||||
|
|
||||||
|
return [partial(_make_one, i) for i in range(n_envs)]
|
||||||
|
|
||||||
|
|
||||||
|
def create_robotwin_envs(
|
||||||
|
task: str,
|
||||||
|
n_envs: int,
|
||||||
|
env_cls: Callable[[Sequence[Callable[[], Any]]], Any] | None = None,
|
||||||
|
camera_names: Sequence[str] = ROBOTWIN_CAMERA_NAMES,
|
||||||
|
observation_height: int = DEFAULT_CAMERA_H,
|
||||||
|
observation_width: int = DEFAULT_CAMERA_W,
|
||||||
|
episode_length: int = DEFAULT_EPISODE_LENGTH,
|
||||||
|
) -> dict[str, dict[int, Any]]:
|
||||||
|
"""Create vectorized RoboTwin 2.0 environments.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
``dict[task_name][0] -> VectorEnv`` — one entry per task, each wrapping
|
||||||
|
``n_envs`` parallel rollouts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
task: Comma-separated list of task names (e.g. ``"beat_block_hammer"``
|
||||||
|
or ``"beat_block_hammer,click_bell"``).
|
||||||
|
n_envs: Number of parallel rollouts per task.
|
||||||
|
env_cls: Vector env constructor (e.g. ``gym.vector.AsyncVectorEnv``).
|
||||||
|
camera_names: Cameras to include in observations.
|
||||||
|
observation_height: Pixel height for all cameras.
|
||||||
|
observation_width: Pixel width for all cameras.
|
||||||
|
episode_length: Max steps before truncation.
|
||||||
|
"""
|
||||||
|
if env_cls is None or not callable(env_cls):
|
||||||
|
raise ValueError("env_cls must be callable (e.g. gym.vector.AsyncVectorEnv).")
|
||||||
|
if not isinstance(n_envs, int) or n_envs <= 0:
|
||||||
|
raise ValueError(f"n_envs must be a positive int; got {n_envs}.")
|
||||||
|
|
||||||
|
task_names = [t.strip() for t in str(task).split(",") if t.strip()]
|
||||||
|
if not task_names:
|
||||||
|
raise ValueError("`task` must contain at least one RoboTwin task name.")
|
||||||
|
|
||||||
|
unknown = [t for t in task_names if t not in ROBOTWIN_TASKS]
|
||||||
|
if unknown:
|
||||||
|
raise ValueError(f"Unknown RoboTwin tasks: {unknown}. Available tasks: {sorted(ROBOTWIN_TASKS)}")
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
"Creating RoboTwin envs | tasks=%s | n_envs(per task)=%d",
|
||||||
|
task_names,
|
||||||
|
n_envs,
|
||||||
|
)
|
||||||
|
|
||||||
|
is_async = env_cls is gym.vector.AsyncVectorEnv
|
||||||
|
cached_obs_space: spaces.Space | None = None
|
||||||
|
cached_act_space: spaces.Space | None = None
|
||||||
|
cached_metadata: dict[str, Any] | None = None
|
||||||
|
|
||||||
|
out: dict[str, dict[int, Any]] = defaultdict(dict)
|
||||||
|
for task_name in task_names:
|
||||||
|
fns = _make_env_fns(
|
||||||
|
task_name=task_name,
|
||||||
|
n_envs=n_envs,
|
||||||
|
camera_names=list(camera_names),
|
||||||
|
observation_height=observation_height,
|
||||||
|
observation_width=observation_width,
|
||||||
|
episode_length=episode_length,
|
||||||
|
)
|
||||||
|
if is_async:
|
||||||
|
lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)
|
||||||
|
if cached_obs_space is None:
|
||||||
|
cached_obs_space = lazy.observation_space
|
||||||
|
cached_act_space = lazy.action_space
|
||||||
|
cached_metadata = lazy.metadata
|
||||||
|
out[task_name][0] = lazy
|
||||||
|
else:
|
||||||
|
out[task_name][0] = env_cls(fns)
|
||||||
|
logger.info("Built vec env | task=%s | n_envs=%d", task_name, n_envs)
|
||||||
|
|
||||||
|
return {k: dict(v) for k, v in out.items()}
|
||||||
@@ -0,0 +1,282 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
"""Unit tests for the RoboTwin 2.0 Gymnasium wrapper.
|
||||||
|
|
||||||
|
These tests mock out the SAPIEN-based RoboTwin runtime (task modules +
|
||||||
|
YAML config loader) so they run without the full RoboTwin installation
|
||||||
|
(SAPIEN, CuRobo, mplib, asset downloads, etc.).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from contextlib import contextmanager
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
import numpy as np
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from lerobot.envs.robotwin import (
|
||||||
|
ACTION_DIM,
|
||||||
|
ROBOTWIN_CAMERA_NAMES,
|
||||||
|
ROBOTWIN_TASKS,
|
||||||
|
RoboTwinEnv,
|
||||||
|
create_robotwin_envs,
|
||||||
|
)
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Fixtures / helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _make_mock_task_env(
|
||||||
|
height: int = 240,
|
||||||
|
width: int = 320,
|
||||||
|
cameras: tuple[str, ...] = ROBOTWIN_CAMERA_NAMES,
|
||||||
|
) -> MagicMock:
|
||||||
|
"""Return a mock that mimics the RoboTwin task class API.
|
||||||
|
|
||||||
|
RoboTwin's real get_obs returns
|
||||||
|
{"observation": {cam: {"rgb": img}}, "joint_action": {"vector": np.ndarray}, ...}
|
||||||
|
so the mock follows the same nested shape.
|
||||||
|
"""
|
||||||
|
obs_dict = {
|
||||||
|
"observation": {cam: {"rgb": np.zeros((height, width, 3), dtype=np.uint8)} for cam in cameras},
|
||||||
|
"joint_action": {"vector": np.zeros(ACTION_DIM, dtype=np.float32)},
|
||||||
|
"endpose": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
mock = MagicMock()
|
||||||
|
mock.get_obs.return_value = obs_dict
|
||||||
|
mock.setup_demo.return_value = None
|
||||||
|
mock.take_action.return_value = None
|
||||||
|
mock.eval_success = False
|
||||||
|
mock.check_success.return_value = False
|
||||||
|
mock.close_env.return_value = None
|
||||||
|
return mock
|
||||||
|
|
||||||
|
|
||||||
|
@contextmanager
|
||||||
|
def _patch_runtime(mock_task_instance: MagicMock):
|
||||||
|
"""Patch both the task-class loader and the YAML config loader so the
|
||||||
|
env can construct + reset without a real RoboTwin install."""
|
||||||
|
task_cls = MagicMock(return_value=mock_task_instance)
|
||||||
|
fake_setup = {
|
||||||
|
"head_camera_h": 240,
|
||||||
|
"head_camera_w": 320,
|
||||||
|
"left_embodiment_config": {},
|
||||||
|
"right_embodiment_config": {},
|
||||||
|
"left_robot_file": "",
|
||||||
|
"right_robot_file": "",
|
||||||
|
"dual_arm_embodied": True,
|
||||||
|
"render_freq": 0,
|
||||||
|
"task_name": "beat_block_hammer",
|
||||||
|
"task_config": "demo_clean",
|
||||||
|
}
|
||||||
|
with (
|
||||||
|
patch("lerobot.envs.robotwin._load_robotwin_task", return_value=task_cls),
|
||||||
|
patch("lerobot.envs.robotwin._load_robotwin_setup_kwargs", return_value=fake_setup),
|
||||||
|
):
|
||||||
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# RoboTwinEnv unit tests
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestRoboTwinEnv:
|
||||||
|
def test_observation_space_shape(self):
|
||||||
|
"""observation_space should have the configured h×w×3 for every camera."""
|
||||||
|
h, w = 240, 320
|
||||||
|
env = RoboTwinEnv(
|
||||||
|
task_name="beat_block_hammer",
|
||||||
|
observation_height=h,
|
||||||
|
observation_width=w,
|
||||||
|
camera_names=["head_camera", "left_camera"],
|
||||||
|
)
|
||||||
|
pixels_space = env.observation_space["pixels"]
|
||||||
|
assert pixels_space["head_camera"].shape == (h, w, 3)
|
||||||
|
assert pixels_space["left_camera"].shape == (h, w, 3)
|
||||||
|
assert "right_camera" not in pixels_space
|
||||||
|
|
||||||
|
def test_action_space(self):
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
assert env.action_space.shape == (ACTION_DIM,)
|
||||||
|
assert env.action_space.dtype == np.float32
|
||||||
|
|
||||||
|
def test_reset_returns_correct_obs_keys(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
obs, info = env.reset()
|
||||||
|
|
||||||
|
assert "pixels" in obs
|
||||||
|
for cam in ROBOTWIN_CAMERA_NAMES:
|
||||||
|
assert cam in obs["pixels"], f"Missing camera '{cam}' in obs"
|
||||||
|
assert "agent_pos" in obs
|
||||||
|
assert obs["agent_pos"].shape == (ACTION_DIM,)
|
||||||
|
assert info["is_success"] is False
|
||||||
|
|
||||||
|
def test_reset_calls_setup_demo(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset(seed=42)
|
||||||
|
# setup_demo receives the full YAML-derived kwargs plus seed + is_test;
|
||||||
|
# we only assert the caller-provided bits.
|
||||||
|
assert mock_task.setup_demo.call_count == 1
|
||||||
|
call_kwargs = mock_task.setup_demo.call_args.kwargs
|
||||||
|
assert call_kwargs["seed"] == 42
|
||||||
|
assert call_kwargs["is_test"] is True
|
||||||
|
|
||||||
|
def test_step_returns_correct_types(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
action = np.zeros(ACTION_DIM, dtype=np.float32)
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset()
|
||||||
|
obs, reward, terminated, truncated, info = env.step(action)
|
||||||
|
|
||||||
|
assert isinstance(obs, dict)
|
||||||
|
assert isinstance(reward, float)
|
||||||
|
assert isinstance(terminated, bool)
|
||||||
|
assert isinstance(truncated, bool)
|
||||||
|
assert isinstance(info, dict)
|
||||||
|
|
||||||
|
def test_step_wrong_action_shape_raises(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
bad_action = np.zeros(7, dtype=np.float32) # wrong dim
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset()
|
||||||
|
with pytest.raises(ValueError, match="Expected 1-D action"):
|
||||||
|
env.step(bad_action)
|
||||||
|
|
||||||
|
def test_success_terminates_episode(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
mock_task.check_success.return_value = True
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
action = np.zeros(ACTION_DIM, dtype=np.float32)
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset()
|
||||||
|
_, _, terminated, _, info = env.step(action)
|
||||||
|
assert terminated is True
|
||||||
|
assert info["is_success"] is True
|
||||||
|
|
||||||
|
def test_truncation_after_episode_length(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer", episode_length=2)
|
||||||
|
action = np.zeros(ACTION_DIM, dtype=np.float32)
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset()
|
||||||
|
env.step(action) # step 1
|
||||||
|
_, _, _, truncated, _ = env.step(action) # step 2 → truncated
|
||||||
|
assert truncated is True
|
||||||
|
|
||||||
|
def test_close_calls_close_env(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
env.reset()
|
||||||
|
env.close()
|
||||||
|
mock_task.close_env.assert_called_once()
|
||||||
|
|
||||||
|
def test_black_frame_for_missing_camera(self):
|
||||||
|
"""If a camera key is absent from get_obs(), a black frame is returned."""
|
||||||
|
# Mock exposes only head_camera; we ask for both head_camera + left_camera.
|
||||||
|
mock_task = _make_mock_task_env(height=10, width=10, cameras=("head_camera",))
|
||||||
|
env = RoboTwinEnv(
|
||||||
|
task_name="beat_block_hammer",
|
||||||
|
camera_names=["head_camera", "left_camera"],
|
||||||
|
observation_height=10,
|
||||||
|
observation_width=10,
|
||||||
|
)
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
obs, _ = env.reset()
|
||||||
|
assert obs["pixels"]["left_camera"].shape == (10, 10, 3)
|
||||||
|
assert obs["pixels"]["left_camera"].sum() == 0
|
||||||
|
|
||||||
|
def test_task_and_task_description_attributes(self):
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
assert env.task == "beat_block_hammer"
|
||||||
|
assert isinstance(env.task_description, str)
|
||||||
|
|
||||||
|
def test_deferred_init_env_is_none_before_reset(self):
|
||||||
|
env = RoboTwinEnv(task_name="beat_block_hammer")
|
||||||
|
assert env._env is None # noqa: SLF001 (testing internal state)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# create_robotwin_envs tests
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestCreateRoboTwinEnvs:
|
||||||
|
def test_returns_correct_structure(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
envs = create_robotwin_envs(
|
||||||
|
task="beat_block_hammer",
|
||||||
|
n_envs=1,
|
||||||
|
env_cls=gym.vector.SyncVectorEnv,
|
||||||
|
)
|
||||||
|
assert "beat_block_hammer" in envs
|
||||||
|
assert 0 in envs["beat_block_hammer"]
|
||||||
|
assert isinstance(envs["beat_block_hammer"][0], gym.vector.SyncVectorEnv)
|
||||||
|
|
||||||
|
def test_multi_task(self):
|
||||||
|
mock_task = _make_mock_task_env()
|
||||||
|
with _patch_runtime(mock_task):
|
||||||
|
envs = create_robotwin_envs(
|
||||||
|
task="beat_block_hammer,click_bell",
|
||||||
|
n_envs=1,
|
||||||
|
env_cls=gym.vector.SyncVectorEnv,
|
||||||
|
)
|
||||||
|
assert set(envs.keys()) == {"beat_block_hammer", "click_bell"}
|
||||||
|
|
||||||
|
def test_unknown_task_raises(self):
|
||||||
|
with pytest.raises(ValueError, match="Unknown RoboTwin tasks"):
|
||||||
|
create_robotwin_envs(
|
||||||
|
task="not_a_real_task",
|
||||||
|
n_envs=1,
|
||||||
|
env_cls=gym.vector.SyncVectorEnv,
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_invalid_n_envs_raises(self):
|
||||||
|
with pytest.raises(ValueError, match="n_envs must be a positive int"):
|
||||||
|
create_robotwin_envs(
|
||||||
|
task="beat_block_hammer",
|
||||||
|
n_envs=0,
|
||||||
|
env_cls=gym.vector.SyncVectorEnv,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# ROBOTWIN_TASKS list
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_task_list_not_empty():
|
||||||
|
assert len(ROBOTWIN_TASKS) >= 50
|
||||||
|
|
||||||
|
|
||||||
|
def test_all_tasks_are_strings():
|
||||||
|
assert all(isinstance(t, str) and t for t in ROBOTWIN_TASKS)
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_duplicate_tasks():
|
||||||
|
assert len(ROBOTWIN_TASKS) == len(set(ROBOTWIN_TASKS))
|
||||||
Reference in New Issue
Block a user