Resolves conflicts introduced by the RoboCasa365 benchmark merge on main
(PR #3375) overlapping with this PR's benchmark CI scaffolding.
- .github/workflows/benchmark_tests.yml: keep both the RoboTwin 2.0 job
(from this branch) and the new RoboCasa365 job (from main).
- docs/source/_toctree.yml: list both `robotwin` and `robocasa` pages
under the "Benchmarks" section.
- scripts/ci/extract_task_descriptions.py: keep both
`_robotwin_descriptions` and `_robocasa_descriptions` helpers and wire
them into `main()` alongside each other.
Made-with: Cursor
* feat(envs): add RoboCasa365 benchmark integration
Add RoboCasa365 (arXiv:2603.04356) as a new simulation benchmark with
365 everyday kitchen manipulation tasks across 2,500 diverse environments.
New files:
- src/lerobot/envs/robocasa.py: gym.Env wrapper with deferred env creation,
flat 12D action / 16D state vectors, 3-camera support
- docs/source/robocasa.mdx: user-facing documentation
- docker/Dockerfile.benchmark.robocasa: CI benchmark image
Modified files:
- src/lerobot/envs/configs.py: RoboCasaEnv config (--env.type=robocasa)
- pyproject.toml: robocasa optional dependency group
- docs/source/_toctree.yml: sidebar entry
- .github/workflows/benchmark_tests.yml: integration test job
Refs: https://arxiv.org/abs/2603.04356, https://robocasa.ai
Related: huggingface/lerobot#321
* fix(docker): use uv pip to install robocasa in benchmark image
The huggingface/lerobot-gpu base image uses `uv` with a venv at
/lerobot/.venv — `pip` is not on PATH, so `pip install` fails with
"pip: not found". Switch to `uv pip install` which installs into the
existing venv.
Also drop the @v1.0.0 tag pin from the robocasa git URL since the
upstream repo may not have that tag; use default branch instead.
* fix(robocasa): editable install + switch to lerobot/smolvla_robocasa
- pip install from git omits data files like box_links_assets.json
(not declared in package_data). Clone and install editable so the
source tree is used at runtime.
- Download only tex + fixtures_lw asset types (smoke test doesn't need
objaverse/aigen objects). Pipe 'y' to auto-accept download prompt.
- Switch CI policy from pepijn223/smolvla_robocasa to lerobot/smolvla_robocasa.
* fix(docker): re-install lerobot editably after COPY
The nightly huggingface/lerobot-gpu image predates the RoboCasaEnv
registration — so `lerobot-eval --env.type=robocasa` fails at argparse
with "invalid choice" even after COPY . . overlays the new source.
Force an editable reinstall so the venv picks up the current configs.py.
* fix(ci): add rename_map for robocasa eval (image* -> camera*)
Policy lerobot/smolvla_robocasa expects observation.images.camera1/2/3,
but RoboCasaEnv produces observation.images.image/image2/image3.
* fix(robocasa): override RoboCasaGymEnv default split (test -> all)
RoboCasaGymEnv defaults split="test", but create_env only accepts
{None, "all", "pretrain", "target"}, so the out-of-the-box default
crashes with ValueError. Always pass "all" when split is None.
* fix(docker): also download objs_lw (lightwheel objects) for robocasa
Kitchen tasks (e.g. CloseFridge) reference lightwheel object meshes
like Stool022/model.xml. fixtures_lw alone isn't enough — we also
need objs_lw. Still skipping objaverse/aigen to keep image size down.
Made-with: Cursor
* feat(robocasa): raw camera names + benchmark-group task shortcuts
Align the LeRobot env with RoboCasa's native conventions so policies
trained on the upstream datasets don't need a --rename_map at eval
time, and expose the standard task groups as first-class --env.task
values.
- Preserve raw RoboCasa camera names (e.g. robot0_agentview_left)
as observation.images.<name> end-to-end. Drops camera_name_mapping
and DEFAULT_CAMERA_NAME_MAPPING; features/features_map are now
built dynamically from the parsed camera list.
- Accept benchmark-group names as --env.task: atomic_seen,
composite_seen, composite_unseen, pretrain50/100/200/300. Expanded
lazily via robocasa.utils.dataset_registry and auto-sets the
split ("target" | "pretrain").
- Update CI smoke-eval rename_map to map raw cam names to the
camera1/2/3 keys expected by lerobot/smolvla_robocasa.
* docs(robocasa): single-task smolvla train+eval recipe on pepijn223/robocasa_CloseFridge
- Rewrite observation section to use raw RoboCasa camera keys
(observation.images.robot0_agentview_{left,right},
observation.images.robot0_eye_in_hand).
- Add a "Training on a single task" section with a full smolvla
training command on pepijn223/robocasa_CloseFridge, plus matching
single-task eval command.
- Document benchmark-group task shortcuts (atomic_seen, composite_seen,
composite_unseen, pretrain50/100/200/300) as valid --env.task values.
* fix(robocasa): restrict obj_registries to lightwheel by default
CloseFridge (and most kitchen tasks) crashed at reset with
`ValueError: Probabilities contain NaN` coming out of
`sample_kitchen_object_helper`. RoboCasa's upstream default
`obj_registries=("objaverse", "lightwheel")` normalizes per-registry
candidate counts as probabilities; when a sampled category has zero
mjcf paths in every configured registry (because the objaverse asset
pack isn't on disk — ~30GB, skipped by our Docker build), the 0/0
divide yields NaNs and `rng.choice` raises.
- Add `obj_registries: list[str] = ["lightwheel"]` to `RoboCasaEnv`
config; thread it through `create_robocasa_envs`, `_make_env_fns`,
and the gym.Env wrapper to the underlying `RoboCasaGymEnv` (which
forwards to `create_env` → `robosuite.make` → kitchen env).
- Default matches what `download_kitchen_assets --type objs_lw`
actually ships, so the env works out of the box without a 30GB
objaverse download.
- Document the override (`--env.obj_registries='[objaverse,lightwheel]'`)
for users who have downloaded the full asset set.
* fix(docker): also download tex_generative for robocasa benchmark
RoboCasa's lightwheel kitchen fixtures embed references to
`generative_textures/wall/tex*.png` directly in their MuJoCo XML, so
`MjModel.from_xml_string` errors out at reset time with
"No such file or directory" even when the env is constructed with
`generative_textures=None`. The generative textures live under a
separate asset registry key (`tex_generative`) in
`download_kitchen_assets`, distinct from the base `tex` pack we were
already fetching.
- Add `tex_generative` to the download list so the fixture XMLs
resolve.
- Document the remaining omissions (objaverse/aigen, ~30GB) and how
the runtime side pairs this with obj_registries=["lightwheel"] to
avoid sampling from categories whose assets aren't on disk.
* ci(robocasa): smoke-eval 10 atomic tasks instead of 1
Broader coverage in the benchmark CI job: evaluate SmolVLA on ten
fixture-centric atomic RoboCasa tasks (one episode each) instead of
just CloseFridge. The tasks are all drawn from TARGET_TASKS.atomic_seen
and selected to avoid object-manipulation categories that would require
the objaverse/aigen asset packs (we only ship objs_lw in the Docker
image, paired with obj_registries=["lightwheel"] on the runtime side).
Tasks: CloseFridge, OpenCabinet, OpenDrawer, TurnOnMicrowave,
TurnOffStove, CloseToasterOvenDoor, SlideDishwasherRack,
TurnOnSinkFaucet, NavigateKitchen, TurnOnElectricKettle.
`scripts/ci/parse_eval_metrics.py` already handles multi-task output
via the `overall` key, so no parser changes needed. Bumped the metrics
artifact's task label to `atomic_smoke_10` to reflect the grouping.
* fix(pyproject): drop unresolvable robocasa extra
robocasa's upstream setup.py hardcodes `lerobot==0.3.3` in
install_requires. Exposing it as the `lerobot[robocasa]` extra made
uv's dep resolver cycle: `lerobot[robocasa]` -> robocasa -> lerobot
(a different version) -> unsolvable. This broke every `uv sync` — even
invocations with an unrelated extra like `--extra test` — because uv
validates the whole lockfile graph.
- Remove the `robocasa` extra from pyproject.toml. Installation
instructions in docs/source/robocasa.mdx now walk users through the
manual `git clone` + `pip install --no-deps` flow, which matches
what the Docker image already does and sidesteps the cyclic dep
entirely.
- Dockerfile: `uv pip install -e ~/robocasa --no-deps` so the
shadowed lerobot==0.3.3 never lands in the image; install
robocasa's actual runtime deps (numpy, numba, scipy, mujoco,
tianshou, etc.) explicitly.
* docs(robocasa): align page with adding_benchmarks template
Rework docs/source/robocasa.mdx to follow the standard benchmark doc
structure: intro + links + available tasks (with family breakdown and
first-class benchmark-group shortcuts) + installation + eval +
recommended episodes + policy I/O + training + reproducing results.
- Fix the paper link (was pointing at a non-existent arxiv ID).
- Surface lerobot/smolvla_robocasa and pepijn223/robocasa_CloseFridge
in the top-of-page links so they're findable without reading the
training section.
- Add an explicit "Object registries" subsection explaining the
`--env.obj_registries=[objaverse,lightwheel]` override path.
- Add an explicit "Reproducing published results" section pointing
at the CI smoke eval.
* fix: integrate PR #3375 review feedback
- envs(robocasa): hoist the duplicated `_parse_camera_names` helper
out of `libero.py` and `robocasa.py` into `envs/utils.py` as the
public `parse_camera_names`; call sites updated.
- envs(robocasa): give each factory a distinct `episode_index`
(`0..n_envs-1`) and derive a per-worker seed series in `reset()`
so n_envs workers don't all roll the same scene under a shared
outer seed.
- envs(robocasa): drop the unused `**kwargs` on `_make_env`; declare
`visualization_height` / `visualization_width` on both the wrapper
and the `RoboCasaEnv` config + propagate via `gym_kwargs`.
- envs(robocasa): emit `info["final_info"]` on termination (matching
MetaWorld) so downstream vector-env auto-reset keeps the terminal
task/success flags.
- docs(robocasa): add `--rename_map` (robot0_agentview_left/
eye_in_hand/agentview_right → camera1/2/3) plus CI-parity flags to
all three eval snippets.
- docker(robocasa): pin robocasa + robosuite git SHAs and the pip
dep versions (pygame, Pillow, opencv-python, pyyaml, pynput, tqdm,
termcolor, imageio, h5py, lxml, hidapi, gymnasium) for
reproducible benchmark images.
- ci(robocasa): update the workflow comment — there is no
`lerobot[robocasa]` extra; robocasa/robosuite are installed
manually because upstream's `lerobot==0.3.3` pin shadows ours.
* docs(robocasa): add benchmark banner image
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch. Also threads the cached metadata
through the RoboCasa factory so async eval on `--env.type=robocasa`
keeps the same improvement.
* fix: integrate PR #3375 review feedback (round 2)
- envs(robocasa): when the caller passes `seed=None` to `reset()`,
fall back to `self.episode_index` for the inner env seed so each
worker still samples a distinct trajectory instead of all workers
inheriting the same global RNG state.
- envs(robocasa): replace the two module-level `print()` calls in
`create_robocasa_envs` with `logger.info(...)` via a module-level
`logger = logging.getLogger(__name__)`.
- ci(robocasa): run `scripts/ci/extract_task_descriptions.py` after
the eval so `metrics.json` carries per-task natural-language
labels, matching LIBERO / MetaWorld / VLABench jobs. Added a
`_robocasa_descriptions()` extractor that splits CamelCase task
names into word-level labels keyed by `<task>_0`.
- envs(robotwin): default `observation_height/width` in
`create_robotwin_envs` to `DEFAULT_CAMERA_H/W` (240/320) so they
match the D435 dims baked into `task_config/demo_clean.yml`.
- envs(robotwin): resolve `task_config/demo_clean.yml` via
`CONFIGS_PATH` instead of a cwd-relative path; works regardless
of where `lerobot-eval` is invoked.
- envs(robotwin): replace `print()` calls in `create_robotwin_envs`
with `logger.info(...)` (module-level `logger = logging.getLogger`).
- envs(robotwin): use `_LazyAsyncVectorEnv` for the async path so
async workers start lazily (matches LIBERO / RoboCasa / VLABench).
- envs(robotwin): cast `agent_pos` space + joint-state output to
float32 end-to-end (was mixed float64/float32).
- envs(configs): use the existing `_make_vec_env_cls(use_async,
n_envs)` helper in `RoboTwinEnvConfig.create_envs`; drop the
`get_env_processors` override so RoboTwin uses the identity
processor inherited from `EnvConfig`.
- processor: delete `RoboTwinProcessorStep` — the float32 cast now
happens in the wrapper itself, so the processor is redundant.
- tests: drop the `TestRoboTwinProcessorStep` suite; update the
mock obs fixture to use float32 `joint_action.vector`.
- ci: hoist `ROBOTWIN_POLICY` and `ROBOTWIN_TASKS` to job-level
env vars so the task list and policy aren't duplicated across
eval / extract / parse steps.
- docker: pin RoboTwin + CuRobo upstream clones to commit SHAs
(`RoboTwin@0aeea2d6`, `curobo@ca941586`) for reproducibility.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On fork PRs, `secrets.DOCKERHUB_LEROBOT_*` expand to empty strings,
which fails `docker/login-action@v3` with `Error: Username and
password required` before any of the actual build/eval work runs.
Gate the login step on the env-var expansion of the username so the
step is skipped (not failed) when secrets are absent. On the main
repo + maintainer-approved fork runs (`pull_request_review` path),
the secrets resolve normally, the step runs, and image pulls get
the authenticated Docker Hub rate limit.
Scope: only `benchmark_tests.yml`, the lone benchmark workflow that
triggers on `pull_request` from forks. `full_tests.yml` and
`latest_deps_tests.yml` run under `pull_request_review` / schedule /
workflow_dispatch, where secrets are already guaranteed.
Context: surfaced on #3416 where an external contributor's PR failed
at the login step before any test could run.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ROBOTWIN_TASKS was trimmed to 50 tasks (see comment in
`src/lerobot/envs/robotwin.py:48`), but the assertion still
required ≥60, causing CI failures. Align the test with the
current upstream task count.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(record): pass rename_map to make_policy in lerobot-record
Fixes#3181. The rename_map from dataset config was used for preprocessor
construction but not passed to make_policy(), causing feature mismatch
errors when camera key names differ between dataset and model config.
make_policy() already accepts a rename_map parameter and uses it to skip
visual feature consistency validation when remapping is active, but
lerobot_record.py was not passing it through.
* style: fix ruff format for ternary expression
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
The local ROBOTWIN_TASKS tuple drifted from upstream
RoboTwin-Platform/RoboTwin. Users passing names like `close_laptop`,
`close_microwave`, `dump_bin`, `place_block`, `pour_water`,
`fold_cloth`, etc. got past our validator (the names were in the
tuple) but then crashed inside robosuite with a confusing error,
because those tasks don't exist in upstream `envs/`.
- Replace ROBOTWIN_TASKS with a verbatim mirror of upstream's
`envs/` directory: 50 tasks as of main (was 60 with many
stale entries). Added a `gh api`-based one-liner comment so
future bumps are mechanical.
- Update the `60 tasks` claims in robotwin.mdx and
RoboTwinEnvConfig's docstring to `50`.
- Replace the stale example-task table in robotwin.mdx with ten
upstream-confirmed examples, and flag `open_laptop` as
temporarily broken (its `check_success()` uses `self.arm_tag`
which is only set inside `play_once()`; eval-mode callers hit
AttributeError).
- Rebuild the "Full benchmark" command with the actual 50-task
list, omitting `open_laptop`.
Made-with: Cursor
The smoke eval hit two upstream issues:
- `open_laptop`: bug in OpenMOSS/RoboTwin main — `check_success()` uses
`self.arm_tag`, but that attribute is only set inside `play_once()`
(the scripted-expert path). During eval `take_action()` calls
`check_success()` directly, hitting `AttributeError: 'open_laptop'
object has no attribute 'arm_tag'`.
- `close_laptop`, `close_microwave`, `place_block`: not present in
upstream RoboTwin `envs/` at all — our ROBOTWIN_TASKS tuple drifted
from upstream and these names leaked into CI.
Replace the four broken tasks with upstream-confirmed equivalents
that exist both in ROBOTWIN_TASKS and in RoboTwin's `envs/`:
`adjust_bottle`, `lift_pot`, `stamp_seal`, `turn_switch`.
New 10-task smoke set: beat_block_hammer, click_bell, handover_block,
stack_blocks_two, click_alarmclock, open_microwave, adjust_bottle,
lift_pot, stamp_seal, turn_switch.
Made-with: Cursor
Broader coverage on the RoboTwin 2.0 benchmark CI job: bump the smoke
eval from 5 tasks to 10 (one episode each). Added tasks are all drawn
from ROBOTWIN_TASKS and mirror the shape/complexity of the existing
set (simple single-object or single-fixture manipulations).
Tasks now run: beat_block_hammer, click_bell, handover_block,
open_laptop, stack_blocks_two, click_alarmclock, close_laptop,
close_microwave, open_microwave, place_block.
`parse_eval_metrics.py` reads `overall` for multi-task runs so no
parser change is needed. Bumped the step name and the metrics label
to reflect the 10-task layout.
Made-with: Cursor
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- docker: tie patches to pinned versions with removal guidance, remove
unnecessary HF_TOKEN for public dataset, fix hadolint warnings
- docs: fix paper link to arxiv, add teaser image, fix camera names
(4→3 cameras), fix observation dims (480x640→240x320)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
__init__ was eagerly calling _load_robotwin_setup_kwargs just to read
head_camera_h/w from the YAML. That import (`from envs import CONFIGS_PATH`)
required a real RoboTwin install, so constructing the env — and thus every
test in tests/envs/test_robotwin.py — blew up with ModuleNotFoundError
on fast-tests where RoboTwin isn't installed.
Replace the eager lookup with DEFAULT_CAMERA_H/W constants (240×320, the
D435 dims baked into task_config/demo_clean.yml). reset() still resolves
the full setup_kwargs lazily — that's fine because reset() is only
called inside the benchmark Docker image where RoboTwin is present.
Also resync the test file with the current env API:
- mock get_obs() as the real nested {"observation": {cam: {"rgb": …}},
"joint_action": {"vector": …}} shape
- patch both _load_robotwin_task and _load_robotwin_setup_kwargs
(_patch_load → _patch_runtime)
- drop `front_camera` / `left_wrist` from assertions — aloha-agilex
exposes head_camera + left_camera + right_camera, not those
- black-frame test now uses left_camera as the missing camera
- setup_demo call check loosened to the caller-provided seed/is_test
bits (full kwargs include the YAML-derived blob)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
doc-builder's TOC integrity check was rejecting the branch because
docs/source/robotwin.mdx existed but wasn't listed in _toctree.yml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror the libero/metaworld/libero_plus/robomme pattern: start from the
nightly GPU image (apt deps, python, uv, venv, lerobot[all] already
there) and layer on only what RoboTwin 2.0 uniquely needs —
cuda-nvcc + cuda-cudart-dev (CuRobo builds from source), Vulkan libs +
NVIDIA ICD (SAPIEN renderer), sapien/mplib/open3d/pytorch3d/curobo
installs, the mplib + sapien upstream patches, and the TianxingChen
asset download.
Drops ~90 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init, base lerobot install). 199 → 110.
Also repoint the docs + env docstring dataset link from
hxma/RoboTwin-LeRobot-v3.0 to the canonical lerobot/robotwin_unified.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Widen the smoke eval from a single task (beat_block_hammer) to five:
click_bell, handover_block, open_laptop, stack_blocks_two on top of the
original. Each gets its own rollout video in videos/<task>_0/ so the
dashboard can surface visually distinct behaviours.
extract_task_descriptions.py now has a RoboTwin branch that reads
`description/task_instruction/<task>.json` (already shipped in the clone
at /opt/robotwin) and pulls the `full_description` field. CI cds into
the clone before invoking the script so the relative path resolves.
parse_eval_metrics.py is invoked with the same 5-task list so the
metrics.json embeds one entry per task.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
write_video (utils/io_utils.py:53) lazily imports PyAV via require_package
and raises silently inside the video-writing thread when the extra is not
installed — so the eval itself succeeds with pc_success=100 but no MP4
ever lands in videos/, and the artifact upload reports "No files were
found". Add av-dep to the install line (same pattern as the RoboMME image).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rollout() does `env.call("_max_episode_steps")` (lerobot_eval.py:157) to
know when to stop stepping. LiberoEnv and MetaworldEnv set this attribute;
RoboTwinEnv was tracking the limit under `episode_length` only, so the call
raised AttributeError once CuRobo finished warming up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lerobot_eval crashed in gym.vector's SyncVectorEnv.reset with:
ValueError: Output array is the wrong shape
because RoboTwinEnvConfig declared observation_space = (480, 640, 3) but
task_config/demo_clean.yml specifies head_camera_type=D435, which renders
(240, 320, 3). gym.vector.concatenate pre-allocates a buffer from the
declared space, so the first np.stack raises on shape mismatch.
Changes:
- Config defaults now 240×320 (the D435 dims in _camera_config.yml), with
a comment pointing at the source of truth.
- RoboTwinEnv.__init__ accepts observation_height/width as Optional and
falls back to setup_kwargs["head_camera_h/w"] so the env is self-consistent
even if the config is not in sync.
- Config camera_names / features_map use the actual aloha-agilex camera
names (head_camera, left_camera, right_camera). Drops the stale
"front_camera" and "left_wrist"/"right_wrist" entries that never matched
anything RoboTwin exposes.
- CI workflow's rename_map updated to match the new camera names.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_get_obs was guarding every dict access with isinstance(..., dict) in case
RoboTwin's get_obs returned something else — but the API contract
(envs/_base_task.py:437) always returns a dict, so the guards were silently
masking real failures behind plausible-looking zero observations. Drop them.
Also:
- Cache a single black fallback frame in __init__ instead of allocating
a fresh np.zeros((H, W, 3), uint8) for every missing camera on every
step — the "camera not exposed" set is static per env.
- Only allocate the zero joint_state on the fallback path (not unconditionally
before the real value overwrites it).
- Replace .flatten() with .ravel() (no copy when already 1-D).
- Fold the nested-dict schema comment and two identical torch.enable_grad()
rationales into a single Autograd section in the class docstring.
- Fix stale `left_wrist` camera name in the observation docstring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RoboTwin's base_task.get_obs() returns a nested dict:
{"observation": {cam: {"rgb": ..., "intrinsic_matrix": ...}},
"joint_action": {"left_arm": ..., "left_gripper": ...,
"right_arm": ..., "right_gripper": ...,
"vector": np.ndarray},
"endpose": {...}}
Our _get_obs was reading raw["{cam}_rgb"] / raw["{cam}"] and raw["joint_action"]
as if they were flat, so np.asarray(raw["joint_action"], dtype=float64) tripped
on a dict and raised
TypeError: float() argument must be a string or a real number, not 'dict'
Fix:
- Pull images from raw["observation"][cam]["rgb"]
- Pull joint state from raw["joint_action"]["vector"] (the flat array)
- Update the default camera tuple to (head_camera, left_camera, right_camera)
to match RoboTwin's actual wrist-camera names (envs/camera/camera.py:135-151)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lerobot_eval wraps the full rollout in torch.no_grad() (lerobot_eval.py:566),
but RoboTwin's setup_demo → load_robot → CuroboPlanner(...) runs
motion_gen.warmup(), which invokes Newton's-method trajectory optimization.
That optimizer calls cost.backward() internally, which raises
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
when autograd is disabled. take_action() hits the same planner path at every
step. Wrap both setup_demo and take_action in torch.enable_grad() so CuRobo's
optimizer can build its computation graph. Policy inference is unaffected —
rollout()'s inner torch.inference_mode() block around select_action() is
untouched, so we still don't allocate grad buffers during policy forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RoboTwinEnvConfig with 4-camera setup (head/front/left_wrist/right_wrist)
- Docker image with SAPIEN, mplib, CuRobo, pytorch3d (Python 3.12)
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robotwin
- RoboTwinProcessorStep for state float32 casting
- Camera rename_map: head_camera/front_camera/left_wrist -> camera1/2/3
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>