mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
main
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e699e52388 |
feat(envs): add RoboCasa365 benchmark integration (#3375)
* feat(envs): add RoboCasa365 benchmark integration Add RoboCasa365 (arXiv:2603.04356) as a new simulation benchmark with 365 everyday kitchen manipulation tasks across 2,500 diverse environments. New files: - src/lerobot/envs/robocasa.py: gym.Env wrapper with deferred env creation, flat 12D action / 16D state vectors, 3-camera support - docs/source/robocasa.mdx: user-facing documentation - docker/Dockerfile.benchmark.robocasa: CI benchmark image Modified files: - src/lerobot/envs/configs.py: RoboCasaEnv config (--env.type=robocasa) - pyproject.toml: robocasa optional dependency group - docs/source/_toctree.yml: sidebar entry - .github/workflows/benchmark_tests.yml: integration test job Refs: https://arxiv.org/abs/2603.04356, https://robocasa.ai Related: huggingface/lerobot#321 * fix(docker): use uv pip to install robocasa in benchmark image The huggingface/lerobot-gpu base image uses `uv` with a venv at /lerobot/.venv — `pip` is not on PATH, so `pip install` fails with "pip: not found". Switch to `uv pip install` which installs into the existing venv. Also drop the @v1.0.0 tag pin from the robocasa git URL since the upstream repo may not have that tag; use default branch instead. * fix(robocasa): editable install + switch to lerobot/smolvla_robocasa - pip install from git omits data files like box_links_assets.json (not declared in package_data). Clone and install editable so the source tree is used at runtime. - Download only tex + fixtures_lw asset types (smoke test doesn't need objaverse/aigen objects). Pipe 'y' to auto-accept download prompt. - Switch CI policy from pepijn223/smolvla_robocasa to lerobot/smolvla_robocasa. * fix(docker): re-install lerobot editably after COPY The nightly huggingface/lerobot-gpu image predates the RoboCasaEnv registration — so `lerobot-eval --env.type=robocasa` fails at argparse with "invalid choice" even after COPY . . overlays the new source. Force an editable reinstall so the venv picks up the current configs.py. * fix(ci): add rename_map for robocasa eval (image* -> camera*) Policy lerobot/smolvla_robocasa expects observation.images.camera1/2/3, but RoboCasaEnv produces observation.images.image/image2/image3. * fix(robocasa): override RoboCasaGymEnv default split (test -> all) RoboCasaGymEnv defaults split="test", but create_env only accepts {None, "all", "pretrain", "target"}, so the out-of-the-box default crashes with ValueError. Always pass "all" when split is None. * fix(docker): also download objs_lw (lightwheel objects) for robocasa Kitchen tasks (e.g. CloseFridge) reference lightwheel object meshes like Stool022/model.xml. fixtures_lw alone isn't enough — we also need objs_lw. Still skipping objaverse/aigen to keep image size down. Made-with: Cursor * feat(robocasa): raw camera names + benchmark-group task shortcuts Align the LeRobot env with RoboCasa's native conventions so policies trained on the upstream datasets don't need a --rename_map at eval time, and expose the standard task groups as first-class --env.task values. - Preserve raw RoboCasa camera names (e.g. robot0_agentview_left) as observation.images.<name> end-to-end. Drops camera_name_mapping and DEFAULT_CAMERA_NAME_MAPPING; features/features_map are now built dynamically from the parsed camera list. - Accept benchmark-group names as --env.task: atomic_seen, composite_seen, composite_unseen, pretrain50/100/200/300. Expanded lazily via robocasa.utils.dataset_registry and auto-sets the split ("target" | "pretrain"). - Update CI smoke-eval rename_map to map raw cam names to the camera1/2/3 keys expected by lerobot/smolvla_robocasa. * docs(robocasa): single-task smolvla train+eval recipe on pepijn223/robocasa_CloseFridge - Rewrite observation section to use raw RoboCasa camera keys (observation.images.robot0_agentview_{left,right}, observation.images.robot0_eye_in_hand). - Add a "Training on a single task" section with a full smolvla training command on pepijn223/robocasa_CloseFridge, plus matching single-task eval command. - Document benchmark-group task shortcuts (atomic_seen, composite_seen, composite_unseen, pretrain50/100/200/300) as valid --env.task values. * fix(robocasa): restrict obj_registries to lightwheel by default CloseFridge (and most kitchen tasks) crashed at reset with `ValueError: Probabilities contain NaN` coming out of `sample_kitchen_object_helper`. RoboCasa's upstream default `obj_registries=("objaverse", "lightwheel")` normalizes per-registry candidate counts as probabilities; when a sampled category has zero mjcf paths in every configured registry (because the objaverse asset pack isn't on disk — ~30GB, skipped by our Docker build), the 0/0 divide yields NaNs and `rng.choice` raises. - Add `obj_registries: list[str] = ["lightwheel"]` to `RoboCasaEnv` config; thread it through `create_robocasa_envs`, `_make_env_fns`, and the gym.Env wrapper to the underlying `RoboCasaGymEnv` (which forwards to `create_env` → `robosuite.make` → kitchen env). - Default matches what `download_kitchen_assets --type objs_lw` actually ships, so the env works out of the box without a 30GB objaverse download. - Document the override (`--env.obj_registries='[objaverse,lightwheel]'`) for users who have downloaded the full asset set. * fix(docker): also download tex_generative for robocasa benchmark RoboCasa's lightwheel kitchen fixtures embed references to `generative_textures/wall/tex*.png` directly in their MuJoCo XML, so `MjModel.from_xml_string` errors out at reset time with "No such file or directory" even when the env is constructed with `generative_textures=None`. The generative textures live under a separate asset registry key (`tex_generative`) in `download_kitchen_assets`, distinct from the base `tex` pack we were already fetching. - Add `tex_generative` to the download list so the fixture XMLs resolve. - Document the remaining omissions (objaverse/aigen, ~30GB) and how the runtime side pairs this with obj_registries=["lightwheel"] to avoid sampling from categories whose assets aren't on disk. * ci(robocasa): smoke-eval 10 atomic tasks instead of 1 Broader coverage in the benchmark CI job: evaluate SmolVLA on ten fixture-centric atomic RoboCasa tasks (one episode each) instead of just CloseFridge. The tasks are all drawn from TARGET_TASKS.atomic_seen and selected to avoid object-manipulation categories that would require the objaverse/aigen asset packs (we only ship objs_lw in the Docker image, paired with obj_registries=["lightwheel"] on the runtime side). Tasks: CloseFridge, OpenCabinet, OpenDrawer, TurnOnMicrowave, TurnOffStove, CloseToasterOvenDoor, SlideDishwasherRack, TurnOnSinkFaucet, NavigateKitchen, TurnOnElectricKettle. `scripts/ci/parse_eval_metrics.py` already handles multi-task output via the `overall` key, so no parser changes needed. Bumped the metrics artifact's task label to `atomic_smoke_10` to reflect the grouping. * fix(pyproject): drop unresolvable robocasa extra robocasa's upstream setup.py hardcodes `lerobot==0.3.3` in install_requires. Exposing it as the `lerobot[robocasa]` extra made uv's dep resolver cycle: `lerobot[robocasa]` -> robocasa -> lerobot (a different version) -> unsolvable. This broke every `uv sync` — even invocations with an unrelated extra like `--extra test` — because uv validates the whole lockfile graph. - Remove the `robocasa` extra from pyproject.toml. Installation instructions in docs/source/robocasa.mdx now walk users through the manual `git clone` + `pip install --no-deps` flow, which matches what the Docker image already does and sidesteps the cyclic dep entirely. - Dockerfile: `uv pip install -e ~/robocasa --no-deps` so the shadowed lerobot==0.3.3 never lands in the image; install robocasa's actual runtime deps (numpy, numba, scipy, mujoco, tianshou, etc.) explicitly. * docs(robocasa): align page with adding_benchmarks template Rework docs/source/robocasa.mdx to follow the standard benchmark doc structure: intro + links + available tasks (with family breakdown and first-class benchmark-group shortcuts) + installation + eval + recommended episodes + policy I/O + training + reproducing results. - Fix the paper link (was pointing at a non-existent arxiv ID). - Surface lerobot/smolvla_robocasa and pepijn223/robocasa_CloseFridge in the top-of-page links so they're findable without reading the training section. - Add an explicit "Object registries" subsection explaining the `--env.obj_registries=[objaverse,lightwheel]` override path. - Add an explicit "Reproducing published results" section pointing at the CI smoke eval. * fix: integrate PR #3375 review feedback - envs(robocasa): hoist the duplicated `_parse_camera_names` helper out of `libero.py` and `robocasa.py` into `envs/utils.py` as the public `parse_camera_names`; call sites updated. - envs(robocasa): give each factory a distinct `episode_index` (`0..n_envs-1`) and derive a per-worker seed series in `reset()` so n_envs workers don't all roll the same scene under a shared outer seed. - envs(robocasa): drop the unused `**kwargs` on `_make_env`; declare `visualization_height` / `visualization_width` on both the wrapper and the `RoboCasaEnv` config + propagate via `gym_kwargs`. - envs(robocasa): emit `info["final_info"]` on termination (matching MetaWorld) so downstream vector-env auto-reset keeps the terminal task/success flags. - docs(robocasa): add `--rename_map` (robot0_agentview_left/ eye_in_hand/agentview_right → camera1/2/3) plus CI-parity flags to all three eval snippets. - docker(robocasa): pin robocasa + robosuite git SHAs and the pip dep versions (pygame, Pillow, opencv-python, pyyaml, pynput, tqdm, termcolor, imageio, h5py, lxml, hidapi, gymnasium) for reproducible benchmark images. - ci(robocasa): update the workflow comment — there is no `lerobot[robocasa]` extra; robocasa/robosuite are installed manually because upstream's `lerobot==0.3.3` pin shadows ours. * docs(robocasa): add benchmark banner image * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Also threads the cached metadata through the RoboCasa factory so async eval on `--env.type=robocasa` keeps the same improvement. * fix: integrate PR #3375 review feedback (round 2) - envs(robocasa): when the caller passes `seed=None` to `reset()`, fall back to `self.episode_index` for the inner env seed so each worker still samples a distinct trajectory instead of all workers inheriting the same global RNG state. - envs(robocasa): replace the two module-level `print()` calls in `create_robocasa_envs` with `logger.info(...)` via a module-level `logger = logging.getLogger(__name__)`. - ci(robocasa): run `scripts/ci/extract_task_descriptions.py` after the eval so `metrics.json` carries per-task natural-language labels, matching LIBERO / MetaWorld / VLABench jobs. Added a `_robocasa_descriptions()` extractor that splits CamelCase task names into word-level labels keyed by `<task>_0`. |