# RoboCasa365 [RoboCasa365](https://robocasa.ai) is a large-scale simulation framework for training and benchmarking **generalist robots** in everyday kitchen tasks. It ships 365 diverse manipulation tasks across 2,500 kitchen environments, 3,200+ object assets and 600+ hours of human demonstration data, on a PandaOmron 12-DOF mobile manipulator (Franka arm on a holonomic base). - Paper: [RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots](https://arxiv.org/abs/2406.02523) - GitHub: [robocasa/robocasa](https://github.com/robocasa/robocasa) - Project website: [robocasa.ai](https://robocasa.ai) - Pretrained policy: [`lerobot/smolvla_robocasa`](https://huggingface.co/lerobot/smolvla_robocasa) - Single-task dataset (CloseFridge): [`pepijn223/robocasa_CloseFridge`](https://huggingface.co/datasets/pepijn223/robocasa_CloseFridge) RoboCasa365 benchmark overview ## Available tasks RoboCasa365 organizes its 365 tasks into two families and three upstream benchmark groups that LeRobot exposes as first-class `--env.task` shortcuts: | Family | Tasks | Description | | --------- | ----- | ------------------------------------------------------------------------------- | | Atomic | ~65 | Single-skill tasks: pick-and-place, door/drawer manipulation, appliance control | | Composite | ~300 | Multi-step tasks across 60+ categories: cooking, cleaning, organizing, etc. | **Atomic task examples:** `CloseFridge`, `OpenDrawer`, `OpenCabinet`, `TurnOnMicrowave`, `TurnOffStove`, `NavigateKitchen`, `PickPlaceCounterToStove`. **Composite task categories:** baking, boiling, brewing, chopping, clearing table, defrosting food, loading dishwasher, making tea, microwaving food, washing dishes, and more. `--env.task` accepts three forms: - a single task name (`CloseFridge`) - a comma-separated list (`CloseFridge,OpenBlenderLid,PickPlaceCoffee`) - a benchmark-group shortcut — `atomic_seen`, `composite_seen`, `composite_unseen`, `pretrain50`, `pretrain100`, `pretrain200`, `pretrain300` — which auto-expands to the upstream task list and auto-sets the dataset `split` (`target` or `pretrain`). ## Installation RoboCasa and its dependency `robosuite` are not published on PyPI, and RoboCasa's own `setup.py` hardcodes `lerobot==0.3.3`, which conflicts with this repo's `lerobot`. LeRobot therefore does **not** expose a `robocasa` extra — install the two packages manually as editable clones (using `--no-deps` on `robocasa` to skip its shadowed `lerobot` pin): ```bash # After following the standard LeRobot installation instructions. git clone https://github.com/robocasa/robocasa.git ~/robocasa git clone https://github.com/ARISE-Initiative/robosuite.git ~/robosuite pip install -e ~/robocasa --no-deps pip install -e ~/robosuite # Robocasa's runtime deps (the ones its setup.py would have pulled, minus # the bad lerobot pin). pip install numpy numba scipy mujoco pygame Pillow opencv-python \ pyyaml pynput tqdm termcolor imageio h5py lxml hidapi \ tianshou gymnasium python -m robocasa.scripts.setup_macros # Lightweight assets (lightwheel object meshes + textures). Enough for # the default env out of the box. python -m robocasa.scripts.download_kitchen_assets \ --type tex tex_generative fixtures_lw objs_lw # Optional: full objaverse/aigen registries (~30GB) for richer object # variety. Enable at eval time via --env.obj_registries (see below). # python -m robocasa.scripts.download_kitchen_assets --type objs_objaverse ``` RoboCasa requires MuJoCo. Set the rendering backend before training or evaluation: ```bash export MUJOCO_GL=egl # for headless servers (HPC, cloud) ``` ### Object registries By default the env samples objects only from the `lightwheel` registry (what `--type objs_lw` ships), which avoids a `Probabilities contain NaN` crash when the objaverse / aigen packs aren't on disk. If you've downloaded the full asset set, enable the full registry at runtime: ```bash --env.obj_registries='[objaverse,lightwheel]' ``` ## Evaluation All eval snippets below mirror the CI command (see `.github/workflows/benchmark_tests.yml`). The `--rename_map` argument maps RoboCasa's native camera keys (`robot0_agentview_left` / `robot0_eye_in_hand` / `robot0_agentview_right`) onto the three-camera (`camera1` / `camera2` / `camera3`) input layout the released `smolvla_robocasa` policy was trained on. ### Single-task evaluation (recommended for quick iteration) ```bash lerobot-eval \ --policy.path=lerobot/smolvla_robocasa \ --env.type=robocasa \ --env.task=CloseFridge \ --eval.batch_size=1 \ --eval.n_episodes=20 \ --eval.use_async_envs=false \ --policy.device=cuda \ '--rename_map={"observation.images.robot0_agentview_left": "observation.images.camera1", "observation.images.robot0_eye_in_hand": "observation.images.camera2", "observation.images.robot0_agentview_right": "observation.images.camera3"}' ``` ### Multi-task evaluation Pass a comma-separated list of tasks: ```bash lerobot-eval \ --policy.path=lerobot/smolvla_robocasa \ --env.type=robocasa \ --env.task=CloseFridge,OpenCabinet,OpenDrawer,TurnOnMicrowave,TurnOffStove \ --eval.batch_size=1 \ --eval.n_episodes=20 \ --eval.use_async_envs=false \ --policy.device=cuda \ '--rename_map={"observation.images.robot0_agentview_left": "observation.images.camera1", "observation.images.robot0_eye_in_hand": "observation.images.camera2", "observation.images.robot0_agentview_right": "observation.images.camera3"}' ``` ### Benchmark-group evaluation Run an entire upstream group (e.g. all 18 `atomic_seen` tasks with `split=target`): ```bash lerobot-eval \ --policy.path=lerobot/smolvla_robocasa \ --env.type=robocasa \ --env.task=atomic_seen \ --eval.batch_size=1 \ --eval.n_episodes=20 \ --eval.use_async_envs=false \ --policy.device=cuda \ '--rename_map={"observation.images.robot0_agentview_left": "observation.images.camera1", "observation.images.robot0_eye_in_hand": "observation.images.camera2", "observation.images.robot0_agentview_right": "observation.images.camera3"}' ``` ### Recommended evaluation episodes **20 episodes per task** for reproducible benchmarking. Matches the protocol used in published results. ## Policy inputs and outputs **Observations** (raw RoboCasa camera names are preserved verbatim): - `observation.state` — 16-dim proprioceptive state (base position, base quaternion, relative end-effector position, relative end-effector quaternion, gripper qpos) - `observation.images.robot0_agentview_left` — left agent view, 256×256 HWC uint8 - `observation.images.robot0_eye_in_hand` — wrist camera view, 256×256 HWC uint8 - `observation.images.robot0_agentview_right` — right agent view, 256×256 HWC uint8 **Actions:** - Continuous control in `Box(-1, 1, shape=(12,))` — base motion (4D) + control mode (1D) + end-effector position (3D) + end-effector rotation (3D) + gripper (1D). ## Training ### Single-task example A ready-to-use single-task dataset is on the Hub: [`pepijn223/robocasa_CloseFridge`](https://huggingface.co/datasets/pepijn223/robocasa_CloseFridge). Fine-tune a SmolVLA base on `CloseFridge`: ```bash lerobot-train \ --policy.type=smolvla \ --policy.repo_id=${HF_USER}/smolvla_robocasa_CloseFridge \ --policy.load_vlm_weights=true \ --policy.push_to_hub=true \ --dataset.repo_id=pepijn223/robocasa_CloseFridge \ --env.type=robocasa \ --env.task=CloseFridge \ --output_dir=./outputs/smolvla_robocasa_CloseFridge \ --steps=100000 \ --batch_size=4 \ --eval_freq=5000 \ --eval.batch_size=1 \ --eval.n_episodes=5 \ --save_freq=10000 ``` Evaluate the resulting checkpoint: ```bash lerobot-eval \ --policy.path=${HF_USER}/smolvla_robocasa_CloseFridge \ --env.type=robocasa \ --env.task=CloseFridge \ --eval.batch_size=1 \ --eval.n_episodes=20 ``` ## Reproducing published results The released checkpoint [`lerobot/smolvla_robocasa`](https://huggingface.co/lerobot/smolvla_robocasa) is evaluated with the commands in the [Evaluation](#evaluation) section. CI runs a 10-atomic-task smoke eval (one episode each) on every PR touching the benchmark, picking fixture-centric tasks that don't require the objaverse asset pack.