🐛 fix v30→v21 converter imports (#101 )

Co-authored-by: Codex <codex@openai.com>
🐛 fix v30_to_v21 ArrowTypeError on pandas extension dtypes
2026-06-26 23:47:01 +00:00 · 2026-05-09 22:31:45 -07:00 · 2026-04-30 07:03:03 +00:00 · 2026-05-01 15:09:36 +08:00 · 2026-04-06 18:25:36 +08:00 · 2026-03-21 15:55:33 +08:00
16 changed files with 1118 additions and 2005 deletions
@@ -20,14 +20,15 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg

 ## 📣 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2026.03.20\]** We have supported Data Conversion from RoboCasa to LeRobot! 🔥🔥🔥
 - **\[2025.10.04\]** We have collected and updated all Dataset Version Conversion Scripts for LeRobot! 🔥🔥🔥
 - **\[2025.09.28\]** We have upgraded LeRobotDataset from v2.1 to v3.0! 🔥🔥🔥
 - **\[2025.06.27\]** We have supported Data Conversion from LIBERO to LeRobot! 🔥🔥🔥
 - **\[2025.05.16\]** We have supported Data Conversion from LeRobot to RLDS! 🔥🔥🔥
- **\[2025.05.12\]** We have supported Data Conversion from RoboMIND to LeRobot! 🔥🔥🔥
 <details>
 <summary>More News</summary>

+- **\[2025.05.12\]** We have supported Data Conversion from RoboMIND to LeRobot! 🔥🔥🔥
 - **\[2025.04.15\]** We add Dataset Merging Tool for merging multi-source lerobot datasets! 🔥🔥🔥
 - **\[2025.04.14\]** We have supported Data Conversion from AgiBotWorld to LeRobot! 🔥🔥🔥
 - **\[2025.04.11\]** We change the repo from `openx2lerobot` to `any4lerobot`, making a universal toolbox for LeRobot! 🔥🔥🔥
@@ -36,6 +37,14 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg

 ## ✨ Features

+- **Dataset Tutorial**:
+
+  - [ ] Dataset Loading
+  - [ ] Dataset Editing
+  - [ ] Dataset Filtering
+  - [ ] Dataset Sampling
+  - [ ] Dataset Merging
+
 - **Data Conversion**:

  - [x] [Open X-Embodiment to LeRobot](./openx2lerobot/README.md)
@@ -43,16 +52,7 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg
  - [x] [RoboMIND to LeRobot](./robomind2lerobot/README.md)
  - [x] [LeRobot to RLDS](./lerobot2rlds/README.md)
  - [x] [LIBERO to LeRobot](./libero2lerobot/README.md)
-
- **Training**:
-
-  - [ ] MultiLeRobotDataset
-
- **Dataset Preprocess**:
-
-  - [x] [Dataset Merging](./dataset_merging/README.md)
-  - [ ] Dataset Filtering
-  - [ ] Dataset Sampling
+  - [x] [RoboCasa to LeRobot](./robocasa2lerobot/README.md)

 - [**Version Conversion**](./ds_version_convert/README.md):

@@ -62,6 +62,10 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg
  - [x] [LeRobotv2.1 to LeRobotv3.0](./ds_version_convert/v21_to_v30/README.md)
  - [x] [LeRobotv3.0 to LeRobotv2.1](./ds_version_convert/v30_to_v21/README.md)

+- **Training**:
+
+  - [ ] MultiLeRobotDataset
+
 - [**Want more features?**](https://github.com/Tavish9/any4lerobot/issues/new?template=feature-request.yml)

 ## 📚 Awesome LeRobot
@@ -73,14 +77,14 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg
 - [OneTwoVLA](https://one-two-vla.github.io/): A Unified Vision-Language-Action Model with Adaptive Reasoning [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/Fanqi-Lin/OneTwoVLA">](https://github.com/Fanqi-Lin/OneTwoVLA)
 - [SmolVLA](https://huggingface.co/blog/smolvla): Efficient Vision-Language-Action Model trained on Lerobot Community Data [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/huggingface/lerobot">](https://github.com/huggingface/lerobot)
 - [SpatialVLA](https://spatialvla.github.io/): a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/SpatialVLA/SpatialVLA">](https://github.com/SpatialVLA/SpatialVLA)
- [openpi](https://www.physicalintelligence.company/blog/pi0): the official implemenation of $π_0$: A Vision-Language-Action Flow Model for General Robot Control [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/Physical-Intelligence/openpi">](https://github.com/Physical-Intelligence/openpi)
+- [openpi](https://www.physicalintelligence.company/blog/pi0): the official implementation of $π_0$: A Vision-Language-Action Flow Model for General Robot Control [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/Physical-Intelligence/openpi">](https://github.com/Physical-Intelligence/openpi)
 - [Isaac-GR00T](https://developer.nvidia.com/isaac/gr00t): NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/NVIDIA/Isaac-GR00T">](https://github.com/NVIDIA/Isaac-GR00T)

 ### Dataset

 - [Official](https://huggingface.co/lerobot): State-of-the-art Machine Learning for real-world robotics.
- [IPEC-COMMUNITY/OpenX](https://huggingface.co/collections/IPEC-COMMUNITY/openx-lerobot-67c29b2ee5911f17dbea635e): Open X-Embodiment datasets in LeRobot format with standard transfomation
- [IPEC-COMMUNITY/LIBERO](https://huggingface.co/collections/IPEC-COMMUNITY/libero-benchmark-dataset-684837af28d465aa8b043950): LIBERO datasets in LeRobot format with standard transfomation and filtering
+- [IPEC-COMMUNITY/OpenX](https://huggingface.co/collections/IPEC-COMMUNITY/openx-lerobot-67c29b2ee5911f17dbea635e): Open X-Embodiment datasets in LeRobot format with standard transformation
+- [IPEC-COMMUNITY/LIBERO](https://huggingface.co/collections/IPEC-COMMUNITY/libero-benchmark-dataset-684837af28d465aa8b043950): LIBERO datasets in LeRobot format with standard transformation and filtering
 - [weijian-sun/agibotworld-lerobot](https://huggingface.co/datasets/weijian-sun/agibotworld-lerobot): AgibotWorld-LeRobot v2.0
 - [GR00T-Dateset](https://huggingface.co/GR00T-Dateset): Isaac-GR00T training dataset
 - [nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim](https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim): Isaac-GR00T training dataset
@@ -105,7 +109,7 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg
 - [LeKiwi](https://github.com/SIGRobotics-UIUC/LeKiwi): Low-Cost Mobile Manipulator [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/SIGRobotics-UIUC/LeKiwi">](https://github.com/SIGRobotics-UIUC/LeKiwi)
 - [XLeRobot](https://github.com/Vector-Wangel/XLeRobot): Fully Autonomous Household Dual-Arm Mobile Robot for $998 [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/Vector-Wangel/XLeRobot">](https://github.com/Vector-Wangel/XLeRobot)
 - [LeRobot-Kinematics](https://github.com/box2ai-robotics/lerobot-kinematics): Simple and Accurate Forward and Inverse Kinematics Examples for the Lerobot SO100 ARM [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/box2ai-robotics/lerobot-kinematics">](https://github.com/box2ai-robotics/lerobot-kinematics)
- [lerobotdepot](https://github.com/maximilienroberti/lerobotdepot): a reoi for hardware, components, and 3D-printable projects compatible with the LeRobot library [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/maximilienroberti/lerobotdepot">](https://github.com/maximilienroberti/lerobotdepot)
+- [lerobotdepot](https://github.com/maximilienroberti/lerobotdepot): a repo for hardware, components, and 3D-printable projects compatible with the LeRobot library [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/maximilienroberti/lerobotdepot">](https://github.com/maximilienroberti/lerobotdepot)
 - [PingTi-Arm](https://github.com/nomorewzx/PingTi-Arm): A human-scale robotic arm compatible with Lerobot, based on SO100 [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/nomorewzx/PingTi-Arm">](https://github.com/nomorewzx/PingTi-Arm)
 - [LeDog](https://github.com/wuxiaoqiang12/LeDog): A mobile manipulator with a Weilan AlphaDog and a LeRobot-compatible SO-101 arm, with support for ROS 1/2 (Official: [ledog_ros2](https://gitcode.com/openeuler/ledog_ros2)) [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/wuxiaoqiang12/LeDog">](https://github.com/wuxiaoqiang12/LeDog)

@@ -130,7 +134,6 @@ A curated collection of utilities for [LeRobot Projects](https://github.com/hugg
 - [LeRobot Dataset Visualizer](https://github.com/huggingface/lerobot-dataset-visualizer): Web application for visualizing robotics datasets in LeRobot format [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/huggingface/lerobot-dataset-visualizer">](https://github.com/huggingface/lerobot-dataset-visualizer)
 - [lerobot_so101_teleop](https://github.com/liorbenhorin/lerobot_so101_teleop): Sample Environment for the LeRobot SO-101 Robot in Isaac Lab to collect demonstrations in a simulation [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/liorbenhorin/lerobot_so101_teleop">](https://github.com/liorbenhorin/lerobot_so101_teleop)
 - [Robot Learning: A Tutorial](https://github.com/fracapuano/robot-learning-tutorial): All the source code for "Robot Learning: A Tutorial". Get involved to be featured in the next iteration [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/fracapuano/robot-learning-tutorial">](https://github.com/fracapuano/robot-learning-tutorial)
- [LERO](https://github.com/masato-ka/lero): LeRobot dataset Operations toolkit [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/masato-ka/lero">](https://github.com/masato-ka/lero)
 - [CRISP](https://utiasdsl.github.io/crisp_controllers/): Record datasets and deploy policies using LeRobot and ROS2-compatible manipulators (Franka Robotics FR3 and more supported) [<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/utiasdsl/crisp_controllers">](https://github.com/utiasdsl/crisp_controllers)

 ## 👷‍♂️ Contributing
@@ -152,7 +155,7 @@ If you find this repository helpful in your research or projects, please conside
 ```bibtex
@misc{any4lerobot,
  title        = {Any4LeRobot: A tool collection for LeRobot},
-  author       = {Qizhi Chen},
+  author       = {Qizhi Chen, Dong Wang, Bin Zhao},
  license      = {MIT},
  url          = {https://github.com/Tavish9/any4lerobot},
  year         = {2025},
@@ -1,68 +1,112 @@
 import argparse
-import gc
+import inspect
 import shutil
 import tempfile
 from pathlib import Path

 import numpy as np
+import pyarrow as pa
+import pyarrow.parquet as pq
 import ray
 import torch
 from agibot_utils.agibot_utils import get_task_info, load_local_dataset
 from agibot_utils.config import AgiBotWorld_TASK_TYPE
 from agibot_utils.lerobot_utils import compute_episode_stats, generate_features_from_config
-from lerobot.datasets.lerobot_dataset import LeRobotDataset
-from lerobot.datasets.utils import validate_episode_buffer, validate_frame
+from lerobot.datasets import LeRobotDataset, LeRobotDatasetMetadata
+from lerobot.datasets.dataset_writer import DatasetWriter
+from lerobot.datasets.feature_utils import get_hf_features_from_features, validate_episode_buffer, validate_frame
+from lerobot.datasets.utils import DEFAULT_EPISODES_PATH
 from ray.runtime_env import RuntimeEnv


-class AgiBotDataset(LeRobotDataset):
+class AgiBotDatasetMetadata(LeRobotDatasetMetadata):
+    def _flush_metadata_buffer(self) -> None:
+        """Write all buffered episode metadata to parquet file."""
+        if not hasattr(self, "_metadata_buffer") or len(self._metadata_buffer) == 0:
+            return
+
+        combined_dict = {}
+        for episode_dict in self._metadata_buffer:
+            for key, value in episode_dict.items():
+                if key not in combined_dict:
+                    combined_dict[key] = []
+                # Extract value and serialize numpy arrays
+                # because PyArrow's from_pydict function doesn't support numpy arrays
+                val = value[0] if isinstance(value, list) else value
+                combined_dict[key].append(val.tolist() if isinstance(val, np.ndarray) else val)
+
+        first_ep = self._metadata_buffer[0]
+        chunk_idx = first_ep["meta/episodes/chunk_index"][0]
+        file_idx = first_ep["meta/episodes/file_index"][0]
+
+        schema = None if not self._pq_writer else self._pq_writer.schema
+        table = pa.Table.from_pydict(combined_dict, schema=schema)
+
+        if not self._pq_writer:
+            path = Path(self.root / DEFAULT_EPISODES_PATH.format(chunk_index=chunk_idx, file_index=file_idx))
+            path.parent.mkdir(parents=True, exist_ok=True)
+            self._pq_writer = pq.ParquetWriter(path, schema=table.schema, compression="snappy", use_dictionary=True)
+
+        self._pq_writer.write_table(table)
+
+        self.latest_episode = self._metadata_buffer[-1]
+        self._metadata_buffer.clear()
+
+
+class AgiBotDatasetWriter(DatasetWriter):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.hf_features = get_hf_features_from_features(self._meta.features)
+
    def add_frame(self, frame: dict) -> None:
        """
-        This function only adds the frame to the episode_buffer. Apart from images — which are written in a
-        temporary directory — nothing is written to disk. To save those frames, the 'save_episode()' method
-        then needs to be called.
+        Add a single frame to the current episode buffer.
+
+        Apart from images written to a temporary directory, nothing is written to disk
+        until ``save_episode()`` is called.
+
+        The caller must provide all user-defined features plus ``"task"``, and must
+        not provide ``"timestamp"`` or ``"frame_index"``; those are computed
+        automatically.
        """
        # Convert torch to numpy if needed
        for name in frame:
            if isinstance(frame[name], torch.Tensor):
                frame[name] = frame[name].numpy()

-        features = {key: value for key, value in self.features.items() if key in self.hf_features}  # remove video keys
+        features = {
+            key: value for key, value in self._meta.features.items() if key in self.hf_features
+        }  # remove video keys
        validate_frame(frame, features)

        if self.episode_buffer is None:
-            self.episode_buffer = self.create_episode_buffer()
+            self.episode_buffer = self._create_episode_buffer()

        # Automatically add frame_index and timestamp to episode buffer
        frame_index = self.episode_buffer["size"]
-        timestamp = frame.pop("timestamp") if "timestamp" in frame else frame_index / self.fps
+        timestamp = frame_index / self._meta.fps
        self.episode_buffer["frame_index"].append(frame_index)
        self.episode_buffer["timestamp"].append(timestamp)
-        self.episode_buffer["task"].append(frame.pop("task"))  # Remove task from frame after processing
+        self.episode_buffer["task"].append(frame.pop("task"))

        # Add frame features to episode_buffer
        for key, value in frame.items():
-            if key not in self.features:
+            if key not in self._meta.features:
                raise ValueError(
-                    f"An element of the frame is not in the features. '{key}' not in '{self.features.keys()}'."
+                    f"An element of the frame is not in the features. '{key}' not in '{self._meta.features.keys()}'."
                )

            self.episode_buffer[key].append(value)

        self.episode_buffer["size"] += 1

-    def save_episode(self, videos: dict, action_config: list, episode_data: dict | None = None) -> None:
-        """
-        This will save to disk the current episode in self.episode_buffer.
-
-        Args:
-            episode_data (dict | None, optional): Dict containing the episode data to save. If None, this will
-                save the current episode in self.episode_buffer, which is filled with 'add_frame'. Defaults to
-                None.
-        """
+    def save_episode(
+        self, videos: dict, action_config: list, episode_data: dict | None = None, parallel_encoding: bool = True
+    ) -> None:
+        """Save the current episode in self.episode_buffer to disk."""
        episode_buffer = episode_data if episode_data is not None else self.episode_buffer

-        validate_episode_buffer(episode_buffer, self.meta.total_episodes, self.features)
+        validate_episode_buffer(episode_buffer, self._meta.total_episodes, self._meta.features)

        # size and task are special cases that won't be added to hf_dataset
        episode_length = episode_buffer.pop("size")
@@ -70,53 +114,49 @@ class AgiBotDataset(LeRobotDataset):
        episode_tasks = list(set(tasks))
        episode_index = episode_buffer["episode_index"]

-        episode_buffer["index"] = np.arange(self.meta.total_frames, self.meta.total_frames + episode_length)
+        episode_buffer["index"] = np.arange(self._meta.total_frames, self._meta.total_frames + episode_length)
        episode_buffer["episode_index"] = np.full((episode_length,), episode_index)

        # Update tasks and task indices with new tasks if any
-        self.meta.save_episode_tasks(episode_tasks)
+        self._meta.save_episode_tasks(episode_tasks)

        # Given tasks in natural language, find their corresponding task indices
-        episode_buffer["task_index"] = np.array([self.meta.get_task_index(task) for task in tasks])
+        episode_buffer["task_index"] = np.array([self._meta.get_task_index(task) for task in tasks])

-        for key, ft in self.features.items():
+        for key, ft in self._meta.features.items():
            # index, episode_index, task_index are already processed above, and image and video
            # are processed separately by storing image path and frame info as meta data
            if key in ["index", "episode_index", "task_index"] or ft["dtype"] in ["video"]:
                continue
            episode_buffer[key] = np.stack(episode_buffer[key]).squeeze()

-        for key in self.meta.video_keys:
-            episode_buffer[key] = str(videos[key])  # PosixPath -> str
+        for key in self._meta.video_keys:
+            episode_buffer[key] = str(videos[key])

-        ep_stats = compute_episode_stats(episode_buffer, self.features)
+        ep_stats = compute_episode_stats(episode_buffer, self._meta.features)

        ep_metadata = self._save_episode_data(episode_buffer)
-        has_video_keys = len(self.meta.video_keys) > 0
-        use_batched_encoding = self.batch_encoding_size > 1
+        has_video_keys = len(self._meta.video_keys) > 0
+        use_batched_encoding = self._batch_encoding_size > 1

        self.current_videos = videos
        if has_video_keys and not use_batched_encoding:
-            for video_key in self.meta.video_keys:
+            for video_key in self._meta.video_keys:
                ep_metadata.update(self._save_episode_video(video_key, episode_index))

-        # `meta.save_episode` be executed after encoding the videos
-        # add action_config to current episode
        ep_metadata.update({"action_config": action_config})
-        self.meta.save_episode(episode_index, episode_length, episode_tasks, ep_stats, ep_metadata)
+        self._meta.save_episode(episode_index, episode_length, episode_tasks, ep_stats, ep_metadata)

        if has_video_keys and use_batched_encoding:
-            # Check if we should trigger batch encoding
-            self.episodes_since_last_encoding += 1
-            if self.episodes_since_last_encoding == self.batch_encoding_size:
-                start_ep = self.num_episodes - self.batch_encoding_size
-                end_ep = self.num_episodes
+            self._episodes_since_last_encoding += 1
+            if self._episodes_since_last_encoding == self._batch_encoding_size:
+                start_ep = self._meta.total_episodes - self._batch_encoding_size
+                end_ep = self._meta.total_episodes
                self._batch_save_episode_video(start_ep, end_ep)
-                self.episodes_since_last_encoding = 0
+                self._episodes_since_last_encoding = 0

        if not episode_data:
-            # Reset episode buffer and clean up temporary images (if not already deleted during video encoding)
-            self.clear_episode_buffer(delete_images=len(self.meta.image_keys) > 0)
+            self.clear_episode_buffer(delete_images=len(self._meta.image_keys) > 0)

    def _encode_temporary_episode_video(self, video_key: str, episode_index: int) -> Path:
        """
@@ -124,11 +164,47 @@ class AgiBotDataset(LeRobotDataset):
        Note: `encode_video_frames` is a blocking call. Making it asynchronous shouldn't speedup encoding,
        since video encoding with ffmpeg is already using multithreading.
        """
-        temp_path = Path(tempfile.mkdtemp(dir=self.root)) / f"{video_key}_{episode_index:03d}.mp4"
+        temp_path = Path(tempfile.mkdtemp(dir=self._root)) / f"{video_key}_{episode_index:03d}.mp4"
        shutil.copy(self.current_videos[video_key], temp_path)
        return temp_path


+class AgiBotDataset(LeRobotDataset):
+    @classmethod
+    def create(cls, *args, **kwargs) -> "AgiBotDataset":
+        sig = inspect.signature(super().create)
+        bound = sig.bind_partial(*args, **kwargs)
+        bound.apply_defaults()
+        params = bound.arguments
+
+        obj = super().create(*args, **kwargs)
+
+        shutil.rmtree(params["root"], ignore_errors=True)
+        obj.meta: AgiBotDatasetMetadata = AgiBotDatasetMetadata.create(
+            repo_id=params["repo_id"],
+            fps=params["fps"],
+            robot_type=params["robot_type"],
+            features=params["features"],
+            root=params["root"],
+            use_videos=params["use_videos"],
+            metadata_buffer_size=params["metadata_buffer_size"],
+        )
+        obj.writer: AgiBotDatasetWriter = AgiBotDatasetWriter(
+            meta=obj.meta,
+            root=obj.root,
+            vcodec=obj._vcodec,
+            encoder_threads=obj._encoder_threads,
+            batch_encoding_size=obj._batch_encoding_size,
+        )
+        return obj
+
+    def save_episode(
+        self, videos: dict, action_config: list, episode_data: dict | None = None, parallel_encoding: bool = True
+    ) -> None:
+        self._require_writer("save_episode")
+        self.writer.save_episode(videos, action_config, episode_data, parallel_encoding)
+
+
 def get_all_tasks(src_path: Path, output_path: Path):
    json_files = src_path.glob("task_info/*.json")
    for json_file in json_files:
@@ -191,11 +267,13 @@ def save_as_lerobot_dataset(agibot_world_config, task: tuple[Path, Path], save_d
            dataset.save_episode(videos=videos, action_config=action_config)
        except Exception as e:
            print(f"{json_file.stem}, episode_{eid}: there are some corrupted mp4s\nException details: {str(e)}")
-            dataset.episode_buffer = None
+            dataset.clear_episode_buffer(delete_images=False)
            continue
-        gc.collect()
+
        print(f"process done for {json_file.stem}, episode_id {eid}, len {len(frames)}")

+    dataset.finalize()
+

 def main(
    src_path: str,
@@ -247,8 +325,6 @@ def main(
                with open("output.txt", "a") as f:
                    f.write(f"{task}, exception details: {str(e)}\n")

-        ray.shutdown()
-

 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
@@ -1,9 +1,11 @@
 import numpy as np
-import torch
-import torchvision
-from lerobot.datasets.compute_stats import auto_downsample_height_width, get_feature_stats, sample_indices
-
-torchvision.set_video_backend("pyav")
+from lerobot.datasets.compute_stats import (
+    DEFAULT_QUANTILES,
+    auto_downsample_height_width,
+    get_feature_stats,
+    sample_indices,
+)
+from torchcodec.decoders import VideoDecoder


 def generate_features_from_config(AgiBotWorld_CONFIG):
@@ -20,9 +22,8 @@ def generate_features_from_config(AgiBotWorld_CONFIG):
 def sample_images(input):
    if type(input) is str:
        video_path = input
-        reader = torchvision.io.VideoReader(video_path, stream="video")
-        frames = [frame["data"] for frame in reader]
-        frames_array = torch.stack(frames).numpy()  # Shape: [T, C, H, W]
+        decoder = VideoDecoder(video_path)
+        frames_array = decoder[0:-1].numpy()  # Shape: [T, C, H, W]

        sampled_indices = sample_indices(len(frames_array))
        images = None
@@ -50,21 +51,31 @@ def sample_images(input):
    return images


-def compute_episode_stats(episode_data: dict[str, list[str] | np.ndarray], features: dict) -> dict:
+def compute_episode_stats(
+    episode_data: dict[str, list[str] | np.ndarray],
+    features: dict,
+    quantile_list: list[float] | None = None,
+) -> dict:
+    if quantile_list is None:
+        quantile_list = DEFAULT_QUANTILES
+
    ep_stats = {}
    for key, data in episode_data.items():
        if features[key]["dtype"] == "string":
-            continue  # HACK: we should receive np.arrays of strings
+            continue
+
        elif features[key]["dtype"] in ["image", "video"]:
            ep_ft_array = sample_images(data)
-            axes_to_reduce = (0, 2, 3)  # keep channel dim
+            axes_to_reduce = (0, 2, 3)
            keepdims = True
        else:
-            ep_ft_array = data  # data is already a np.ndarray
-            axes_to_reduce = 0  # compute stats over the first axis
-            keepdims = data.ndim == 1  # keep as np.array
+            ep_ft_array = data
+            axes_to_reduce = 0
+            keepdims = data.ndim == 1

-        ep_stats[key] = get_feature_stats(ep_ft_array, axis=axes_to_reduce, keepdims=keepdims)
+        ep_stats[key] = get_feature_stats(
+            ep_ft_array, axis=axes_to_reduce, keepdims=keepdims, quantile_list=quantile_list
+        )

        if features[key]["dtype"] in ["image", "video"]:
            value_norm = 1.0 if "depth" in key else 255.0
@@ -1,89 +0,0 @@
-
-# Dataset Merger Tool README
-## 一、简介
-merge_lerobot_dataset是一个功能强大的Python脚本，专门用于合并多个具有相似结构的数据集。它特别适用于包含视频、状态和动作数据的lerobot自采数据集，能够高效地整合分散的episodes和分散的任务资源，为后续的模型训练等工作提供便利。
-
-## 二、功能概述
-1. **数据集合并**：将多个源数据集文件夹合并为一个统一的数据集，简化数据管理。
-2. **索引重编号**：重新编号所有的episode索引和任务索引，确保合并后数据的连续性和一致性。
-3. **向量维度填充**：自动检测并填充向量维度，使所有数据在observation.state和action等方面具有一致性。
-4. **统计信息合并**：智能合并多个数据集的统计信息，正确处理复杂的数据结构，如图像特征的嵌套结构。
-5. **图像视频验证**：如果有图片 `images` 文件夹的话，自动检测视频 / 图像 / 元数据文件之间的数量关系。
-6. **视频文件处理**：正确处理和复制视频文件，保持视频与其他数据之间的正确索引关系，支持多种视频存储结构。
-7. **元数据更新**：更新所有元数据文件，准确反映合并后的数据集结构。
-8. **图像文件处理**：复制图像文件，保持图片与其他数据之间的正确索引关系。
-
-
-## 三、安装
-本脚本依赖于以下Python库：
- numpy
- pandas
-
-安装依赖项的命令如下：
-```bash
-pip install numpy pandas
-```
-
-## 四、使用方法
-### （一）基本用法
-通过命令行运行工具，示例如下：
-```bash
-python dataset_merger.py --sources /path/to/dataset1 /path/to/dataset2 /path/to/dataset3 --output /path/to/output_dataset
-```
-### （二）命令行参数
- **--sources**：源数据集文件夹路径列表，至少需要指定一个源数据集路径。
- **--output**：输出数据集文件夹路径，用于指定合并后数据集的存储位置。
- **--state_max_dim**：状态向量的最大维度，默认值为32。
- **--action_max_dim**：动作向量的最大维度，默认值为32。
- **--fps**：数据集的帧率，默认值为20。
- **--copy_images**: 是否将图像从源文件夹复制且合并到输出文件夹。(default: `False`)
-
-### （三）示例
-```bash
-python dataset_merger.py --sources ./robot_dataset_1 ./robot_dataset_2 --output ./merged_dataset --state_max_dim 32 --action_max_dim 18 --fps 30
-```
-
-## 五、数据集格式
-此工具假设输入数据集具有以下结构：
-```
-dataset/
-├── meta/
-│   ├── episodes.jsonl       # 包含每个episode的元数据
-│   ├── episodes_stats.jsonl # 每个episode的统计数据
-│   ├── info.json           # 数据集全局信息
-│   ├── stats.json          # 全局统计信息
-│   └── tasks.jsonl         # 任务定义
-├── data/                   # 包含parquet格式的episode数据
-│   └── chunk-xxx/
-│       └── episode_xxxxxx.parquet
-├── images/                   # 可选的图片文件
-│   └── episode_xxxxxx/
-│       └── frame_xxxxxx.png
-└── videos/                 # 可选的视频文件
-    └── chunk-xxx/
-        └── video_key/
-            └── episode_xxxxxx.mp4 
-```
-
-## 六、功能细节
-1. **数据一致性处理**：自动检测并填充状态和动作向量维度，确保所有数据具有一致的维度，满足机器学习算法对数据格式的要求。
-2. **索引管理**：重新编号所有episode和任务索引，同时维护帧索引的连续性，避免数据混乱。
-3. **统计信息合并**：智能合并多个数据集的统计数据，能够正确处理复杂的数据结构，如图像特征的嵌套结构，确保统计信息的准确性和完整性。
-4. **视频文件处理**：正确复制视频文件，并保持视频与其他数据之间的正确索引关系，支持多种视频存储结构，保证视频数据与其他数据的同步性。
-5. **任务映射**：自动检测并合并相同的任务描述，创建新的任务索引映射，方便对任务进行统一管理和调用。
-6. **数据预验证**：在执行合并操作前对数据集执行全面的预验证，检查视频帧数、图片数量与元数据中记录的帧长度的一致性，确保合并后数据的准确性和完整性，并自动修复可修复的问题（从图片重新编码视频）。
-7. **图像文件管理**：复制和整理图像文件，保持正确的命名规则和目录结构，确保图像与视频和其他数据保持正确的索引对应关系，支持按需启用图像复制功能来优化存储空间使用。
-
-## 七、注意事项
-1. 确保所有源数据集具有兼容的结构，否则可能导致合并失败或数据错误。
-2. 合并后的数据集可能占用较大磁盘空间，在进行合并操作前，请确保有足够的存储空间。
-3. 对于非常大的数据集，合并过程可能需要较长时间，请耐心等待。
-4. 图像文件夹占用磁盘空间很大，默认不开启 `copy_images` 参数。本工具在处理 `images` 文件夹下的图片时，**仅支持 PNG 格式的图片**，且要求图片文件名为 `frame_XXXXXX.png`（X为6位数字，例如 `frame_000001.png`）。合并过程中会自动检测并处理这些 PNG 图片。
-
-## 八、常见问题
-1. **Q: 合并不同维度的数据集会发生什么？**
-**A**: 工具会自动检测最大维度并用零填充较小维度的向量，确保所有数据具有一致的维度。
-2. **Q: 如何处理不同FPS的数据集？**
-**A**: 暂时只支持相同FPS的数据集合并。
-3. **Q: 能否只合并某些特定episode？**
-**A**: 当前版本会合并所有数据。如需更精细的控制，您可以先筛选数据集，然后再进行合并。也可以全部合并，然后使用lerobot加载特定的episode。
@@ -32,6 +32,11 @@ import pyarrow.parquet as pq
 import tqdm
 from datasets import Dataset
 from huggingface_hub import snapshot_download
+from lerobot.datasets.io_utils import (
+    load_info,
+    load_tasks,
+    write_info,
+)
 from lerobot.datasets.utils import (
    DEFAULT_CHUNK_SIZE,
    DEFAULT_DATA_PATH,
@@ -40,11 +45,8 @@ from lerobot.datasets.utils import (
    LEGACY_EPISODES_PATH,
    LEGACY_EPISODES_STATS_PATH,
    LEGACY_TASKS_PATH,
-    load_info,
-    load_tasks,
    serialize_dict,
    unflatten_dict,
-    write_info,
 )
 from lerobot.utils.constants import HF_LEROBOT_HOME
 from lerobot.utils.utils import init_logging
@@ -52,8 +54,12 @@ from lerobot.utils.utils import init_logging
 V21 = "v2.1"
 V30 = "v3.0"

-LEGACY_DATA_PATH_TEMPLATE = "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet"
-LEGACY_VIDEO_PATH_TEMPLATE = "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4"
+LEGACY_DATA_PATH_TEMPLATE = (
+    "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet"
+)
+LEGACY_VIDEO_PATH_TEMPLATE = (
+    "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4"
+)
 MIN_VIDEO_DURATION = 1e-6
 LEGACY_STATS_KEYS = ("mean", "std", "min", "max", "count")

@@ -137,7 +143,9 @@ def convert_info(
        if ft.get("dtype") != "video":
            ft.pop("fps", None)

-    info["total_chunks"] = math.ceil(total_episodes / chunks_size) if total_episodes > 0 else 0
+    info["total_chunks"] = (
+        math.ceil(total_episodes / chunks_size) if total_episodes > 0 else 0
+    )
    info["total_videos"] = total_episodes * len(video_keys)

    write_info(info, new_root)
@@ -156,14 +164,22 @@ def _group_episodes_by_data_file(
    return grouped


-def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any]]) -> None:
+def convert_data(
+    root: Path, new_root: Path, episode_records: list[dict[str, Any]]
+) -> None:
    logging.info("Converting consolidated parquet files back to per-episode files")
    grouped = _group_episodes_by_data_file(episode_records)

-    for (chunk_idx, file_idx), records in tqdm.tqdm(grouped.items(), desc="convert data files"):
-        source_path = root / DEFAULT_DATA_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
+    for (chunk_idx, file_idx), records in tqdm.tqdm(
+        grouped.items(), desc="convert data files"
+    ):
+        source_path = root / DEFAULT_DATA_PATH.format(
+            chunk_index=chunk_idx, file_index=file_idx
+        )
        if not source_path.exists():
-            raise FileNotFoundError(f"Expected source parquet file not found: {source_path}")
+            raise FileNotFoundError(
+                f"Expected source parquet file not found: {source_path}"
+            )

        table = pq.read_table(source_path)
        records = sorted(records, key=lambda rec: int(rec["dataset_from_index"]))
@@ -181,7 +197,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
                    f"episode_index={episode_index}, length={length}"
                )

-            episode_table = table.slice(start, length).to_pandas()
+            episode_table = table.slice(start, length)

            dest_chunk = episode_index // DEFAULT_CHUNK_SIZE
            dest_path = new_root / LEGACY_DATA_PATH_TEMPLATE.format(
@@ -189,7 +205,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
                episode_index=episode_index,
            )
            dest_path.parent.mkdir(parents=True, exist_ok=True)
-            Dataset.from_pandas(episode_table).to_parquet(dest_path)
+            Dataset(episode_table).to_parquet(dest_path)


 def _group_episodes_by_video_file(
@@ -235,10 +251,14 @@ def _validate_video_paths(src: Path, dst: Path) -> None:
    # Validate file extensions for video files
    valid_video_extensions = {".mp4", ".avi", ".mov", ".mkv", ".webm", ".m4v"}
    if src_resolved.suffix.lower() not in valid_video_extensions:
-        raise ValueError(f"Source file does not have a valid video extension: {src_resolved}")
+        raise ValueError(
+            f"Source file does not have a valid video extension: {src_resolved}"
+        )

    if dst_resolved.suffix.lower() not in valid_video_extensions:
-        raise ValueError(f"Destination file does not have a valid video extension: {dst_resolved}")
+        raise ValueError(
+            f"Destination file does not have a valid video extension: {dst_resolved}"
+        )

    # Check for path traversal attempts in the original paths
    src_str = str(src)
@@ -253,11 +273,16 @@ def _validate_video_paths(src: Path, dst: Path) -> None:

    # Additional check: ensure resolved paths don't point to system directories
    system_dirs = {"/etc", "/sys", "/proc", "/dev", "/boot", "/root"}
-    for resolved_path, name in [(src_resolved, "source"), (dst_resolved, "destination")]:
+    for resolved_path, name in [
+        (src_resolved, "source"),
+        (dst_resolved, "destination"),
+    ]:
        path_str = str(resolved_path)
        for sys_dir in system_dirs:
            if path_str.startswith(sys_dir + "/") or path_str == sys_dir:
-                raise ValueError(f"Path points to system directory: {name} path {resolved_path}")
+                raise ValueError(
+                    f"Path points to system directory: {name} path {resolved_path}"
+                )

    # Ensure the destination directory can be created safely
    try:
@@ -324,9 +349,13 @@ def _extract_video_segment(
            text=True,
        )
    except subprocess.TimeoutExpired as exc:
-        raise RuntimeError(f"ffmpeg timed out while processing video '{src}' -> '{dst}'") from exc
+        raise RuntimeError(
+            f"ffmpeg timed out while processing video '{src}' -> '{dst}'"
+        ) from exc
    except FileNotFoundError as exc:
-        raise RuntimeError("ffmpeg executable not found; it is required for video conversion") from exc
+        raise RuntimeError(
+            "ffmpeg executable not found; it is required for video conversion"
+        ) from exc
    except subprocess.CalledProcessError as exc:
        error_msg = f"ffmpeg failed while splitting video '{src}' into '{dst}'"
        if exc.stderr:
@@ -334,7 +363,12 @@ def _extract_video_segment(
        raise RuntimeError(error_msg) from exc


-def convert_videos(root: Path, new_root: Path, episode_records: list[dict[str, Any]], video_keys: list[str]) -> None:
+def convert_videos(
+    root: Path,
+    new_root: Path,
+    episode_records: list[dict[str, Any]],
+    video_keys: list[str],
+) -> None:
    if len(video_keys) == 0:
        logging.info("No video features detected; skipping video conversion")
        return
@@ -347,7 +381,9 @@ def convert_videos(root: Path, new_root: Path, episode_records: list[dict[str, A
            logging.info("No video metadata found for key '%s'; skipping", video_key)
            continue

-        for (chunk_idx, file_idx), records in tqdm.tqdm(grouped.items(), desc=f"convert videos ({video_key})"):
+        for (chunk_idx, file_idx), records in tqdm.tqdm(
+            grouped.items(), desc=f"convert videos ({video_key})"
+        ):
            src_path = root / DEFAULT_VIDEO_PATH.format(
                video_key=video_key,
                chunk_index=chunk_idx,
@@ -356,7 +392,10 @@ def convert_videos(root: Path, new_root: Path, episode_records: list[dict[str, A
            if not src_path.exists():
                raise FileNotFoundError(f"Expected MP4 file not found: {src_path}")

-            records = sorted(records, key=lambda rec: float(rec[f"videos/{video_key}/from_timestamp"]))
+            records = sorted(
+                records,
+                key=lambda rec: float(rec[f"videos/{video_key}/from_timestamp"]),
+            )

            for record in records:
                episode_index = int(record["episode_index"])
@@ -373,7 +412,9 @@ def convert_videos(root: Path, new_root: Path, episode_records: list[dict[str, A
                _extract_video_segment(src_path, dest_path, start=start, end=end)


-def convert_episodes_metadata(new_root: Path, episode_records: list[dict[str, Any]]) -> None:
+def convert_episodes_metadata(
+    new_root: Path, episode_records: list[dict[str, Any]]
+) -> None:
    logging.info("Reconstructing legacy episodes and episodes_stats JSONL files")

    episodes_path = new_root / LEGACY_EPISODES_PATH
@@ -396,7 +437,9 @@ def convert_episodes_metadata(new_root: Path, episode_records: list[dict[str, An
        jsonlines.open(episodes_path, mode="w") as episodes_writer,
        jsonlines.open(stats_path, mode="w") as stats_writer,
    ):
-        for record in sorted(episode_records, key=lambda rec: int(rec["episode_index"])):
+        for record in sorted(
+            episode_records, key=lambda rec: int(rec["episode_index"])
+        ):
            legacy_episode = {
                key: value
                for key, value in record.items()
@@ -407,10 +450,14 @@ def convert_episodes_metadata(new_root: Path, episode_records: list[dict[str, An
                and key not in {"dataset_from_index", "dataset_to_index"}
            }

-            serializable_episode = {key: _to_serializable(value) for key, value in legacy_episode.items()}
+            serializable_episode = {
+                key: _to_serializable(value) for key, value in legacy_episode.items()
+            }
            episodes_writer.write(serializable_episode)

-            stats_flat = {key: record[key] for key in record if key.startswith("stats/")}
+            stats_flat = {
+                key: record[key] for key in record if key.startswith("stats/")
+            }
            stats_nested = unflatten_dict(stats_flat).get("stats", {})
            stats_serialized = serialize_dict(_filter_stats(stats_nested))
            stats_writer.write(
@@ -453,7 +500,11 @@ def convert_dataset(
    new_root.mkdir(parents=True, exist_ok=True)

    episode_records = load_episode_records(root)
-    video_keys = [key for key, ft in load_info(root)["features"].items() if ft.get("dtype") == "video"]
+    video_keys = [
+        key
+        for key, ft in load_info(root)["features"].items()
+        if ft.get("dtype") == "video"
+    ]

    convert_info(root, new_root, episode_records, video_keys)
    convert_tasks(root, new_root)
@@ -8,6 +8,7 @@ import pandas as pd
 import ray
 from datatrove.executor import LocalPipelineExecutor, RayPipelineExecutor
 from datatrove.pipeline.base import PipelineStep
+from lerobot.datasets import LeRobotDataset, LeRobotDatasetMetadata
 from lerobot.datasets.aggregate import (
    aggregate_data,
    aggregate_metadata,
@@ -15,14 +16,11 @@ from lerobot.datasets.aggregate import (
    aggregate_videos,
    validate_all_metadata,
 )
-from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
+from lerobot.datasets.io_utils import write_info, write_stats, write_tasks
 from lerobot.datasets.utils import (
    DEFAULT_CHUNK_SIZE,
    DEFAULT_DATA_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
-    write_info,
-    write_stats,
-    write_tasks,
 )
 from libero_utils.config import LIBERO_FEATURES
 from libero_utils.libero_utils import load_local_episodes
@@ -171,7 +169,9 @@ def main(
                )
            )
    if len(src_paths) > 1:
-        aggregate_output_path = output_path / ("_".join([src_path.name for src_path in src_paths]) + "_aggregated_lerobot")
+        aggregate_output_path = output_path / (
+            "_".join([src_path.name for src_path in src_paths]) + "_aggregated_lerobot"
+        )
    else:
        aggregate_output_path = output_path / f"{src_paths[0].name}_lerobot"
    aggregate_output_path = aggregate_output_path.resolve()
@@ -234,7 +234,9 @@ if __name__ == "__main__":
    parser.add_argument("--output-path", type=Path, required=True)
    parser.add_argument("--executor", type=str, choices=["local", "ray"], default="local")
    parser.add_argument("--cpus-per-task", type=int, default=1)
-    parser.add_argument("--tasks-per-job", type=int, default=1, help="number of concurrent tasks per job, only used for ray")
+    parser.add_argument(
+        "--tasks-per-job", type=int, default=1, help="number of concurrent tasks per job, only used for ray"
+    )
    parser.add_argument("--workers", type=int, default=-1, help="number of concurrent jobs to run")
    parser.add_argument("--resume-dir", type=Path, help="logs directory to resume")
    parser.add_argument("--debug", action="store_true")
@@ -0,0 +1,144 @@
+# ROBOCASA TO LEROBOT
+
+## ROBOCASA installation
+
+- Clone this repo: https://github.com/robocasa/robocasa
+- Follow README.md to install packages and download assets
+
+## Data Preparation
+
+- Check files: `robocasa/scripts/download_datasets.py`, `robocasa/utils/dataset_registry.py`
+- Download original datasets by python scripts or wget/curl (recommended)
+
+## Example:
+
+```bash
+wget https://utexas.box.com/shared/static/7y9csrcx6uhhq4p3yctmm2df3rjqpw6g.hdf5 -O PnPCounterToCab.hdf5
+```
+
+- Extract subset data: Original hdf5 files contain about 3000 episodes. However, it contains a key "masks", which contain list of subset demo_ids. For example: 30_demos : `[demo123, demo234, demo345, etc.]`.Run the code in the notebook to extract only chosen subset demos, which is much smaller and easier for later processes.
+
+## Code Modification
+
+- Add functions in `camera_utils.py` to your `robosuite/robosuite/utils/camera_utils.py` for camera parameters extraction (May be useful for experiments which requires multiview rendering)
+
+- Change args to render depth and segmentation masks for new regenerated dataset. Change in `robocasa/environments/kitchen/kitchen.py`
+
+```python
+
+class Kitchen(ManipulationEnv, metaclass=KitchenEnvMeta):
+    ...
+    EXCLUDE_LAYOUTS = []
+    def __init__(
+        self,
+        robots,
+        env_configuration="default",
+        controller_configs=None,
+        gripper_types="default",
+        base_types="default",
+        initialization_noise="default",
+        use_camera_obs=True,
+        use_object_obs=True,  # currently unused variable
+        reward_scale=1.0,  # currently unused variable
+        reward_shaping=False,  # currently unused variables
+        placement_initializer=None,
+        has_renderer=False,
+        has_offscreen_renderer=True,
+        render_camera="robot0_agentview_center",
+        render_collision_mesh=False,
+        render_visual_mesh=True,
+        render_gpu_device_id=-1,
+        control_freq=20,
+        horizon=1000,
+        ignore_done=True,
+        hard_reset=True,
+        camera_names="agentview",
+        camera_heights=256,
+        camera_widths=256,
+        camera_depths=False, # -> True
+        renderer="mjviewer",
+        renderer_config=None,
+        init_robot_base_pos=None,
+        seed=None,
+        layout_and_style_ids=None,
+        layout_ids=None,
+        style_ids=None,
+        scene_split=None,  # unsued, for backwards compatibility
+        generative_textures=None,
+        obj_registries=("objaverse",),
+        obj_instance_split=None,
+        use_distractors=False,
+        translucent_robot=False,
+        randomize_cameras=False,
+        camera_segmentations="instance", # add camera segmentation here: semantic/instance/element
+    ):
+        ...
+
+        super().__init__(
+            robots=robots,
+            env_configuration=env_configuration,
+            controller_configs=controller_configs,
+            base_types=base_types,
+            gripper_types=gripper_types,
+            initialization_noise=initialization_noise,
+            use_camera_obs=use_camera_obs,
+            has_renderer=has_renderer,
+            has_offscreen_renderer=has_offscreen_renderer,
+            render_camera=render_camera,
+            render_collision_mesh=render_collision_mesh,
+            render_visual_mesh=render_visual_mesh,
+            render_gpu_device_id=render_gpu_device_id,
+            control_freq=control_freq,
+            lite_physics=True,
+            horizon=horizon,
+            ignore_done=ignore_done,
+            hard_reset=hard_reset,
+            camera_names=camera_names,
+            camera_heights=camera_heights,
+            camera_widths=camera_widths,
+            camera_depths=camera_depths,
+            camera_segmentations=camera_segmentations, # add camera segmentation here
+            renderer=renderer,
+            renderer_config=renderer_config,
+            seed=seed,
+        )
+```
+
+## Regenerate
+
+- Check file: `regenerate.py`
+- Original dataset contain image in 128x128 resolution and does not contain segmentation mask, depth, etc. We need to rerender it in 256x256 and save segmentation mask, and depth
+- Overall re-render flow:
+  - (1) load hdf5 file and create env
+  - (2) reset env to first state in the dataset
+  - (3) Execute action in action label of original dataset, at each step, we collect observation data, camera parameters, state, etc. from simulation.
+  - (4) Save only successful episode to new hdf5 file (original data contain unsuccessful episode or wrong action)
+- Change `origin_dir` and `regenerate_dir` to your dir in `regenerate.py` then run `python regenerate.py` to regenerate
+
+## Get started
+
+1. Download source code:
+
+   ```bash
+   git clone https://github.com/Tavish9/any4lerobot.git
+   ```
+
+2. Modify path in `convert.sh`:
+
+   ```bash
+   python robocasa_h5.py \
+       --raw-dir /path/to/your/hdf5/files \
+       --repo-id your_hf_id \
+       --local-dir /path/to/your/output/dataset
+   ```
+
+3. Execute the script:
+
+   ```bash
+   bash convert.sh
+   ```
+
+## Example output datasets:
+
+- ROBOCASA 100 demos: https://huggingface.co/datasets/binhng/robocasa_merged_24_tasks_100demos_v1
+- ROBOCASA 30 demos: https://huggingface.co/datasets/binhng/robocasa_merged_24_tasks_30demos_v3
@@ -0,0 +1,4 @@
+python robocasa_h5.py \
+    --raw-dir /path/to/your/hdf5/files \
+    --repo-id your_hf_id \
+    --local-dir /path/to/your/output/dataset
@@ -0,0 +1,107 @@
+import argparse
+import json
+import shutil
+from pathlib import Path
+
+import h5py
+import numpy as np
+from lerobot.datasets.lerobot_dataset import LeRobotDataset
+from tqdm import tqdm
+
+
+def main(raw_dir: Path, repo_id: str, local_dir: Path):
+    if local_dir.exists():
+        shutil.rmtree(local_dir)
+
+    dataset = LeRobotDataset.create(
+        repo_id=repo_id,
+        robot_type="PandaOmron",
+        root=local_dir,
+        fps=20,
+        features={
+            "observation.images.robot0_agentview_right": {
+                "dtype": "video",
+                "shape": (256, 256, 3),
+                "names": ["height", "width", "channel"],
+            },
+            "observation.images.robot0_agentview_left": {
+                "dtype": "video",
+                "shape": (256, 256, 3),
+                "names": ["height", "width", "channel"],
+            },
+            "observation.images.robot0_eye_in_hand": {
+                "dtype": "video",
+                "shape": (256, 256, 3),
+                "names": ["height", "width", "channel"],
+            },
+            "observation.state": {
+                "dtype": "float32",
+                "shape": (9,),
+                "names": ["state"],
+            },
+            "action": {
+                "dtype": "float32",
+                "shape": (12,),
+                "names": ["actions"],
+            },
+        },
+    )
+
+    for dataset_path in raw_dir.glob("**/*.hdf5"):
+         with h5py.File(dataset_path, "r") as raw_dataset:
+            demos = raw_dataset["data"].keys()
+            for demo in tqdm(demos):
+                demo_length = len(raw_dataset["data"][demo]["actions"])
+                demo_data = raw_dataset["data"][demo]
+
+                left_images = demo_data["obs"]["robot0_agentview_left_image"][:]
+                right_images = demo_data["obs"]["robot0_agentview_right_image"][:]
+                wrist_images = demo_data["obs"]["robot0_eye_in_hand_image"][:]
+                states = np.concatenate(
+                    (
+                        demo_data["obs"]["robot0_base_to_eef_pos"][:],
+                        demo_data["obs"]["robot0_base_to_eef_quat"][:],
+                        demo_data["obs"]["robot0_gripper_qpos"][:],
+                    ),
+                    axis=1,
+                )
+                actions = demo_data["actions"][:]
+                for i in range(demo_length):
+                    ep_meta = demo_data.attrs["ep_meta"]
+                    ep_meta = json.loads(ep_meta)
+                    lang = ep_meta["lang"]
+                    dataset.add_frame(
+                        {
+                            "observation.images.robot0_agentview_right": right_images[i],
+                            "observation.images.robot0_agentview_left": left_images[i],
+                            "observation.images.robot0_eye_in_hand": wrist_images[i],
+                            "observation.state": states[i].astype(np.float32),
+                            "action": actions[i].astype(np.float32),
+                            "task": lang,
+                        },
+                    )
+                dataset.save_episode()
+    dataset.finalize()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--raw-dir",
+        type=Path,
+        required=True,
+        help="Directory containing input raw datasets (e.g. `path/to/dataset` or `path/to/dataset/version).",
+    )
+    parser.add_argument(
+        "--local-dir",
+        type=Path,
+        required=True,
+        help="When provided, writes the dataset converted to LeRobotDataset format in this directory  (e.g. `data/lerobot/aloha_mobile_chair`).",
+    )
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        help="Repositery identifier on Hugging Face: a community or a user name `/` the name of the dataset, required when push-to-hub is True",
+    )
+    args = parser.parse_args()
+    main(raw_dir=args.raw_dir, repo_id=args.repo_id, local_dir=args.local_dir)
@@ -0,0 +1,78 @@
+import numpy as np
+import robosuite.utils.transform_utils as T
+
+
+def get_camera_intrinsic_matrix(sim, camera_name, camera_height, camera_width):
+    """
+    Obtains camera intrinsic matrix.
+
+    Args:
+        sim (MjSim): simulator instance
+        camera_name (str): name of camera
+        camera_height (int): height of camera images in pixels
+        camera_width (int): width of camera images in pixels
+    Return:
+        K (np.array): 3x3 camera matrix
+    """
+    cam_id = sim.model.camera_name2id(camera_name)
+    fovy = sim.model.cam_fovy[cam_id]
+    f = 0.5 * camera_height / np.tan(fovy * np.pi / 360)
+    K = np.array([[f, 0, camera_width / 2], [0, f, camera_height / 2], [0, 0, 1]])
+    return K
+
+
+def get_camera_extrinsic_matrix(sim, camera_name):
+    """
+    Returns a 4x4 homogenous matrix corresponding to the camera pose in the
+    world frame. MuJoCo has a weird convention for how it sets up the
+    camera body axis, so we also apply a correction so that the x and y
+    axis are along the camera view and the z axis points along the
+    viewpoint.
+    Normal camera convention: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
+
+    Args:
+        sim (MjSim): simulator instance
+        camera_name (str): name of camera
+    Return:
+        R (np.array): 4x4 camera extrinsic matrix
+    """
+    cam_id = sim.model.camera_name2id(camera_name)
+    camera_pos = sim.data.cam_xpos[cam_id]
+    camera_rot = sim.data.cam_xmat[cam_id].reshape(3, 3)
+    R = T.make_pose(camera_pos, camera_rot)
+
+    # IMPORTANT! This is a correction so that the camera axis is set up along the viewpoint correctly.
+    camera_axis_correction = np.array(
+        [[1.0, 0.0, 0.0, 0.0], [0.0, -1.0, 0.0, 0.0], [0.0, 0.0, -1.0, 0.0], [0.0, 0.0, 0.0, 1.0]]
+    )
+    R = R @ camera_axis_correction
+    return R
+
+def get_camera_extrinsic_matrix_rel(sim, camera_name):
+    """
+    Returns a 4x4 homogenous matrix corresponding to the camera pose in the
+    world frame. MuJoCo has a weird convention for how it sets up the
+    camera body axis, so we also apply a correction so that the x and y
+    axis are along the camera view and the z axis points along the
+    viewpoint.
+    Normal camera convention: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
+
+    Args:
+        sim (MjSim): simulator instance
+        camera_name (str): name of camera
+    Return:
+        R (np.array): 4x4 camera extrinsic matrix
+    """
+    cam_id = sim.model.camera_name2id(camera_name)
+    camera_pos = sim.model.cam_pos[cam_id]
+    camera_quat = sim.model.cam_quat[cam_id]
+    camera_rot = T.quat2mat(camera_quat)
+    R = T.make_pose(camera_pos, camera_rot)
+
+    # IMPORTANT! This is a correction so that the camera axis is set up along the viewpoint correctly.
+    camera_axis_correction = np.array(
+        [[1.0, 0.0, 0.0, 0.0], [0.0, -1.0, 0.0, 0.0], [0.0, 0.0, -1.0, 0.0], [0.0, 0.0, 0.0, 1.0]]
+    )
+    R = R @ camera_axis_correction
+    return R
+ 
@@ -0,0 +1,70 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "44b6da09",
+   "metadata": {},
+   "source": [
+    "# Extract subset data \n",
+    "\n",
+    "Original hdf5 file contains about 3000 episodes. However, it contains a key \"masks\", which contain list of subset demo_ids. For example: 30_demos : [demo123, demo234, demo 345, etc.]\n",
+    "\n",
+    "Run the code bellow to extract only chosen subset demos, which is much smaller and easier for later process."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ac64550",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import h5py\n",
+    "\n",
+    "DATA_DIR=\"direction/to/your/hdf5/files/\"\n",
+    "# E.x: DATA_DIR=\"/projects/extern/kisski/kisski-spath/dir.project/VLA_3D/binh/robocasa/test\"\n",
+    "\n",
+    "# file_name = \"PnPCabToCounter.hdf5\"\n",
+    "# file_name = \"PnPCounterToCab.hdf5\"\n",
+    "# file_name = \"CoffeeSetupMug.hdf5\"\n",
+    "# file_name = \"TurnOnMicrowave.hdf5\"\n",
+    "file_name = \"TurnOffStove.hdf5\"\n",
+    "\n",
+    "file_path = DATA_DIR + \"/\" + file_name\n",
+    "\n",
+    "f = h5py.File(file_path, 'r')\n",
+    "chosen_demo_list = []\n",
+    "for i in f['mask']['100_demos'][:]: # or \"30_demos\"\n",
+    "    chosen_demo_list.append(i.decode('utf-8'))\n",
+    "    \n",
+    "chosen_data = []\n",
+    "for k in f['data'].keys():\n",
+    "    if k in chosen_demo_list:\n",
+    "        chosen_data.append(f['data'][k])\n",
+    "        \n",
+    "with h5py.File(f\"direction_to_your_new_extracted_subset/{file_name}\", \"w\") as out:\n",
+    "    out_data = out.create_group(\"data\")\n",
+    "    \n",
+    "    for key, val in f['data'].attrs.items():\n",
+    "            out_data.attrs[key] = val # IMPORTANT: set attributes for new hdf5 files (need for reset env and later re-render)\n",
+    "\n",
+    "    for grp in chosen_data:\n",
+    "        name = grp.name.split(\"/\")[-1]   # demo_xxx\n",
+    "        grp.file.copy(grp, out_data, name=name)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "robocasa",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
@@ -0,0 +1,277 @@
+import json
+import os
+
+import h5py
+import numpy as np
+import robosuite
+from robocasa.scripts.playback_dataset import reset_to
+from robosuite.utils.camera_utils import (
+    get_camera_extrinsic_matrix,
+    get_camera_extrinsic_matrix_rel,
+    get_camera_intrinsic_matrix,
+)
+from tqdm import tqdm
+
+ROBOCASA_DUMMY_ACTION = [0.0] * 6 + [-1.0] + [0.0] * 4 + [-1.0]
+
+
+def get_camera_info(sim, camera_name, camera_height, camera_width):
+    camera_intrinsics = get_camera_intrinsic_matrix(sim, camera_name, camera_height, camera_width)
+    camera_extrinsics = get_camera_extrinsic_matrix(sim, camera_name)
+
+    return camera_intrinsics, camera_extrinsics
+
+
+def creat_env_from_hdf5(f):
+    env_meta = json.loads(f["data"].attrs["env_args"])
+    env_meta["env_kwargs"]["camera_depths"] = True
+    env_meta["env_kwargs"]["camera_heights"] = 256
+    env_meta["env_kwargs"]["camera_widths"] = 256
+    env_meta["env_kwargs"]["camera_segmentations"] = "element"  # element' #'instance'
+    # f.close()
+
+    env_kwargs = env_meta["env_kwargs"]
+    env_kwargs["env_name"] = env_meta["env_name"]
+    env_kwargs["has_renderer"] = False
+    env_kwargs["renderer"] = "mjviewer"
+    env_kwargs["has_offscreen_renderer"] = True  # write_video
+    env_kwargs["use_camera_obs"] = True
+    env_kwargs["ignore_done"] = False
+
+    env = robosuite.make(**env_kwargs)
+
+    return env, env_meta
+
+
+def reset_each_demo(env, demo):
+    # demo = f["data"]["demo_<idx>"]
+    model_xml = demo.attrs["model_file"]
+    init_state = demo["states"][()][0]
+    ep_meta = demo.attrs["ep_meta"]
+
+    state = {"states": init_state, "model": model_xml, "ep_meta": ep_meta}
+    reset_to(env, state)
+
+
+def process_1_demo(env, f, demo_id, grp):
+    demo = f["data"][demo_id]
+    reset_each_demo(env, demo)
+
+    ep_meta = env.get_ep_meta()
+    model_file = env.model.get_xml()
+
+    for _ in range(10):
+        obs, reward, done, info = env.step(ROBOCASA_DUMMY_ACTION)
+
+    obs_keys = list(obs.keys())
+    obs_keys += [
+        "robot0_agentview_left_intrinsics",
+        "robot0_agentview_right_intrinsics",
+        "robot0_eye_in_hand_intrinsics",
+    ]
+    obs_keys += [
+        "robot0_agentview_left_extrinsics",
+        "robot0_agentview_right_extrinsics",
+        "robot0_eye_in_hand_extrinsics",
+    ]
+    obs_keys += [
+        "robot0_agentview_left_extrinsicsR",
+        "robot0_agentview_right_extrinsicsR",
+        "robot0_eye_in_hand_extrinsicsR",
+    ]
+    obs_keys += ["robot0_agentview_left_depthW", "robot0_agentview_right_depthW", "robot0_eye_in_hand_depthW"]
+
+    obs_dict = {key: [] for key in obs_keys}
+    # action_dict = {key: [] for key in act_keys}
+    actions = []
+    actions_abs = []
+    rewards = []
+    dones = []
+    states = []  # env state, not robot. The state for robot is included in obs
+
+    # for key in obs_keys:
+    #     obs_dict[key] = obs[key]
+    orig_actions = demo["actions"][()]
+    orig_actions_abs = demo["actions_abs"][()]
+    # orig_action_dict = demo['action_dict']
+
+    for i, action in enumerate(orig_actions):
+        # for i, action in enumerate(orig_actions_abs):
+        extent = env.sim.model.stat.extent
+        far = env.sim.model.vis.map.zfar * extent
+        near = env.sim.model.vis.map.znear * extent
+        left_depth = obs["robot0_agentview_left_depth"].copy()
+        right_depth = obs["robot0_agentview_right_depth"].copy()
+        wrist_depth = obs["robot0_eye_in_hand_depth"].copy()
+        left_depth = near / (1.0 - left_depth * (1.0 - near / far))[::-1]
+        right_depth = near / (1.0 - right_depth * (1.0 - near / far))[::-1]
+        wrist_depth = near / (1.0 - wrist_depth * (1.0 - near / far))[::-1]
+
+        obs["robot0_agentview_left_depthW"] = left_depth
+        obs["robot0_agentview_right_depthW"] = right_depth
+        obs["robot0_eye_in_hand_depthW"] = wrist_depth
+
+        left_intrinsics, left_extrinsics = get_camera_info(env.sim, "robot0_agentview_left", 256, 256)
+        right_intrinsics, right_extrinsics = get_camera_info(env.sim, "robot0_agentview_right", 256, 256)
+        wrist_intrinsics, wrist_extrinsics = get_camera_info(env.sim, "robot0_eye_in_hand", 256, 256)
+
+        obs["robot0_agentview_left_intrinsics"] = left_intrinsics
+        obs["robot0_agentview_right_intrinsics"] = right_intrinsics
+        obs["robot0_eye_in_hand_intrinsics"] = wrist_intrinsics
+        obs["robot0_agentview_left_extrinsics"] = left_extrinsics
+        obs["robot0_agentview_right_extrinsics"] = right_extrinsics
+        obs["robot0_eye_in_hand_extrinsics"] = wrist_extrinsics
+
+        left_intrinsics_rel = get_camera_extrinsic_matrix_rel(env.sim, "robot0_agentview_left")
+        right_intrinsics_rel = get_camera_extrinsic_matrix_rel(env.sim, "robot0_agentview_right")
+        wrist_intrinsics_rel = get_camera_extrinsic_matrix_rel(env.sim, "robot0_eye_in_hand")
+
+        obs["robot0_agentview_left_extrinsicsR"] = left_intrinsics_rel
+        obs["robot0_agentview_right_extrinsicsR"] = right_intrinsics_rel
+        obs["robot0_eye_in_hand_extrinsicsR"] = wrist_intrinsics_rel
+
+        # append all keys
+        for key in obs_keys:
+            if (
+                ("eye_in_hand" in key or "agentview" in key)
+                and "depthW" not in key
+                and "intrinsics" not in key
+                and "extrinsics" not in key
+            ):
+                obs_dict[key].append(obs[key][::-1, :, :])
+            else:
+                obs_dict[key].append(obs[key])
+
+        # for key in act_keys:
+        #     action_dict[key].append(orig_action_dict[key][i])
+
+        actions.append(action)
+        actions_abs.append(orig_actions_abs[i])
+
+        rewards.append(reward)
+        dones.append(done)
+
+        current_state = env.sim.get_state().flatten()
+        states.append(current_state)
+
+        # step env
+        obs, reward, done, info = env.step(action.tolist())
+
+        done = done or env._check_success()
+        # if done:
+        # print(f" Step {i} done: {done}")
+        # print(f" Step {i} info: {info}")
+        # print(f" Step {i} is_success: {env._check_success()}" )
+
+    # save successful episode only
+    if done:
+        print(f"Demo {demo_id} is done after {i} actions! -> SAVE!!!")
+
+        # save to new hdf5 file here
+        ep_data = grp.create_group(demo_id)
+        # set attribute for ep_data here ...
+        ep_data.attrs["model_file"] = model_file
+        ep_data.attrs["ep_meta"] = json.dumps(ep_meta, indent=4)
+
+        # obs group
+        obs_grp = ep_data.create_group("obs")
+        for key in obs_keys:
+            obs_grp.create_dataset(key, data=np.stack(obs_dict[key], axis=0))
+
+        # actions dataset
+        ep_data.create_dataset("actions", data=np.stack(actions, axis=0))
+
+        ep_data.create_dataset("actions_abs", data=np.stack(actions_abs, axis=0))
+
+        ep_data.create_dataset("dones", data=np.stack(dones, axis=0))
+
+        ep_data.create_dataset("rewards", data=np.stack(rewards, axis=0))
+
+        # state dataset
+        ep_data.create_dataset("states", data=np.stack(states, axis=0))
+
+    elif not done:
+        print(f"Demo {demo_id} not done after all actions executed! -> does not SAVE!")
+
+
+def regenerate_hdf5_dataset(input_path, output_path, debug=False):
+    f = h5py.File(input_path, "r")
+    env, env_meta = creat_env_from_hdf5(f)
+
+    out_f = h5py.File(output_path, "w")
+    out_f.attrs["env_args"] = json.dumps(env_meta)
+
+    grp = out_f.create_group("data")
+
+    all_demo_ids = list(f["data"].keys())
+    if debug:
+        all_demo_ids = all_demo_ids[: min(2, len(all_demo_ids))]
+    for demo_id in tqdm(all_demo_ids):
+        print(f"Processing {demo_id} ...")
+        process_1_demo(env, f, demo_id, grp)
+
+    f.close()
+    if len(out_f["data"].keys()) == 0:
+        print("No demos were processed successfully. Deleting output file.")
+        out_f.close()
+        os.remove(output_path)
+    else:
+        print(f"Processed data saved {len(out_f['data'].keys())} demos to {output_path}")
+        out_f.close()
+
+
+def process_task_wrapper(args):
+    """Wrapper function for multiprocessing to process a single task."""
+    task, origin_dir, regenerate_dir, debug = args
+    input_path = os.path.join(origin_dir, f"{task}.hdf5")
+    output_path = os.path.join(regenerate_dir, f"{task}.hdf5")
+
+    print(f"Regenerating dataset for task {task} ...")
+    regenerate_hdf5_dataset(input_path, output_path, debug=debug)
+    print(f"Completed task {task}")
+
+
+if __name__ == "__main__":
+    n_demo = 100  # 100
+    origin_dir = f"<directory/contain/original/hdf5/files/>"
+    regenerate_dir = f"<directory/contain/regenerated/hdf5/files/>"
+    os.makedirs(regenerate_dir, exist_ok=True)
+
+    task_list = [
+        "PnPCabToCounter",
+        "PnPCounterToCab",
+        "CoffeeSetupMug",
+        "TurnOffStove",
+        "TurnOnMicrowave",
+        # ... add other tasks as needed
+        # "CoffeePressButton",
+        # "CoffeeServeMug",
+        # "TurnOffMicrowave",
+        # "TurnOffSinkFaucet",
+        # "TurnOnSinkFaucet",
+        # "TurnOnStove",
+        # "TurnSinkSpout"
+        # "CloseDoubleDoor",
+        # "CloseDrawer",
+        # "CloseSingleDoor",
+        # "OpenDoubleDoor",
+        # "OpenDrawer",
+        # "OpenSingleDoor"
+        # "PnPCounterToMicrowave",
+        # "PnPCounterToSink",
+        # "PnPCounterToStove",
+        # "PnPMicrowaveToCounter",
+        # "PnPSinkToCounter",
+        # "PnPStoveToCounter"
+    ]  # 24 tasks in robocasa kitchen dataset
+
+    debug = False
+    if debug:
+        task_list = task_list[:2]
+
+    for task in task_list:
+        input_path = os.path.join(origin_dir, f"{task}.hdf5")
+        output_path = os.path.join(regenerate_dir, f"{task}.hdf5")
+
+        print(f"Regenerating dataset for task {task} ...")
+        regenerate_hdf5_dataset(input_path, output_path, debug=False)
@@ -182,6 +182,8 @@ Dataset Structure of `meta/info.json`:
 > │   └── h5_ur_1rgb.json
 > └── RoboMIND_v1_2_instr.csv
 > ```
+>
+> 3. Due the format mismatch in simulation dataset, this script skips `simulation`, `sim_franka_3rgb`, `sim_tienkung_1rgb`.

 > [!NOTE]
 > The conversion speed of this script is limited by the performance of the physical machine running it, including **CPU cores and memory**. We recommend using **2 CPU cores per task** for optimal performance. However, each task requires approximately 10 GiB of memory. To avoid running out of memory, you may need to increase the number of CPU cores per task depending on your system’s available memory.
@@ -1,5 +1,6 @@
 import argparse
-import gc
+import concurrent.futures
+import inspect
 import json
 import logging
 import shutil
@@ -7,11 +8,15 @@ from pathlib import Path

 import numpy as np
 import pandas as pd
+import pyarrow as pa
+import pyarrow.parquet as pq
 import ray
+from lerobot.datasets import LeRobotDataset, LeRobotDatasetMetadata
 from lerobot.datasets.compute_stats import aggregate_stats
-from lerobot.datasets.lerobot_dataset import VALID_VIDEO_CODECS, LeRobotDataset, LeRobotDatasetMetadata
-from lerobot.datasets.utils import flatten_dict, validate_episode_buffer, write_info, write_stats
-from lerobot.datasets.video_utils import get_safe_default_codec
+from lerobot.datasets.dataset_writer import DatasetWriter, _encode_video_worker
+from lerobot.datasets.feature_utils import validate_episode_buffer
+from lerobot.datasets.io_utils import write_info, write_stats
+from lerobot.datasets.utils import DEFAULT_EPISODES_PATH, flatten_dict
 from ray.runtime_env import RuntimeEnv
 from robomind_uitls.configs import ROBOMIND_CONFIG
 from robomind_uitls.lerobot_uitls import compute_episode_stats, generate_features_from_config
@@ -21,6 +26,38 @@ logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(


 class RoboMINDDatasetMetadata(LeRobotDatasetMetadata):
+    def _flush_metadata_buffer(self) -> None:
+        """Write all buffered episode metadata to parquet file."""
+        if not hasattr(self, "_metadata_buffer") or len(self._metadata_buffer) == 0:
+            return
+
+        combined_dict = {}
+        for episode_dict in self._metadata_buffer:
+            for key, value in episode_dict.items():
+                if key not in combined_dict:
+                    combined_dict[key] = []
+                # Extract value and serialize numpy arrays
+                # because PyArrow's from_pydict function doesn't support numpy arrays
+                val = value[0] if isinstance(value, list) else value
+                combined_dict[key].append(val.tolist() if isinstance(val, np.ndarray) else val)
+
+        first_ep = self._metadata_buffer[0]
+        chunk_idx = first_ep["meta/episodes/chunk_index"][0]
+        file_idx = first_ep["meta/episodes/file_index"][0]
+
+        schema = None if not self._pq_writer else self._pq_writer.schema
+        table = pa.Table.from_pydict(combined_dict, schema=schema)
+
+        if not self._pq_writer:
+            path = Path(self.root / DEFAULT_EPISODES_PATH.format(chunk_index=chunk_idx, file_index=file_idx))
+            path.parent.mkdir(parents=True, exist_ok=True)
+            self._pq_writer = pq.ParquetWriter(path, schema=table.schema, compression="snappy", use_dictionary=True)
+
+        self._pq_writer.write_table(table)
+
+        self.latest_episode = self._metadata_buffer[-1]
+        self._metadata_buffer.clear()
+
    def save_episode(
        self,
        split,
@@ -55,78 +92,18 @@ class RoboMINDDatasetMetadata(LeRobotDatasetMetadata):
        write_stats(self.stats, self.root)


-class RoboMINDDataset(LeRobotDataset):
-    @classmethod
-    def create(
-        cls,
-        repo_id: str,
-        fps: int,
-        features: dict,
-        root: str | Path | None = None,
-        robot_type: str | None = None,
-        use_videos: bool = True,
-        tolerance_s: float = 1e-4,
-        image_writer_processes: int = 0,
-        image_writer_threads: int = 0,
-        video_backend: str | None = None,
-        batch_encoding_size: int = 1,
-        vcodec: str = "libsvtav1",
-    ) -> "LeRobotDataset":
-        """Create a LeRobot Dataset from scratch in order to record data."""
-        if vcodec not in VALID_VIDEO_CODECS:
-            raise ValueError(f"Invalid vcodec '{vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
-        obj = cls.__new__(cls)
-        obj.meta = RoboMINDDatasetMetadata.create(
-            repo_id=repo_id,
-            fps=fps,
-            robot_type=robot_type,
-            features=features,
-            root=root,
-            use_videos=use_videos,
-        )
-        obj.repo_id = obj.meta.repo_id
-        obj.root = obj.meta.root
-        obj.revision = None
-        obj.tolerance_s = tolerance_s
-        obj.image_writer = None
-        obj.batch_encoding_size = batch_encoding_size
-        obj.episodes_since_last_encoding = 0
-        obj.vcodec = vcodec
-
-        if image_writer_processes or image_writer_threads:
-            obj.start_image_writer(image_writer_processes, image_writer_threads)
-
-        # TODO(aliberts, rcadene, alexander-soare): Merge this with OnlineBuffer/DataBuffer
-        obj.episode_buffer = obj.create_episode_buffer()
-
-        obj.episodes = None
-        obj.hf_dataset = obj.create_hf_dataset()
-        obj.image_transforms = None
-        obj.delta_timestamps = None
-        obj.delta_indices = None
-        obj._absolute_to_relative_idx = None
-        obj.video_backend = video_backend if video_backend is not None else get_safe_default_codec()
-        obj.writer = None
-        obj.latest_episode = None
-        obj._current_file_start_frame = None
-        # Initialize tracking for incremental recording
-        obj._lazy_loading = False
-        obj._recorded_frames = 0
-        obj._writer_closed_for_reading = False
-        return obj
-
-    def save_episode(self, split, action_config: dict, episode_data: dict | None = None) -> None:
-        """
-        This will save to disk the current episode in self.episode_buffer.
-
-        Args:
-            episode_data (dict | None, optional): Dict containing the episode data to save. If None, this will
-                save the current episode in self.episode_buffer, which is filled with 'add_frame'. Defaults to
-                None.
-        """
+class RoboMINDDatasetWriter(DatasetWriter):
+    def save_episode(
+        self,
+        split,
+        action_config: dict,
+        episode_data: dict | None = None,
+        parallel_encoding: bool = True,
+    ) -> None:
+        """Save the current episode in self.episode_buffer to disk."""
        episode_buffer = episode_data if episode_data is not None else self.episode_buffer

-        validate_episode_buffer(episode_buffer, self.meta.total_episodes, self.features)
+        validate_episode_buffer(episode_buffer, self._meta.total_episodes, self._meta.features)

        # size and task are special cases that won't be added to hf_dataset
        episode_length = episode_buffer.pop("size")
@@ -134,49 +111,131 @@ class RoboMINDDataset(LeRobotDataset):
        episode_tasks = list(set(tasks))
        episode_index = episode_buffer["episode_index"]

-        episode_buffer["index"] = np.arange(self.meta.total_frames, self.meta.total_frames + episode_length)
+        episode_buffer["index"] = np.arange(self._meta.total_frames, self._meta.total_frames + episode_length)
        episode_buffer["episode_index"] = np.full((episode_length,), episode_index)

        # Update tasks and task indices with new tasks if any
-        self.meta.save_episode_tasks(episode_tasks)
+        self._meta.save_episode_tasks(episode_tasks)

        # Given tasks in natural language, find their corresponding task indices
-        episode_buffer["task_index"] = np.array([self.meta.get_task_index(task) for task in tasks])
+        episode_buffer["task_index"] = np.array([self._meta.get_task_index(task) for task in tasks])

-        for key, ft in self.features.items():
-            # index, episode_index, task_index are already processed above, and image and video
-            # are processed separately by storing image path and frame info as meta data
+        for key, ft in self._meta.features.items():
            if key in ["index", "episode_index", "task_index"] or ft["dtype"] in ["video"]:
                continue
            episode_buffer[key] = np.stack(episode_buffer[key]).squeeze()

+        # Wait for image writer to end, so that episode stats over images can be computed
        self._wait_image_writer()
-        ep_stats = compute_episode_stats(episode_buffer, self.features)
+        has_video_keys = len(self._meta.video_keys) > 0
+        use_streaming = self._streaming_encoder is not None and has_video_keys
+        use_batched_encoding = self._batch_encoding_size > 1
+
+        if use_streaming:
+            non_video_buffer = {
+                k: v for k, v in episode_buffer.items() if self._meta.features.get(k, {}).get("dtype") not in ("video",)
+            }
+            non_video_features = {k: v for k, v in self._meta.features.items() if v["dtype"] != "video"}
+            ep_stats = compute_episode_stats(non_video_buffer, non_video_features)
+        else:
+            ep_stats = compute_episode_stats(episode_buffer, self._meta.features)

        ep_metadata = self._save_episode_data(episode_buffer)
-        has_video_keys = len(self.meta.video_keys) > 0
-        use_batched_encoding = self.batch_encoding_size > 1

-        if has_video_keys and not use_batched_encoding:
-            for video_key in self.meta.video_keys:
-                ep_metadata.update(self._save_episode_video(video_key, episode_index))
+        if use_streaming:
+            streaming_results = self._streaming_encoder.finish_episode()
+            for video_key in self._meta.video_keys:
+                temp_path, video_stats = streaming_results[video_key]
+                if video_stats is not None:
+                    ep_stats[video_key] = {
+                        k: v if k == "count" else np.squeeze(v.reshape(1, -1, 1, 1) / 255.0, axis=0)
+                        for k, v in video_stats.items()
+                    }
+                ep_metadata.update(self._save_episode_video(video_key, episode_index, temp_path=temp_path))
+        elif has_video_keys and not use_batched_encoding:
+            num_cameras = len(self._meta.video_keys)
+            if parallel_encoding and num_cameras > 1:
+                with concurrent.futures.ProcessPoolExecutor(max_workers=num_cameras) as executor:
+                    future_to_key = {
+                        executor.submit(
+                            _encode_video_worker,
+                            video_key,
+                            episode_index,
+                            self._root,
+                            self._meta.fps,
+                            self._vcodec,
+                            self._encoder_threads,
+                        ): video_key
+                        for video_key in self._meta.video_keys
+                    }
+
+                    results = {}
+                    for future in concurrent.futures.as_completed(future_to_key):
+                        video_key = future_to_key[future]
+                        try:
+                            temp_path = future.result()
+                            results[video_key] = temp_path
+                        except Exception as exc:
+                            logging.error(f"Video encoding failed for {video_key}: {exc}")
+                            raise exc
+
+                for video_key in self._meta.video_keys:
+                    temp_path = results[video_key]
+                    ep_metadata.update(self._save_episode_video(video_key, episode_index, temp_path=temp_path))
+            else:
+                for video_key in self._meta.video_keys:
+                    ep_metadata.update(self._save_episode_video(video_key, episode_index))

        # `meta.save_episode` be executed after encoding the videos
        ep_metadata.update({"action_config": action_config})
-        self.meta.save_episode(split, episode_index, episode_length, episode_tasks, ep_stats, ep_metadata)
+        self._meta.save_episode(split, episode_index, episode_length, episode_tasks, ep_stats, ep_metadata)

        if has_video_keys and use_batched_encoding:
-            # Check if we should trigger batch encoding
-            self.episodes_since_last_encoding += 1
-            if self.episodes_since_last_encoding == self.batch_encoding_size:
-                start_ep = self.num_episodes - self.batch_encoding_size
-                end_ep = self.num_episodes
+            self._episodes_since_last_encoding += 1
+            if self._episodes_since_last_encoding == self._batch_encoding_size:
+                start_ep = self._meta.total_episodes - self._batch_encoding_size
+                end_ep = self._meta.total_episodes
                self._batch_save_episode_video(start_ep, end_ep)
-                self.episodes_since_last_encoding = 0
+                self._episodes_since_last_encoding = 0

        if not episode_data:
-            # Reset episode buffer and clean up temporary images (if not already deleted during video encoding)
-            self.clear_episode_buffer(delete_images=len(self.meta.image_keys) > 0)
+            self.clear_episode_buffer(delete_images=len(self._meta.image_keys) > 0)
+
+
+class RoboMINDDataset(LeRobotDataset):
+    @classmethod
+    def create(cls, *args, **kwargs) -> "RoboMINDDataset":
+        sig = inspect.signature(super().create)
+        bound = sig.bind_partial(*args, **kwargs)
+        bound.apply_defaults()
+        params = bound.arguments
+
+        obj = super().create(*args, **kwargs)
+
+        shutil.rmtree(params["root"], ignore_errors=True)
+        obj.meta: RoboMINDDatasetMetadata = RoboMINDDatasetMetadata.create(
+            repo_id=params["repo_id"],
+            fps=params["fps"],
+            robot_type=params["robot_type"],
+            features=params["features"],
+            root=params["root"],
+            use_videos=params["use_videos"],
+            metadata_buffer_size=params["metadata_buffer_size"],
+        )
+        obj.writer: RoboMINDDatasetWriter = RoboMINDDatasetWriter(
+            meta=obj.meta,
+            root=obj.root,
+            vcodec=obj._vcodec,
+            encoder_threads=obj._encoder_threads,
+            batch_encoding_size=obj._batch_encoding_size,
+        )
+        return obj
+
+    def save_episode(
+        self, split, action_config: dict, episode_data: dict | None = None, parallel_encoding: bool = True
+    ) -> None:
+        self._require_writer("save_episode")
+        self.writer.save_episode(split, action_config, episode_data, parallel_encoding)


 def get_all_tasks(src_path: Path, output_path: Path, embodiment: str):
@@ -255,11 +314,11 @@ def save_as_lerobot_dataset(task: tuple[dict, Path, str], src_path, benchmark, e
                    return
            else:
                logging.warning(f"Skipped {episode_path}: len of dataset:{len(raw_dataset)} or {str(err)}")
-            gc.collect()
+
+    dataset.finalize()

    if dataset.meta.total_episodes == 0:
        shutil.rmtree(local_dir)
-    del dataset


 def main(
@@ -298,7 +357,6 @@ def main(
                logging.error(f"Exception occurred for {task_path['train']}")
                with open("output.txt", "a") as f:
                    f.write(f"{task_path['train']}, exception details: {str(e)}\n")
-        ray.shutdown()


 if __name__ == "__main__":
@@ -1,9 +1,11 @@
 import numpy as np
-import torchvision
-from lerobot.datasets.compute_stats import auto_downsample_height_width, get_feature_stats, sample_indices
-from lerobot.datasets.utils import load_image_as_numpy
-
-torchvision.set_video_backend("pyav")
+from lerobot.datasets.compute_stats import (
+    DEFAULT_QUANTILES,
+    auto_downsample_height_width,
+    get_feature_stats,
+    sample_indices,
+)
+from lerobot.datasets.io_utils import load_image_as_numpy


 def generate_features_from_config(AgiBotWorld_CONFIG):
@@ -49,21 +51,31 @@ def sample_images(input):
    return images


-def compute_episode_stats(episode_data: dict[str, list[str] | np.ndarray], features: dict) -> dict:
+def compute_episode_stats(
+    episode_data: dict[str, list[str] | np.ndarray],
+    features: dict,
+    quantile_list: list[float] | None = None,
+) -> dict:
+    if quantile_list is None:
+        quantile_list = DEFAULT_QUANTILES
+
    ep_stats = {}
    for key, data in episode_data.items():
        if features[key]["dtype"] == "string":
-            continue  # HACK: we should receive np.arrays of strings
+            continue
+
        elif features[key]["dtype"] in ["image", "video"]:
            ep_ft_array = sample_images(data)
-            axes_to_reduce = (0, 2, 3)  # keep channel dim
+            axes_to_reduce = (0, 2, 3)
            keepdims = True
        else:
-            ep_ft_array = data  # data is already a np.ndarray
-            axes_to_reduce = 0  # compute stats over the first axis
-            keepdims = data.ndim == 1  # keep as np.array
+            ep_ft_array = data
+            axes_to_reduce = 0
+            keepdims = data.ndim == 1

-        ep_stats[key] = get_feature_stats(ep_ft_array, axis=axes_to_reduce, keepdims=keepdims)
+        ep_stats[key] = get_feature_stats(
+            ep_ft_array, axis=axes_to_reduce, keepdims=keepdims, quantile_list=quantile_list
+        )

        if features[key]["dtype"] in ["image", "video"]:
            value_norm = 1.0 if "depth" in key else 255.0
Author	SHA1	Message	Date
Qizhi Chen	2ef2370d66	🐛 fix v30→v21 converter imports (#101 ) Co-authored-by: Codex <codex@openai.com>	2026-05-09 22:31:45 -07:00
FennMai	723bd71cf2	🐛 fix v30_to_v21 ArrowTypeError on pandas extension dtypes `table.slice(...).to_pandas()` produces pandas ExtensionArrays for `array[float32]` columns (e.g. `observation.states.end.orientation`) on newer pandas/pyarrow combos, which then fail in `pa.Table.from_pandas` inside `Dataset.from_pandas(...).to_parquet(...)`. Skip the pandas round-trip and wrap the `pa.Table` slice in a `Dataset` directly with `Dataset(episode_table).to_parquet(...)`. This preserves the HuggingFace dataset metadata that `Dataset.to_parquet` writes, while avoiding the ExtensionArray crash. No version pin on datasets/pyarrow needed. Closes #87	2026-04-30 07:03:03 +00:00
Fennmai	8aa7343137	📝 fix typos and remove duplicate entry in README (#98 ) - "implemenation" → "implementation" (openpi) - "transfomation" → "transformation" (IPEC OpenX/LIBERO) - "reoi" → "repo" (lerobotdepot) - remove duplicate LERO entry in Tutorial / Utils	2026-05-01 15:09:36 +08:00
Qizhi Chen	ad1381915c	⬆️ sync with lerobot v0.5.1 (#96 ) * update agibot2lerobot * update libero2lerobot * update robomind2lerobot * fix robomind2lerobot	2026-04-06 18:25:36 +08:00
Jibby Nguyen	ef184e44be	add support for robocasa2lerobot (#86 ) * Support robocasa2lerobot * Support robocasa2lerobot * NIT: formatting * update to latest lerobot * update readme * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix h5py open --------- Co-authored-by: Tavish <tavish9.chen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-21 15:55:33 +08:00
Qizhi Chen	e97ca6531e	🐛 Fix bug in robomind2lerobot with empty action_config (#92 ) * fix robomind2lerobot bug when action_config is [] * update readme	2026-03-20 10:18:13 +08:00
Qizhi Chen	8cc8f342a4	🐛 fix agibot2lerobot bug when action_config is [] (#91 ) Co-authored-by: Brucexkm <24210860082@m.fudan.edu.cn>	2026-03-19 12:02:11 +08:00
Qizhi Chen	8dd339f545	Update authors in reference	2026-03-09 13:33:20 +08:00
Qizhi Chen	984be466e3	🚧 prepare for detailed dataset tutorial (#84 ) * ⚰️ remove dataset merging * 📝 update readme	2026-02-07 16:10:13 +08:00