diff --git a/README.md b/README.md index 3f6dadd..be7438f 100644 --- a/README.md +++ b/README.md @@ -141,52 +141,6 @@ We have upload most of the OpenX datasets in [huggingface](https://huggingface.c You can visualize the dataset in this [space](https://huggingface.co/spaces/IPEC-COMMUNITY/openx_dataset_lerobot_v2.0). -## The `LeRobotDataset` format - -A dataset in `LeRobotDataset` format is very simple to use. It can be loaded from a repository on the Hugging Face hub or a local folder simply with e.g. `dataset = LeRobotDataset("lerobot/aloha_static_coffee")` and can be indexed into like any Hugging Face and PyTorch dataset. For instance `dataset[0]` will retrieve a single temporal frame from the dataset containing observation(s) and an action as PyTorch tensors ready to be fed to a model. - -A specificity of `LeRobotDataset` is that, rather than retrieving a single frame by its index, we can retrieve several frames based on their temporal relationship with the indexed frame, by setting `delta_timestamps` to a list of relative times with respect to the indexed frame. For example, with `delta_timestamps = {"observation.image": [-1, -0.5, -0.2, 0]}` one can retrieve, for a given index, 4 frames: 3 "previous" frames 1 second, 0.5 seconds, and 0.2 seconds before the indexed frame, and the indexed frame itself (corresponding to the 0 entry). See example [1_load_lerobot_dataset.py](examples/1_load_lerobot_dataset.py) for more details on `delta_timestamps`. - -Under the hood, the `LeRobotDataset` format makes use of several ways to serialize data which can be useful to understand if you plan to work more closely with this format. We tried to make a flexible yet simple dataset format that would cover most type of features and specificities present in reinforcement learning and robotics, in simulation and in real-world, with a focus on cameras and robot states but easily extended to other types of sensory inputs as long as they can be represented by a tensor. - -Here are the important details and internal structure organization of a typical `LeRobotDataset` instantiated with `dataset = LeRobotDataset("lerobot/aloha_static_coffee")`. The exact features will change from dataset to dataset but not the main aspects: - -``` -dataset attributes: - ├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example: - │ ├ observation.images.cam_high (VideoFrame): - │ │ VideoFrame = {'path': path to a mp4 video, 'timestamp' (float32): timestamp in the video} - │ ├ observation.state (list of float32): position of an arm joints (for instance) - │ ... (more observations) - │ ├ action (list of float32): goal position of an arm joints (for instance) - │ ├ episode_index (int64): index of the episode for this sample - │ ├ frame_index (int64): index of the frame for this sample in the episode ; starts at 0 for each episode - │ ├ timestamp (float32): timestamp in the episode - │ ├ next.done (bool): indicates the end of en episode ; True for the last frame in each episode - │ └ index (int64): general index in the whole dataset - ├ episode_data_index: contains 2 tensors with the start and end indices of each episode - │ ├ from (1D int64 tensor): first frame index for each episode — shape (num episodes,) starts with 0 - │ └ to: (1D int64 tensor): last frame index for each episode — shape (num episodes,) - ├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance - │ ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.} - │ ... - ├ info: a dictionary of metadata on the dataset - │ ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with - │ ├ fps (float): frame per second the dataset is recorded/synchronized to - │ ├ video (bool): indicates if frames are encoded in mp4 video files to save space or stored as png files - │ └ encoding (dict): if video, this documents the main options that were used with ffmpeg to encode the videos - ├ videos_dir (Path): where the mp4 videos or png images are stored/accessed - └ camera_keys (list of string): the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`) -``` - -A `LeRobotDataset` is serialised using several widespread file formats for each of its parts, namely: - -- hf_dataset stored using Hugging Face datasets library serialization to parquet -- videos are stored in mp4 format to save space -- metadata are stored in plain json/jsonl files - -Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can use the `local_files_only` argument and specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location. - ## Acknowledgment Special thanks to the [Lerobot teams](https://github.com/huggingface/lerobot) for making this great framework.