Files
any4lerobot/robocasa2lerobot/robocasa_utils/extract_subset.ipynb
T
Jibby Nguyen ef184e44be add support for robocasa2lerobot (#86)
* Support robocasa2lerobot

* Support robocasa2lerobot

* NIT: formatting

* update to latest lerobot

* update readme

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix h5py open

---------

Co-authored-by: Tavish <tavish9.chen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-21 15:55:33 +08:00

71 lines
2.2 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "44b6da09",
"metadata": {},
"source": [
"# Extract subset data \n",
"\n",
"Original hdf5 file contains about 3000 episodes. However, it contains a key \"masks\", which contain list of subset demo_ids. For example: 30_demos : [demo123, demo234, demo 345, etc.]\n",
"\n",
"Run the code bellow to extract only chosen subset demos, which is much smaller and easier for later process."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ac64550",
"metadata": {},
"outputs": [],
"source": [
"import h5py\n",
"\n",
"DATA_DIR=\"direction/to/your/hdf5/files/\"\n",
"# E.x: DATA_DIR=\"/projects/extern/kisski/kisski-spath/dir.project/VLA_3D/binh/robocasa/test\"\n",
"\n",
"# file_name = \"PnPCabToCounter.hdf5\"\n",
"# file_name = \"PnPCounterToCab.hdf5\"\n",
"# file_name = \"CoffeeSetupMug.hdf5\"\n",
"# file_name = \"TurnOnMicrowave.hdf5\"\n",
"file_name = \"TurnOffStove.hdf5\"\n",
"\n",
"file_path = DATA_DIR + \"/\" + file_name\n",
"\n",
"f = h5py.File(file_path, 'r')\n",
"chosen_demo_list = []\n",
"for i in f['mask']['100_demos'][:]: # or \"30_demos\"\n",
" chosen_demo_list.append(i.decode('utf-8'))\n",
" \n",
"chosen_data = []\n",
"for k in f['data'].keys():\n",
" if k in chosen_demo_list:\n",
" chosen_data.append(f['data'][k])\n",
" \n",
"with h5py.File(f\"direction_to_your_new_extracted_subset/{file_name}\", \"w\") as out:\n",
" out_data = out.create_group(\"data\")\n",
" \n",
" for key, val in f['data'].attrs.items():\n",
" out_data.attrs[key] = val # IMPORTANT: set attributes for new hdf5 files (need for reset env and later re-render)\n",
"\n",
" for grp in chosen_data:\n",
" name = grp.name.split(\"/\")[-1] # demo_xxx\n",
" grp.file.copy(grp, out_data, name=name)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "robocasa",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}