Files
2026-01-05 15:23:22 +08:00

234 lines
6.1 KiB
Markdown

# LIBERO to LeRobot
LIBERO consists of 4 task suites and 130 tasks for studying LLDM. Specifically, the tasks in 3 of the 4 task suites vary only in one type of knowledge, while the last task suite requires transfer of entangled knowledge. (Copied from [docs](https://lifelong-robot-learning.github.io/LIBERO/html/getting_started/overview.html))
## 🚀 What's New in This Script
In this dataset, we have made several key improvements:
- **OpenVLA-based LIBERO Regeneration**: Resolution enhancement, No-op action filtration, 180° RGB frame rotation, Failed trajectory filtering.
- **State Data Preservation**: Maintained native LIBERO state information (accessible via `states.ee_state`, `states.joint_state` and etc.).
- **Robust Conversion Pipeline**: Using DataTrove framework for High-speed dataset transformation and automatic failure recovery during conversion
Dataset Structure of `meta/info.json`:
```json
{
"codebase_version": "v3.0", // lastest lerobot format
"robot_type": "franka", // specific robot type
"fps": 20, // control frequency
"features": {
"observation.images.image": {
"dtype": "video",
"shape": [
256,
256,
3
],
"names": [
"height",
"width",
"rgb"
],
"info": {
"video.height": 256,
"video.width": 256,
"video.codec": "av1",
"video.pix_fmt": "yuv420p",
"video.is_depth_map": false,
"video.fps": 20,
"video.channels": 3,
"has_audio": false
}
},
// for more states key, see configs
"observation.state": {
"dtype": "float32",
"shape": [
8
],
"names": {
"motors": [
"x",
"y",
"z",
"roll",
"pitch",
"yaw",
"gripper",
"gripper"
]
}
},
...
"action": {
"dtype": "float32",
"shape": [
7
],
"names": {
"motors": [
"x",
"y",
"z",
"roll",
"pitch",
"yaw",
"gripper"
]
}
},
...
}
}
```
## Installation
1. Install LeRobot:
Follow instructions in [official repo](https://github.com/huggingface/lerobot?tab=readme-ov-file#installation).
2. Install others:
We use datatrove[ray] for parallel conversion, significantly speeding up data processing tasks by distributing the workload across multiple cores or nodes (if any).
```bash
pip install h5py
pip install -U datatrove
pip install -U "datatrove[ray]" # if you want ray features
```
## Get started
> [!NOTE]
> This script supports converting from original hdf5 to lerobot. If you want to convert from rlds to lerobot, check [openx2lerobot](../openx2lerobot/README.md).
### Download source code:
```bash
git clone https://github.com/Tavish9/any4lerobot.git
```
### Regenerate LIBERO Trajectory:
1. [Install LIBERO dependency](https://github.com/Lifelong-Robot-Learning/LIBERO?tab=readme-ov-file#installtion)
2. Replace `libero_90` with your target libero dataset.
```bash
python libero_utils/regenerate_libero_dataset.py \
--resolution 256 \
--libero_task_suite libero_90 \
--libero_raw_data_dir /path/to/libero/datasets/libero_90 \
--libero_target_dir /path/to/libero/datasets/libero_90_no_noops
```
### Modify in `convert.sh`:
1. If you have installed `datatrove[ray]`, we recommend using `ray` executor for faster conversion.
2. Increase `workers` and `tasks-per-job` if you have sufficient computing resources.
3. To merge many datasets into one, simply specify both paths like: `--src-paths /path/libero_10 /path/libero_90`
4. To resume from a previous conversion, provide the appropriate log directory using `--resume-from-save` and `--resume-from-aggregate`
5. If you want different image resolution, regenerate the trajectory, and change the [config](./libero_utils/config.py). (DO NOT use resize)
```bash
python libero_h5.py \
--src-paths /path/to/libero/ \
--output-path /path/to/local \
--executor local \
--tasks-per-job 3 \
--workers 10
```
### Execute the script:
#### For single node
```bash
bash convert.sh
```
#### For multi nodes (Install ray first)
**Direct Access to Nodes (2 nodes in example)**
On Node 1:
```bash
ray start --head --port=6379
```
On Node 2:
```bash
ray start --address='node_1_ip:6379'
```
On either Node, check the ray cluster status, and start the script
```bash
ray status
bash convert.sh
```
**Slurm-managed System**
```bash
#!/bin/bash
#SBATCH --job-name=ray-cluster
#SBATCH --ntasks=2
#SBATCH --nodes=2
#SBATCH --partition=partition
# Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)
# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
head_node_ip=${ADDR[1]}
else
head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi
port=6379
ip_head=$head_node_ip:$port
export ip_head
echo "IP Head: $ip_head"
echo "Starting HEAD at $head_node"
srun --nodes=1 --ntasks=1 -w "$head_node" \
ray start --head \
--node-ip-address="$head_node_ip" \
--port=$port \
--block &
sleep 10
# number of nodes other than the head node
worker_num=$((SLURM_JOB_NUM_NODES - 1))
for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
echo "Starting WORKER $i at $node_i"
srun --nodes=1 --ntasks=1 -w "$node_i" \
ray start \
--address "$ip_head" \
--block &
sleep 5
done
sleep 10
bash convert.sh
```
**Other Community Supported Cluster Managers**
See the [doc](https://docs.ray.io/en/latest/cluster/vms/user-guides/community/index.html) for more details.