any4lerobot/libero2lerobot/README.md

# LIBERO to LeRobot

LIBERO consists of 4 task suites and 130 tasks for studying LLDM. Specifically, the tasks in 3 of the 4 task suites vary only in one type of knowledge, while the last task suite requires transfer of entangled knowledge. (Copied from [docs](https://lifelong-robot-learning.github.io/LIBERO/html/getting_started/overview.html))

## 🚀 What's New in This Script

In this dataset, we have made several key improvements:

- **OpenVLA-based LIBERO Regeneration**: Resolution enhancement, No-op action filtration, 180° RGB frame rotation, Failed trajectory filtering.
- **State Data Preservation**: Maintained native LIBERO state information (accessible via `states.ee_state`, `states.joint_state` and etc.).
- **Robust Conversion Pipeline**: Using DataTrove framework for High-speed dataset transformation and automatic failure recovery during conversion

Dataset Structure of `meta/info.json`:

```json
{
  "codebase_version": "v3.0", // lastest lerobot format
  "robot_type": "franka", // specific robot type
  "fps": 20, // control frequency
  "features": {
    "observation.images.image": {
        "dtype": "video",
        "shape": [
            256,
            256,
            3
        ],
        "names": [
            "height",
            "width",
            "rgb"
        ],
        "info": {
            "video.height": 256,
            "video.width": 256,
            "video.codec": "av1",
            "video.pix_fmt": "yuv420p",
            "video.is_depth_map": false,
            "video.fps": 20,
            "video.channels": 3,
            "has_audio": false
        }
    },
    // for more states key, see configs
    "observation.state": {
        "dtype": "float32",
        "shape": [
            8
        ],
        "names": {
            "motors": [
                "x",
                "y",
                "z",
                "roll",
                "pitch",
                "yaw",
                "gripper",
                "gripper"
            ]
        }
    },
    ...
    "action": {
        "dtype": "float32",
        "shape": [
            7
        ],
        "names": {
            "motors": [
                "x",
                "y",
                "z",
                "roll",
                "pitch",
                "yaw",
                "gripper"
            ]
        }
    },
    ...
  }
}
```

## Installation

1. Install LeRobot:
   Follow instructions in [official repo](https://github.com/huggingface/lerobot?tab=readme-ov-file#installation).

2. Install others:
   We use datatrove[ray] for parallel conversion, significantly speeding up data processing tasks by distributing the workload across multiple cores or nodes (if any).
   ```bash
   pip install h5py
   pip install -U datatrove
   pip install -U "datatrove[ray]" # if you want ray features
   ```

## Get started

> [!NOTE]
> This script supports converting from original hdf5 to lerobot. If you want to convert from rlds to lerobot, check [openx2lerobot](../openx2lerobot/README.md).

### Download source code:

```bash
git clone https://github.com/Tavish9/any4lerobot.git
```

### Regenerate LIBERO Trajectory:

1. [Install LIBERO dependency](https://github.com/Lifelong-Robot-Learning/LIBERO?tab=readme-ov-file#installtion)
2. Replace `libero_90` with your target libero dataset.

```bash
python libero_utils/regenerate_libero_dataset.py \
    --resolution 256 \
    --libero_task_suite libero_90 \
    --libero_raw_data_dir /path/to/libero/datasets/libero_90 \
    --libero_target_dir /path/to/libero/datasets/libero_90_no_noops
```

### Modify in `convert.sh`:

1. If you have installed `datatrove[ray]`, we recommend using `ray` executor for faster conversion.
2. Increase `workers` and `tasks-per-job` if you have sufficient computing resources.
3. To merge many datasets into one, simply specify both paths like: `--src-paths /path/libero_10 /path/libero_90`
4. To resume from a previous conversion, provide the appropriate log directory using `--resume-from-save` and `--resume-from-aggregate`
5. If you want different image resolution, regenerate the trajectory, and change the [config](./libero_utils/config.py). (DO NOT use resize)

```bash
python libero_h5.py \
    --src-paths /path/to/libero/ \
    --output-path /path/to/local \
    --executor local \
    --tasks-per-job 3 \
    --workers 10
```

### Execute the script:

#### For single node

```bash
bash convert.sh
```

#### For multi nodes (Install ray first)

**Direct Access to Nodes (2 nodes in example)**

On Node 1:

```bash
ray start --head --port=6379
```

On Node 2:

```bash
ray start --address='node_1_ip:6379'
```

On either Node, check the ray cluster status, and start the script

```bash
ray status
bash convert.sh
```

**Slurm-managed System**

```bash
#!/bin/bash
#SBATCH --job-name=ray-cluster
#SBATCH --ntasks=2
#SBATCH --nodes=2
#SBATCH --partition=partition

# Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)

head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)

# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
  head_node_ip=${ADDR[1]}
else
  head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi

port=6379
ip_head=$head_node_ip:$port
export ip_head
echo "IP Head: $ip_head"

echo "Starting HEAD at $head_node"
srun --nodes=1 --ntasks=1 -w "$head_node" \
    ray start --head \
    --node-ip-address="$head_node_ip" \
    --port=$port \
    --block &

sleep 10

# number of nodes other than the head node
worker_num=$((SLURM_JOB_NUM_NODES - 1))

for ((i = 1; i <= worker_num; i++)); do
    node_i=${nodes_array[$i]}
    echo "Starting WORKER $i at $node_i"
    srun --nodes=1 --ntasks=1 -w "$node_i" \
        ray start \
        --address "$ip_head" \
        --block &
    sleep 5
done

sleep 10

bash convert.sh
```

**Other Community Supported Cluster Managers**

See the [doc](https://docs.ray.io/en/latest/cluster/vms/user-guides/community/index.html) for more details.