examples(port_datasets): generalize RoboCasa builder + add smoke script

- Add ATOMIC_TASKS, COMPOSITE_UNSEEN_TASKS and four new --task-set keys
  (atomic, composite_unseen, composite_all, composite_atomic) so the same
  builder produces the 50-task target benchmark or the 300-task Human300
  pretraining slice (via --split=pretrain --task-set=all) without
  duplicating logic.
- Stop hardcoding the composite_seen tag on the HF push; tags are now
  derived from --split / --source / --task-set so atomic, composite_all,
  and pretrain runs land with accurate metadata.
- Refresh module docstring to match the broader scope.
- Add scripts/build_robocasa_smoke.sh: 2-atomic-task smoke dataset
  (~1k episodes, ~131k frames) for fast end-to-end training validation
  before kicking off Human300-scale runs.
This commit is contained in:
pepijn
2026-05-25 14:51:09 +00:00
parent 83d0c390da
commit 3fdfcb912a
2 changed files with 109 additions and 10 deletions
+47
View File
@@ -0,0 +1,47 @@
#!/bin/bash
# Build a tiny RoboCasa smoke dataset (2 short atomic tasks, all episodes) for
# fast end-to-end training validation before the real run.
#
# Defaults: target/human, OpenStandMixerHead + NavigateKitchen (~1k episodes,
# ~131k frames, ~109 min @ 20 fps), 2 SLURM workers on hopper-cpu.
#
# Override via env: TASKS, REPO_ID, WORK_DIR, WORKERS, CPUS, PARTITION, LOCAL=1.
set -euo pipefail
cd "${LEROBOT_ROOT:-$HOME/lerobot}"
source ~/miniconda3/etc/profile.d/conda.sh
conda activate lerobot
REPO_ID="${REPO_ID:-${HF_USER:?HF_USER is unset}/robocasa_smoke_2atomic_v3}"
WORK_DIR="${WORK_DIR:-/fsx/${USER}/robocasa/datasets/v1.0}"
ROBOCASA_ROOT="${ROBOCASA_ROOT:-/fsx/${USER}/robocasa}"
LOGS_DIR="${LOGS_DIR:-/fsx/${USER}/logs/robocasa}"
TASKS="${TASKS:-OpenStandMixerHead NavigateKitchen}"
WORKERS="${WORKERS:-2}"
CPUS="${CPUS:-8}"
PARTITION="${PARTITION:-hopper-cpu}"
LOCAL="${LOCAL:-0}"
ARGS=(
examples/port_datasets/slurm_build_robocasa_composite_seen.py
--repo-id="$REPO_ID"
--work-dir="$WORK_DIR"
--robocasa-root="$ROBOCASA_ROOT"
--split=target --source=human
--tasks $TASKS
--workers="$WORKERS"
--cpus-per-task="$CPUS"
--partition="$PARTITION"
--mem-per-cpu=4G
--time=04:00:00
--logs-dir="$LOGS_DIR"
--job-name=port_robocasa_smoke
)
if [[ "$LOCAL" == "1" ]]; then
ARGS+=(--slurm=0)
fi
echo "Smoke dataset: $REPO_ID"
echo "Tasks: $TASKS"
python "${ARGS[@]}"