feat(pi052): π0.5 v2 — full reproduction of the π0.5 paper recipe

New ``lerobot.policies.pi052`` (parallel to ``smolvla2``) that adds text-prediction + hierarchical-inference on top of the existing π0.5 implementation. Mirrors the paper's §IV.D dual-head training: L = H(text) + α * ‖ω - a - f_θ_action(...)‖², α = 10 Components: * ``configuration_pi052.py`` thin PI05Config subclass; adds recipe_path, text/flow loss weights (default α=10 per paper), prompt dropout knobs, ``unfreeze_lm_head``. * ``text_processor_pi052.py`` PI052TextTokenizerStep — concatenates rendered messages as ``Role: ...`` plain text (PaliGemma has no chat template), tokenises with the PaliGemma tokenizer, builds a label mask covering supervised target spans. Includes Pi 0.7 §V.E per-component prompt dropout. * ``processor_pi052.py`` make_pi052_pre_post_processors — Rename + Batch + Relative + Normalize + RenderMessagesStep + PI052TextTokenizerStep + Device. Falls back to π0.5's plain pipeline when recipe_path is unset. * ``modeling_pi052.py`` PI052Policy(PI05Policy) — re-enables PaliGemma ``lm_head``, computes text_loss via CE on the supervised span, sums with flow_loss in forward(), and adds select_message for AR text generation at inference (same surface as SmolVLA2Policy.select_message so SmolVLA2Runtime drives it unchanged). Plus the supporting plumbing: * recipe ``configs/recipes/pi052_hirobot.yaml`` — same Hi-Robot blend as smolvla2_hirobot.yaml, with the same ``${subtask}`` / ``if_present`` supervision fix (current span at every frame, not ``${next_subtask}``). * SLURM ``examples/training/pi052_hirobot.slurm`` — full training command matching the SmolVLA2 launcher. * factory registration: ``--policy.type=pi052`` resolves to PI052Policy with the new processor. Same multi-rate runtime (``lerobot.policies.smolvla2.inference``) drives this policy too — both expose ``predict_action_chunk`` for the action expert and ``select_message`` for the LM head. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-17 23:11:45 +00:00 · 2026-05-13 10:59:26 +02:00
parent 1d24301b67
commit ef5879a02a
8 changed files with 1127 additions and 0 deletions
@@ -0,0 +1,75 @@
+#!/bin/bash
+#SBATCH --job-name=pi052-hirobot
+#SBATCH --partition=hopper-prod
+#SBATCH --qos=high
+#SBATCH --time=48:00:00
+#SBATCH --ntasks=1
+#SBATCH --gpus-per-task=8
+
+# π0.5 v2 training — reproduces the π0.5 paper's hierarchical recipe.
+#
+# Same recipe blend as the SmolVLA2 stack (recipes/pi052_hirobot.yaml),
+# just on the PaliGemma 2B + Gemma-300m action-expert backbone the
+# paper uses. The text head learns subtask prediction via cross-
+# entropy on supervised spans; the action expert learns the flow
+# field. Paper §IV.D mixes the two losses with α=10, which we encode
+# as flow_loss_weight=10 / text_loss_weight=1.
+
+set -euo pipefail
+
+cd "${LEROBOT_ROOT:-$HOME/lerobot}"
+
+export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
+export LD_LIBRARY_PATH="$HOME/miniconda3/lib:${LD_LIBRARY_PATH:-}"
+export NCCL_TIMEOUT="${NCCL_TIMEOUT:-1800}"
+export HF_HUB_DOWNLOAD_TIMEOUT="${HF_HUB_DOWNLOAD_TIMEOUT:-120}"
+export WANDB_INIT_TIMEOUT="${WANDB_INIT_TIMEOUT:-300}"
+
+DATASET="${DATASET:-pepijn223/super_poulain_full_tool3}"
+POLICY_REPO_ID="${POLICY_REPO_ID:-pepijn223/pi052_hirobot_super_poulain}"
+JOB_NAME="${JOB_NAME:-pi052-hirobot-super-poulain}"
+NUM_PROCESSES="${NUM_PROCESSES:-8}"
+BATCH_SIZE="${BATCH_SIZE:-32}"
+STEPS="${STEPS:-15000}"
+RUN_ID="${SLURM_JOB_ID:-$(date +%Y%m%d_%H%M%S)}"
+OUTPUT_DIR="${OUTPUT_DIR:-/fsx/pepijn/outputs/train/pi052_hirobot_${STEPS}_${RUN_ID}}"
+
+echo "Training pi052 on $DATASET"
+echo "  GPUs:         $NUM_PROCESSES"
+echo "  batch:        $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))"
+echo "  steps:        $STEPS"
+echo "  output:       $OUTPUT_DIR"
+echo "  loss mix:     flow_loss_weight=10 (paper α), text_loss_weight=1"
+echo "  augmentation: image_transforms ON, prompt dropout {plan:0.30 memory:0.30 subtask:0.20}"
+
+accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
+    -m lerobot.scripts.lerobot_train \
+    --policy.type=pi052 \
+    --policy.recipe_path=recipes/pi052_hirobot.yaml \
+    --dataset.repo_id="$DATASET" \
+    --dataset.revision=main \
+    --dataset.video_backend=pyav \
+    --output_dir="$OUTPUT_DIR" \
+    --job_name="$JOB_NAME" \
+    --policy.repo_id="$POLICY_REPO_ID" \
+    --policy.compile_model=false \
+    --policy.device=cuda \
+    --policy.tokenizer_max_length=512 \
+    --policy.text_loss_weight=1.0 \
+    --policy.flow_loss_weight=10.0 \
+    --policy.unfreeze_lm_head=true \
+    --steps="$STEPS" \
+    --policy.scheduler_decay_steps="$STEPS" \
+    --batch_size="$BATCH_SIZE" \
+    --wandb.enable=true \
+    --wandb.disable_artifact=true \
+    --wandb.project=hirobot \
+    --log_freq=100 \
+    --save_freq="$STEPS" \
+    --num_workers=0 \
+    --dataset.image_transforms.enable=true \
+    --dataset.image_transforms.max_num_transforms=3 \
+    --dataset.image_transforms.random_order=true \
+    --policy.plan_dropout_prob=0.30 \
+    --policy.memory_dropout_prob=0.30 \
+    --policy.subtask_dropout_prob=0.20