mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 09:09:48 +00:00
ef5879a02a
New ``lerobot.policies.pi052`` (parallel to ``smolvla2``) that adds
text-prediction + hierarchical-inference on top of the existing π0.5
implementation. Mirrors the paper's §IV.D dual-head training:
L = H(text) + α * ‖ω - a - f_θ_action(...)‖², α = 10
Components:
* ``configuration_pi052.py`` thin PI05Config subclass; adds
recipe_path, text/flow loss weights
(default α=10 per paper), prompt
dropout knobs, ``unfreeze_lm_head``.
* ``text_processor_pi052.py`` PI052TextTokenizerStep — concatenates
rendered messages as ``Role: ...``
plain text (PaliGemma has no chat
template), tokenises with the
PaliGemma tokenizer, builds a label
mask covering supervised target
spans. Includes Pi 0.7 §V.E
per-component prompt dropout.
* ``processor_pi052.py`` make_pi052_pre_post_processors —
Rename + Batch + Relative +
Normalize + RenderMessagesStep +
PI052TextTokenizerStep + Device.
Falls back to π0.5's plain pipeline
when recipe_path is unset.
* ``modeling_pi052.py`` PI052Policy(PI05Policy) — re-enables
PaliGemma ``lm_head``, computes
text_loss via CE on the supervised
span, sums with flow_loss in
forward(), and adds select_message
for AR text generation at inference
(same surface as
SmolVLA2Policy.select_message so
SmolVLA2Runtime drives it unchanged).
Plus the supporting plumbing:
* recipe ``configs/recipes/pi052_hirobot.yaml`` — same Hi-Robot blend
as smolvla2_hirobot.yaml, with the same ``${subtask}`` /
``if_present`` supervision fix (current span at every frame, not
``${next_subtask}``).
* SLURM ``examples/training/pi052_hirobot.slurm`` — full training
command matching the SmolVLA2 launcher.
* factory registration: ``--policy.type=pi052`` resolves to
PI052Policy with the new processor.
Same multi-rate runtime (``lerobot.policies.smolvla2.inference``)
drives this policy too — both expose ``predict_action_chunk`` for the
action expert and ``select_message`` for the LM head.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
76 lines
2.8 KiB
Bash
76 lines
2.8 KiB
Bash
#!/bin/bash
|
||
#SBATCH --job-name=pi052-hirobot
|
||
#SBATCH --partition=hopper-prod
|
||
#SBATCH --qos=high
|
||
#SBATCH --time=48:00:00
|
||
#SBATCH --ntasks=1
|
||
#SBATCH --gpus-per-task=8
|
||
|
||
# π0.5 v2 training — reproduces the π0.5 paper's hierarchical recipe.
|
||
#
|
||
# Same recipe blend as the SmolVLA2 stack (recipes/pi052_hirobot.yaml),
|
||
# just on the PaliGemma 2B + Gemma-300m action-expert backbone the
|
||
# paper uses. The text head learns subtask prediction via cross-
|
||
# entropy on supervised spans; the action expert learns the flow
|
||
# field. Paper §IV.D mixes the two losses with α=10, which we encode
|
||
# as flow_loss_weight=10 / text_loss_weight=1.
|
||
|
||
set -euo pipefail
|
||
|
||
cd "${LEROBOT_ROOT:-$HOME/lerobot}"
|
||
|
||
export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
|
||
export LD_LIBRARY_PATH="$HOME/miniconda3/lib:${LD_LIBRARY_PATH:-}"
|
||
export NCCL_TIMEOUT="${NCCL_TIMEOUT:-1800}"
|
||
export HF_HUB_DOWNLOAD_TIMEOUT="${HF_HUB_DOWNLOAD_TIMEOUT:-120}"
|
||
export WANDB_INIT_TIMEOUT="${WANDB_INIT_TIMEOUT:-300}"
|
||
|
||
DATASET="${DATASET:-pepijn223/super_poulain_full_tool3}"
|
||
POLICY_REPO_ID="${POLICY_REPO_ID:-pepijn223/pi052_hirobot_super_poulain}"
|
||
JOB_NAME="${JOB_NAME:-pi052-hirobot-super-poulain}"
|
||
NUM_PROCESSES="${NUM_PROCESSES:-8}"
|
||
BATCH_SIZE="${BATCH_SIZE:-32}"
|
||
STEPS="${STEPS:-15000}"
|
||
RUN_ID="${SLURM_JOB_ID:-$(date +%Y%m%d_%H%M%S)}"
|
||
OUTPUT_DIR="${OUTPUT_DIR:-/fsx/pepijn/outputs/train/pi052_hirobot_${STEPS}_${RUN_ID}}"
|
||
|
||
echo "Training pi052 on $DATASET"
|
||
echo " GPUs: $NUM_PROCESSES"
|
||
echo " batch: $BATCH_SIZE / GPU (global=$((NUM_PROCESSES * BATCH_SIZE)))"
|
||
echo " steps: $STEPS"
|
||
echo " output: $OUTPUT_DIR"
|
||
echo " loss mix: flow_loss_weight=10 (paper α), text_loss_weight=1"
|
||
echo " augmentation: image_transforms ON, prompt dropout {plan:0.30 memory:0.30 subtask:0.20}"
|
||
|
||
accelerate launch --multi_gpu --num_processes="$NUM_PROCESSES" \
|
||
-m lerobot.scripts.lerobot_train \
|
||
--policy.type=pi052 \
|
||
--policy.recipe_path=recipes/pi052_hirobot.yaml \
|
||
--dataset.repo_id="$DATASET" \
|
||
--dataset.revision=main \
|
||
--dataset.video_backend=pyav \
|
||
--output_dir="$OUTPUT_DIR" \
|
||
--job_name="$JOB_NAME" \
|
||
--policy.repo_id="$POLICY_REPO_ID" \
|
||
--policy.compile_model=false \
|
||
--policy.device=cuda \
|
||
--policy.tokenizer_max_length=512 \
|
||
--policy.text_loss_weight=1.0 \
|
||
--policy.flow_loss_weight=10.0 \
|
||
--policy.unfreeze_lm_head=true \
|
||
--steps="$STEPS" \
|
||
--policy.scheduler_decay_steps="$STEPS" \
|
||
--batch_size="$BATCH_SIZE" \
|
||
--wandb.enable=true \
|
||
--wandb.disable_artifact=true \
|
||
--wandb.project=hirobot \
|
||
--log_freq=100 \
|
||
--save_freq="$STEPS" \
|
||
--num_workers=0 \
|
||
--dataset.image_transforms.enable=true \
|
||
--dataset.image_transforms.max_num_transforms=3 \
|
||
--dataset.image_transforms.random_order=true \
|
||
--policy.plan_dropout_prob=0.30 \
|
||
--policy.memory_dropout_prob=0.30 \
|
||
--policy.subtask_dropout_prob=0.20
|