Add inline offline validation with train/eval split (#3824)

* refactor(training): rename eval_freq to env_eval_freq - Rename eval_freq to env_eval_freq to distinguish sim environment evaluation from offline loss evaluation. * feat(training): add inline offline validation with train/eval split - Add eval_split config for balanced per-task holdout - Add eval_steps for periodic inline eval loss computation - Add max_eval_samples to cap eval cost * fix(datasets): remap absolute indices in __getitem__ for filtered datasets * fix(train): vectorize eval subset selection for max_eval_samples * fix(datasets): Move the remapping into EpisodeAwareSampler via absolute_to_relative_idx * fix(validation): add eval_split range check and eval_steps warning Validate eval_split is in [0.0, 1.0) to prevent garbage splits from out-of-range values. Raise when eval_steps > 0 but eval_split is 0.0 since no offline eval will run. * fix(train): prepare eval dataloader with accelerator for multi-GPU Prepare eval_dataloader through accelerator.prepare() so eval data is sharded across ranks instead of duplicated. Reduce eval_loss across ranks with mean reduction for consistent logging. * fix(test): rename eval_freq to env_eval_freq for multi-GPU training
2026-06-28 05:37:16 +00:00 · 2026-06-25 15:31:24 +02:00
parent c3f180e115
commit 6a788fbdb0
18 changed files with 199 additions and 32 deletions
@@ -167,9 +167,9 @@ jobs:

      # ── LIBERO TRAIN+EVAL SMOKE ──────────────────────────────────────────────
      # Train SmolVLA for 1 step (batch_size=1, dataset episode 0 only) then
-      # immediately runs eval inside the training loop (eval_freq=1, 1 episode).
+      # immediately runs eval inside the training loop (env_eval_freq=1, 1 episode).
      # Tests the full train→eval-within-training pipeline end-to-end.
-      - name: Run Libero train+eval smoke (1 step, eval_freq=1)
+      - name: Run Libero train+eval smoke (1 step, env_eval_freq=1)
        if: env.HF_USER_TOKEN != ''
        run: |
          docker run --name libero-train-smoke --gpus all \
@@ -196,7 +196,7 @@ jobs:
                --output_dir=/tmp/train-smoke \
                --steps=1 \
                --batch_size=1 \
-                --eval_freq=1 \
+                --env_eval_freq=1 \
                --eval.n_episodes=1 \
                --eval.batch_size=1 \
                --eval.use_async_envs=false \