fix(smolvla2): make HighLevelSubtaskFwd actually fire at low hz + quiet startup log

Two runtime fixes that surfaced from on-robot testing. (1) HighLevelSubtaskFwd was double-gated: HzTrigger fires every period (e.g. every 5s at --high_level_hz=0.2) AND the step requires the action queue to be empty. The queue-empty window is brief (~tens of ms between drain and refill) and almost never coincides with the low-hz timer, so HL effectively never fired and the subtask shown in the runtime panel stayed on the dataset's frame-0 annotation. Add HzTrigger.rearm() and have HighLevelSubtaskFwd call it when skipping due to queue-non-empty — the trigger stays armed and tries again on the next tick instead of waiting another full period. LowLevelForward keeps the original "skip" semantics because chunk_hz is meant as a true upper bound on chunk-generation rate. (2) The "robot state at startup" warning in _build_robot_observation_provider was meant to fire once but wasn't gated by _resize_logged like the sibling "camera ... live=AxB" warning. Result: it spammed every observation tick (~1-2s). Gate it on first_call (snapshot of _resize_logged["done"]) so both logs fire once at session start. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-07-09 19:11:44 +00:00 · 2026-05-22 11:04:12 +00:00
parent ca1b951e7b
commit 77a16db529
3 changed files with 40 additions and 9 deletions
@@ -463,8 +463,16 @@ class HighLevelSubtaskFwd(InferenceStep):
        # of 30/sec and the robot barely moves. Tying it to the same
        # "queue empty" condition as the chunk refresh produces a
        # clean sense → think → act cycle.
+        #
+        # Rearm the trigger when skipping so a low-hz schedule
+        # (e.g. ``--high_level_hz=0.2`` = once per 5 s) doesn't lose
+        # the slot: the trigger fires once on the timer but the brief
+        # queue-empty window almost never coincides, so without rearm
+        # HL would effectively never run.
        queue = state.get("action_queue") or []
        if len(queue) > 0:
+            if hasattr(self.trigger, "rearm"):
+                self.trigger.rearm()
            return None
        ctx = _msgs_for_subtask(state)
        observation = _maybe_observation(self.observation_provider)
@@ -82,7 +82,15 @@ class Trigger(Protocol):

@dataclass
 class HzTrigger:
-    """Fire at most ``hz`` times per second."""
+    """Fire at most ``hz`` times per second.
+
+    A step that gates further (e.g. ``HighLevelSubtaskFwd`` skipping
+    when the action queue is non-empty) and wants the trigger to
+    retry next tick instead of waiting a full period can call
+    :meth:`rearm` from inside ``run``. Without this, a low-hz trigger
+    (e.g. ``hz=0.2`` = once per 5 s) almost never coincides with the
+    brief queue-empty window and the step never fires at all.
+    """

    hz: float
    _last_seconds: float | None = field(default=None, init=False)
@@ -94,6 +102,15 @@ class HzTrigger:
            return True
        return False

+    def rearm(self) -> None:
+        """Mark the trigger as not having fired, so the next tick re-evaluates.
+
+        Used by a step that decided to skip after ``should_fire`` already
+        committed the firing — keeps the cadence honest without losing
+        the slot.
+        """
+        self._last_seconds = None
+

@dataclass
 class EventTrigger:
@@ -772,6 +772,11 @@ def _build_robot_observation_provider(
                import cv2 as _cv2  # noqa: PLC0415
                import numpy as _np  # noqa: PLC0415

+                # Snapshot the gate state at the start of the call: the
+                # camera info and startup-state warnings are meant to fire
+                # exactly once (operator sanity check), so gate them on
+                # the *previous* value rather than the post-loop value.
+                first_call = not _resize_logged["done"]
                for cam_key, (target_h, target_w) in target_image_shapes.items():
                    img = raw.get(cam_key)
                    if img is None or not isinstance(img, _np.ndarray):
@@ -779,7 +784,7 @@ def _build_robot_observation_provider(
                    if img.ndim != 3:
                        continue
                    cur_h, cur_w = img.shape[:2]
-                    if not _resize_logged["done"]:
+                    if first_call:
                        logger.warning(
                            "camera %s: live=%dx%d, training=%dx%d (resize=%s)",
                            cam_key,
@@ -793,13 +798,14 @@ def _build_robot_observation_provider(
                        continue
                    raw[cam_key] = _cv2.resize(img, (target_w, target_h), interpolation=_cv2.INTER_AREA)
                _resize_logged["done"] = True
-                # Also print the state vector once so the operator
-                # can eyeball it against the dataset's stats. State
-                # OOD is a real failure mode for VLAs — the prefix
-                # carries state via the projection layer, and a
-                # neutral home pose can easily sit a couple σ off
-                # the supervised support region.
-                if "observation.state" in (ds_features or {}):
+                # Print the state vector once so the operator can eyeball
+                # it against the dataset's stats. State OOD is a real
+                # failure mode for VLAs — the prefix carries state via
+                # the projection layer, and a neutral home pose can
+                # easily sit a couple σ off the supervised support
+                # region. Gated on ``first_call`` so this doesn't spam
+                # every observation tick.
+                if first_call and "observation.state" in (ds_features or {}):
                    state_names = ds_features["observation.state"].get("names") or []
                    state_vals = [raw.get(n) for n in state_names]
                    logger.warning(