fix(smolvla2): only regenerate chunk when queue is fully drained

The previous refresh threshold (queue > chunk_size // 2) made each new chunk *telescope* past the previous one: at queue=25, we kicked off a new chunk forward from the current observation, but by the time the new chunk's first action was actually dispatched, the robot had executed the remaining 25 actions of the previous chunk — so the new chunk was planned from an observation 25+ steps stale. Canonical sense → think → act loop: execute the full chunk, then re-observe and replan. Refresh only when the queue is empty. Every step of every chunk still gets dispatched to the robot (no behaviour change there), but each chunk is now planned from an observation that's at most one chunk's worth of dispatch latency old, not "previous chunk's worth of stale state on top of that". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-24 10:16:09 +00:00 · 2026-05-12 17:15:02 +02:00
parent 01e2228b24
commit d866c2c9fd
1 changed files with 9 additions and 10 deletions
@@ -95,17 +95,16 @@ class LowLevelForward(InferenceStep):
            return None

        # SmolVLA produces *action chunks* (typically 50 steps via
-        # flow-matching). The expensive part is the chunk forward;
-        # popping one action per dispatch tick is essentially free.
-        # Only regenerate when the queue is low so we don't burn one
-        # full chunk forward per chunk_hz tick when most of the
-        # previous chunk is still buffered.
+        # flow-matching). Every step gets dispatched to the robot;
+        # popping one per dispatch tick is essentially free. Only
+        # generate a new chunk once the previous one has fully
+        # drained — this is the canonical "sense → think → act"
+        # loop. Refreshing while a chunk is still queued causes the
+        # new chunk to "telescope" past the old one (planned from an
+        # observation that's already 25+ steps stale by the time it
+        # starts dispatching).
        queue = state.setdefault("action_queue", [])
-        chunk_size = getattr(self.policy.config, "chunk_size", None) or getattr(
-            self.policy.config, "n_action_steps", 50
-        )
-        # Refresh threshold: keep at least half a chunk buffered.
-        if len(queue) > max(1, chunk_size // 2):
+        if len(queue) > 0:
            return None

        observation = self.observation_provider()