fix(smolvla2): only regenerate chunk when queue is fully drained

The previous refresh threshold (queue > chunk_size // 2) made each
new chunk *telescope* past the previous one: at queue=25, we kicked
off a new chunk forward from the current observation, but by the
time the new chunk's first action was actually dispatched, the
robot had executed the remaining 25 actions of the previous chunk
— so the new chunk was planned from an observation 25+ steps stale.

Canonical sense → think → act loop: execute the full chunk, then
re-observe and replan. Refresh only when the queue is empty. Every
step of every chunk still gets dispatched to the robot (no
behaviour change there), but each chunk is now planned from an
observation that's at most one chunk's worth of dispatch latency
old, not "previous chunk's worth of stale state on top of that".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-12 17:15:02 +02:00
parent 01e2228b24
commit d866c2c9fd
@@ -95,17 +95,16 @@ class LowLevelForward(InferenceStep):
return None
# SmolVLA produces *action chunks* (typically 50 steps via
# flow-matching). The expensive part is the chunk forward;
# popping one action per dispatch tick is essentially free.
# Only regenerate when the queue is low so we don't burn one
# full chunk forward per chunk_hz tick when most of the
# previous chunk is still buffered.
# flow-matching). Every step gets dispatched to the robot;
# popping one per dispatch tick is essentially free. Only
# generate a new chunk once the previous one has fully
# drained — this is the canonical "sense → think → act"
# loop. Refreshing while a chunk is still queued causes the
# new chunk to "telescope" past the old one (planned from an
# observation that's already 25+ steps stale by the time it
# starts dispatching).
queue = state.setdefault("action_queue", [])
chunk_size = getattr(self.policy.config, "chunk_size", None) or getattr(
self.policy.config, "n_action_steps", 50
)
# Refresh threshold: keep at least half a chunk buffered.
if len(queue) > max(1, chunk_size // 2):
if len(queue) > 0:
return None
observation = self.observation_provider()