# TOPReward TOPReward is a **zero-shot reward model** that extracts token log-probabilities from an off-the-shelf vision-language model (VLM) as a robotic reward signal. Given a video trajectory and a task instruction, it returns the VLM's log-likelihood that the instruction is true — no fine-tuning required. **Paper**: [TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics](https://arxiv.org/abs/2602.19313) **Project**: [topreward.github.io](https://topreward.github.io/webpage/) **Original code**: [github.com/TOPReward/TOPReward](https://github.com/TOPReward/TOPReward) **Default backbone**: [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) ## Overview TOPReward asks a generic VLM how likely a task instruction is, **conditioned on the video** of a robot trying to complete that task. Concretely, given: - A trajectory video (a sequence of frames). - A task instruction (e.g. _"open the drawer"_). it builds a chat prompt of the form ```text