optmize topreward input processing (#3660)

This commit is contained in:
Haoming Song
2026-05-25 22:07:45 +08:00
committed by GitHub
parent 616663cd9f
commit 3b5b94dbd6
10 changed files with 300 additions and 281 deletions
+1 -1
View File
@@ -53,7 +53,7 @@ or, with `uv` from a source checkout:
uv sync --extra topreward
```
This pulls in `transformers` and `qwen-vl-utils`. The first time you run TOPReward, Hugging Face will also download the VLM weights from the Hub (~16 GB for Qwen3-VL-8B-Instruct). A GPU is strongly recommended.
This pulls in `transformers`. The first time you run TOPReward, Hugging Face will also download the VLM weights from the Hub (~16 GB for Qwen3-VL-8B-Instruct). A GPU is strongly recommended.
## Model Inputs and Outputs