Compare commits

...

5 Commits

Author SHA1 Message Date
Haoming Song 3b5b94dbd6 optmize topreward input processing (#3660) 2026-05-25 16:07:45 +02:00
Cole 616663cd9f fix(rewards/topreward): fix pyproject extra typo and simplify processor (#3653)
Add lerobot[topreward] extra to all in
pyproject.toml, drop the redundant labels arg in scoring, and
collapse the dead-branch shape check in the encoder processor.
2026-05-23 00:27:09 +02:00
Khalil Meftah 5cfca59ec7 fix(rewards/topreward): add missing input keys mm_token_type_ids 2026-05-21 11:05:02 +02:00
Khalil Meftah f6ecb7b955 refactor(rewards): clean up TOPReward processor/model 2026-05-20 17:39:21 +02:00
Khalil Meftah 70ad322676 feat(rewards): add TOPReward reward model 2026-05-19 18:00:18 +02:00
15 changed files with 1891 additions and 3 deletions
+2
View File
@@ -73,6 +73,8 @@
- sections:
- local: sarm
title: SARM
- local: topreward
title: TOPReward
title: "Reward Models"
- sections:
- local: inference
+177
View File
@@ -0,0 +1,177 @@
# TOPReward
TOPReward is a **zero-shot reward model** that extracts token log-probabilities from an off-the-shelf vision-language model (VLM) as a robotic reward signal. Given a video trajectory and a task instruction, it returns the VLM's log-likelihood that the instruction is true — no fine-tuning required.
**Paper**: [TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics](https://arxiv.org/abs/2602.19313)
**Project**: [topreward.github.io](https://topreward.github.io/webpage/)
**Original code**: [github.com/TOPReward/TOPReward](https://github.com/TOPReward/TOPReward)
**Default backbone**: [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
## Overview
TOPReward asks a generic VLM how likely a task instruction is, **conditioned on the video** of a robot trying to complete that task. Concretely, given:
- A trajectory video (a sequence of frames).
- A task instruction (e.g. _"open the drawer"_).
it builds a chat prompt of the form
```text
<video>
"The above video shows a robot manipulation trajectory that completes the
following task: <instruction> Decide whether the above statement is True
or not. The answer is: True"
```
forwards it through the VLM, label-masks everything except the very last token, and reads back the log-probability of that token — by default the literal `"True"` that closes the suffix template. The resulting `log P("True" | video + prompt + instruction)` is the reward.
Because the method only depends on a frozen VLM, TOPReward is **zero-shot**: there are no fine-tuned weights to host. The "model" in LeRobot is a small wrapper around `transformers`' `Qwen3VLForConditionalGeneration` plus the label-masking logic. The processor owns the tokeniser and builds the full chat prompt (EO-1/Robometer pattern).
## What the LeRobot integration covers
- Standard `reward_model.type=topreward` configuration through LeRobot.
- VLM loading via the `transformers` `Qwen3VLForConditionalGeneration` API.
- Prompt assembly + tokenisation in the processor (matching upstream `QwenClient.compute_instruction_reward`).
- `compute_reward()` returns one scalar log-prob per sample.
- LeRobot reward-model save/load — `save_pretrained` writes only `config.json` (the VLM is identified by `vlm_name`).
- An offline labeling script that writes a `topreward_progress.parquet` (SARM-compatible schema) for RA-BC and overlay.
The current LeRobot port supports the **Qwen3-VL client only**. Other upstream clients (Gemini, OpenAI, Gemma, Molmo) can be added as follow-up extras.
## Installation Requirements
1. Install LeRobot following the [Installation Guide](./installation).
2. Install the TOPReward optional extra:
```bash
pip install -e ".[topreward]"
```
or, with `uv` from a source checkout:
```bash
uv sync --extra topreward
```
This pulls in `transformers`. The first time you run TOPReward, Hugging Face will also download the VLM weights from the Hub (~16 GB for Qwen3-VL-8B-Instruct). A GPU is strongly recommended.
## Model Inputs and Outputs
TOPReward expects:
- A trajectory video or sequence of frames.
- A natural-language task description.
In LeRobot datasets the preprocessor reads:
| Config field | Default | Meaning |
| ------------------------- | --------------------------- | --------------------------------------------- |
| `reward_model.image_key` | `observation.images.top` | Camera observation used by TOPReward |
| `reward_model.task_key` | `task` | Key in complementary data for the task string |
| `reward_model.max_frames` | `16` | Cap on frames per sample |
| `reward_model.fps` | `2.0` | Metadata passed to the Qwen video processor |
| `reward_model.vlm_name` | `Qwen/Qwen3-VL-8B-Instruct` | Hugging Face Hub id of the underlying VLM |
The model returns:
- `compute_reward(batch)`: one log-probability per sample. Higher = better task-video alignment. When `success_threshold` is finite, returns the binary thresholded value instead.
## Usage
### Load the reward model directly
```python
from lerobot.rewards.topreward import TOPRewardConfig, TOPRewardModel
cfg = TOPRewardConfig(
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
device="cuda",
)
reward_model = TOPRewardModel(cfg)
```
### Use the reward factory
```python
from lerobot.rewards import make_reward_model, make_reward_model_config, make_reward_pre_post_processors
cfg = make_reward_model_config(
"topreward",
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
device="cuda",
image_key="observation.images.top",
)
reward_model = make_reward_model(cfg)
preprocessor, postprocessor = make_reward_pre_post_processors(cfg)
```
The preprocessor tokenises the full prompt (video + prefix + instruction suffix), writes Qwen-VL tensors + `prompt_length` under `observation.topreward.*`. The model reads those tensors, label-masks based on `prompt_length`, and extracts the log-prob reward.
### Offline dataset labeling
Write a `topreward_progress.parquet` for RA-BC training and overlay videos:
```bash
# Sparse-dense (15 anchors per episode, matches upstream)
uv run python -m lerobot.rewards.topreward.compute_rabc_weights \
--dataset-repo-id lerobot/libero_10_image \
--num-samples 15 \
--device cuda
```
Then render the progress overlay for any episode:
```bash
uv run examples/dataset/create_progress_videos.py \
--repo-id lerobot/libero_10_image \
--episode 0 \
--progress-file topreward_progress.parquet \
--gif
```
## Configuration Notes
### Prompt knobs
The default prompt mirrors the upstream paper:
```text
prompt_prefix = "The above video shows a robot manipulation trajectory that completes the following task: "
prompt_suffix_template = "{instruction} Decide whether the above statement is True or not. The answer is: True"
```
Both are exposed on `TOPRewardConfig` for ablation. The suffix template **must** contain `{instruction}`.
### Chat template
`add_chat_template=True` wraps the full prompt (including instruction) with the tokenizer's chat template before tokenisation. Default is `False`, matching the upstream paper's main experiments.
## Limitations
- The current LeRobot port is **inference-only and zero-shot**; `forward()` is not overridden and `is_trainable` returns `False`.
- Only the **Qwen3-VL family** is supported; other upstream clients are out of scope.
- TOPReward inherits the underlying VLM's biases.
## References
- [TOPReward project page](https://topreward.github.io/webpage/)
- [TOPReward paper](https://arxiv.org/abs/2602.19313)
- [Original TOPReward code](https://github.com/TOPReward/TOPReward)
- [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
## Citation
```bibtex
@article{chen2026topreward,
title={TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics},
author={Chen, Shirui and Harrison, Cole and Lee, Ying-Chun and Yang, Angela Jin and
Ren, Zhongzheng and Ratliff, Lillian J and Duan, Jiafei and Fox, Dieter and
Krishna, Ranjay},
journal={arXiv preprint arXiv:2602.19313},
year={2026}
}
```
## License
The original TOPReward codebase is MIT-licensed. The LeRobot port follows the LeRobot Apache 2.0 license; the wrapped Qwen3-VL weights are subject to the original Qwen license.
+2
View File
@@ -209,6 +209,7 @@ groot = [
"flash-attn>=2.5.9,<3.0.0 ; sys_platform != 'darwin'"
]
sarm = ["lerobot[transformers-dep]", "pydantic>=2.0.0,<3.0.0", "faker>=33.0.0,<35.0.0", "lerobot[matplotlib-dep]", "lerobot[qwen-vl-utils-dep]"]
topreward = ["lerobot[transformers-dep]"]
xvla = ["lerobot[transformers-dep]"]
eo1 = ["lerobot[transformers-dep]", "lerobot[qwen-vl-utils-dep]"]
hilserl = ["lerobot[transformers-dep]", "lerobot[dataset]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"]
@@ -286,6 +287,7 @@ all = [
"lerobot[libero]; sys_platform == 'linux'",
"lerobot[metaworld]",
"lerobot[sarm]",
"lerobot[topreward]",
"lerobot[peft]",
# "lerobot[unitree_g1]", TODO: Unitree requires specific installation instructions for unitree_sdk2
]
+2
View File
@@ -21,11 +21,13 @@ from .factory import (
)
from .pretrained import PreTrainedRewardModel as PreTrainedRewardModel
from .sarm.configuration_sarm import SARMConfig as SARMConfig
from .topreward.configuration_topreward import TOPRewardConfig as TOPRewardConfig
__all__ = [
# Configuration classes
"RewardClassifierConfig",
"SARMConfig",
"TOPRewardConfig",
# Base class
"PreTrainedRewardModel",
# Factory functions
+17 -2
View File
@@ -26,6 +26,7 @@ from lerobot.processor import PolicyAction, PolicyProcessorPipeline
from .classifier.configuration_classifier import RewardClassifierConfig
from .pretrained import PreTrainedRewardModel
from .sarm.configuration_sarm import SARMConfig
from .topreward.configuration_topreward import TOPRewardConfig
def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
@@ -37,7 +38,7 @@ def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
Args:
name: The name of the reward model. Supported names are "reward_classifier",
"sarm".
"sarm", "topreward".
Returns:
The reward model class corresponding to the given name.
@@ -53,6 +54,10 @@ def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel
return SARMRewardModel
elif name == "topreward":
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
return TOPRewardModel
else:
try:
return _get_reward_model_cls_from_name(name=name)
@@ -69,7 +74,7 @@ def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:
Args:
reward_type: The type of the reward model. Supported types include
"reward_classifier", "sarm".
"reward_classifier", "sarm", "topreward".
**kwargs: Keyword arguments to be passed to the configuration class constructor.
Returns:
@@ -82,6 +87,8 @@ def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:
return RewardClassifierConfig(**kwargs)
elif reward_type == "sarm":
return SARMConfig(**kwargs)
elif reward_type == "topreward":
return TOPRewardConfig(**kwargs)
else:
try:
config_cls = RewardModelConfig.get_choice_class(reward_type)
@@ -162,6 +169,14 @@ def make_reward_pre_post_processors(
dataset_meta=kwargs.get("dataset_meta"),
)
elif isinstance(reward_cfg, TOPRewardConfig):
from lerobot.rewards.topreward.processor_topreward import make_topreward_pre_post_processors
return make_topreward_pre_post_processors(
config=reward_cfg,
dataset_stats=kwargs.get("dataset_stats"),
)
else:
try:
processors = _make_processors_from_reward_model_config(
+19
View File
@@ -0,0 +1,19 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .configuration_topreward import TOPRewardConfig
from .modeling_topreward import TOPRewardModel
from .processor_topreward import make_topreward_pre_post_processors
__all__ = ["TOPRewardConfig", "TOPRewardModel", "make_topreward_pre_post_processors"]
@@ -0,0 +1,353 @@
#!/usr/bin/env python
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Compute per-frame TOPReward progress curves for a LeRobot dataset.
For each episode, scores trajectory prefixes of increasing length using
the TOPReward reward model, min-max normalises the raw log-prob rewards per episode,
and writes a parquet file with one row per frame.
The parquet uses the same schema as SARM's :mod:`lerobot.rewards.sarm.compute_rabc_weights`.
Usage:
# Sparse-dense mode (15 anchors per episode, matches upstream)
python -m lerobot.rewards.topreward.compute_rabc_weights \\
--dataset-repo-id lerobot/libero_10_image \\
--num-samples 15
# Use a different VLM backbone
python -m lerobot.rewards.topreward.compute_rabc_weights \\
--dataset-repo-id lerobot/libero_10_image \\
--vlm-name Qwen/Qwen3-VL-4B-Instruct
"""
from __future__ import annotations
import argparse
import logging
from pathlib import Path
from typing import Any
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
import torch
from tqdm import tqdm
from lerobot.datasets import LeRobotDataset
from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
from lerobot.rewards.topreward.processor_topreward import TOPRewardEncoderProcessorStep
from lerobot.types import TransitionKey
DEFAULT_OUTPUT_FILENAME = "topreward_progress.parquet"
def get_reward_model_path_from_parquet(parquet_path: Path) -> str | None:
"""Read ``reward_model_path`` from parquet metadata if available."""
if not parquet_path.exists():
return None
try:
metadata = pq.read_metadata(parquet_path).schema.to_arrow_schema().metadata
if metadata and b"reward_model_path" in metadata:
return metadata[b"reward_model_path"].decode()
except Exception: # nosec B110
return None
return None
def _resolve_task(sample: dict[str, Any], default: str) -> str:
"""Best-effort task extraction from a dataset sample."""
task = sample.get("task")
if isinstance(task, str) and task:
return task
return default
def normalize_rewards(rewards: list[float] | np.ndarray) -> np.ndarray:
"""Min-max normalise raw log-prob rewards into ``[0, 1]``."""
rewards_arr = np.asarray(rewards, dtype=np.float64)
if rewards_arr.size == 0:
return rewards_arr.astype(np.float32)
if rewards_arr.size == 1:
return np.array([1.0], dtype=np.float32)
r_min, r_max = rewards_arr.min(), rewards_arr.max()
if r_max == r_min:
return np.ones_like(rewards_arr, dtype=np.float32)
return ((rewards_arr - r_min) / (r_max - r_min)).astype(np.float32)
def compute_instruction_rewards_for_prefixes(
model: TOPRewardModel,
encoder: TOPRewardEncoderProcessorStep,
dataset: LeRobotDataset,
ep_start: int,
num_frames: int,
task: str,
image_key: str,
num_samples: int | None,
device: str,
) -> np.ndarray:
"""Score an episode via prefix sweep and return a per-frame normalised curve."""
if num_samples is None or num_samples >= num_frames:
prefix_lengths = np.arange(1, num_frames + 1, dtype=np.int64)
else:
prefix_lengths = np.unique(np.linspace(1, num_frames, num_samples).round().astype(np.int64))
episode_frames = torch.stack([dataset[ep_start + i][image_key] for i in range(num_frames)])
rewards: list[float] = []
for length in prefix_lengths:
frames = episode_frames[: int(length)].unsqueeze(0) # (1, T, C, H, W)
transition = {
TransitionKey.OBSERVATION: {image_key: frames},
TransitionKey.COMPLEMENTARY_DATA: {"task": task},
}
encoded = encoder(transition)
obs = encoded[TransitionKey.OBSERVATION]
batch = {
key: value.to(device) if isinstance(value, torch.Tensor) else value for key, value in obs.items()
}
with torch.no_grad():
reward = model.compute_reward(batch)
rewards.append(float(reward.item()))
normalized_rewards = normalize_rewards(rewards)
if prefix_lengths.shape[0] == num_frames:
return normalized_rewards
return np.interp(
np.arange(1, num_frames + 1, dtype=np.float64),
prefix_lengths.astype(np.float64),
normalized_rewards.astype(np.float64),
).astype(np.float32)
def compute_topreward_progress(
dataset_repo_id: str,
reward_model_path: str | None = None,
vlm_name: str | None = None,
output_path: str | None = None,
device: str = "cuda",
num_samples: int | None = None,
fps: float | None = None,
episodes: list[int] | None = None,
) -> Path:
"""Run TOPReward over a dataset and write per-frame progress."""
if reward_model_path is not None:
logging.info(f"Loading TOPReward config from: {reward_model_path}")
model = TOPRewardModel.from_pretrained(reward_model_path)
config = model.config
config.device = device
if vlm_name is not None and vlm_name != config.vlm_name:
logging.info(f"Overriding vlm_name from config: {config.vlm_name} -> {vlm_name}")
config.vlm_name = vlm_name
model = TOPRewardModel(config)
else:
config_kwargs: dict[str, Any] = {"device": device}
if vlm_name is not None:
config_kwargs["vlm_name"] = vlm_name
if fps is not None:
config_kwargs["fps"] = fps
config = TOPRewardConfig(**config_kwargs)
logging.info(f"Constructing TOPReward with VLM: {config.vlm_name}")
model = TOPRewardModel(config)
model.to(device).eval()
encoder = TOPRewardEncoderProcessorStep(
vlm_name=config.vlm_name,
image_key=config.image_key,
task_key=config.task_key,
default_task=config.default_task,
max_frames=None, # no tail-crop: we control prefix length explicitly
fps=config.fps,
prompt_prefix=config.prompt_prefix,
prompt_suffix_template=config.prompt_suffix_template,
add_chat_template=config.add_chat_template,
max_length=config.max_input_length,
)
image_key = config.image_key
logging.info(f"Loading dataset: {dataset_repo_id}")
dataset = LeRobotDataset(dataset_repo_id, download_videos=True)
logging.info(f"Dataset: {dataset.num_episodes} episodes, {dataset.num_frames} frames")
episode_indices = list(range(dataset.num_episodes)) if episodes is None else episodes
logging.info(f"Processing {len(episode_indices)} episode(s)")
all_index: list[int] = []
all_episode: list[int] = []
all_frame: list[int] = []
all_progress: list[float] = []
for episode_idx in tqdm(episode_indices, desc="Episodes"):
ep = dataset.meta.episodes[episode_idx]
ep_start = int(ep["dataset_from_index"])
ep_end = int(ep["dataset_to_index"])
num_frames = ep_end - ep_start
if num_frames <= 0:
continue
first_sample = dataset[ep_start]
task = _resolve_task(first_sample, default=config.default_task or "perform the task")
per_frame = compute_instruction_rewards_for_prefixes(
model=model,
encoder=encoder,
dataset=dataset,
ep_start=ep_start,
num_frames=num_frames,
task=task,
image_key=image_key,
num_samples=num_samples,
device=device,
)
for local in range(num_frames):
all_index.append(ep_start + local)
all_episode.append(episode_idx)
all_frame.append(local)
all_progress.append(float(per_frame[local]))
if device.startswith("cuda"):
torch.cuda.empty_cache()
table = pa.table(
{
"index": np.asarray(all_index, dtype=np.int64),
"episode_index": np.asarray(all_episode, dtype=np.int64),
"frame_index": np.asarray(all_frame, dtype=np.int64),
"progress_sparse": np.asarray(all_progress, dtype=np.float32),
}
)
schema_metadata: dict[bytes, bytes] = {b"vlm_name": config.vlm_name.encode()}
if reward_model_path is not None:
schema_metadata[b"reward_model_path"] = reward_model_path.encode()
table = table.replace_schema_metadata(schema_metadata)
out = Path(dataset.root) / DEFAULT_OUTPUT_FILENAME if output_path is None else Path(output_path)
out.parent.mkdir(parents=True, exist_ok=True)
pq.write_table(table, out)
logging.info(f"Saved {len(table)} frame values to {out}")
progress_arr = np.asarray(all_progress, dtype=np.float32)
if progress_arr.size:
logging.info(
f"Progress: mean={float(progress_arr.mean()):.4f}, "
f"std={float(progress_arr.std()):.4f}, "
f"min={float(progress_arr.min()):.4f}, "
f"max={float(progress_arr.max()):.4f}"
)
return out
def main():
parser = argparse.ArgumentParser(
description="Compute per-frame TOPReward progress curves for RA-BC weighting.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Sparse-dense mode (matches upstream TOPReward num_samples=15)
python -m lerobot.rewards.topreward.compute_rabc_weights \\
--dataset-repo-id lerobot/libero_10_image \\
--num-samples 15
# Use a smaller VLM
python -m lerobot.rewards.topreward.compute_rabc_weights \\
--dataset-repo-id lerobot/libero_10_image \\
--vlm-name Qwen/Qwen3-VL-4B-Instruct
""",
)
parser.add_argument(
"--dataset-repo-id", type=str, required=True, help="HuggingFace dataset repo id or local path."
)
parser.add_argument(
"--reward-model-path", type=str, default=None, help="Optional TOPReward LeRobot config."
)
parser.add_argument("--vlm-name", type=str, default=None, help="Override the VLM backbone (HF Hub id).")
parser.add_argument("--output-path", type=str, default=None, help="Output parquet path.")
parser.add_argument("--device", type=str, default="cuda", help="Device to use (default: cuda).")
parser.add_argument(
"--num-samples",
type=int,
default=None,
help="Anchor prefix samples per episode. None = dense. 15 matches upstream.",
)
parser.add_argument(
"--episodes",
type=int,
nargs="+",
default=None,
help="Process only these episode indices (e.g. --episodes 0 or --episodes 0 5 10).",
)
parser.add_argument("--fps", type=float, default=None, help="Override TOPRewardConfig.fps.")
parser.add_argument(
"--push-to-hub", action="store_true", help="Upload to the dataset repo on HuggingFace Hub."
)
args = parser.parse_args()
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
output_path = compute_topreward_progress(
dataset_repo_id=args.dataset_repo_id,
reward_model_path=args.reward_model_path,
vlm_name=args.vlm_name,
output_path=args.output_path,
device=args.device,
num_samples=args.num_samples,
fps=args.fps,
episodes=args.episodes,
)
print(f"\nTOPReward progress saved to: {output_path}")
if args.push_to_hub:
from huggingface_hub import HfApi
api = HfApi()
hub_path = DEFAULT_OUTPUT_FILENAME
print(f"\nUploading to Hub: {args.dataset_repo_id}/{hub_path}")
api.upload_file(
path_or_fileobj=str(output_path),
path_in_repo=hub_path,
repo_id=args.dataset_repo_id,
repo_type="dataset",
)
print(
"Successfully uploaded to: "
f"https://huggingface.co/datasets/{args.dataset_repo_id}/blob/main/{hub_path}"
)
print("\nTo use in training, add to your config:")
print(" use_rabc: true")
print(f" rabc_progress_path: hf://datasets/{args.dataset_repo_id}/{hub_path}")
print(" rabc_head_mode: sparse")
else:
print("\nTo use in training, add to your config:")
print(" use_rabc: true")
print(f" rabc_progress_path: {output_path}")
print(" rabc_head_mode: sparse")
if __name__ == "__main__":
main()
@@ -0,0 +1,146 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
from dataclasses import dataclass, field
from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature
from lerobot.configs.rewards import RewardModelConfig
from lerobot.utils.constants import OBS_IMAGES
# Default prompt scaffolding from the upstream TOPReward paper / reference
# implementation (``QwenClient.compute_instruction_reward``). The prompt
# scores the terminal ``True`` token in ``f"{instruction} ... True"``
# given the video.
DEFAULT_PROMPT_PREFIX = (
"The above video shows a robot manipulation trajectory that completes the following task: "
)
DEFAULT_PROMPT_SUFFIX_TEMPLATE = (
"{instruction} Decide whether the above statement is True or not. The answer is: True"
)
@RewardModelConfig.register_subclass("topreward")
@dataclass
class TOPRewardConfig(RewardModelConfig):
"""Configuration for the TOPReward zero-shot reward model.
TOPReward is **zero-shot**: it has no learnable parameters of its own.
The "model" is a generic vision-language model (default
``Qwen/Qwen3-VL-8B-Instruct``) used with a fixed prompt to extract
token log-probabilities as a reward signal. There is therefore no
fine-tuned checkpoint to host: ``pretrained_path`` is unused at
runtime the model identity is :attr:`vlm_name` (an HF Hub id).
Args:
vlm_name: Hugging Face Hub id of the underlying VLM. Must be a
Qwen3-VL family model (the only client implemented in this
LeRobot port).
torch_dtype: Torch dtype name passed to the VLM loader
(``"auto"``, ``"bfloat16"``, ``"float16"``, ...).
attn_implementation: ``transformers`` attention implementation
(e.g. ``"flash_attention_2"``, ``"sdpa"``). Defaults to
``None`` so the upstream picks the best available.
image_key: Observation key that holds the trajectory frames.
task_key: Complementary-data key that holds the task instruction.
default_task: Fallback instruction when ``task_key`` is absent.
max_frames: Cap on the number of frames fed to the VLM per
sample. ``None`` = use all frames.
fps: Frames-per-second metadata for the Qwen video processor.
prompt_prefix: Text shown to the VLM right after the video and
before the suffix template.
prompt_suffix_template: Suffix appended after ``prompt_prefix``.
Must contain ``{instruction}``; the VLM scores the
log-likelihood of the tokens that follow the prefix.
add_chat_template: If ``True``, wrap the full prompt with the
tokenizer's chat template before tokenisation (matches
upstream ``add_chat_template=True``).
success_threshold: Optional log-prob threshold. If finite,
:meth:`TOPRewardModel.compute_reward` returns
``(reward > success_threshold).float()`` instead of the raw
log-prob.
max_input_length: Hard limit on the total tokenized input length;
samples that exceed it raise a ``ValueError``.
"""
# Path to a local LeRobot dir or HF repo that holds a ``config.json``
# snapshot of this TOPRewardConfig. The VLM weights themselves are
# always identified by ``vlm_name``.
pretrained_path: str | None = None
vlm_name: str = "Qwen/Qwen3-VL-8B-Instruct"
torch_dtype: str = "auto"
attn_implementation: str | None = None
image_key: str = OBS_IMAGES + ".top"
task_key: str = "task"
default_task: str | None = None
max_frames: int | None = 16
fps: float = 2.0
prompt_prefix: str = DEFAULT_PROMPT_PREFIX
prompt_suffix_template: str = DEFAULT_PROMPT_SUFFIX_TEMPLATE
add_chat_template: bool = False
success_threshold: float = float("-inf")
max_input_length: int = 32768
license: str | None = "mit" # matches upstream TOPReward
tags: list[str] | None = field(
default_factory=lambda: ["reward-model", "vision-language", "qwen3-vl", "zero-shot"]
)
input_features: dict[str, PolicyFeature] = field(default_factory=dict)
output_features: dict[str, PolicyFeature] = field(default_factory=dict)
normalization_mapping: dict[str, NormalizationMode] = field(
default_factory=lambda: {
"VISUAL": NormalizationMode.IDENTITY,
"REWARD": NormalizationMode.IDENTITY,
}
)
def __post_init__(self) -> None:
super().__post_init__()
if self.max_frames is not None and self.max_frames < 1:
raise ValueError(f"max_frames must be >= 1, got {self.max_frames}")
if self.fps <= 0:
raise ValueError(f"fps must be > 0, got {self.fps}")
if "{instruction}" not in self.prompt_suffix_template:
raise ValueError(
"prompt_suffix_template must contain `{instruction}` so the model "
"scores the log-likelihood of the task suffix."
)
if self.max_input_length <= 0:
raise ValueError(f"max_input_length must be > 0, got {self.max_input_length}")
if self.image_key not in self.input_features:
self.input_features[self.image_key] = PolicyFeature(shape=(3, 224, 224), type=FeatureType.VISUAL)
self.output_features.setdefault("reward", PolicyFeature(shape=(1,), type=FeatureType.REWARD))
@property
def observation_delta_indices(self) -> list[int] | None:
return None
@property
def action_delta_indices(self) -> None:
return None
@property
def reward_delta_indices(self) -> None:
return None
def validate_features(self) -> None:
if self.image_key not in self.input_features:
raise ValueError(f"TOPReward requires image input feature {self.image_key!r}")
@@ -0,0 +1,238 @@
# Copyright 2026 Shirui Chen, Cole Harrison, Ying-Chun Lee, Angela Jin Yang,
# Zhongzheng Ren, Lillian J. Ratliff, Jiafei Duan, Dieter Fox, Ranjay Krishna
# and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics.
Paper: https://arxiv.org/abs/2602.19313
Project: https://topreward.github.io/webpage/
Original code: https://github.com/TOPReward/TOPReward
Backbone: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct (default)
TOPReward is a **zero-shot** reward model: it has no fine-tuned weights of
its own. Given a video trajectory and a task instruction, it asks an
off-the-shelf VLM how likely the instruction is, conditioned on the video,
and returns that log-likelihood as the reward signal.
Inference recipe:
1. The processor builds a chat-style prompt, tokenises it, and emits
``input_ids``, ``attention_mask``, vision tensors, and ``labels``.
The processor label-masks everything except the terminal answer token with
``-100``.
2. Forward the full token sequence through the VLM.
3. Read the terminal answer token log-probability from the logits as the
scalar reward.
With the default ``prompt_suffix_template``, the only unmasked token is the
literal ``"True"`` at the end the reward is
``log P("True" | video + prompt + instruction)``.
This LeRobot port is **inference-only and not trainable** :meth:`forward`
is intentionally inherited from :class:`PreTrainedRewardModel` and raises
``NotImplementedError``, making :attr:`PreTrainedRewardModel.is_trainable`
return ``False``.
Because the VLM weights live on the Hugging Face Hub under their canonical
id (``Qwen/Qwen3-VL-8B-Instruct`` etc.) and TOPReward never modifies them,
:meth:`_save_pretrained` and :meth:`from_pretrained` are overridden so a
TOPReward LeRobot "checkpoint" is a single ``config.json`` (the VLM is
re-fetched from the Hub at load time).
"""
from __future__ import annotations
import builtins
import logging
import os
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import TYPE_CHECKING, Any, TypeVar
import numpy as np
import torch
from huggingface_hub import HfApi, hf_hub_download
from huggingface_hub.constants import CONFIG_NAME
from huggingface_hub.errors import HfHubHTTPError
from torch import Tensor
from torch.nn.functional import cross_entropy
from lerobot.configs.rewards import RewardModelConfig
from lerobot.rewards.pretrained import PreTrainedRewardModel
from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig
from lerobot.rewards.topreward.processor_topreward import TOPREWARD_FEATURE_PREFIX, TOPREWARD_INPUT_KEYS
from lerobot.utils.import_utils import _transformers_available, require_package
if TYPE_CHECKING:
from lerobot.configs.train import TrainPipelineConfig
if TYPE_CHECKING or _transformers_available:
from transformers import Qwen3VLForConditionalGeneration
else:
Qwen3VLForConditionalGeneration = None # type: ignore[assignment]
logger = logging.getLogger(__name__)
T = TypeVar("T", bound="TOPRewardModel")
def _torch_dtype(name: str) -> torch.dtype | str:
"""Resolve a torch dtype name; ``"auto"`` is passed through verbatim."""
if name == "auto":
return "auto"
dtype = getattr(torch, name, None)
if isinstance(dtype, torch.dtype):
return dtype
raise ValueError(f"Unknown torch dtype: {name!r}")
class TOPRewardModel(PreTrainedRewardModel):
"""TOPReward zero-shot reward model."""
name = "topreward"
config_class = TOPRewardConfig
def __init__(self, config: TOPRewardConfig) -> None:
require_package("transformers", extra="topreward")
super().__init__(config)
self.config = config
torch_dtype = _torch_dtype(config.torch_dtype)
model_kwargs: dict[str, Any] = {"dtype": torch_dtype, "trust_remote_code": True}
if config.attn_implementation is not None:
model_kwargs["attn_implementation"] = config.attn_implementation
self.model = Qwen3VLForConditionalGeneration.from_pretrained(config.vlm_name, **model_kwargs)
def compute_reward(self, batch: dict[str, Any]) -> Tensor:
"""Return one log-prob reward per sample in the batch."""
inputs: dict[str, Any] = {}
for key in TOPREWARD_INPUT_KEYS:
batch_key = f"{TOPREWARD_FEATURE_PREFIX}{key}"
if batch_key not in batch:
raise KeyError(
f"TOPReward batch missing `{batch_key}`. Make sure the "
"TOPRewardEncoderProcessorStep ran before `compute_reward`."
)
inputs[key] = batch[batch_key]
device = next(self.model.parameters()).device
inputs = {key: value.to(device) if hasattr(value, "to") else value for key, value in inputs.items()}
labels = inputs.pop("labels")
inputs["logits_to_keep"] = 2
self.eval()
with torch.no_grad():
outputs = self.model(**inputs)
logits = outputs.logits
rewards = -cross_entropy(logits[:, -2, :].float(), labels[:, -1], reduction="none")
if np.isfinite(self.config.success_threshold):
rewards = (rewards > self.config.success_threshold).float()
return rewards.to(self.config.device or "cpu")
def _save_pretrained(self, save_directory: Path) -> None:
"""Save ``config.json`` only."""
self.config._save_pretrained(save_directory)
@classmethod
def from_pretrained(
cls: builtins.type[T],
pretrained_name_or_path: str | Path,
*,
config: RewardModelConfig | None = None,
force_download: bool = False,
resume_download: bool | None = None,
proxies: dict | None = None,
token: str | bool | None = None,
cache_dir: str | Path | None = None,
local_files_only: bool = False,
revision: str | None = None,
strict: bool = False, # noqa: ARG003 — accepted for API parity; unused (no safetensors to load)
**kwargs: Any,
) -> T:
"""Load a TOPReward configuration and instantiate the wrapped VLM."""
if config is None:
config = RewardModelConfig.from_pretrained(
pretrained_name_or_path=pretrained_name_or_path,
force_download=force_download,
resume_download=resume_download,
proxies=proxies,
token=token,
cache_dir=cache_dir,
local_files_only=local_files_only,
revision=revision,
**kwargs,
)
if not isinstance(config, TOPRewardConfig):
raise TypeError(
f"Expected a TOPRewardConfig, got {type(config).__name__}. Make sure "
f"`pretrained_name_or_path={pretrained_name_or_path!r}` points at a "
"TOPReward checkpoint."
)
model_id = str(pretrained_name_or_path)
if not os.path.isdir(model_id):
try:
hf_hub_download(
repo_id=model_id,
filename=CONFIG_NAME,
revision=revision,
cache_dir=cache_dir,
force_download=force_download,
proxies=proxies,
resume_download=resume_download,
token=token,
local_files_only=local_files_only,
)
except HfHubHTTPError as e:
raise FileNotFoundError(
f"{CONFIG_NAME} not found on the HuggingFace Hub in {model_id}"
) from e
instance = cls(config, **kwargs)
instance.to(config.device)
instance.eval()
return instance
def push_model_to_hub(self, cfg: TrainPipelineConfig):
"""Push the TOPReward ``config.json`` + model card to the Hub."""
api = HfApi()
repo_id = api.create_repo(
repo_id=self.config.repo_id, private=self.config.private, exist_ok=True
).repo_id
with TemporaryDirectory(ignore_cleanup_errors=True) as tmp:
saved_path = Path(tmp) / repo_id
saved_path.mkdir(parents=True, exist_ok=True)
self.config._save_pretrained(saved_path)
card = self.generate_model_card(
cfg.dataset.repo_id, self.config.type, self.config.license, self.config.tags
)
card.save(str(saved_path / "README.md"))
cfg.save_pretrained(saved_path)
commit_info = api.upload_folder(
repo_id=repo_id,
repo_type="model",
folder_path=saved_path,
commit_message="Upload TOPReward config and readme",
allow_patterns=["*.json", "*.yaml", "*.md"],
ignore_patterns=["*.tmp", "*.log", "*.safetensors"],
)
logger.info(f"Model pushed to {commit_info.repo_url.url}")
@@ -0,0 +1,305 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TOPReward pre/post processing pipeline."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import TYPE_CHECKING, Any
import torch
from torch import Tensor
from lerobot.configs import PipelineFeatureType, PolicyFeature
from lerobot.processor import (
AddBatchDimensionProcessorStep,
DeviceProcessorStep,
PolicyAction,
PolicyProcessorPipeline,
ProcessorStep,
ProcessorStepRegistry,
policy_action_to_transition,
)
from lerobot.rewards.topreward.configuration_topreward import (
DEFAULT_PROMPT_PREFIX,
DEFAULT_PROMPT_SUFFIX_TEMPLATE,
TOPRewardConfig,
)
from lerobot.types import EnvTransition, TransitionKey
from lerobot.utils.constants import (
OBS_IMAGES,
OBS_PREFIX,
POLICY_POSTPROCESSOR_DEFAULT_NAME,
POLICY_PREPROCESSOR_DEFAULT_NAME,
)
from lerobot.utils.import_utils import _transformers_available, require_package
if TYPE_CHECKING or _transformers_available:
from transformers import AutoProcessor
else:
AutoProcessor = None
TOPREWARD_FEATURE_PREFIX = f"{OBS_PREFIX}topreward."
_TRUE_ANSWER = "True"
TOPREWARD_VLM_INPUT_KEYS = (
"input_ids",
"attention_mask",
"pixel_values_videos",
"video_grid_thw",
"mm_token_type_ids",
)
TOPREWARD_INPUT_KEYS = TOPREWARD_VLM_INPUT_KEYS + ("labels",)
def _prepare_video_batch(video: Tensor, *, max_frames: int | None) -> Tensor:
"""Return videos as ``(B, T, C, H, W)`` uint8 tensors for Qwen3-VL."""
if video.ndim == 4:
video = video.unsqueeze(1)
elif video.ndim != 5:
raise ValueError(
f"Expected TOPReward frames with shape (B,C,H,W) or (B,T,C,H,W); got {tuple(video.shape)}"
)
if max_frames is not None:
video = video[:, -max_frames:]
if video.shape[-1] in (1, 3):
video = video.permute(0, 1, 4, 2, 3)
elif video.shape[2] not in (1, 3):
raise ValueError(f"Expected channel dim of size 1 or 3, got shape {tuple(video.shape)}")
if video.is_floating_point():
video = video * 255.0
return video.clamp(0, 255).to(torch.uint8).contiguous()
def _expand_tasks(task: Any, *, batch_size: int, default: str | None) -> list[str]:
if task is None:
task = default
if task is None:
raise KeyError("TOPReward expected a task description in complementary data")
if isinstance(task, str):
return [task] * batch_size
if isinstance(task, tuple):
task = list(task)
if not (isinstance(task, list) and all(isinstance(item, str) for item in task)):
raise TypeError(f"TOPReward task must be a string or list of strings, got {type(task)}")
if len(task) == 1 and batch_size > 1:
return task * batch_size
if len(task) != batch_size:
raise ValueError(f"Expected {batch_size} tasks, got {len(task)}")
return task
@dataclass
@ProcessorStepRegistry.register(name="topreward_encoder")
class TOPRewardEncoderProcessorStep(ProcessorStep):
"""Encode raw frames + task into Qwen-VL tensors for the TOPReward model.
Loads a :class:`~transformers.AutoProcessor` matching ``vlm_name`` and
builds the full chat prompt including the instruction suffix. The
resulting ``input_ids``, ``attention_mask``, vision tensors, and
``labels`` are written under the ``observation.topreward.*`` namespace
so the model can score without re-tokenising.
At call time the step reads:
- ``observation[image_key]``: ``(B, T, C, H, W)`` or ``(B, C, H, W)`` frames.
- ``complementary_data[task_key]``: a string or list of strings.
and writes ``observation[f"{TOPREWARD_FEATURE_PREFIX}<name>"]`` for the
Qwen-VL tensors plus ``labels``.
"""
vlm_name: str = "Qwen/Qwen3-VL-8B-Instruct"
image_key: str = OBS_IMAGES + ".top"
task_key: str = "task"
default_task: str | None = None
max_frames: int | None = 16
fps: float = 2.0
prompt_prefix: str = DEFAULT_PROMPT_PREFIX
prompt_suffix_template: str = DEFAULT_PROMPT_SUFFIX_TEMPLATE
add_chat_template: bool = False
max_length: int = 32768
_processor: Any = field(default=None, init=False, repr=False)
def __post_init__(self) -> None:
require_package("transformers", extra="topreward")
self._processor = AutoProcessor.from_pretrained(self.vlm_name, trust_remote_code=True)
def __call__(self, transition: EnvTransition) -> EnvTransition:
observation = transition.get(TransitionKey.OBSERVATION)
complementary = transition.get(TransitionKey.COMPLEMENTARY_DATA) or {}
if self.image_key not in observation:
raise KeyError(f"TOPReward expected image key {self.image_key!r} in observation")
frames = observation[self.image_key]
videos = frames.detach().cpu() if isinstance(frames, Tensor) else torch.as_tensor(frames)
videos = _prepare_video_batch(videos, max_frames=self.max_frames)
batch_size = videos.shape[0]
tasks = _expand_tasks(
complementary.get(self.task_key, self.default_task),
batch_size=batch_size,
default=self.default_task,
)
encoded = self._encode_batch(videos, tasks, batch_size)
new_observation = dict(observation)
for key, value in encoded.items():
new_observation[f"{TOPREWARD_FEATURE_PREFIX}{key}"] = value
new_transition = transition.copy()
new_transition[TransitionKey.OBSERVATION] = new_observation
return new_transition
def _encode_batch(self, videos: Tensor, tasks: list[str], batch_size) -> dict[str, Any]:
"""Tokenise a batch of (frames, task) pairs into Qwen-VL tensors.
The loop only builds per-sample chat strings. Tokenisation, padding,
video preprocessing, and label construction are batched.
"""
texts: list[str] = []
video_metadata = [
{
"total_num_frames": int(videos.shape[1]),
"fps": float(self.fps),
"frames_indices": list(range(int(videos.shape[1]))),
}
for _ in range(batch_size)
]
eos_token = self._processor.tokenizer.eos_token
for i in range(batch_size):
instruction_suffix = self.prompt_suffix_template.format(instruction=tasks[i])
if self.add_chat_template:
suffix_for_template = instruction_suffix.removesuffix(_TRUE_ANSWER).rstrip()
templated_messages = [
{
"role": "user",
"content": [
{"type": "video", "video": videos[i], "fps": self.fps},
{"type": "text", "text": f"{self.prompt_prefix}{suffix_for_template}"},
],
}
]
prompt_chat = self._processor.apply_chat_template(
templated_messages, tokenize=False, add_generation_prompt=True
)
full_text = f"{prompt_chat}{_TRUE_ANSWER}"
else:
user_messages = [
{
"role": "user",
"content": [
{"type": "video", "video": videos[i], "fps": self.fps},
{"type": "text", "text": self.prompt_prefix},
],
}
]
prompt_chat = self._processor.apply_chat_template(
user_messages, tokenize=False, add_generation_prompt=False
)
if eos_token is not None:
prompt_chat = prompt_chat.split(eos_token)[0]
full_text = f"{prompt_chat}{instruction_suffix}"
texts.append(full_text)
result = self._processor(
text=texts,
videos=videos,
video_metadata=video_metadata,
do_sample_frames=False,
padding=True,
padding_side="left",
return_tensors="pt",
)
input_ids = result["input_ids"]
if input_ids.shape[-1] > self.max_length:
raise ValueError(
f"TOPReward input length {input_ids.shape[-1]} exceeds max_length "
f"{self.max_length}; lower `max_frames` or raise `max_length`."
)
labels = torch.full_like(input_ids, -100)
labels[:, -1] = input_ids[:, -1]
result["labels"] = labels
return result
def transform_features(
self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
return features
def get_config(self) -> dict[str, Any]:
return {
"vlm_name": self.vlm_name,
"image_key": self.image_key,
"task_key": self.task_key,
"default_task": self.default_task,
"max_frames": self.max_frames,
"fps": self.fps,
"prompt_prefix": self.prompt_prefix,
"prompt_suffix_template": self.prompt_suffix_template,
"add_chat_template": self.add_chat_template,
"max_length": self.max_length,
}
def make_topreward_pre_post_processors(
config: TOPRewardConfig,
dataset_stats: dict[str, dict[str, Any]] | None = None,
) -> tuple[
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
PolicyProcessorPipeline[PolicyAction, PolicyAction],
]:
"""Pipeline that pre-encodes frames + task into Qwen-VL tensors.
The preprocessor adds a batch dimension if needed, runs TOPReward's
encoder (which tokenises the full prompt and emits ``labels``), and
moves everything to the configured device. The postprocessor is
the identity since TOPReward outputs a single reward tensor.
"""
preprocessor = PolicyProcessorPipeline[dict[str, Any], dict[str, Any]](
steps=[
AddBatchDimensionProcessorStep(),
TOPRewardEncoderProcessorStep(
vlm_name=config.vlm_name,
image_key=config.image_key,
task_key=config.task_key,
default_task=config.default_task,
max_frames=config.max_frames,
fps=config.fps,
prompt_prefix=config.prompt_prefix,
prompt_suffix_template=config.prompt_suffix_template,
add_chat_template=config.add_chat_template,
max_length=config.max_input_length,
),
DeviceProcessorStep(device=config.device or "cpu"),
],
name=POLICY_PREPROCESSOR_DEFAULT_NAME,
)
postprocessor = PolicyProcessorPipeline(
name=POLICY_POSTPROCESSOR_DEFAULT_NAME,
to_transition=policy_action_to_transition,
)
return preprocessor, postprocessor
@@ -13,6 +13,8 @@
A reward classifier is a lightweight neural network that scores observations or trajectories for task success, providing a learned reward signal or offline evaluation when explicit rewards are unavailable.
{% elif model_name == "sarm" %}
A Success-Aware Reward Model (SARM) predicts a dense reward signal from observations, typically used downstream for reinforcement learning or human-in-the-loop fine-tuning when task success is not directly observable.
{% elif model_name == "topreward" %}
TOPReward is a **zero-shot** reward model that extracts token log-probabilities from an off-the-shelf vision-language model (default Qwen3-VL) as a reward signal. Given a video trajectory and a task instruction, it returns the VLM's log-likelihood of the instruction being true, with no fine-tuning required.
{% else %}
_Reward model type not recognized — please update this template._
{% endif %}
+296
View File
@@ -0,0 +1,296 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for the TOPReward reward model."""
from __future__ import annotations
from types import SimpleNamespace
import pytest
import torch
from lerobot.configs.rewards import RewardModelConfig
from lerobot.rewards.factory import get_reward_model_class, make_reward_model_config
from lerobot.rewards.topreward import TOPRewardConfig
from lerobot.rewards.topreward.processor_topreward import TOPREWARD_FEATURE_PREFIX, TOPREWARD_INPUT_KEYS
from tests.utils import skip_if_package_missing
class _FakeQwenModel(torch.nn.Module):
"""Stand-in for ``Qwen3VLForConditionalGeneration``.
Returns a ``SimpleNamespace`` with ``logits`` of a controlled shape so
the log-prob extraction path in ``compute_reward`` can be exercised
without downloading real VLM weights.
"""
def __init__(self) -> None:
super().__init__()
self._param = torch.nn.Parameter(torch.zeros(1))
self._reward_value: float = -1.5
@classmethod
def from_pretrained(cls, *args, **kwargs): # noqa: ARG003
return cls()
def forward( # noqa: ARG002
self, input_ids, attention_mask=None, labels=None, logits_to_keep=0, **kwargs
):
batch_size, seq_len = input_ids.shape
vocab_size = 1000
logits = torch.zeros(batch_size, seq_len, vocab_size)
# Place a controlled log-prob at the target token position so the
# model returns a predictable reward value.
# The label-masked suffix is the last token.
# After the causal-LM shift (logits[:, :-1], labels[:, 1:]) the scored
# position is logits[:, -2, :] predicting labels[:, -1].
# We set logits so that log_softmax at the target token ≈ _reward_value.
for i in range(batch_size):
target_idx = int(input_ids[i, -1].item())
logits[i, -2, target_idx] = self._reward_value * -10 # high logit -> high log-prob
if logits_to_keep:
logits = logits[:, -logits_to_keep:, :]
return SimpleNamespace(logits=logits)
def _patch_build(monkeypatch) -> None:
"""Stub out HF AutoX so TOPReward construction is cheap and offline."""
from lerobot.rewards.topreward import modeling_topreward
monkeypatch.setattr(modeling_topreward, "Qwen3VLForConditionalGeneration", _FakeQwenModel)
def _make_batch(
input_ids: torch.Tensor,
attention_mask: torch.Tensor | None = None,
labels: torch.Tensor | None = None,
*,
omit: str | None = None,
) -> dict[str, torch.Tensor]:
"""Build a ``compute_reward``-ready batch using TOPReward's namespaced keys."""
batch_size, seq_len = input_ids.shape
if attention_mask is None:
attention_mask = torch.ones(batch_size, seq_len, dtype=torch.long)
batch: dict[str, torch.Tensor] = {}
if labels is not None:
batch[f"{TOPREWARD_FEATURE_PREFIX}labels"] = labels
batch.update(
{
f"{TOPREWARD_FEATURE_PREFIX}input_ids": input_ids,
f"{TOPREWARD_FEATURE_PREFIX}attention_mask": attention_mask,
f"{TOPREWARD_FEATURE_PREFIX}pixel_values_videos": torch.zeros(
batch_size, 1536, dtype=torch.float32
),
f"{TOPREWARD_FEATURE_PREFIX}video_grid_thw": torch.ones(batch_size, 3, dtype=torch.long),
f"{TOPREWARD_FEATURE_PREFIX}mm_token_type_ids": torch.zeros_like(input_ids),
}
)
if omit is not None:
batch.pop(f"{TOPREWARD_FEATURE_PREFIX}{omit}", None)
return batch
def _terminal_labels(input_ids: torch.Tensor) -> torch.Tensor:
labels = torch.full_like(input_ids, -100)
labels[:, -1] = input_ids[:, -1]
return labels
# ---------------------------------------------------------------------------
# Registry + factory
# ---------------------------------------------------------------------------
def test_topreward_config_registered():
assert "topreward" in RewardModelConfig.get_known_choices()
assert RewardModelConfig.get_choice_class("topreward") is TOPRewardConfig
assert isinstance(make_reward_model_config("topreward", device="cpu"), TOPRewardConfig)
def test_topreward_factory_returns_in_tree_class():
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
assert get_reward_model_class("topreward") is TOPRewardModel
# ---------------------------------------------------------------------------
# Config validation
# ---------------------------------------------------------------------------
def test_topreward_config_rejects_zero_max_frames():
with pytest.raises(ValueError, match="max_frames must be >= 1"):
TOPRewardConfig(device="cpu", max_frames=0)
def test_topreward_config_rejects_non_positive_fps():
with pytest.raises(ValueError, match="fps must be > 0"):
TOPRewardConfig(device="cpu", fps=0.0)
def test_topreward_config_rejects_suffix_without_instruction_placeholder():
with pytest.raises(ValueError, match=r"\{instruction\}"):
TOPRewardConfig(device="cpu", prompt_suffix_template="no placeholder here")
# ---------------------------------------------------------------------------
# compute_reward
# ---------------------------------------------------------------------------
@skip_if_package_missing("transformers")
def test_topreward_compute_reward_returns_one_scalar_per_sample(monkeypatch):
"""``compute_reward`` must return a ``(B,)`` float32 tensor with one
log-prob reward per sample, consuming pre-encoded Qwen-VL tensors."""
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu")
model = TOPRewardModel(cfg)
input_ids = torch.randint(0, 100, (2, 10))
attention_mask = torch.ones(2, 10, dtype=torch.long)
labels = _terminal_labels(input_ids)
batch = _make_batch(input_ids, attention_mask, labels)
rewards = model.compute_reward(batch)
assert rewards.shape == (2,)
assert rewards.dtype == torch.float32
@skip_if_package_missing("transformers")
def test_topreward_compute_reward_applies_success_threshold(monkeypatch):
"""When ``success_threshold`` is finite, the model returns binary success."""
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu", success_threshold=0.0)
model = TOPRewardModel(cfg)
input_ids = torch.randint(0, 100, (2, 10))
attention_mask = torch.ones(2, 10, dtype=torch.long)
labels = _terminal_labels(input_ids)
batch = _make_batch(input_ids, attention_mask, labels)
rewards = model.compute_reward(batch)
assert rewards.shape == (2,)
assert set(rewards.tolist()).issubset({0.0, 1.0})
@skip_if_package_missing("transformers")
def test_topreward_compute_reward_errors_when_inputs_missing(monkeypatch):
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu")
model = TOPRewardModel(cfg)
with pytest.raises(KeyError, match=r"observation\.topreward\.input_ids"):
model.compute_reward(_make_batch(torch.randint(0, 100, (1, 10)), omit="input_ids"))
@skip_if_package_missing("transformers")
def test_topreward_compute_reward_errors_when_labels_missing(monkeypatch):
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu")
model = TOPRewardModel(cfg)
input_ids = torch.randint(0, 100, (1, 10))
with pytest.raises(KeyError, match=r"observation\.topreward\.labels"):
model.compute_reward(_make_batch(input_ids, labels=None))
@skip_if_package_missing("transformers")
def test_topreward_compute_reward_requires_all_encoder_keys(monkeypatch):
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu")
model = TOPRewardModel(cfg)
input_ids = torch.randint(0, 100, (1, 10))
labels = _terminal_labels(input_ids)
required_encoder_keys = set(TOPREWARD_INPUT_KEYS) - {"input_ids", "labels"}
for key in required_encoder_keys:
with pytest.raises(KeyError, match=rf"observation\.topreward\.{key}"):
model.compute_reward(_make_batch(input_ids, labels=labels, omit=key))
# ---------------------------------------------------------------------------
# Save / load — config-only checkpoint
# ---------------------------------------------------------------------------
@skip_if_package_missing("transformers")
def test_topreward_save_pretrained_writes_only_config_json(monkeypatch, tmp_path):
from huggingface_hub.constants import CONFIG_NAME, SAFETENSORS_SINGLE_FILE
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(
device="cpu",
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
fps=4.0,
image_key="observation.images.front",
)
model = TOPRewardModel(cfg)
model.save_pretrained(str(tmp_path))
assert (tmp_path / CONFIG_NAME).exists()
assert not (tmp_path / SAFETENSORS_SINGLE_FILE).exists()
@skip_if_package_missing("transformers")
def test_topreward_from_pretrained_local_dir_roundtrips_config(monkeypatch, tmp_path):
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(
device="cpu",
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
fps=4.0,
image_key="observation.images.front",
add_chat_template=True,
success_threshold=-1.5,
)
TOPRewardModel(cfg).save_pretrained(str(tmp_path))
reloaded = TOPRewardModel.from_pretrained(str(tmp_path))
assert isinstance(reloaded.config, TOPRewardConfig)
assert reloaded.config.vlm_name == "Qwen/Qwen3-VL-8B-Instruct"
assert reloaded.config.fps == 4.0
assert reloaded.config.image_key == "observation.images.front"
assert reloaded.config.add_chat_template is True
assert reloaded.config.success_threshold == -1.5
@skip_if_package_missing("transformers")
def test_topreward_is_not_trainable(monkeypatch):
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
_patch_build(monkeypatch)
cfg = TOPRewardConfig(device="cpu")
model = TOPRewardModel(cfg)
assert model.is_trainable is False
with pytest.raises(NotImplementedError, match="not trainable"):
model.forward({"x": torch.zeros(1)})
+80
View File
@@ -0,0 +1,80 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""End-to-end TOPReward smoke test with the real Qwen3-VL model."""
import os
import pytest
import torch
pytest.importorskip("transformers")
from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig # noqa: E402
from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel # noqa: E402
from lerobot.rewards.topreward.processor_topreward import ( # noqa: E402
TOPREWARD_FEATURE_PREFIX,
TOPREWARD_INPUT_KEYS,
make_topreward_pre_post_processors,
)
from tests.utils import require_cuda # noqa: E402
pytestmark = pytest.mark.skipif(
os.environ.get("CI") == "true" or os.environ.get("GITHUB_ACTIONS") == "true",
reason="This test requires downloading and loading Qwen3-VL and is not meant for CI",
)
def _make_dummy_topreward_batch(image_key: str, task_key: str) -> dict[str, object]:
num_frames = 4
image_size = 64
frames = torch.zeros(1, num_frames, 3, image_size, image_size, dtype=torch.uint8)
for frame_idx in range(num_frames):
frames[0, frame_idx, 0].fill_(min(frame_idx * 48, 255))
frames[0, frame_idx, 1].fill_(96)
frames[0, frame_idx, 2].fill_(192)
return {
image_key: frames,
task_key: ["pick up the red cube"],
}
@require_cuda
def test_topreward_full_qwen3vl_preprocessor_to_compute_reward():
cfg = TOPRewardConfig(
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
device="cuda",
max_frames=4,
fps=2.0,
max_input_length=4096,
)
preprocessor, _ = make_topreward_pre_post_processors(cfg)
encoded_batch = preprocessor(_make_dummy_topreward_batch(cfg.image_key, cfg.task_key))
for key in TOPREWARD_INPUT_KEYS:
assert f"{TOPREWARD_FEATURE_PREFIX}{key}" in encoded_batch
model = TOPRewardModel(cfg)
try:
model.to(cfg.device)
model.eval()
rewards = model.compute_reward(encoded_batch)
finally:
del model
torch.cuda.empty_cache()
assert rewards.shape == (1,)
assert rewards.dtype == torch.float32
assert torch.isfinite(rewards).all()
+246
View File
@@ -0,0 +1,246 @@
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for TOPReward's pre-processing helpers and encoder step."""
from __future__ import annotations
import pytest
import torch
from lerobot.configs import FeatureType, PipelineFeatureType, PolicyFeature
from lerobot.rewards.topreward.processor_topreward import (
TOPREWARD_FEATURE_PREFIX,
TOPREWARD_INPUT_KEYS,
_expand_tasks,
_prepare_video_batch,
)
from lerobot.types import TransitionKey
from tests.utils import skip_if_package_missing
# ---------------------------------------------------------------------------
# _prepare_video_batch — raw image/video batch -> (B, T, C, H, W) uint8
# ---------------------------------------------------------------------------
def test_prepare_video_batch_batched_chw_float_is_converted_to_uint8():
video = torch.rand(2, 4, 3, 8, 8)
tensor = _prepare_video_batch(video, max_frames=None)
assert tensor.shape == (2, 4, 3, 8, 8)
assert tensor.dtype == torch.uint8
assert tensor.min() >= 0 and tensor.max() <= 255
def test_prepare_video_batch_batched_thwc_uint8_is_permuted_to_channel_first():
video = torch.randint(0, 256, (2, 3, 8, 8, 3), dtype=torch.uint8)
tensor = _prepare_video_batch(video, max_frames=None)
assert tensor.shape == (2, 3, 3, 8, 8)
assert tensor.dtype == torch.uint8
def test_prepare_video_batch_max_frames_tail_crops_recent_frames():
video = torch.zeros(1, 10, 3, 4, 4)
for t in range(10):
video[:, t] = t / 9.0
tensor = _prepare_video_batch(video, max_frames=3)
assert tensor.shape == (1, 3, 3, 4, 4)
assert int(tensor[0, 0, 0, 0, 0]) == int(7 / 9 * 255)
assert int(tensor[0, -1, 0, 0, 0]) == 255
def test_prepare_video_batch_rejects_3d_input():
with pytest.raises(ValueError, match="Expected TOPReward frames"):
_prepare_video_batch(torch.zeros(4, 8, 8), max_frames=None)
def test_prepare_video_batch_floats_above_one_are_rescaled_and_clipped():
video = torch.full((1, 1, 3, 2, 2), 5.0)
tensor = _prepare_video_batch(video, max_frames=None)
assert tensor.shape == (1, 1, 3, 2, 2)
assert int(tensor.max()) == 255
def test_prepare_video_batch_clips_very_large_floats_to_uint8_max():
video = torch.full((1, 1, 3, 2, 2), 300.0)
tensor = _prepare_video_batch(video, max_frames=None)
assert int(tensor.max()) == 255
# ---------------------------------------------------------------------------
# _expand_tasks — string / list / tuple broadcasting to batch size
# ---------------------------------------------------------------------------
def test_expand_tasks_string_is_broadcast_to_batch_size():
assert _expand_tasks("pick up", batch_size=3, default=None) == ["pick up", "pick up", "pick up"]
def test_expand_tasks_list_of_matching_size_passes_through():
assert _expand_tasks(["a", "b", "c"], batch_size=3, default=None) == ["a", "b", "c"]
def test_expand_tasks_tuple_is_normalised_to_list():
assert _expand_tasks(("a", "b"), batch_size=2, default=None) == ["a", "b"]
def test_expand_tasks_single_element_list_is_broadcast():
assert _expand_tasks(["only one"], batch_size=3, default=None) == ["only one"] * 3
def test_expand_tasks_size_mismatch_raises():
with pytest.raises(ValueError, match="Expected 3 tasks"):
_expand_tasks(["a", "b"], batch_size=3, default=None)
def test_expand_tasks_missing_uses_default():
assert _expand_tasks(None, batch_size=2, default="fallback") == ["fallback", "fallback"]
def test_expand_tasks_missing_without_default_raises():
with pytest.raises(KeyError, match="task description"):
_expand_tasks(None, batch_size=1, default=None)
def test_expand_tasks_wrong_type_raises():
with pytest.raises(TypeError, match="must be a string or list"):
_expand_tasks(42, batch_size=1, default=None)
# ---------------------------------------------------------------------------
# Encoder step — stubbed AutoProcessor
# ---------------------------------------------------------------------------
def _skip_if_topreward_extras_missing(func):
func = skip_if_package_missing("transformers")(func)
return func
class _FakeTokenizer:
eos_token = "<|endoftext|>"
pad_token = "<|endoftext|>"
def __call__(self, *args, **kwargs):
return {"input_ids": torch.zeros(1, 10, dtype=torch.long)}
class _FakeAutoProcessor:
def __init__(self) -> None:
self.tokenizer = _FakeTokenizer()
@classmethod
def from_pretrained(cls, *args, **kwargs): # noqa: ARG003
return cls()
def apply_chat_template(self, messages, **kwargs): # noqa: ARG002
return "fake_prompt_text"
def __call__(self, text=None, images=None, videos=None, **kwargs): # noqa: ARG002
seq_len = 10
batch_size = len(text) if isinstance(text, list) else 1
return {
"input_ids": torch.randint(0, 100, (batch_size, seq_len)),
"attention_mask": torch.ones(batch_size, seq_len, dtype=torch.long),
"pixel_values_videos": torch.zeros(batch_size, 1536, dtype=torch.float32),
"video_grid_thw": torch.ones(batch_size, 3, dtype=torch.long),
"mm_token_type_ids": torch.zeros(batch_size, seq_len, dtype=torch.long),
}
def _build_step(monkeypatch, **overrides):
from lerobot.rewards.topreward import processor_topreward
monkeypatch.setattr(processor_topreward, "AutoProcessor", _FakeAutoProcessor)
return processor_topreward.TOPRewardEncoderProcessorStep(**overrides)
def _make_transition(observation: dict, complementary: dict | None = None) -> dict:
transition: dict = {TransitionKey.OBSERVATION: observation}
if complementary is not None:
transition[TransitionKey.COMPLEMENTARY_DATA] = complementary
return transition
@_skip_if_topreward_extras_missing
def test_encoder_step_emits_input_ids_and_labels(monkeypatch):
"""The processor must emit Qwen-VL tensors including ``input_ids`` and
``labels`` under the ``observation.topreward.*`` namespace."""
step = _build_step(monkeypatch)
frames_batch = torch.zeros(2, 4, 3, 8, 8)
out = step(
_make_transition(
observation={"observation.images.top": frames_batch},
complementary={"task": ["pick", "place"]},
)
)
obs_out = out[TransitionKey.OBSERVATION]
for key in TOPREWARD_INPUT_KEYS:
assert f"{TOPREWARD_FEATURE_PREFIX}{key}" in obs_out
input_ids = obs_out[f"{TOPREWARD_FEATURE_PREFIX}input_ids"]
labels = obs_out[f"{TOPREWARD_FEATURE_PREFIX}labels"]
assert labels.dtype == torch.long
assert labels.shape == (2, 10)
assert labels[:, :-1].eq(-100).all()
assert labels[:, -1].equal(input_ids[:, -1])
@_skip_if_topreward_extras_missing
def test_encoder_step_get_config_roundtrips_user_fields(monkeypatch):
step = _build_step(
monkeypatch,
vlm_name="Qwen/Qwen3-VL-8B-Instruct",
image_key="observation.images.cam_top",
task_key="task",
default_task="do the thing",
max_frames=8,
fps=4.0,
add_chat_template=True,
max_length=2048,
)
cfg = step.get_config()
assert cfg["vlm_name"] == "Qwen/Qwen3-VL-8B-Instruct"
assert cfg["image_key"] == "observation.images.cam_top"
assert cfg["default_task"] == "do the thing"
assert cfg["max_frames"] == 8
assert cfg["fps"] == 4.0
assert cfg["add_chat_template"] is True
assert cfg["max_length"] == 2048
@_skip_if_topreward_extras_missing
def test_encoder_step_transform_features_is_identity(monkeypatch):
step = _build_step(monkeypatch)
features = {
PipelineFeatureType.OBSERVATION: {
"observation.images.top": PolicyFeature(shape=(3, 224, 224), type=FeatureType.VISUAL),
}
}
assert step.transform_features(features) == features
@_skip_if_topreward_extras_missing
def test_encoder_step_rejects_missing_image_key(monkeypatch):
step = _build_step(monkeypatch, image_key="observation.images.top")
with pytest.raises(KeyError, match="image key"):
step(_make_transition(observation={}, complementary={"task": "pick"}))
Generated
+6 -1
View File
@@ -3009,6 +3009,9 @@ test = [
{ name = "pytest-cov" },
{ name = "pytest-timeout" },
]
topreward = [
{ name = "transformers" },
]
training = [
{ name = "accelerate" },
{ name = "av" },
@@ -3167,6 +3170,7 @@ requires-dist = [
{ name = "lerobot", extras = ["scipy-dep"], marker = "extra == 'wallx'" },
{ name = "lerobot", extras = ["smolvla"], marker = "extra == 'all'" },
{ name = "lerobot", extras = ["test"], marker = "extra == 'all'" },
{ name = "lerobot", extras = ["topreward"], marker = "extra == 'all'" },
{ name = "lerobot", extras = ["training"], marker = "extra == 'all'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'eo1'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'groot'" },
@@ -3177,6 +3181,7 @@ requires-dist = [
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'pi'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'sarm'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'smolvla'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'topreward'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'wallx'" },
{ name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'xvla'" },
{ name = "lerobot", extras = ["video-benchmark"], marker = "extra == 'all'" },
@@ -3244,7 +3249,7 @@ requires-dist = [
{ name = "transformers", marker = "extra == 'transformers-dep'", specifier = ">=5.4.0,<5.6.0" },
{ name = "wandb", marker = "extra == 'training'", specifier = ">=0.24.0,<0.25.0" },
]
provides-extras = ["dataset", "training", "hardware", "viz", "core-scripts", "evaluation", "dataset-viz", "av-dep", "pygame-dep", "placo-dep", "transformers-dep", "grpcio-dep", "can-dep", "peft-dep", "scipy-dep", "diffusers-dep", "qwen-vl-utils-dep", "matplotlib-dep", "pyserial-dep", "deepdiff-dep", "pynput-dep", "pyzmq-dep", "motorbridge-dep", "motorbridge-smart-servo-dep", "feetech", "dynamixel", "damiao", "robstride", "openarms", "gamepad", "hopejr", "lekiwi", "unitree-g1", "reachy2", "rebot", "kinematics", "intelrealsense", "phone", "diffusion", "wallx", "pi", "smolvla", "multi-task-dit", "groot", "sarm", "xvla", "eo1", "hilserl", "async", "peft", "dev", "notebook", "test", "video-benchmark", "aloha", "pusht", "libero", "metaworld", "all"]
provides-extras = ["dataset", "training", "hardware", "viz", "core-scripts", "evaluation", "dataset-viz", "av-dep", "pygame-dep", "placo-dep", "transformers-dep", "grpcio-dep", "can-dep", "peft-dep", "scipy-dep", "diffusers-dep", "qwen-vl-utils-dep", "matplotlib-dep", "pyserial-dep", "deepdiff-dep", "pynput-dep", "pyzmq-dep", "motorbridge-dep", "motorbridge-smart-servo-dep", "feetech", "dynamixel", "damiao", "robstride", "openarms", "gamepad", "hopejr", "lekiwi", "unitree-g1", "reachy2", "rebot", "kinematics", "intelrealsense", "phone", "diffusion", "wallx", "pi", "smolvla", "multi-task-dit", "groot", "sarm", "topreward", "xvla", "eo1", "hilserl", "async", "peft", "dev", "notebook", "test", "video-benchmark", "aloha", "pusht", "libero", "metaworld", "all"]
[[package]]
name = "librt"