merge: resolve conflicts from main into refactor/benchmark-dispatch

Keep refactored dispatch pattern (no factory.py edits for new benchmarks). Incorporate main's "Verifying your integration" section and class naming fix. Made-with: Cursor
2026-05-16 17:20:05 +00:00 · 2026-04-03 14:49:36 +02:00
parent 89ce91f69f 4dbbcca496
commit 0045f88355
2 changed files with 12 additions and 3 deletions
@@ -186,7 +186,7 @@ Register a config dataclass so users can select your benchmark with `--env.type=
 ```python
@EnvConfig.register_subclass("<benchmark_name>")
@dataclass
-class MyBenchmarkEnv(EnvConfig):
+class MyBenchmarkEnvConfig(EnvConfig):
    task: str = "<default_task>"
    fps: int = <fps>
    obs_type: str = "pixels_agent_pos"
@@ -229,7 +229,7 @@ Key points:
 - `features_map` maps raw observation keys to LeRobot convention keys.
 - **No changes to `factory.py` needed** — the factory delegates to `cfg.create_envs()` and `cfg.get_env_processors()` automatically.

-### 3. Env processor (optional) (`src/lerobot/processor/env_processor.py`)
+### 3. Env processor (optional — `src/lerobot/processor/env_processor.py`)

 Only needed if your benchmark requires observation transforms beyond what `preprocess_observation()` handles (e.g. image flipping, coordinate conversion). Define the processor step here and return it from `get_env_processors()` in your config (see step 2):

@@ -293,6 +293,15 @@ Add your benchmark to the "Benchmarks" section:
  title: "Benchmarks"
 ```

+## Verifying your integration
+
+After completing the steps above, confirm that everything works:
+
+1. **Install** — `pip install -e ".[mybenchmark]"` and verify the dependency group installs cleanly.
+2. **Smoke test env creation** — call `make_env()` with your config in Python, check that the returned dict has the expected `{suite: {task_id: VectorEnv}}` shape, and that `reset()` returns observations with the right keys.
+3. **Run a full eval** — `lerobot-eval --env.type=<name> --env.task=<task> --eval.n_episodes=1 --eval.batch_size=1 --policy.path=<any_compatible_policy>` to exercise the full pipeline end-to-end.
+4. **Check success detection** — verify that `info["is_success"]` flips to `True` when the task is actually completed. This is what the eval loop uses to compute success rates.
+
 ## Writing a benchmark doc page

 Each benchmark `.mdx` page should include: