merge: resolve conflicts from main into refactor/benchmark-dispatch

Keep refactored dispatch pattern (no factory.py edits for new benchmarks).
Incorporate main's "Verifying your integration" section and class naming fix.

Made-with: Cursor
This commit is contained in:
Pepijn
2026-04-03 14:49:36 +02:00
2 changed files with 12 additions and 3 deletions
+11 -2
View File
@@ -186,7 +186,7 @@ Register a config dataclass so users can select your benchmark with `--env.type=
```python
@EnvConfig.register_subclass("<benchmark_name>")
@dataclass
class MyBenchmarkEnv(EnvConfig):
class MyBenchmarkEnvConfig(EnvConfig):
task: str = "<default_task>"
fps: int = <fps>
obs_type: str = "pixels_agent_pos"
@@ -229,7 +229,7 @@ Key points:
- `features_map` maps raw observation keys to LeRobot convention keys.
- **No changes to `factory.py` needed** — the factory delegates to `cfg.create_envs()` and `cfg.get_env_processors()` automatically.
### 3. Env processor (optional) (`src/lerobot/processor/env_processor.py`)
### 3. Env processor (optional`src/lerobot/processor/env_processor.py`)
Only needed if your benchmark requires observation transforms beyond what `preprocess_observation()` handles (e.g. image flipping, coordinate conversion). Define the processor step here and return it from `get_env_processors()` in your config (see step 2):
@@ -293,6 +293,15 @@ Add your benchmark to the "Benchmarks" section:
title: "Benchmarks"
```
## Verifying your integration
After completing the steps above, confirm that everything works:
1. **Install** — `pip install -e ".[mybenchmark]"` and verify the dependency group installs cleanly.
2. **Smoke test env creation** — call `make_env()` with your config in Python, check that the returned dict has the expected `{suite: {task_id: VectorEnv}}` shape, and that `reset()` returns observations with the right keys.
3. **Run a full eval** — `lerobot-eval --env.type=<name> --env.task=<task> --eval.n_episodes=1 --eval.batch_size=1 --policy.path=<any_compatible_policy>` to exercise the full pipeline end-to-end.
4. **Check success detection** — verify that `info["is_success"]` flips to `True` when the task is actually completed. This is what the eval loop uses to compute success rates.
## Writing a benchmark doc page
Each benchmark `.mdx` page should include: