From 3da966b463f7085c1738acbd648598d8258d901a Mon Sep 17 00:00:00 2001 From: Steven Palma Date: Thu, 7 May 2026 21:27:42 +0200 Subject: [PATCH] docs(policy): contributing a policy guide --- docs/source/contributing_a_policy.mdx | 159 ++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 docs/source/contributing_a_policy.mdx diff --git a/docs/source/contributing_a_policy.mdx b/docs/source/contributing_a_policy.mdx new file mode 100644 index 000000000..d48cf7f7c --- /dev/null +++ b/docs/source/contributing_a_policy.mdx @@ -0,0 +1,159 @@ +# Contributing a Policy + +This is a practical guide for landing a new policy directly in the LeRobot codebase. It's the in-tree counterpart to [Bring Your Own Policies](./bring_your_own_policies), which packages a policy as an out-of-tree `lerobot_policy_*` plugin. The plugin route is faster (no PR required) and is usually the right starting point — land in `main` once the policy has stabilized and there's clear value in shipping it with the library. + +It assumes you've already read the general [contribution guide](./contributing) and the [PR template](https://github.com/huggingface/lerobot/blob/main/.github/PULL_REQUEST_TEMPLATE.md) — that's where you'll find the testing/quality expectations every PR has to meet (`pre-commit run -a`, `pytest`, the community-review rule, etc.). What's below is the policy-specific layer on top of that. + +A note on tone: robot-learning is an actively evolving field, and "what a policy looks like" can shift with each new architecture. The conventions described here exist because they let `lerobot-train` and `lerobot-eval` work uniformly across very different models. When a new policy genuinely doesn't fit them, raise it in your PR — the conventions are not sacred. + +--- + +## In-tree layout + +``` +src/lerobot/policies/my_policy/ +├── __init__.py # re-exports config + processor factory (NOT modeling) +├── configuration_my_policy.py # MyPolicyConfig + @register_subclass +├── modeling_my_policy.py # MyPolicy(PreTrainedPolicy) +├── processor_my_policy.py # make_my_policy_pre_post_processors +└── README.md # symlink → ../../../../docs/source/policy_my_policy_README.md +``` + +Two notes: + +- The `README.md` next to the source is a **symlink** into `docs/source/policy__README.md` — the actual file lives under `docs/`. Existing policies (act, smolvla, diffusion, …) all do this; copy one of those symlinks. The policy README is conventionally minimal: paper link + BibTeX citation. +- The user-facing tutorial — what to install, how to train, hyperparameters, benchmark numbers — lives separately at `docs/source/.mdx` and is registered in `_toctree.yml` under "Policies". + +The file names are load-bearing: the factory does lazy imports by name, and the processor is discovered by the `make__pre_post_processors` convention. + +--- + +## Policy class + +Inherit from [`PreTrainedPolicy`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/pretrained.py) and set two class attributes — both are checked by `__init_subclass__`: + +```python +class MyPolicy(PreTrainedPolicy): + config_class = MyPolicyConfig + name = "my_policy" # must match @register_subclass and --policy.type +``` + +The methods called by the train/eval loops: + +| Method | Used by | What it does | +| ----------------------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `reset() -> None` | `lerobot-eval` | Clear per-episode state at the start of each episode. | +| `select_action(batch, **kwargs) -> Tensor` | `lerobot-eval` | Return the next action `(B, action_dim)`. Called every step. | +| `predict_action_chunk(batch, **kwargs) -> Tensor` | the policy itself | Return an action chunk `(B, chunk_size, action_dim)`. Currently abstract on the base class — raise `NotImplementedError` if your policy doesn't chunk. | +| `forward(batch, reduction="mean") -> tuple[Tensor, dict \| None]` | `lerobot-train` | Return `(loss, output_dict)`. Must accept `reduction="none"` for per-sample weighting. | +| `get_optim_params() -> dict` | the optimizer | Return parameter groups; `{"params": self.parameters()}` is fine if you don't need per-group settings. | +| `update() -> None` _(optional)_ | `lerobot-train` | Called after each optimizer step _if defined_. Use for EMA, target nets, replay buffers (TDMPC uses this). | + +Batches are flat dictionaries keyed by the constants in [`lerobot.utils.constants`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/utils/constants.py): `OBS_STATE` (`observation.state.`), `OBS_IMAGES` (`observation.images.`), `OBS_LANGUAGE`, `ACTION`, etc. Reuse the constants — don't invent new prefixes. + +--- + +## Config class + +Inherit from [`PreTrainedConfig`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/configs/policies.py), decorate with `@PreTrainedConfig.register_subclass("my_policy")` (the string must match `MyPolicy.name`), and provide: + +- `validate_features()` — raises `ValueError` if the configured input/output features can't satisfy your policy. Call it explicitly from your policy's `__init__`. +- `get_optimizer_preset()` — return a config from `lerobot.optim` (default to AdamW unless you genuinely need otherwise). +- `get_scheduler_preset()` — return a `LRSchedulerConfig` or `None`. +- `observation_delta_indices` / `action_delta_indices` / `reward_delta_indices` — relative timestep offsets the dataset loader returns per sample (`None` for single-frame, `list(range(self.horizon))` for action-chunking, etc.). + +--- + +## Wiring + +Three places need to know about your policy. All by name. + +1. **`policies/__init__.py`** — re-export `MyPolicyConfig` and add it to `__all__`. **Don't** re-export the modeling class; it loads lazily through the factory (so `import lerobot` stays fast). +2. **`factory.py:get_policy_class`** — add a branch returning `MyPolicy` from a lazy import. +3. **`factory.py:make_policy_config`** and **`factory.py:make_pre_post_processors`** — same idea, two more branches. + +Mirror an existing policy that's structurally similar to yours; the diff is small. + +--- + +## Heavy / optional dependencies + +Most policies need a heavy backbone (transformers, diffusers, a specific VLM SDK). The convention is **two-step gating**: a `TYPE_CHECKING`-guarded import at module top, and a `require_package` runtime check in the constructor. [`modeling_diffusion.py`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/diffusion/modeling_diffusion.py) is the canonical reference: + +```python +from typing import TYPE_CHECKING +from lerobot.utils.import_utils import _diffusers_available, require_package + +if TYPE_CHECKING or _diffusers_available: + from diffusers.schedulers.scheduling_ddim import DDIMScheduler +else: + DDIMScheduler = None # keeps the symbol bindable at import time + +class DiffusionPolicy(PreTrainedPolicy): + def __init__(self, config): + require_package("diffusers", extra="diffusion") + super().__init__(config) + ... +``` + +This way: + +- `import lerobot.policies` keeps working without the extra installed (the symbol is just bound to `None`). +- Type checkers see the real symbol. +- Instantiating the policy without the extra raises a clear `ImportError` pointing at `pip install 'lerobot[diffusion]'`. + +Add a matching extra to [`pyproject.toml`](https://github.com/huggingface/lerobot/blob/main/pyproject.toml) `[project.optional-dependencies]` and include it in the `all` extra so `pip install 'lerobot[all]'` keeps installing everything. + +--- + +## Benchmarks and a published checkpoint + +A new policy is much easier to review — and far more useful — when it ships with a working checkpoint and at least one number you can reproduce. + +**Pick at least one in-tree benchmark.** LeRobot ships sim benchmarks with per-benchmark Docker images (LIBERO, LIBERO-plus, Meta-World, RoboTwin 2.0, RoboCasa365, RoboCerebra, RoboMME, VLABench and more). Pick the one that matches your policy's modality — VLAs usually go to LIBERO or VLABench; image-only BC to LIBERO or Meta-World. The full list lives under [Benchmarks](./libero) in the docs sidebar. + +**Push the checkpoint** to the Hub under `lerobot/_` (or your namespace if you don't have write access; a maintainer can mirror it). Use `PreTrainedPolicy.push_to_hub` so the repo gets `config.json`, `model.safetensors`, and a model card. + +**Report results in your policy's MDX**, with the exact `lerobot-eval` command and hardware so anyone can re-run: + +```markdown +## Results + +Evaluated on LIBERO with `lerobot/_libero`: + +| Suite | Success rate | n_episodes | +| -------------- | -----------: | ---------: | +| libero_spatial | 87.5% | 50 | +| libero_object | 93.0% | 50 | +| libero_goal | 81.5% | 50 | +| libero_10 | 62.0% | 50 | +| **average** | **81.0%** | 200 | + +Reproduce: `lerobot-eval --policy.path=lerobot/_libero --env.type=libero --env.task=libero_spatial --eval.n_episodes=50` (1× A100 40 GB). +``` + +Use `n_episodes ≥ 50` per suite for stable success-rate estimates. + +If your policy is real-robot-only and no sim benchmark applies, swap the sim eval for: a public training dataset on the Hub, the `lerobot-train` command, the checkpoint, and a real-robot success rate over ≥10 episodes via `lerobot-record --policy.path=...`. + +--- + +## PR checklist + +The general expectations are in [`CONTRIBUTING.md`](https://github.com/huggingface/lerobot/blob/main/CONTRIBUTING.md) and the [PR template](https://github.com/huggingface/lerobot/blob/main/.github/PULL_REQUEST_TEMPLATE.md). On top of those, reviewers will look for: + +- [ ] `MyPolicy` and `MyPolicyConfig` cover the surface above; `__init_subclass__` accepts the class. +- [ ] `factory.py` and `policies/__init__.py` are wired (lazy imports for modeling). +- [ ] `make_my_policy_pre_post_processors` follows the naming convention. +- [ ] Optional deps live behind a `[project.optional-dependencies]` extra and the `TYPE_CHECKING + require_package` guard. +- [ ] `tests/policies/` updated; backward-compat artifact committed & policy-specifictests. +- [ ] `src/lerobot/policies//README.md` symlinked into `docs/source/policy__README.md`; user-facing `docs/source/.mdx` written and added to `_toctree.yml`. +- [ ] At least one reproducible benchmark eval in the policy MDX with a published checkpoint (sim benchmark, or real-robot dataset + checkpoint). + +The fastest way to get a clean PR is to copy the directory of the existing policy closest to yours, rename, and replace contents method by method. Don't wait until everything is polished — open a draft PR early and iterate with us; reviewers would much rather give feedback on a half-finished branch than a fully-merged one. + +--- + +## Welcome aboard + +Thanks for taking the time to bring a new policy into LeRobot. Every architecture that lands in `main` makes the library a little more useful for the next person — and a little more representative of where robot learning is going. We're genuinely happy to have you contributing, and looking forward to seeing what you ship. 🤗