Add basic PEFT support to train script + record module (#1411)

* Add basic support for PEFT adapter methods

This changes adds support for training policies with much less parameters
by applying adapter methods such as LoRA on specific parts of the policies
and therefore possibly higher learning rates / batch sizes.

To make this as accessible as possible I thought it useful to provide
defaults for `target_modules` and `modules_to_save`. Currently only SmolVLA
has such defaults but when we agree that this change is useful I will set
out to generate more such defaults. While the user can override these
settings, they are expected to only change the peft_method, rank and init_type
parameters.

* Implement loading of PEFT adapters

Loading a PEFT adapter is currently done by initializing a policy with default config
and then applying the adapter on the resulting model. This has the obvious drawback
that any configurations done during training are not applied in the adapted model.

Currently the `use_peft` attribute of `PreTrainedConfig` is only set during loading
to signal the following code that it has to deal with a PEFT adapter. However
we could imagine a scenario where this is already set at training time and stored
alongside the adapter.

* Store policy config alongside PEFT checkpoint

Before this change the PEFT-wrapped policy did not save the policy's config
alongside the adapter config / weights which prevented us from changing the
policy config. Now the policy config is saved both in full training and PEFT
training.

This change makes loading the PEFT policy adapter much easier as well.

* Add default config for ACT

* Support targets like `all-linear`

* Formatting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix failing tests

* Remove PEFT compatibility changes in config

We'll wait for the PEFT release that fixes this for good.

* Remove `use_peft` parameter from training script

Instead we make the PEFT config optional which has the same effect.

* Log adapter config to WandB

* Better documentation for CLI arguments

* Don't unload & merge the PEFT model

This can make things hard when using quantized layers (user expects quantized base layers with
unquantized adapters for example, merging defaults to upcast the layers leading to higher
memory).

* Correct way of identifying when to save config

* Add CLI end-to-end tests

Currently there don't seem to be any way to test the CLI commands.
Since this change mostly happens in those I thought it best to add
a way to test these commands end-to-end.

More integrated commands like `lerobot-record` need patching but
standalone commands like training seem to work fine.

* Update default targets

Removed ACT since it doesn't make sense to fine-tune ACT without having it pretrained beforehand.
SmolVLA and Pi0/0.5 are much more senseful targets.

* Clean up loading code

- Centralized instantiation of the PEFT wrapper in `make_policy` for inference
  (e.g. in `lerobot-record`)
- Training a PEFT policy also sets `cfg.use_peft` so that all inference code loading
  the policy can rely on that attribute to identify if PEFT loading is needed
- Modified RTC example to also include PEFT policies. Mostly because this is an example
  I'm currently exploring.

* Make sure push_to_hub works

Since PEFT only wraps `push_to_hub` and not `push_model_to_hub`, the reference
to `self` in `policy.push_model_to_hub` is the unwrapped policy which, of course,
doesn't know anything about PEFT.

To make the upload process aware of PEFT, we pass the unwrapped policy down to
`push_model_to_hub` as a kwarg. This is not ideal but I think it is the best way
for now.

* formatting

* Warn when encountering from-scratch-training

* Revamp pretrained model loading

There were quite a few factors that convinced me that the status quo
is able to load pretrained models from the PEFT adapter config but
in fact that didn't work.

This commit fixes the following things:
- policies wrapped in PEFT will now have a `name_or_path` attribute
  containing the name or path of the pretrained model we're fine-tuning
- we further assume that SmolVLA without `pretrained_path` and
  `load_vlm_weights==False` must be an user-side error
- we assume that using PEFT on from-scratch-policies must be
  an user-side-error

* Make it possible to unset policy features

This is necessary to train pre-trained policies on new datasets so that the
features are inferred from the new dataset and not from the pretrained
policy.

* Use correct loading for PEFT in RTC example

* Make it possible to use PeftModels in eval

* Add test checking that PEFT actually reduces params

* Adapt state/action projections instead of full-finetuning

There doesn't seem to be a benefit to fully fine-tune these layers
over just adapting them, so we do that instead.

* Disallow PEFT training on non-pretrained policies

At first I thought it would make sense to have this feature
in case you want to fine-tune a pre-trained section but in the
end it makes more trouble than it's worth.

It's still possible to allow this in the future when a concrete
need arises.

* Add basic documentation

* Formatting

* Add peft as extra dependency, mark tests

Fast tests currently fail because of the missing dependency.

* Fix pre-commit issues

* Add walx <> peft conflict for uv

* Exclude peft from pi install for now

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
This commit is contained in:
githubnemo
2026-01-05 08:51:26 +01:00
committed by GitHub
parent 75ab388ecd
commit e670ac5daf
16 changed files with 548 additions and 10 deletions
+28
View File
@@ -67,3 +67,31 @@ class EvalConfig:
f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={self.batch_size}`), "
f"or lower the batch size (e.g. `eval.batch_size={self.n_episodes}`)."
)
@dataclass
class PeftConfig:
# PEFT offers many fine-tuning methods, layer adapters being the most common and currently also the most
# effective methods so we'll focus on those in this high-level config interface.
# Either a string (module name suffix or 'all-linear'), a list of module name suffixes or a regular expression
# describing module names to target with the configured PEFT method. Some policies have a default value for this
# so that you don't *have* to choose which layers to adapt but it might still be worthwhile depending on your case.
target_modules: list[str] | str | None = None
# Names/suffixes of modules to fully fine-tune and store alongside adapter weights. Useful for layers that are
# not part of a pre-trained model (e.g., action state projections). Depending on the policy this defaults to layers
# that are newly created in pre-trained policies. If you're fine-tuning an already trained policy you might want
# to set this to `[]`. Corresponds to PEFT's `modules_to_save`.
full_training_modules: list[str] | None = None
# The PEFT (adapter) method to apply to the policy. Needs to be a valid PEFT type.
method_type: str = "LORA"
# Adapter initialization method. Look at the specific PEFT adapter documentation for defaults.
init_type: str | None = None
# We expect that all PEFT adapters are in some way doing rank-decomposition therefore this parameter specifies
# the rank used for the adapter. In general a higher rank means more trainable parameters and closer to full
# fine-tuning.
r: int = 16
+14 -2
View File
@@ -55,14 +55,18 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
n_obs_steps: int = 1
input_features: dict[str, PolicyFeature] = field(default_factory=dict)
output_features: dict[str, PolicyFeature] = field(default_factory=dict)
# `input_features` can be set to None/null in order to infer those values from the dataset.
input_features: dict[str, PolicyFeature] | None = field(default_factory=dict)
output_features: dict[str, PolicyFeature] | None = field(default_factory=dict)
device: str | None = None # e.g. "cuda", "cuda:0", "cpu", or "mps"
# `use_amp` determines whether to use Automatic Mixed Precision (AMP) for training and evaluation. With AMP,
# automatic gradient scaling is used.
use_amp: bool = False
# Whether the policy employed PEFT for training.
use_peft: bool = False
push_to_hub: bool = True # type: ignore[assignment] # TODO: use a different name to avoid override
repo_id: str | None = None
@@ -125,6 +129,8 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def robot_state_feature(self) -> PolicyFeature | None:
if not self.input_features:
return None
for ft_name, ft in self.input_features.items():
if ft.type is FeatureType.STATE and ft_name == OBS_STATE:
return ft
@@ -132,6 +138,8 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def env_state_feature(self) -> PolicyFeature | None:
if not self.input_features:
return None
for _, ft in self.input_features.items():
if ft.type is FeatureType.ENV:
return ft
@@ -139,10 +147,14 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def image_features(self) -> dict[str, PolicyFeature]:
if not self.input_features:
return {}
return {key: ft for key, ft in self.input_features.items() if ft.type is FeatureType.VISUAL}
@property
def action_feature(self) -> PolicyFeature | None:
if not self.output_features:
return None
for ft_name, ft in self.output_features.items():
if ft.type is FeatureType.ACTION and ft_name == ACTION:
return ft
+2 -1
View File
@@ -24,7 +24,7 @@ from huggingface_hub.errors import HfHubHTTPError
from lerobot import envs
from lerobot.configs import parser
from lerobot.configs.default import DatasetConfig, EvalConfig, WandBConfig
from lerobot.configs.default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
from lerobot.configs.policies import PreTrainedConfig
from lerobot.optim import OptimizerConfig
from lerobot.optim.schedulers import LRSchedulerConfig
@@ -65,6 +65,7 @@ class TrainPipelineConfig(HubMixin):
scheduler: LRSchedulerConfig | None = None
eval: EvalConfig = field(default_factory=EvalConfig)
wandb: WandBConfig = field(default_factory=WandBConfig)
peft: PeftConfig | None = None
# RA-BC (Reward-Aligned Behavior Cloning) parameters
use_rabc: bool = False # Enable reward-weighted training
+30 -1
View File
@@ -471,11 +471,40 @@ def make_policy(
if ds_meta is not None:
kwargs["dataset_meta"] = ds_meta
if cfg.pretrained_path:
if not cfg.pretrained_path and cfg.use_peft:
raise ValueError(
"Instantiating a policy with `use_peft=True` without a checkpoint is not supported since that requires "
"the PEFT config parameters to be set. For training with PEFT, see `lerobot_train.py` on how to do that."
)
if cfg.pretrained_path and not cfg.use_peft:
# Load a pretrained policy and override the config if needed (for example, if there are inference-time
# hyperparameters that we want to vary).
kwargs["pretrained_name_or_path"] = cfg.pretrained_path
policy = policy_cls.from_pretrained(**kwargs)
elif cfg.pretrained_path and cfg.use_peft:
# Load a pretrained PEFT model on top of the policy. The pretrained path points to the folder/repo
# of the adapter and the adapter's config contains the path to the base policy. So we need the
# adapter config first, then load the correct policy and then apply PEFT.
from peft import PeftConfig, PeftModel
logging.info("Loading policy's PEFT adapter.")
peft_pretrained_path = cfg.pretrained_path
peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
kwargs["pretrained_name_or_path"] = peft_config.base_model_name_or_path
if not kwargs["pretrained_name_or_path"]:
# This means that there's a bug or we trained a policy from scratch using PEFT.
# It is more likely that this is a bug so we'll raise an error.
raise ValueError(
"No pretrained model name found in adapter config. Can't instantiate the pre-trained policy on which "
"the adapter was trained."
)
policy = policy_cls.from_pretrained(**kwargs)
policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
else:
# Make a fresh policy.
policy = policy_cls(**kwargs)
+9 -1
View File
@@ -206,6 +206,7 @@ class PreTrainedPolicy(nn.Module, HubMixin, abc.ABC):
def push_model_to_hub(
self,
cfg: TrainPipelineConfig,
peft_model=None,
):
api = HfApi()
repo_id = api.create_repo(
@@ -216,7 +217,14 @@ class PreTrainedPolicy(nn.Module, HubMixin, abc.ABC):
with TemporaryDirectory(ignore_cleanup_errors=True) as tmp:
saved_path = Path(tmp) / repo_id
self.save_pretrained(saved_path) # Calls _save_pretrained and stores model tensors
if peft_model is not None:
# Since PEFT just forwards calls to `push_model_to_hub`, `self` is not the PeftModel wrapper
# but the actual policy which is why we need the PEFT model passed to us to save the adapter.
# That also means that we need to store the policy config ourselves since PEFT can't.
peft_model.save_pretrained(saved_path)
self.config.save_pretrained(saved_path)
else:
self.save_pretrained(saved_path) # Calls _save_pretrained and stores model tensors
card = self.generate_model_card(
cfg.dataset.repo_id, self.config.type, self.config.license, self.config.tags
+26 -1
View File
@@ -112,7 +112,32 @@ class WandBLogger:
artifact_name = f"{self._group}-{step_id}"
artifact_name = get_safe_wandb_artifact_name(artifact_name)
artifact = self._wandb.Artifact(artifact_name, type="model")
artifact.add_file(checkpoint_dir / PRETRAINED_MODEL_DIR / SAFETENSORS_SINGLE_FILE)
pretrained_model_dir = checkpoint_dir / PRETRAINED_MODEL_DIR
# Check if this is a PEFT model (has adapter files instead of model.safetensors)
adapter_model_file = pretrained_model_dir / "adapter_model.safetensors"
standard_model_file = pretrained_model_dir / SAFETENSORS_SINGLE_FILE
if adapter_model_file.exists():
# PEFT model: add adapter files and configs
artifact.add_file(adapter_model_file)
adapter_config_file = pretrained_model_dir / "adapter_config.json"
if adapter_config_file.exists():
artifact.add_file(adapter_config_file)
# Also add the policy config which is needed for loading
config_file = pretrained_model_dir / "config.json"
if config_file.exists():
artifact.add_file(config_file)
elif standard_model_file.exists():
# Standard model: add the single safetensors file
artifact.add_file(standard_model_file)
else:
logging.warning(
f"No {SAFETENSORS_SINGLE_FILE} or adapter_model.safetensors found in {pretrained_model_dir}. "
"Skipping model artifact upload to WandB."
)
return
self._wandb.log_artifact(artifact)
def log_dict(
+8 -1
View File
@@ -278,9 +278,16 @@ def eval_policy(
raise ValueError("If max_episodes_rendered > 0, videos_dir must be provided.")
if not isinstance(policy, PreTrainedPolicy):
raise ValueError(
exc = ValueError(
f"Policy of type 'PreTrainedPolicy' is expected, but type '{type(policy)}' was provided."
)
try:
from peft import PeftModel
if not isinstance(policy, PeftModel):
raise exc
except ImportError:
raise exc from None
start = time.time()
policy.eval()
+2
View File
@@ -193,8 +193,10 @@ class RecordConfig:
def __post_init__(self):
# HACK: We parse again the cli args here to get the pretrained path if there was one.
policy_path = parser.get_path_arg("policy")
if policy_path:
cli_overrides = parser.get_cli_overrides("policy")
self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
self.policy.pretrained_path = policy_path
+95 -1
View File
@@ -13,6 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import dataclasses
import logging
import time
from contextlib import nullcontext
@@ -147,6 +148,92 @@ def update_policy(
return train_metrics, output_dict
def get_default_peft_configuration(policy_type):
"""Build a basic PEFT configuration for the given policy type assuming that we train a policy from a checkpoint."""
common_projections = "state_proj|action_in_proj|action_out_proj|action_time_mlp_in|action_time_mlp_out"
if policy_type == "smolvla":
return {
"target_modules": rf"(model\.vlm_with_expert\.lm_expert\..*\.(q|v)_proj|model\.({common_projections}))",
"modules_to_save": [],
}
elif policy_type in ("pi0", "pi05"):
return {
"target_modules": rf"(.*\.gemma_expert\..*\.self_attn.(q|v)_proj|model\.({common_projections}))",
"modules_to_save": [],
}
return {"modules_to_save": None}
def wrap_policy_in_peft_model(cfg, policy):
from peft import PEFT_TYPE_TO_CONFIG_MAPPING, PeftType, get_peft_model
# Disable all gradients because we'll only train the parameters selected by the PEFT method.
# Layers that should receive gradients anyway need to be listed in `modules_to_save`.
for p in policy.parameters():
p.requires_grad_(False)
if not cfg.policy.pretrained_path:
raise ValueError(
"Training from scratch using PEFT. This is unlikely to yield good results. "
"Supply a `policy.path` to fine-tune an existing model."
)
if cfg.policy.type == "smolvla" and not cfg.policy.load_vlm_weights:
logging.warning(
"Training SmolVLA from scratch using PEFT. This is unlikely to yield good results. Set "
"`load_vlm_weights=True` to fine-tune the existing policy."
)
peft_config_policy = get_default_peft_configuration(cfg.policy.type)
peft_config_cli = dataclasses.asdict(cfg.peft) if cfg.peft else {}
peft_config_cli["modules_to_save"] = peft_config_cli["full_training_modules"] # compatibility with PEFT
peft_method_type = PeftType[peft_config_cli["method_type"].upper()]
peft_config_cls = PEFT_TYPE_TO_CONFIG_MAPPING[peft_method_type]
# Handle specific CLI overrides
for key in ["target_modules", "modules_to_save", "r"]:
if peft_config_cli[key] is not None:
peft_config_policy[key] = peft_config_cli[key]
if "target_modules" not in peft_config_policy:
raise ValueError(
f"There is no default `target_modules` value for policy {cfg.policy.type}. Please pass it manually."
)
# Init method depends on the used PEFT method, your specific PEFT method
# might not be considered here, in that case an error is raised.
if peft_config_cli["init_type"] is not None:
if peft_method_type == "LORA":
peft_config_policy["init_lora_weights"] = peft_config_cli["init_type"]
elif peft_method_type == "MISS":
peft_config_policy["init_weights"] = peft_config_cli["init_type"]
else:
raise ValueError(
f"Init type {peft_config_cli['init_type']} unknown for PEFT method {peft_method_type}."
)
# PEFT uses this attribute to set adapter_config.base_name_or_path which we use for loading the
# correct base model in `make_policy` since in a PEFT loading setting we only get the path to the
# adapter, not the base model.
if policy.config.pretrained_path:
policy.name_or_path = str(policy.config.pretrained_path)
# Finally wrap the policy in a PEFT model
policy = get_peft_model(
policy,
peft_config_cls(**peft_config_policy),
)
# Make sure that the config is tagged as using PEFT so that the loading code can take the
# appropriate steps to use the adapter weights and the PEFT config instead of the full model weights.
policy.config.use_peft = True
return policy
@parser.wrap()
def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
"""
@@ -230,6 +317,10 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
rename_map=cfg.rename_map,
)
if cfg.peft is not None:
logging.info("Using PEFT! Wrapping model.")
policy = wrap_policy_in_peft_model(cfg, policy)
# Wait for all processes to finish policy creation before continuing
accelerator.wait_for_everyone()
@@ -502,7 +593,10 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
if cfg.policy.push_to_hub:
unwrapped_policy = accelerator.unwrap_model(policy)
unwrapped_policy.push_model_to_hub(cfg)
if cfg.policy.use_peft:
unwrapped_policy.push_model_to_hub(cfg, peft_model=unwrapped_policy)
else:
unwrapped_policy.push_model_to_hub(cfg)
preprocessor.push_to_hub(cfg.policy.repo_id)
postprocessor.push_to_hub(cfg.policy.repo_id)
+4
View File
@@ -99,6 +99,10 @@ def save_checkpoint(
pretrained_dir = checkpoint_dir / PRETRAINED_MODEL_DIR
policy.save_pretrained(pretrained_dir)
cfg.save_pretrained(pretrained_dir)
if cfg.peft is not None:
# When using PEFT, policy.save_pretrained will only write the adapter weights + config, not the
# policy config which we need for loading the model. In this case we'll write it ourselves.
policy.config.save_pretrained(pretrained_dir)
if preprocessor is not None:
preprocessor.save_pretrained(pretrained_dir)
if postprocessor is not None: