lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-06 09:37:06 +00:00

Files

T

Pepijn da7da741f1 fix(profiling): use SGD for pi0/pi05/pi0_fast and free CUDA cache after deterministic forward

Adam optimizer states (exp_avg + exp_avg_sq) require ~16GB extra on top of
model params and gradients for 4B parameter models, exceeding the 22GB GPU.
SGD has zero optimizer state overhead and profiling only measures
forward/backward timing anyway.

Also adds torch.cuda.empty_cache() after deterministic forward to release
transient memory before the training loop starts.

Made-with: Cursor

2026-04-16 16:09:56 +02:00

model_profiling_specs.json

fix(profiling): use SGD for pi0/pi05/pi0_fast and free CUDA cache after deterministic forward

2026-04-16 16:09:56 +02:00