Unify the profiling subsystem into one file per reviewer request.
Before (4 files):
src/lerobot/utils/profiling_utils.py 399 LOC
scripts/ci/run_model_profiling.py 337 LOC
profiling/model_profiling_specs.json 181 LOC
tests/scripts/test_model_profiling.py 423 LOC
After (2 files):
src/lerobot/utils/model_profiling.py 758 LOC — TrainingProfiler +
CI orchestrator +
POLICY_SPECS (inline)
tests/test_model_profiling.py 315 LOC
Net: -267 LOC and 4 files → 2. All functionality preserved: per-step
forward/backward/optimizer timings, torch profiler tables + chrome
traces, deterministic-forward fingerprint, HF Hub result upload, and
the same CLI surface.
Changes:
- Collapse `_StepTimingCollector` into inline attributes on
`TrainingProfiler` (no separate class).
- Drop `ProfilingSpec` dataclass; specs are plain dicts.
- Inline the JSON matrix as a module-level `POLICY_SPECS` dict —
one less file to keep in sync with the training args.
- CI workflow invokes `python -m lerobot.utils.model_profiling` in
place of the standalone script.
- Tests import `lerobot.utils.model_profiling` directly instead of
loading a script-by-path. Removed JSON schema tests that no
longer apply.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the `with profiler or nullcontext():` wrap around the entire
training loop with explicit `profiler.start()` / `profiler.finalize()`
calls, and tighten `_section(...)` regions in `update_policy` to only
wrap the hot calls (forward / backward / optimizer.step).
This avoids ~120 lines of pure re-indentation noise while keeping the
exact same artifacts on disk and the same public behavior.
lerobot_train.py diff vs main: 267 -> 29 changed lines.
Made-with: Cursor
The dashboard expects per-phase timings (forward_s, backward_s,
optimizer_s) in step_timing_summary.json, but only total_update_s
and dataloading_s were collected — leaving every chart except
dataloading empty.
Add a lightweight TrainingProfiler.section(name) context manager
that times a region with torch.cuda.synchronize before and after
(so GPU work is captured, not just the kernel-launch latency) and
accumulates per-section samples into step_timing_summary.json.
Wrap forward, backward (incl. grad clip), and optimizer (incl.
zero_grad and scheduler.step) in update_policy with these sections.
When profiling is off (profiler=None) the wrappers become no-ops,
so training performance is unchanged outside CI.
Made-with: Cursor
Move all profiling orchestration out of lerobot_train.py and
TrainPipelineConfig into a TrainingProfiler class in profiling_utils.py.
- lerobot_train.py: ~74 lines of profiling code reduced to ~7 call sites
- TrainPipelineConfig: 10 profile_* fields reduced to 2 (mode + output_dir)
- update_policy: reverted to clean main-branch signature (no timing_collector)
- TrainingProfiler encapsulates torch profiler, timing collection,
deterministic forward artifacts, and all output writing
- CI script (run_model_profiling.py) unchanged—it only passes the 2 kept fields
Made-with: Cursor
- Move cudnn_deterministic to per-spec train_args instead of hardcoding
it for all models. cuBLAS deterministic mode triggers internal errors
on Gemma-based models (pi0, pi05) during backward pass.
- Enable use_amp=true for pi0, pi0_fast, and pi05 to reduce memory
footprint from fp32 (~16GB weights alone) to bf16, fitting within
22GB GPU budget with room for activations and gradients.
- Small models (act, diffusion, multi_task_dit) still use deterministic
mode for reproducible profiling results.
Made-with: Cursor
Remove cProfile wrapping from the training loop and profiling utilities.
The torch profiler already captures fine-grained timing and operator
breakdowns; cProfile added redundant overhead without actionable
insight for GPU-bound models.
- Remove render_cprofile_summary, run_with_cprofile from profiling_utils
- Replace cProfile-wrapped calls in lerobot_train with direct calls
- Remove cprofile_summaries from artifact index in run_model_profiling
- Update tests to match
Made-with: Cursor
* fix(root): adding proper support for the root and new_root arguments
* feat(roots): adding a roots agrument for the merge operation
* chore(clean): cleaning up code
* chore(doctrings): updating doctrings with new features
* fix(repo_id): setting repo_id to None when not needed
* fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation
* fix(path): fixing path related issues
* fix(repo_id): fixing issues related to repo_id
* chore(doctrings): updating docstrings + fix typo
* chore(clean): cleaning code
* fix(split new_repo_id): reverting new_repo_id addition for split operation
* docs(dosctrings): completing docstrings
* fix(repo_ids/roots): improving checks for repo_ids/roots lengths
* fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given
* fix(docstrings): fixing docstrings for split operation
* fix(hints): updating get_output_path hints to accept paths as strings too
* fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset
* fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id
* fix(typo): fixing typo in doctrings
* Add New featrue to lerobot_edit_datset.py that show dataset information.
* Fix to draccus error when happen give only --operation.type=info
* Updating test and documents regarding lerobot-edit-dataset info function.
* Updating documents regarding lerobot-edit-dataset extract function. option name in document is mistake.
* feat(datasets): Update to align formatting with pre-commit.(#2917)
Update to align formatting by pre-commit.
---------
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
- Add `tests/scripts/save_policy_to_safetensor.py` to generate test artifacts
- Add `test_backward_compatibility to test generated outputs from the policies against artifacts