lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-13 15:49:53 +00:00

Author	SHA1	Message	Date
Pepijn	a515eadc96	refactor(profiling): consolidate into single module Unify the profiling subsystem into one file per reviewer request. Before (4 files): src/lerobot/utils/profiling_utils.py 399 LOC scripts/ci/run_model_profiling.py 337 LOC profiling/model_profiling_specs.json 181 LOC tests/scripts/test_model_profiling.py 423 LOC After (2 files): src/lerobot/utils/model_profiling.py 758 LOC — TrainingProfiler + CI orchestrator + POLICY_SPECS (inline) tests/test_model_profiling.py 315 LOC Net: -267 LOC and 4 files → 2. All functionality preserved: per-step forward/backward/optimizer timings, torch profiler tables + chrome traces, deterministic-forward fingerprint, HF Hub result upload, and the same CLI surface. Changes: - Collapse `_StepTimingCollector` into inline attributes on `TrainingProfiler` (no separate class). - Drop `ProfilingSpec` dataclass; specs are plain dicts. - Inline the JSON matrix as a module-level `POLICY_SPECS` dict — one less file to keep in sync with the training args. - CI workflow invokes `python -m lerobot.utils.model_profiling` in place of the standalone script. - Tests import `lerobot.utils.model_profiling` directly instead of loading a script-by-path. Removed JSON schema tests that no longer apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 21:31:17 +02:00
Pepijn	1ac8e96575	refactor(profiling): shrink lerobot_train.py diff via start()/finalize() Replace the `with profiler or nullcontext():` wrap around the entire training loop with explicit `profiler.start()` / `profiler.finalize()` calls, and tighten `_section(...)` regions in `update_policy` to only wrap the hot calls (forward / backward / optimizer.step). This avoids ~120 lines of pure re-indentation noise while keeping the exact same artifacts on disk and the same public behavior. lerobot_train.py diff vs main: 267 -> 29 changed lines. Made-with: Cursor	2026-04-17 10:59:43 +01:00
Pepijn	1842100402	feat(profiling): record forward/backward/optimizer timings The dashboard expects per-phase timings (forward_s, backward_s, optimizer_s) in step_timing_summary.json, but only total_update_s and dataloading_s were collected — leaving every chart except dataloading empty. Add a lightweight TrainingProfiler.section(name) context manager that times a region with torch.cuda.synchronize before and after (so GPU work is captured, not just the kernel-launch latency) and accumulates per-section samples into step_timing_summary.json. Wrap forward, backward (incl. grad clip), and optimizer (incl. zero_grad and scheduler.step) in update_policy with these sections. When profiling is off (profiler=None) the wrappers become no-ops, so training performance is unchanged outside CI. Made-with: Cursor	2026-04-16 20:26:27 +02:00
Pepijn	b1e16783de	refactor: extract profiling into self-contained TrainingProfiler class Move all profiling orchestration out of lerobot_train.py and TrainPipelineConfig into a TrainingProfiler class in profiling_utils.py. - lerobot_train.py: ~74 lines of profiling code reduced to ~7 call sites - TrainPipelineConfig: 10 profile_* fields reduced to 2 (mode + output_dir) - update_policy: reverted to clean main-branch signature (no timing_collector) - TrainingProfiler encapsulates torch profiler, timing collection, deterministic forward artifacts, and all output writing - CI script (run_model_profiling.py) unchanged—it only passes the 2 kept fields Made-with: Cursor	2026-04-16 16:00:49 +02:00
Pepijn	dbe01b0444	fix(profiling): fix pi0 cuBLAS error and pi05 OOM on 22GB GPU - Move cudnn_deterministic to per-spec train_args instead of hardcoding it for all models. cuBLAS deterministic mode triggers internal errors on Gemma-based models (pi0, pi05) during backward pass. - Enable use_amp=true for pi0, pi0_fast, and pi05 to reduce memory footprint from fp32 (~16GB weights alone) to bf16, fitting within 22GB GPU budget with room for activations and gradients. - Small models (act, diffusion, multi_task_dit) still use deterministic mode for reproducible profiling results. Made-with: Cursor	2026-04-16 15:34:17 +02:00
Pepijn	e16a95a78e	refactor(profiling): remove cProfile, keep torch profiler only Remove cProfile wrapping from the training loop and profiling utilities. The torch profiler already captures fine-grained timing and operator breakdowns; cProfile added redundant overhead without actionable insight for GPU-bound models. - Remove render_cprofile_summary, run_with_cprofile from profiling_utils - Replace cProfile-wrapped calls in lerobot_train with direct calls - Remove cprofile_summaries from artifact index in run_model_profiling - Update tests to match Made-with: Cursor	2026-04-16 15:34:17 +02:00
Pepijn	4137b5785d	fix(profiling): align libero smoke specs with pretrained policies	2026-04-16 15:11:54 +02:00
Pepijn	6d1a5fca02	fix(profiling): keep ci green when hub publish is unauthorized	2026-04-16 13:07:30 +02:00
Pepijn	8d7099cd7d	fix(profiling): publish preview runs via hf dataset prs	2026-04-16 12:50:57 +02:00
Pepijn	40470648d1	feat(profiling): publish preview runs for dashboard debugging	2026-04-16 10:54:34 +02:00
Pepijn	25e5062b2c	fix(profiling): read generic device timings from profiler	2026-04-16 10:29:01 +02:00
Pepijn	35e3b28da1	fix(profiling): normalize timing metrics before export	2026-04-16 10:11:14 +02:00
Pepijn	ed8a98dda6	fix(profiling): preserve policy mode for deterministic forward	2026-04-16 09:50:29 +02:00
Pepijn	1a2aec1b04	feat(profiling): add weekly model profiling	2026-04-15 22:31:44 +02:00
Steven Palma	df0763a2bc	feat(dependencies): minimal default tag install (#3362 )	2026-04-12 20:03:04 +02:00
Caroline Pascal	63dca86df8	fix(dataset edit tools): clarifying `root` argument usage + adding related features (#3049 ) * fix(root): adding proper support for the root and new_root arguments * feat(roots): adding a roots agrument for the merge operation * chore(clean): cleaning up code * chore(doctrings): updating doctrings with new features * fix(repo_id): setting repo_id to None when not needed * fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation * fix(path): fixing path related issues * fix(repo_id): fixing issues related to repo_id * chore(doctrings): updating docstrings + fix typo * chore(clean): cleaning code * fix(split new_repo_id): reverting new_repo_id addition for split operation * docs(dosctrings): completing docstrings * fix(repo_ids/roots): improving checks for repo_ids/roots lengths * fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given * fix(docstrings): fixing docstrings for split operation * fix(hints): updating get_output_path hints to accept paths as strings too * fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset * fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id * fix(typo): fixing typo in doctrings	2026-03-03 15:40:46 +01:00
masato-ka	51d3822d75	feat(datasets): Add info operation to lerobot-edit-dataset command (#2917 ) * Add New featrue to lerobot_edit_datset.py that show dataset information. * Fix to draccus error when happen give only --operation.type=info * Updating test and documents regarding lerobot-edit-dataset info function. * Updating documents regarding lerobot-edit-dataset extract function. option name in document is mistake. * feat(datasets): Update to align formatting with pre-commit.(#2917) Update to align formatting by pre-commit. --------- Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-02-17 20:09:42 +01:00
Caroline Pascal	adebbcf090	fix(dataset tools draccus): fixing draccus parsing for dataset edit operation type specification (#2949 ) * fix(edit dataset operation): fixing dataset tools CLI operation type specification * test(edit dataset operation): adding tests for dataset tools operation type specification * chore(format): running pre-commit * chore(backward compatibility): adding a type property in OperationConfig for backward compatibility Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-02-12 18:56:04 +01:00
Simon Alibert	974028bd28	Organize test folders (#856 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2025-03-13 14:05:55 +01:00
Steven Palma	5e9473806c	refactor(config): Move device & amp args to PreTrainedConfig (#812 ) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>	2025-03-06 17:59:28 +01:00
Simon Alibert	659ec4434d	Fix nightly (#775 )	2025-02-26 16:36:03 +01:00
Simon Alibert	c4c2ce04e7	Update pre-commits (#733 )	2025-02-15 15:51:17 +01:00
Simon Alibert	90e099b39f	Remove offline training, refactor `train.py` and logging/checkpointing (#670 ) Co-authored-by: Remi <remi.cadene@huggingface.co>	2025-02-11 10:36:06 +01:00
Remi	638d411cd3	Add Pi0 (#681 ) Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>	2025-02-04 18:01:04 +01:00
Simon Alibert	3c0a209f9f	Simplify configs (#550 ) Co-authored-by: Remi <remi.cadene@huggingface.co> Co-authored-by: HUANG TZU-CHUN <137322177+tc-huang@users.noreply.github.com>	2025-01-31 13:57:37 +01:00
Simon Alibert	32eb0cec8f	Dataset v2.0 (#461 ) Co-authored-by: Remi <remi.cadene@huggingface.co>	2024-11-29 19:04:00 +01:00
Michel Aractingi	eb4c505cff	Support for converting OpenX datasets from RLDS format to LeRobotDataset (#354 ) Signed-off-by: youliangtan <tan_you_liang@hotmail.com> Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Co-authored-by: youliangtan <tan_you_liang@hotmail.com> Co-authored-by: Remi <re.cadene@gmail.com>	2024-08-27 09:07:00 +02:00
Alexander Soare	f8a6574698	Add online training with TD-MPC as proof of concept (#338 )	2024-07-25 11:16:38 +01:00
Simon Alibert	0b21210d72	Convert datasets to av1 encoding (#302 )	2024-07-22 20:08:59 +02:00
Alexander Soare	342f429f1c	Add test to make sure policy dataclass configs match yaml configs (#292 )	2024-06-26 09:09:40 +01:00
Thomas Wolf	48951662f2	Bug fix: missing attention mask in VAE encoder in ACT policy (#279 ) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>	2024-06-19 12:07:21 +01:00
Marina Barannikov	ff8f6aa6cd	Add data augmentation in LeRobotDataset (#234 ) Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Co-authored-by: Remi Cadene <re.cadene@gmail.com>	2024-06-11 19:20:55 +02:00
Remi	d585c73f9f	Add real-world support for ACT on Aloha/Aloha2 (#228 ) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>	2024-05-31 15:31:02 +02:00
Remi	01eae09ba6	Fix aloha real-world datasets (#175 )	2024-05-20 13:48:09 +02:00
Simon Alibert	f52f4f2cd2	Add copyrights (#157 )	2024-05-15 12:13:09 +02:00
Alexander Soare	f3bba0270d	Remove EMA model from Diffusion Policy (#134 )	2024-05-05 11:26:12 +01:00
Simon Alibert	c77633c38c	Add regression tests (#119 ) - Add `tests/scripts/save_policy_to_safetensor.py` to generate test artifacts - Add `test_backward_compatibility to test generated outputs from the policies against artifacts	2024-05-04 16:20:30 +02:00
Remi	19812ca470	Add dataset visualization with rerun.io (#131 ) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>	2024-05-04 16:07:14 +02:00
Remi	b2cda12f87	Add video decoding to LeRobotDataset (#92 )	2024-05-03 00:50:19 +02:00
Remi	e4e739f4f8	Refactor push_dataset_to_hub (#118 )	2024-04-30 14:25:41 +02:00
Adil Zouitine	55dc9f7f51	Refactor the download and publication of the datasets and convert it into CLI script (#95 ) Co-authored-by: Remi <re.cadene@gmail.com>	2024-04-29 00:08:17 +02:00
Remi	659c69a1c0	Refactor datasets into LeRobotDataset (#91 ) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>	2024-04-25 12:23:12 +02:00
Remi	1030ea0070	Loads episode_data_index and stats during dataset __init__ (#85 ) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>	2024-04-23 14:13:25 +02:00
Remi	0928afd37d	Improve dataset examples (#82 ) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>	2024-04-18 11:43:16 +02:00
Cadene	70aaf1c4cb	test_datasets.py are passing!	2024-04-08 14:16:57 +00:00
Cadene	5af00d0c1e	fix train.py, stats, eval.py (training is running)	2024-04-05 09:31:39 +00:00
Cadene	e799dc5e3f	Improve mock_dataset	2024-03-19 16:38:07 +00:00
Cadene	6a1a29386a	Add replay_buffer directory in pusht datasets + aloha (WIP)	2024-03-19 15:49:45 +00:00
Cadene	f440a681ad	Add mock_dataset.py	2024-03-09 15:36:20 +01:00
Cadene	35bd577deb	Add mock_dataset.py	2024-03-09 15:36:20 +01:00

50 Commits