lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-17 16:27:04 +00:00

Author	SHA1	Message	Date
Steven Palma	5753f8c18b	fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth). GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images on the single CPU main-loop thread, and that cost is timed inside dataloading_s (preprocessor(batch) runs in the main process, not the dataloader workers), so adding workers cannot hide it. - Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify on config.device (GPU) when available. Bit-identical on CPU when no resize is configured; with a resize only the PIL->torchvision bicubic backend differs (<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box without the saved device falls back to CPU. - Default image_target_size/crop to the N1.7 backbone's training geometry (256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new embodiment). Previously image_target_size=None disabled the resize, so full-resolution frames were patchified into ~4.7x more vision tokens than the model was trained on -- inflating dataloading_s (patchify) and update_s (VLM sequence) and skewing the input distribution. Checkpoints that pin their own sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS. Net: preprocessing leaves the CPU critical path and the VLM sees the resolution it was trained on -- faster training/inference and a correct train/serve distribution. Affects inference too (shared preprocessor); existing checkpoints still load (backward compatible) but must be retrained to gain the benefits.	2026-06-15 18:20:49 +02:00
Kartik	97bd373d15	Merge pull request #15 from huggingface/fix/groot_n17_core fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes	2026-06-13 23:05:51 +02:00
Kartik	10a73e3c95	Merge pull request #14 from huggingface/fix/groot_n17_backbone fix(groot): N1.7 backbone loading and DiT parameter-count logging	2026-06-13 21:47:35 +02:00
Kartik	27c9288b24	Merge pull request #13 from huggingface/fix/groot_n17_docs docs(groot): document the N1.5 removal and the N1.7 parity test	2026-06-13 21:47:05 +02:00
Steven Palma	378897800a	fix(groot): skip normalization overrides for training	2026-06-13 19:51:29 +02:00
Steven Palma	fcb371eddd	fix(groot): N1.7 config defaults, N1.5 rejection, and processor/model runtime fixes Covers the GR00T N1.7 source trio (configuration, processor, model wrapper). Config: - GrootConfig defaults are the N1.7 values; explicitly passed legacy N1.5-era values (chunk_size=50, max_state_dim=64, ...) are remapped with a warning instead of silently. - action_decode_transform gains an 'auto' sentinel so an explicit 'none' opt-out wins over the libero_sim default and survives save/load round-trips. - action_delta_indices is cached on the inputs that determine it. - Legacy N1.5 checkpoints/configs (tokenizer_assets_repo, model_type/ architectures/eagle backbone markers) are rejected with a single clear error pointing to lerobot==0.5.1. Processor: - GrootN17ActionDecodeStep handles the 2-D (B, D) actions delivered by sync select_action (relative eef/non-eef decode in eval/record flows). - Postprocessor falls back to dataset stats when a raw checkpoint lacks the configured embodiment tag; raw-state cache is per-instance, not process-global; caller overrides (device, rename_map) are honored on the raw-checkpoint branch. - Camera/modality-key mismatches warn (including the zero-match fallback); deprecated Qwen2VLImageProcessorFast replaced with Qwen2VLImageProcessor; removed N1.5 processor steps are stubbed to raise the removal guidance and the action-unpack step is re-registered as _v2. Model: - Flash-attention probe is diagnostic-only; forward raises on a missing loss; print() replaced with logging; N1.5 base-path mismatch includes the removal guidance.	2026-06-13 18:30:21 +02:00
Steven Palma	895eaf0d7c	fix(groot): N1.7 backbone loading and DiT parameter-count logging - select_layer default tracks the N1.7-3B checkpoint value (16); real checkpoint loads still override it from config.json. - get_backbone_cls recognizes Cosmos-Reason2 / Qwen3-VL backbones by name and warns (instead of silently assuming) when an unrecognized backbone is loaded only on the strength of backbone_model_type='qwen'. - 'revision' pins the GR00T checkpoint repo only and is no longer forwarded into the unrelated backbone repo load; pin the backbone via transformers_loading_kwargs instead. - DiT / SelfAttentionTransformer parameter counts go through logging.debug instead of print().	2026-06-12 23:55:33 +02:00
Steven Palma	edda8552ec	docs(groot): document the N1.5 removal and the N1.7 parity test - groot.mdx: breaking-change warning and migration path (pin lerobot==0.5.1 to keep N1.5, or move to N1.7); the dead `huggingface-cli download` is replaced with `hf download`. - policy_groot_README.md: N1.5 removal note, updated paper / model-card links, and the two-comparison (model parity + preprocessor parity) description of the original-vs-LeRobot test, including the raw-observation artifacts and recorded seed.	2026-06-12 23:40:36 +02:00
Kartik	c8225d749a	Merge pull request #12 from acwrenn53/exp/groot-n17-test-groot-lerobot Adopt test_groot_lerobot for GR00T N1.7, drop N1.5	2026-06-12 11:01:25 +02:00
nv-sachdevkartik	68f869b7a0	test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5 The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5 checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On load, model_version defaults to n1.7 while the base path infers n1.5, so the version-consistency guard in GrootConfig.__post_init__ raised ValueError and both test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no longer a supported model_version. Adopt the test for N1.7: - MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via GrootPolicy.from_pretrained as a base N1.7 model). - Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint embodiment_id.json), via a single EMBODIMENT_TAG constant. - DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size. - Docstrings/labels updated to 'GR00T N1.7'. Both tests run and pass on CUDA; full tests/policies/groot/ suite is 73 passed / 0 failed / 0 skipped.	2026-06-12 08:42:45 +00:00
Kartik	4119ad4d10	Merge pull request #11 from acwrenn53/exp/groot-n17-logit-parity GR00T N1.7 logit parity	2026-06-12 10:14:05 +02:00
nv-sachdevkartik	750358895b	test(groot): move parity producer into utils/ package Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into a tests/policies/groot/utils/ package (with __init__.py) and update all path references in the test docstring/skip-message and the policy README.	2026-06-12 08:10:03 +00:00
nv-sachdevkartik	bc4d0db8f4	docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring	2026-06-12 08:06:33 +00:00
nv-sachdevkartik	45e273b806	test(groot): self-contained parity test + in-repo producer + docs - Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py - Make the test self-contained: producer script (dump_original_n1_7.py) now lives next to the test; default artifact dir is repo-relative (tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The test only reads artifacts and skips if absent -- it never creates external dirs. - Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer; never committed. - Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note). - Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md). - Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity. 9/9 embodiments still pass (max\|diff\| < 3e-6, fp32 eps).	2026-06-12 07:47:11 +00:00
nv-sachdevkartik	8b5f56b63c	test(groot): parametrize N1.7 parity across all checkpoint embodiments Generalize the original-vs-LeRobot N1.7 output-parity test from a single libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid, real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built generically from checkpoint metadata; the test discovers per-tag .npz artifacts and runs one parametrized case each, loading the LeRobot model once via a fixture. All 9 embodiments match the original to fp32 epsilon (max\|diff\| < 3e-6), confirming the integration is correct across the model's full embodiment space and not overfit to libero_sim.	2026-06-11 21:41:30 +00:00
nv-sachdevkartik	9f1ee224cb	test(groot): add N1.7 original-vs-LeRobot output parity test Verifies the LeRobot GR00T N1.7 integration produces equivalent raw action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed, precision (fp32) and attention kernel (SDPA): max\|diff\|=8.9e-7 on the libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10). The two impls pin incompatible transformers majors (orig 4.57.3 vs LeRobot 5.x) and cannot share a process, so the original outputs + exact collated inputs are produced out-of-process and loaded from an .npz. The test skips on CI / when the checkpoint or artifact are absent.	2026-06-11 20:59:14 +00:00
nv-sachdevkartik	885f55ef04	groot: fix stale Eagle VLM docstring in processor (N1.7 uses Qwen3-VL backbone) Addresses checker nit: processor_groot.py docstring still described the N1.5 Eagle VLM path with eagle_content/eagle_* keys that no longer exist in the code.	2026-06-11 18:10:46 +00:00
nv-sachdevkartik	bba996ef8d	groot: reuse lerobot get_device_from_parameters instead of inline lookup modeling_groot.py duplicated next(self.parameters()).device twice. LeRobot ships get_device_from_parameters in policies/utils.py (used by diffusion, vqbet, tdmpc, gaussian_actor). Reuse it for consistency with the framework.	2026-06-11 18:03:28 +00:00
nv-sachdevkartik	162b07512a	groot: remove dead N1.5 code (eagle2_hg_model, flow_matching_action_head, action_encoder) N1.7 backbone is nvidia/Cosmos-Reason2-2B via Qwen3VLForConditionalGeneration, not Eagle2 — eagle2_hg_model/ had zero refs outside its own dir. GR00TN17ActionHead (groot_n1_7.py) re-implements MultiEmbodimentActionEncoder + CategorySpecificLinear + swish + SinusoidalPositionalEncoding locally, so flow_matching_action_head.py (N1.5 FlowmatchingActionHead) and its sole dependency action_encoder.py are dead. Verified: no src/ or tests/ reference. Removed (~2037 LOC): - eagle2_hg_model/ (4 files, ~1575 LOC) - action_head/flow_matching_action_head.py (408 LOC) - action_head/action_encoder.py (54 LOC) cross_attention_dit.py KEPT (DiT/AlternateVLDiT/SelfAttentionTransformer live in N1.7).	2026-06-11 17:49:12 +00:00
acwrenn53	0509ea05df	Merge pull request #10 from acwrenn53/nvidia-gr00t-n17-lerobot-cleanup Remove GR00T N1.5 support and fix LIBERO gripper action transform	2026-06-05 12:15:10 -07:00
Andrew Wrenn	de1a9e5ad9	Reconnect GR00T relative action processors	2026-06-05 09:31:04 -07:00
groot-validation	6803439f22	groot: auto-enable LIBERO gripper action transform for libero_sim GR00T N1.7 emits gripper in [0,1] but LIBERO expects [-1,1]. The decode transform existed but was never auto-enabled for embodiment_tag=libero_sim, so the policy scored 0% on LIBERO eval. Auto-set it in __post_init__ (still overridable). LIBERO Spatial eval: 0% -> 98%.	2026-06-05 00:56:11 +00:00
nv-sachdevkartik	90d1e70da2	removed remaining N1.5 traces	2026-06-05 00:11:37 +00:00
nv-sachdevkartik	a35ac22afd	removed n1.5 dependency	2026-06-04 22:14:07 +00:00
Kartik	fd7fed08e2	Merge branch 'huggingface:main' into nvidia-gr00t-n17-lerobot	2026-06-04 23:41:09 +02:00
Maxime Ellerbach	2e9cd87bbd	feat(policies): add VLA-JEPA (#3568 ) * first commit * feat(policies): add VLA-JEPA * feat(policies): add VLA-JEPA * support vla_jepa * (feat)policies: add VLA-JEPA * linting * adding deps to pyproject.toml * updating uv lock * adding guards to avoid needing transformers and diffusers for type checking and basic tests * fixing action and state dim * fix warnings with qwen processor kwargs * fixing wm_loss not propagating * adjusting obs steps, tublets size to match original implementation * some more fixes to be closer to the original implem * adding more tests to ensure good coverage * align VLA-JEPA architecture with original checkpoint - Remove stale `action_num_heads` / `action_attention_head_dim` config fields; DiT head dimensions are now always derived from the preset (DiT-B/L/test). - Add `num_target_vision_tokens` and `action_max_seq_len` config fields required by the action head's future-token embedding and positional embedding tables. - Fix default `qwen_model_name` to 2B (matches all released checkpoints). - Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match checkpoint key names; replace `nn.Sequential` decoder/state-encoder with `_MLP2` (layer1/layer2 naming). - Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim` (DiT input width) rather than `action_hidden_size` (DiT output width). - Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match checkpoint; add alternating cross/self attention (even blocks cross-attend to Qwen context, odd blocks self-attend). - Add `DiT-test` preset for unit tests. - Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`. * propagate action_is_pad masking through VLA-JEPA policy pipeline Pass the `action_is_pad` tensor from the batch through to the action head so padded timesteps are excluded from the flow-matching loss. * update VLA-JEPA tests for arch changes and action_is_pad - Switch conftest to use `action_model_type="DiT-test"` now that `action_num_heads` / `action_attention_head_dim` have been removed. - Add action_head tests covering fully-padded loss (zero) and equivalence of action_is_pad=None vs all-zeros mask. - Remove obsolete `test_native_to_lerobot_wm_only` test. * add VLA-JEPA documentation Covers architecture overview, pretrained checkpoints, config reference, training/eval commands for LIBERO-10, and guidance on fine-tuning for single-camera datasets. * add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated) * make default params more aligned with paper and pretrained models - adding possibility of freezing qwen backbone and world model - added tests for weight loading * trying out to re-init the action head to avoid pretraining dimension mismatch * allow different state dim and action dim * removing missleading future_action_window_size to just use chunk_size * lots of changes to make existing weights work, need to massively refactor the pre and post processing * refactoring into using pre and post processor * pre-commit cleanup * fixing doc defaults args Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adressing dtype zeros issue * adding guard for diffusers * fixing training and exal examples * trying to close success rate gap * fix qwen norm layer output libero eval is now as expected * adding instructions for different embodiement + fixing some tests * smol fix to avoid having default CPU device when training * fixing misconception about multiview / singleview handling * removing conversion script * adding licences * adding .mdx docs and shortening polivy_vla_jepa_README.md * removing useless pre-processor * cleanup * removing swish in favor of silu * adding configuration gripper index and threshold * fixing simlink --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: ginwind <ginwind@mail.ustc.edu.cn>	2026-06-04 19:22:51 +02:00
acwrenn53	0c3cc4c9d6	Merge pull request #6 from acwrenn53/nvidia-gr00t-n17-lerobot-rtc-2 Nvidia gr00t n17 lerobot rtc 2	2026-06-03 16:10:49 -07:00
Andrew Wrenn	6caeac9d07	Ignore padded GR00T N1.7 RTC prefix rows	2026-06-03 14:04:31 -07:00
Andrew Wrenn	1d6810b814	Trim GR00T N1.7 RTC chunks to valid horizon	2026-06-03 13:51:35 -07:00
Andrew Wrenn	de9af57475	Fix GR00T N1.7 RTC action decoding	2026-06-03 13:43:13 -07:00
Jaimin	d1b1c5c8cf	docs: fix broken dataset script paths (datasets/v30 -> scripts) (#3695 ) The docs pointed at src/lerobot/datasets/v30/, which does not exist. Both scripts actually live in src/lerobot/scripts/: - convert_dataset_v21_to_v30.py - augment_dataset_quantile_stats.py Updated the four references (one python -m module path and three file-path invocations) to the correct location, matching each script's own usage docstring.	2026-06-03 14:48:19 +02:00
Nikodem Bartnik	741c2d0a39	Docs/add lelab (#3707 ) * first text draft (no images) * simplified docs * fix formatting * add youtube video * add a tip about compatibility * fix broken link	2026-06-03 14:22:05 +02:00
Haoming Song	19fe315971	fix(train): enable relative action overrides for pretrained processors (#3711 ) * fix(train): enable relative action overrides for pretrained processors Keep pretrained processor pipelines when use_relative_actions is enabled and apply relative/absolute action processor settings through overrides. Rename the relative action processor registry key to relative_actions_processor. * fix(config): reject rename_map without pretrained checkpoint Fail fast when rename_map is set during fresh initialization, since fresh configs derive feature names from the current dataset and no rename is applied. --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-06-03 11:46:35 +02:00
Andrew Wrenn	364750ada2	Allow Groot fake RTC chunk prefetch	2026-06-02 14:20:00 -07:00
Andrew Wrenn	342d223706	Restore GR00T Flash Attention install guidance	2026-06-02 13:26:08 -07:00
Andrew Wrenn	e3b203e5a7	Move Groot processor compatibility into Groot loader	2026-06-02 13:19:12 -07:00
Khalil Meftah	906b585826	fix(datasets): default `private` to `None` in `push_to_hub` to respect Hub org visibility settings (#3713 )	2026-06-02 19:25:13 +02:00
Andrew Wrenn	b568c41355	Add GR00T N1.7 support Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests. Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>	2026-06-01 08:57:04 -07:00
Khalil Meftah	b8ad81bf39	feat(rewards): add ROBOMETER reward model (#3627 ) * feat/add ROBOMETER reward model * feat(rewards): add Robometer offline progress labeling script * fix(rewards/robometer): add missing input keys mm_token_type_ids * chore(rewards/robometer): default to lerobot/Robometer-4b model * doc(rewards/robometer): update citation and original github link * feat(rewards/robometer): add image key argument to compute Robometer progress	2026-05-29 21:45:39 +02:00
Haoquan Fang	24017e960c	Add MolmoAct2 policy (#3604 ) * add molmoact2 policy * add apache headers to molmoact2 files * simplify molmoact2 package imports * align molmoact2 feature validation with eo pattern * remove molmoact2 processor override from factory * guard molmoact2 transformers imports * guard molmoact2 processor transformers import * add scipy dependency to molmoact2 extra * use a single molmoact2 action queue * move molmoact2 config logic into config * fix molmoact2 hf image key resolution * load molmoact2 without remote code * lazy import molmoact2 scipy * format molmoact2 files * skip molmoact2 tests without optional deps * fix molmoact2 pre-commit checks * validate molmoact2 gripper range	2026-05-27 18:58:37 +02:00
Khalil Meftah	e86f5af5bf	feat(rewards): add TOPReward reward model (#3629 ) * feat(rewards): add TOPReward reward model * refactor(rewards): clean up TOPReward processor/model * fix(rewards/topreward): add missing input keys mm_token_type_ids * fix(rewards/topreward): fix pyproject extra typo and simplify processor (#3653) Add lerobot[topreward] extra to all in pyproject.toml, drop the redundant labels arg in scoring, and collapse the dead-branch shape check in the encoder processor. * optmize topreward input processing (#3660) --------- Co-authored-by: Cole <91766445+jcoleharrison@users.noreply.github.com> Co-authored-by: Haoming Song <haomingsong24@gmail.com>	2026-05-27 14:24:31 +02:00
Haoming Song	5c98e80430	fix(gr00t): fix Eagle25VL model and processor crash in transformers>=5.4.0, <5.6.0 (#3652 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:04:22 +02:00
Reece O'Mahoney	f65f3f7a4a	Fix policy.path in YAML configs (PR #3145 followup) (#3597 ) PR #3145 added YAML support for policy.path but left two bugs: 1. extract_path_fields_from_config only deleted config_data[field] when no sibling overrides existed. With siblings, the dict stayed in place and draccus crashed decoding it as PreTrainedConfig (no 'type' key). Sibling overrides go into _config_yaml_overrides and are applied later by from_pretrained(), so the field can always be removed. 2. wrap() updated config_path_cli to the cleaned temp file path but never propagated it to the draccus.parse fallback branch. cli_args still contained --config_path=<original>, so draccus read the original YAML with path: still present. Tests passed because they (a) called extract_path_fields_from_config directly and (b) included type: alongside path: in the YAML, sidestepping both bugs. Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:01:19 +02:00
Pepijn	8194897994	fix(deps): cap placo below 0.9.16 and harden kinematics import (#3647 ) * fix(deps): cap placo below 0.9.16 and harden kinematics import placo 0.9.16 links against liburdfdom_sensor.so.4, which is unavailable on Ubuntu 24.04 (noble ships urdfdom 3.x). Importing placo on that base crashes with: ImportError: liburdfdom_sensor.so.4.0: cannot open shared object file This broke nightly Latest Deps tests (CPU and GPU) when the lockfile upgrade picked placo 0.9.16, since lerobot.model.kinematics unconditionally imports placo when _placo_available is true, and that check (importlib.util.find_spec) cannot detect dlopen failures of transitive shared libraries — so unrelated subsystems (RL actor, gym_manipulator) became unimportable. Two changes: 1. Pin placo to <0.9.16 in pyproject.toml + regenerate uv.lock (0.9.16 → 0.9.15). Short-term unblock for nightly CI until system urdfdom 4.x is broadly available. 2. Harden the import guard in src/lerobot/model/kinematics.py: wrap 'import placo' in try/except ImportError so a missing transitive .so no longer crashes module import. RobotKinematics instantiation now raises an informative ImportError citing the underlying dlopen failure via _raise_if_placo_unusable(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(kinematics): hoist _placo_runtime_error to module scope for mypy Mypy walks the TYPE_CHECKING branch in which the runtime else-block is not executed, so _placo_runtime_error was only defined at runtime and mypy reported 'Name "_placo_runtime_error" is not defined' on the three references inside _raise_if_placo_unusable. Declare the symbol unconditionally at module scope with a default of None; the runtime import-failure branch still assigns to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(kinematics): drop verbose comments around placo import guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 12:03:07 +02:00
Haoming Song	9f437d86b6	fix(groot): align GR00TN15Config with transformers config dataclasses (#3606 ) * fix(gr00t): fix gr00t config dataclass init TypeError * fix(groot): guard strict config decorator without transformers for passing CI --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-22 10:31:04 +02:00
Haoming Song	b74a551d38	fix(pi0, pi05): stabilize torch.compile and expand test coverage (#3610 ) * chore(gr00t): sync with #3606 for fixing gr00t config crash * fix(pi0&pi05): fix graph break caused by deepcopy of past_key_values in sample_actions * fix(pi0&pi05): fix frequent recompile caused by compute_layer_complete * feat(test): add compile test and benchamrk for pi0 and pi05 * feat(test): add comprehensive testing for pi0 and pi05. Including processor, forward, sample action, etc.	2026-05-22 10:29:34 +02:00
Nikodem Bartnik	c0a2e9814d	fix examples (#3623 ) - Fixed broken API examples in Lerobot Imitation Learning Documentation - Teleoperation with cameras improved by adding a fixed frequency in the loop (without it the cameras feed gets very slow) - Wrapped record example script in main() to avoid problems on Mac - Previously teleoperation example was using SO-ARM and teleoperation with cameras was using Koch. I changed it to use SO-ARM in all of the examples. - Added section on how to train with HF Jobs - CLI and Python examples - Replaced lerobot-record with lerobot-rollout in policies examples	2026-05-21 22:14:07 +02:00
Khalil Meftah	bac4f61eae	refactor: support custom progress parquet overlays (#3640 )	2026-05-21 14:32:10 +02:00
Virgileboat	f4b834844e	Feat/clean can bus (#3526 ) * change timeout for handshake * enforce last state read when querry * change import order * fix(motors): flush stale robstride RX and harden feedback drain * robstride: remove redundant timeout and max_messages casts * bugfix + %-style * update exception catch	2026-05-21 11:44:04 +02:00
Roham Z. Nobari	dfdc48a7f1	fix(datasets): bound VideoDecoderCache to prevent OOM on large datasets (#3614 ) VideoDecoderCache used an unbounded dict keyed on absolute path, with no eviction in the standard LeRobotDataset path. With shuffled iteration over datasets that have many distinct mp4 files, every DataLoader worker accumulated one cached (VideoDecoder, fsspec file handle) pair per distinct path it had ever touched. Per-entry cost is ~3-5 MB of host RAM plus one open FD; at ~8 k entries this is roughly 30 GB per worker. This was hit in the wild during a SmolVLA training run on a 4,195-episode SO-101 dataset (8,390 mp4s, two cameras per episode). dmesg showed anon-rss climbing to 34.9 GB on a single pt_data_worker before the OOM killer fired ~30 min into training; with --num_workers=8 the per-worker peak halved to 17.9 GB, which is the expected inverse-scaling signature when the leak is per-decode and the workload is split across workers. The working workaround on the affected platform was --dataset.video_backend=pyav, because the pyav path opens/closes per call and never touches this cache. Switch the backing store to an OrderedDict and evict LRU entries when the cap is reached, closing the evicted file handle inside the lock so we do not leak FDs either. Default cap is DEFAULT_DECODER_CACHE_SIZE = 100, overridable via LEROBOT_VIDEO_DECODER_CACHE_SIZE or by passing max_size= to the constructor; max_size=None restores the legacy unbounded behaviour for callers that need it. Validation on the original failing workload (decode_video_frames_torchcodec called over real mp4s from the affected SO-101 dataset): unbounded: 300 files -> +1087 MB host RSS, cache=300, still climbing cap=50: 500 files -> +266 MB host RSS, cache=50, stable cap=50: 2000 calls -> +312 MB host RSS, cache=50, stable cap=100: 1000 calls -> +470 MB host RSS, cache=100, stable Three independent seeded runs at cap=50 agreed to within 1% (263 / 266 / 265 MB delta), and the 2000-call multi-pass run shows RSS plateaus after the cap is reached instead of drifting. Tests in tests/datasets/test_video_decoder_cache.py cover: default-is-bounded, size cap, LRU ordering, FD close on eviction, FD close on clear(), cache-hit invariance, max_size=None fallback, and env-var override. No regressions in test_video_encoding.py, test_streaming.py, or test_dataset_reader.py (73 prior tests still pass alongside the 8 new ones).	2026-05-19 16:54:25 +02:00

1 2 3 4 5 ...

1512 Commits