lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-05 09:07:03 +00:00

Author	SHA1	Message	Date
johnnynunez	f53490c15e	feat(groot): train-time random crop for N1.7 (eval keeps center crop) Isaac-GR00T crops a random crop_fraction window during training and the deterministic center window at eval, replaying the sampled window across all camera views of a sample. This contract is unchanged since the N1.5 release (gr00t/data/transform/video.py: "If mode is 'train', return a random crop transform. If mode is 'eval', return a center crop transform.") and mirrors LeRobot's own Diffusion/VQBeT crop_is_random pattern. The LeRobot N1.7 port used the eval center crop for training too, so the fine-tuned projector/DiT never sees frame borders and trains on a single fixed appearance point. Scope: crop geometry ONLY - no color jitter, no new dependencies. The random window is plain numpy slicing inside the existing cv2 eval transform: - _transform_n1_7_image_for_vlm_albumentations gains crop_position=(y, x) fractions; None keeps the center crop byte-identical to before (verified by test) - GrootN17VLMEncodeStep gains a runtime-only 'training' flag (never serialized; reloaded pipelines default to eval); training samples ONE window per sample and reuses it across (timestep, view) frames - Isaac's cross-view consistency - gated on torch.is_grad_enabled() so no_grad validation and frozen-eval paths are unaffected - wired via dataset_meta is not None in make_groot_pre_post_processors and the existing _set_groot_preprocessor_training on serialized reloads Verification: tests/policies/groot/test_groot_train_random_crop.py (8 passed: center-crop bit-exactness with crop_position=None, corner/center windows, cross-view replay, train!=eval, no_grad gating, seed reproducibility, serialization contract) + groot suite 23 passed / 5 skipped on RTX PRO 6000 / CUDA 13.3.	2026-07-02 03:17:47 +02:00
acwrenn53	459d416bbf	Merge pull request #41 from johnnynunez/split/groot-n17-state-dropout feat(groot): activate checkpoint-configured N1.7 raw-state dropout during training	2026-07-01 16:16:48 -07:00
johnnynunez	f42cdcf137	fix(groot): align N1.7 fine-tuning optimizer/scheduler/precision with Isaac-GR00T Evidence from the LeRobot-vs-OSS checkpoint comparison: the LeRobot/HF 8k checkpoint's DiT moved only ~19% as far from base as the OSS-trained one (0.0547 vs 0.285 relative L2) - undertrained because the scheduler decayed over a hardcoded 10k steps regardless of --steps, on top of beta1/clip mismatches. - AdamW betas (0.95, 0.999) -> (0.9, 0.999) and grad_clip_norm 10.0 -> 1.0 (Isaac defaults) - scheduler: hardcoded CosineDecayWithWarmup(10k decay, floor 10% peak) -> DiffuserSchedulerConfig HF cosine with ceil(max_steps * warmup_ratio) warmup, deriving num_training_steps from the outer --steps at runtime - model_params_fp32 (default true): keep master weights in FP32 and compute under BF16 autocast like the native N1.7 recipe (fixes optimizer-update numerics vs pure-BF16 params) - weight-decay grouping via transformers get_parameter_names: biases and norm parameters excluded from decay - restore the TF4 lm_head/embedding weight tie so the unused Qwen LM head stays frozen and deduplicated in checkpoints - action_mask kept in native dtype for the masked flow-matching loss - drop_n_last_frames: exclude episode tails that cannot supply a complete action chunk (Isaac sampler behavior) Verification: tests/policies/groot/test_groot_training_optim_contract.py (7 passed) + remaining groot suite 11 passed/5 skipped on RTX PRO 6000 / CUDA 13.3. Note: tests/policies/groot/test_groot_n1_7.py does not collect on the base branch (pre-existing ImportError, fixed in PR #37).	2026-07-02 01:04:23 +02:00
johnnynunez	20c0f07858	feat(groot): activate checkpoint-configured N1.7 raw-state dropout during training Isaac-GR00T applies dual state regularization during fine-tuning: raw-state zeroing driven by the processor sidecar's state_dropout_prob (0.2 for the inspected N1.7 checkpoint) plus encoded-feature dropout. Baseline LeRobot kept the processor in deterministic mode, so the raw-state dropout never activated (RCA Tier-2 contributor to the LeRobot-trained SO-101 failures). - GrootN17PackInputsStep: runtime-only 'training' flag + state_dropout_prob; whole-sample state zeroing gated on torch.is_grad_enabled() so eval and no_grad validation paths are unaffected - sidecar loader reads state_dropout_prob from processor_config.json - state_dropout_prob serializes with the step; the training flag intentionally does not (reloaded pipelines default to eval, re-enabled only when processors are rebuilt with dataset_meta) - _set_groot_preprocessor_training toggles any dataclass step exposing a 'training' field on serialized-pipeline reloads Verification: tests/policies/groot/test_groot_state_dropout.py (4 passed) on RTX PRO 6000 / CUDA 13.3.	2026-07-02 00:54:20 +02:00
Andy Wrenn	da9ce79678	fix(groot): make N1.7 letterbox opt-in	2026-06-30 15:46:56 -07:00
Steven Palma	c74eb20abd	fix(test): add guard	2026-06-30 15:46:56 -07:00
Steven Palma	22c1d0765a	chore(policies): add explicit dataset dependecy to gr00t implementation	2026-06-30 15:46:56 -07:00
Steven Palma	73c3a66d51	fix(ci): guard dependecy checks	2026-06-30 15:46:56 -07:00
Steven Palma	b422269de4	fix(style): pre-commit	2026-06-30 15:46:56 -07:00
Steven Palma	44b6950f06	chore(policies): add guards, warnings and comments + recover tests n1.5 check	2026-06-30 15:46:56 -07:00
Andy Wrenn	4a3f46d0ec	Format GR00T OSS parity changes	2026-06-28 12:55:42 -07:00
Andy Wrenn	bdc05c89e3	Apply LIBERO action decode override after loading	2026-06-28 12:55:42 -07:00
Andy Wrenn	1fcc100790	Match GR00T N1.7 OSS preprocessing and relative actions	2026-06-28 12:55:42 -07:00
Andy Wrenn	6126a85d60	Guard GR00T relative action stepwise decode	2026-06-28 12:55:42 -07:00
Andy Wrenn	2ed55d2a77	Move GROOT relative stats out of train script	2026-06-28 12:55:42 -07:00
Andy Wrenn	31f7979498	Revert "Reset rollout state after robot episode end" This reverts commit `1322f45aec`.	2026-06-28 12:55:42 -07:00
Andy Wrenn	b8dcc51f35	Reset rollout state after robot episode end	2026-06-28 12:55:42 -07:00
Andy Wrenn	ab351fa3b0	Fix GROOT relative action padding and RTC leftovers	2026-06-28 12:55:42 -07:00
Andy Wrenn	977e00a4e5	Fix GROOT relative action training stats	2026-06-28 12:55:42 -07:00
Andy Wrenn	f25b97936e	Fix GROOT N1.7 relative action stats	2026-06-28 12:55:42 -07:00
Andy Wrenn	0a588064d4	Address GROOT relative action review feedback	2026-06-28 12:55:42 -07:00
Andy Wrenn	679fe3621e	Fix GROOT relative action training stats	2026-06-28 12:55:42 -07:00
Steven Palma	5fe9fe0050	fix(groot): GPU/tensor N1.7 image preprocessing + resize to trained resolution GR00T training was dataloader-bound (0->100->0 GPU-utilization sawtooth). GrootN17VLMEncodeStep ran the Qwen3-VL image processor per frame on PIL images on the single CPU main-loop thread, and that cost is timed inside dataloading_s (preprocessor(batch) runs in the main process, not the dataloader workers), so adding workers cannot hide it. - Feed the torchvision-backed Qwen3-VL processor (C,H,W) uint8 tensors instead of a per-frame Image.fromarray PIL roundtrip, and run resize/normalize/patchify on config.device (GPU) when available. Bit-identical on CPU when no resize is configured; with a resize only the PIL->torchvision bicubic backend differs (<2/255 per pixel). The use_albumentations path stays PIL/cv2; reload on a box without the saved device falls back to CPU. - Default image_target_size/crop to the N1.7 backbone's training geometry (256x256 / 230x230) when a checkpoint ships no image sizing (checkpoint_assets is None, e.g. finetuning nvidia/GR00T-N1.7-3B via repo-id with a new embodiment). Previously image_target_size=None disabled the resize, so full-resolution frames were patchified into ~4.7x more vision tokens than the model was trained on -- inflating dataloading_s (patchify) and update_s (VLM sequence) and skewing the input distribution. Checkpoints that pin their own sizing are honored; the default constants are shared with GR00T_N1_7_DEFAULTS. Net: preprocessing leaves the CPU critical path and the VLM sees the resolution it was trained on -- faster training/inference and a correct train/serve distribution. Affects inference too (shared preprocessor); existing checkpoints still load (backward compatible) but must be retrained to gain the benefits.	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	6ec33dbaef	test(groot): adopt test_groot_lerobot for GR00T N1.7, drop N1.5 The test loaded MODEL_PATH='aractingi/bimanual-handover-groot-10k', an N1.5 checkpoint (config base_model_path=nvidia/GR00T-N1.5-3B, no model_version). On load, model_version defaults to n1.7 while the base path infers n1.5, so the version-consistency guard in GrootConfig.__post_init__ raised ValueError and both test_lerobot_groot_inference and test_lerobot_groot_forward_pass failed. N1.5 is no longer a supported model_version. Adopt the test for N1.7: - MODEL_PATH -> nvidia/GR00T-N1.7-3B (root-level sharded safetensors; loads via GrootPolicy.from_pretrained as a base N1.7 model). - Embodiment tag 'gr1' (N1.5) -> 'gr1_unified' (valid N1.7 tag from the checkpoint embodiment_id.json), via a single EMBODIMENT_TAG constant. - DUMMY_ACTION_HORIZON 16 -> 40 to match N1.7's native action-chunk size. - Docstrings/labels updated to 'GR00T N1.7'. Both tests run and pass on CUDA; full tests/policies/groot/ suite is 73 passed / 0 failed / 0 skipped.	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	628e8fe3b6	test(groot): move parity producer into utils/ package Mirror the tests/policies/pi0_pi05/utils convention: move dump_original_n1_7.py into a tests/policies/groot/utils/ package (with __init__.py) and update all path references in the test docstring/skip-message and the policy README.	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	9db6a8ae0f	docs(groot): drop WHY TWO ENVIRONMENTS block from parity test docstring	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	a9a78f72fe	test(groot): self-contained parity test + in-repo producer + docs - Rename test_groot_n1_7_vs_original.py -> test_groot_vs_original.py - Make the test self-contained: producer script (dump_original_n1_7.py) now lives next to the test; default artifact dir is repo-relative (tests/policies/groot/artifacts/), overridable via GROOT_N1_7_PARITY_DIR. The test only reads artifacts and skips if absent -- it never creates external dirs. - Heavy .npz artifacts (~6-9MB each) are gitignored and regenerated by the producer; never committed. - Drop the verbose 'MULTIPLE EMBODIMENTS' docstring block (kept a one-line note). - Document the parity procedure in the groot policy README (docs/source/policy_groot_README.md). - Rename test fn test_groot_n1_7_get_action_parity -> test_groot_get_action_parity. 9/9 embodiments still pass (max\|diff\| < 3e-6, fp32 eps).	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	4317508984	test(groot): parametrize N1.7 parity across all checkpoint embodiments Generalize the original-vs-LeRobot N1.7 output-parity test from a single libero_sim case to every embodiment tag in the checkpoint (libero_sim, oxe_droid, real_g1, the real_r1_pro_sharpa family, and the xdof family). Inputs are built generically from checkpoint metadata; the test discovers per-tag .npz artifacts and runs one parametrized case each, loading the LeRobot model once via a fixture. All 9 embodiments match the original to fp32 epsilon (max\|diff\| < 3e-6), confirming the integration is correct across the model's full embodiment space and not overfit to libero_sim.	2026-06-28 12:55:18 -07:00
nv-sachdevkartik	883ff3eb21	test(groot): add N1.7 original-vs-LeRobot output parity test Verifies the LeRobot GR00T N1.7 integration produces equivalent raw action_pred to NVIDIA Isaac-GR00T for the same checkpoint, inputs, seed, precision (fp32) and attention kernel (SDPA): max\|diff\|=8.9e-7 on the libero_sim embodiment (GR00T-N1.7-LIBERO/libero_10). The two impls pin incompatible transformers majors (orig 4.57.3 vs LeRobot 5.x) and cannot share a process, so the original outputs + exact collated inputs are produced out-of-process and loaded from an .npz. The test skips on CI / when the checkpoint or artifact are absent.	2026-06-28 12:55:18 -07:00
Andrew Wrenn	87e4460f60	Reconnect GR00T relative action processors	2026-06-28 12:55:17 -07:00
nv-sachdevkartik	1b24d7bc86	removed remaining N1.5 traces	2026-06-28 12:55:17 -07:00
nv-sachdevkartik	b6c910e936	removed n1.5 dependency	2026-06-28 12:55:17 -07:00
Andrew Wrenn	58247ab9bc	Ignore padded GR00T N1.7 RTC prefix rows	2026-06-28 12:55:02 -07:00
Andrew Wrenn	3159f473df	Trim GR00T N1.7 RTC chunks to valid horizon	2026-06-28 12:55:02 -07:00
Andrew Wrenn	bed3747804	Fix GR00T N1.7 RTC action decoding	2026-06-28 12:55:02 -07:00
Andrew Wrenn	60e1474cf6	Allow Groot fake RTC chunk prefetch	2026-06-28 12:55:02 -07:00
Andrew Wrenn	111dceeb8a	Move Groot processor compatibility into Groot loader	2026-06-28 12:55:01 -07:00
Andrew Wrenn	9c26e111d1	Add GR00T N1.7 support Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests. Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>	2026-06-28 12:55:01 -07:00
Caroline Pascal	3dd19d043e	feat(depth maps): adding support for depth in LeRobot (#3644 ) * feat(depth): add depth quantization helpers and tests * feat(video): add ffv1 to supported codecs * feat(depth): persist depth metadata * feat(depth): extend quantization tools to better fit the encoding/decoding pipeline * feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter * feat(depth): wire StreamingVideoEncoder + writer to depth encoder * feat(depth): wire DatasetReader to decode_depth_frames * feat(cameras/realsense): expose async depth in metric meters * feat(features): route 2D camera shapes to observation.depth.<key> * feat(robots/so_follower): emit + populate depth keys when use_depth * feat(record): plumb DepthEncoderConfig through lerobot-record * feat(viz): render depth observations as rr.DepthImage in Viridis * feat(depth maps writer): adding support for raw depth maps recording with image writer * chore(format): format code * feat(depth shape): ensuring depth maps shape is always including the channel * feat(is_depth): simplifying is_depth nested name + legacy support * fix(stop_event): fixing stop_event race condition in camera classes * fix(plumbing): fixing missing parts in the depth maps pipeline * chore(typos): fixing typos * test(fix): fixing exisiting tests to still work with latest features * tests(depth): adding new tests for depth integration validation * feat(pix_fmt channels): use PyAv to check get pixel formats number of channels * feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file. * fix(pre-commit): fixing mutable defautl value * fix(info): fixing info metadata update when is_depth_map was set * tests(typos): fixing typos in tests * fix(realsense): fixing typo in realsense serial number * fix(normalization): restricting 255 normalization to non depth/uint8 images only * fix(typo): fixing typo * fix(TIFF): add missing quantization and cleanup for TIFF files * feat(batched dequantization): optimizing dequantize_depth for torch based batched dequantization * feat(tools): adding depth support in LeRobotDataset edition tools * test(aggregate): extending aggregation tests to depth frames * test(cleaning): cleaning up tests * fix(from_video_info): fixing early validation issue in from_video_info * fix(typo): fixing typo * fix(is_depth): adding missing doctrings and is_depth arguments in video decoding functions Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * fix(depth units): fixing depth units output for the realsense cameras * feat(output unit): adding support for output unit specification at dataset reading/training time Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * test(depth): cleaning up depth tests * test(depth encoding): updating and cleaning video/depth encoding tests * chore(format): formatting code * docs(depth): improving depth maps docs * test(fix): fixing depth tests * test(dataset tools): adding missing tests for new dataset edition tools features * chore(format): formatting code * fix(pyav check): fixing PyAV option validation for integer codec options by normalizing numeric values before calling `is_integer()` Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * docs(mermaid): fixing mermaid diagram * fix(rebase): rebase follow up corrections * feat(dataset tools): adding missing docstrings and features for depth fill support in dataset edition tools * docs(docstring): updating docstrings * docs(dataset tools): updating docs * fix(save images): fixing image saving in dataset tools * fix(update video info): fixing update video info logic to match the recording and editing use cases * test(reencode): fixing reencoding monkeypatch * fix(review): add Claude review * chore(format): format code * fix(update video info): ditching the differentiated approahces for video info update - video info are always updated unless for preserved keys. * chore(rebase): fixing rebase merge conflicts * test(visualization): fixing visualization tests * feat(docstrings): adding explicit docstring for encoding parameters. Docstrigns will now show up as description in the CLI --help. * feat(mm as default): adding a global DEFAULT_DEPTH_UNIT variable setting mm as default depth unit * fix(RGB <-> camera): renaming camera_encoder to rgb_encoder for clarity * chore(TODO): removing deprecated TODO * doc(write_u16_plane): improving docstrings for write_u16_plane * feat(units): adding constants for depth frames units (m and mm) * fix(spam): replacing spamming warning but a debug log * feat(leagcy metadata): adding automatic metadata update for legacy 'video.is_depth_map' feature * fix(copy&reindex): fixing metadat reshaping for single channel frames * fix(ImageNet): excluding dpeth frames from ImageNet stats * fix(PyAV container seek): fixing initial PyAV container seek to be robust againsy codec choice * feat(lerobot-dataset-viz): adding support for depth in lerobot-dataset-viz * fix(compress): removing rerun compression for DepthImages * fix(signle channel squeeze): fixing single channel squeezing * chore(format): format code * fix(streaming): adding support for dequantization in streaming_dataset.py * refactor(read depth): factorizing depth reading methods for realsense camera and adding support for depth-only usage * chore(renaming): fixing missed RGBEncoderConfig renamings * docs(renaming): reflecting renamings in a clearer way in the docs * chore(annotation): excluding depth from the annotation pipeline * feat(robots): adding depth support in compatible follower robots * feat(LeSadKiwi): excluding LeKiwi from depth support (for now) * chore(fail): removing misplaced file * chore(fail): removing misplaced file * fix(remove ffv1): removing ffv1 as it does not support MP4 * docs(cheat sheet): adding depth and video encoding to the cheat sheet * fix(lossless): tuning depth encoding parameters for lossless depth storage * test(fix): fixing failing tests * depth(ZMQ): excluding ZMQ from depth support * Revert "depth(ZMQ): excluding ZMQ from depth support" This reverts commit `b95cf4e4c2`. * fix(image transforms): excluding depth frames from images transforms * fix(typo): typo * fix(stats): fixing stats computation for depth frames * fix(TIFF vs. pytorch): adding an extra uint16 to float32 conversion for depth maps stored as raw TIFF images * fix(typos): fixing typos * test(dtype): fixing stats computation typing tests --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Wensi Ai <wsai@stanford.edu>	2026-06-27 14:21:21 +02:00
Khalil Meftah	6a788fbdb0	Add inline offline validation with train/eval split (#3824 ) * refactor(training): rename eval_freq to env_eval_freq - Rename eval_freq to env_eval_freq to distinguish sim environment evaluation from offline loss evaluation. * feat(training): add inline offline validation with train/eval split - Add eval_split config for balanced per-task holdout - Add eval_steps for periodic inline eval loss computation - Add max_eval_samples to cap eval cost * fix(datasets): remap absolute indices in __getitem__ for filtered datasets * fix(train): vectorize eval subset selection for max_eval_samples * fix(datasets): Move the remapping into EpisodeAwareSampler via absolute_to_relative_idx * fix(validation): add eval_split range check and eval_steps warning Validate eval_split is in [0.0, 1.0) to prevent garbage splits from out-of-range values. Raise when eval_steps > 0 but eval_split is 0.0 since no offline eval will run. * fix(train): prepare eval dataloader with accelerator for multi-GPU Prepare eval_dataloader through accelerator.prepare() so eval data is sharded across ranks instead of duplicated. Reduce eval_loss across ranks with mean reduction for consistent logging. * fix(test): rename eval_freq to env_eval_freq for multi-GPU training	2026-06-25 15:31:24 +02:00
Khalil Meftah	c3f180e115	refactor(policies): clean MolmoAct2 to follow EO1/TOPReward patterns (#3724 ) Align the MolmoAct2 implementation with lerobot codebase conventions: - Rename hf_model/ to molmoact2_hf_model/ - Slim config: move all I/O and runtime logic to modeling - Remove blanket from 8 vendored files, fix 66 lint issues - Deduplicate _hf_token() and _resolve_checkpoint_location() - Make huggingface_hub imports lazy - Remove custom MolmoAct2CosineDecayWithWarmupSchedulerConfig, use base class - Extract 13 static/classmethods from MolmoAct2Policy to free functions - Replace print() with logger in vendored action_tokenizer - Add module docstrings, class docstring, and key method docstrings - Add module-level loggers to modeling and processor - Fix docs: pip to uv install, deduplicate README symlink - Remove shebangs from all files	2026-06-25 14:19:35 +02:00
Steven Palma	b4e454c0ff	feat(utils): display-independent keyboard controls for recording (Wayland / headless / macOS) (#3875 ) * feat(utils): headless keyboard control * refactor(utils): consolidate keyboard listener creation * fix(rollout): remove import require guard for pynput --------- Co-authored-by: Leo Toff <leo@toff.dev> Co-authored-by: Stefano Maestri <stefano.maestri@javalinux.it> Co-authored-by: Sahil Chande <85823961+SahilChande@users.noreply.github.com> Co-authored-by: Vinayak Agarwal <63502278+Vinayak-Agarwal-2004@users.noreply.github.com> Co-authored-by: Abdul Rahim Mirani <abdulrahimmirani@gmail.com>	2026-06-25 10:58:39 +02:00
Maxime Ellerbach	73782447f2	feat(train): FSDP checkpoint saving (#3810 ) * feat(train): FSDP checkpoint saving * adding docs for FSDP * adding a test for the fsdp checkpoint path * cleanup * fixing final upload to hub * refactored initial implementation to use torch fsdp api and adding new tests	2026-06-22 13:51:21 +02:00
Caroline Pascal	287c823f13	fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks (#3826 ) * fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks * tests(test): adding new test	2026-06-16 17:58:59 +02:00
Pepijn	58ccc01508	fix(datasets): enforce one parquet row group per episode in v3 data writes (#3807 ) * fix(datasets): enforce one parquet row group per episode in v3 data writes LeRobot v3 data shards must hold exactly one row group per episode so a reader can fetch episode i with pq.ParquetFile(path).read_row_group(i) (a byte-range read) instead of loading the whole shard. The recording writer already does this (one write_table per episode); the aggregate and lerobot-annotate re-write paths instead concatenated many episodes and wrote them in one shot, collapsing the file to a single row group. - io_utils: add write_table_one_row_group_per_episode (one ParquetWriter, one write_table per episode — same pattern as the recording writer); to_parquet_with_hf_images embeds images then writes per-episode row groups; to_parquet_one_row_group_per_episode wraps it for plain frames - aggregate: route non-image data writes through the per-episode writer; leave the episodes-metadata parquet untouched (already one row/episode) - annotate: rewrite shards via the per-episode writer instead of a single bulk pq.write_table - tests: invariant coverage through the aggregate (image + video) and annotate paths No change to on-disk schema, paths, naming, rollover thresholds, or compression. Readers stay backward-compatible (old collapsed files load). * Update src/lerobot/datasets/io_utils.py Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update src/lerobot/datasets/io_utils.py Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(datasets): correct indentation and add strict= in row-group helper The web-edited numpy version of write_table_one_row_group_per_episode had an over-indented line (IndentationError, breaking pre-commit + test collection) and a zip() without strict=. Fix both; behaviour unchanged. --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-06-16 12:15:48 +02:00
Caroline Pascal	38327fdc84	fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion (#3783 ) * fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion when videos are not used * fix(docstrings): improving docstrings Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com> --------- Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-06-15 17:55:52 +02:00
Steven Palma	d576c59afb	refactor(robots): homogenize bi-manual setups implementations (#3772 ) * chore(robots): homogenize bi setups * feat(robots): split openarm mini into single and bi * refactor(robots): mixin for bi classes * docs: update docs	2026-06-15 16:28:54 +02:00
Altman	8515d456be	fix(datasets): avoid uint8 overflow in image stats (#3697 ) * fix(datasets): avoid uint8 overflow in image stats * fix(datasets): promote stats batches dynamically	2026-06-13 12:09:43 +02:00
Mahbod	30790de178	feat(edit-dataset): add `concatenate_videos` opt-out to merge (#3663 ) * feat(edit-dataset): add `concatenate_videos` opt-out to merge When merging datasets, source mp4s are concatenated into shards capped at `video_files_size_in_mb` (default 200 MB). This is great for dataloader throughput but destroys per-episode (or per-source) video boundaries, which is undesirable when you want to inspect, ship, or reuse the individual mp4s. Add a `concatenate_videos: bool = True` knob plumbed through `MergeConfig` → `merge_datasets` → `aggregate_datasets` → `aggregate_videos`. When False, each source mp4 is copied 1:1 to its own destination mp4 with no re-muxing, so the merge preserves source video boundaries. Usage: lerobot-edit-dataset \ --new_repo_id user/merged \ --operation.type=merge \ --operation.repo_ids "['user/a', 'user/b']" \ --operation.concatenate_videos=false Defaults are unchanged; the dataloader path is unaffected because the `episodes.parquet` `from_timestamp`/`to_timestamp` index keeps working regardless of whether each mp4 holds one or many episodes. * feat(edit-dataset): extend concatenate opt-out to data files Following review, add a concatenate_data flag mirroring concatenate_videos, threaded through MergeConfig, merge_datasets, aggregate_datasets, aggregate_data and append_or_create_parquet_file. Metadata index files still always concatenate. Also trim the verbose docstrings and comments since the names are self-explanatory, and extend the existing merge test to cover data files.	2026-06-12 20:05:04 +02:00
Pepijn	cec8ee0be6	feat: language annotation pipeline (#3471 ) Steerable annotation pipeline (lerobot-annotate) that populates the language_persistent and language_events columns introduced in PR 1 (#3467) directly into data/chunk-/file-.parquet. This is PR 2 of the three-PR plan: PR 1 (Add extensive language support #3467): schema + DSL + rendering, base of this PR PR 2 (this PR): annotation pipeline writing into PR 1's columns PR 3: model with language prediction and runtime A VLM (Qwen-VL family, served on vLLM) watches each episode's video and emits grounded language annotations: subtasks, plans, memory, task rephrasings, interjections + speech, and per-camera VQA. The pipeline is built for production annotation at scale — single-camera grounding, embedded-frame inputs, a describe-then-segment grounding flow, and a deterministic full-episode coverage guarantee — informed by Scale's dense-captioning findings (representation > sampling, rules > reasoning, model capacity is the biggest lever, two-pass systems compound errors)	2026-06-12 15:12:33 +02:00

1 2 3 4 5 ...

397 Commits