Commit Graph

1542 Commits

Author SHA1 Message Date
CarolinePascal 76f79f3955 docs(depth): improving depth maps docs 2026-06-16 18:02:58 +02:00
CarolinePascal 9e994baa04 chore(format): formatting code 2026-06-16 18:02:56 +02:00
CarolinePascal 6fd911ebb9 test(depth encoding): updating and cleaning video/depth encoding tests 2026-06-16 18:02:31 +02:00
CarolinePascal f712698272 test(depth): cleaning up depth tests 2026-06-16 18:02:31 +02:00
CarolinePascal c2416ecbcb feat(output unit): adding support for output unit specification at dataset reading/training time
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
2026-06-16 18:02:31 +02:00
CarolinePascal 6aa50cc1e5 fix(depth units): fixing depth units output for the realsense cameras 2026-06-16 18:02:31 +02:00
CarolinePascal e17adce3ba fix(is_depth): adding missing doctrings and is_depth arguments in video decoding functions
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
2026-06-16 18:02:31 +02:00
CarolinePascal f7010ff66c fix(typo): fixing typo 2026-06-16 18:02:31 +02:00
CarolinePascal f7ee453de7 fix(from_video_info): fixing early validation issue in from_video_info 2026-06-16 18:02:31 +02:00
CarolinePascal ca7168f413 test(cleaning): cleaning up tests 2026-06-16 18:02:28 +02:00
CarolinePascal ec6264d768 test(aggregate): extending aggregation tests to depth frames 2026-06-16 18:01:55 +02:00
CarolinePascal d93a58a8b8 feat(tools): adding depth support in LeRobotDataset edition tools 2026-06-16 18:01:50 +02:00
CarolinePascal 92497dfcd8 feat(batched dequantization): optimizing dequantize_depth for torch based batched dequantization 2026-06-16 18:01:18 +02:00
CarolinePascal 263108d6c1 fix(TIFF): add missing quantization and cleanup for TIFF files 2026-06-16 18:01:18 +02:00
CarolinePascal a925d20ce4 fix(typo): fixing typo 2026-06-16 18:01:18 +02:00
CarolinePascal 1f024ea3bf fix(normalization): restricting 255 normalization to non depth/uint8 images only 2026-06-16 18:01:18 +02:00
CarolinePascal d5f67cc7fc fix(realsense): fixing typo in realsense serial number 2026-06-16 18:01:18 +02:00
CarolinePascal 9ab8c98494 tests(typos): fixing typos in tests 2026-06-16 18:01:18 +02:00
CarolinePascal a561183442 fix(info): fixing info metadata update when is_depth_map was set 2026-06-16 18:01:18 +02:00
CarolinePascal 305b8d64b2 fix(pre-commit): fixing mutable defautl value 2026-06-16 18:01:18 +02:00
CarolinePascal 0a624a5cf5 feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file. 2026-06-16 18:01:18 +02:00
CarolinePascal d044ead377 feat(pix_fmt channels): use PyAv to check get pixel formats number of channels 2026-06-16 18:01:18 +02:00
CarolinePascal e425fcb61a tests(depth): adding new tests for depth integration validation 2026-06-16 18:01:17 +02:00
CarolinePascal f08a9aea71 test(fix): fixing exisiting tests to still work with latest features 2026-06-16 18:01:17 +02:00
CarolinePascal 7d97b55cc4 chore(typos): fixing typos 2026-06-16 18:01:17 +02:00
CarolinePascal edbd8c6f82 fix(plumbing): fixing missing parts in the depth maps pipeline 2026-06-16 18:01:17 +02:00
CarolinePascal 615954b80b fix(stop_event): fixing stop_event race condition in camera classes 2026-06-16 18:01:17 +02:00
CarolinePascal 1c0fdfdb4b feat(is_depth): simplifying is_depth nested name + legacy support 2026-06-16 18:01:17 +02:00
CarolinePascal 1c3ebd475f feat(depth shape): ensuring depth maps shape is always including the channel 2026-06-16 18:01:17 +02:00
CarolinePascal c655814788 chore(format): format code 2026-06-16 18:01:17 +02:00
CarolinePascal a72ab14f89 feat(depth maps writer): adding support for raw depth maps recording with image writer 2026-06-16 18:01:17 +02:00
CarolinePascal 882074d707 feat(viz): render depth observations as rr.DepthImage in Viridis 2026-06-16 18:01:17 +02:00
CarolinePascal 4ae2f9f375 feat(record): plumb DepthEncoderConfig through lerobot-record 2026-06-16 18:01:17 +02:00
CarolinePascal 26099b6e03 feat(robots/so_follower): emit + populate depth keys when use_depth 2026-06-16 18:01:16 +02:00
CarolinePascal 6b395dfb24 feat(features): route 2D camera shapes to observation.depth.<key> 2026-06-16 18:01:16 +02:00
CarolinePascal 1cbabfe9a4 feat(cameras/realsense): expose async depth in metric meters 2026-06-16 18:01:16 +02:00
CarolinePascal 4744f4b913 feat(depth): wire DatasetReader to decode_depth_frames 2026-06-16 18:01:16 +02:00
CarolinePascal 9568e68b28 feat(depth): wire StreamingVideoEncoder + writer to depth encoder 2026-06-16 18:01:16 +02:00
CarolinePascal 10941c31f6 feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter 2026-06-16 18:01:16 +02:00
CarolinePascal a6882a048a feat(depth): extend quantization tools to better fit the encoding/decoding pipeline 2026-06-16 18:01:16 +02:00
CarolinePascal eb2b7d6dc3 feat(depth): persist depth metadata 2026-06-16 18:01:16 +02:00
CarolinePascal f7f7b8c7f8 feat(video): add ffv1 to supported codecs 2026-06-16 18:01:16 +02:00
CarolinePascal d58a324da4 feat(depth): add depth quantization helpers and tests 2026-06-16 18:01:16 +02:00
Caroline Pascal 287c823f13 fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks (#3826)
* fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks

* tests(test): adding new test
2026-06-16 17:58:59 +02:00
Pepijn 58ccc01508 fix(datasets): enforce one parquet row group per episode in v3 data writes (#3807)
* fix(datasets): enforce one parquet row group per episode in v3 data writes

LeRobot v3 data shards must hold exactly one row group per episode so a
reader can fetch episode i with pq.ParquetFile(path).read_row_group(i)
(a byte-range read) instead of loading the whole shard. The recording
writer already does this (one write_table per episode); the aggregate
and lerobot-annotate re-write paths instead concatenated many episodes
and wrote them in one shot, collapsing the file to a single row group.

- io_utils: add write_table_one_row_group_per_episode (one ParquetWriter,
  one write_table per episode — same pattern as the recording writer);
  to_parquet_with_hf_images embeds images then writes per-episode row
  groups; to_parquet_one_row_group_per_episode wraps it for plain frames
- aggregate: route non-image data writes through the per-episode writer;
  leave the episodes-metadata parquet untouched (already one row/episode)
- annotate: rewrite shards via the per-episode writer instead of a single
  bulk pq.write_table
- tests: invariant coverage through the aggregate (image + video) and
  annotate paths

No change to on-disk schema, paths, naming, rollover thresholds, or
compression. Readers stay backward-compatible (old collapsed files load).

* Update src/lerobot/datasets/io_utils.py

Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* Update src/lerobot/datasets/io_utils.py

Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>

* fix(datasets): correct indentation and add strict= in row-group helper

The web-edited numpy version of write_table_one_row_group_per_episode had an
over-indented line (IndentationError, breaking pre-commit + test collection)
and a zip() without strict=. Fix both; behaviour unchanged.

---------

Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
2026-06-16 12:15:48 +02:00
Caroline Pascal 38327fdc84 fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion (#3783)
* fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion when videos are not used

* fix(docstrings): improving docstrings

Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>

---------

Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>
2026-06-15 17:55:52 +02:00
Steven Palma 9555efc02c chore(dependencies): update uv.lock (#3595)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-06-15 16:29:44 +02:00
Steven Palma d576c59afb refactor(robots): homogenize bi-manual setups implementations (#3772)
* chore(robots): homogenize bi setups

* feat(robots): split openarm mini into single and bi

* refactor(robots): mixin for bi classes

* docs: update docs
2026-06-15 16:28:54 +02:00
Altman 8515d456be fix(datasets): avoid uint8 overflow in image stats (#3697)
* fix(datasets): avoid uint8 overflow in image stats

* fix(datasets): promote stats batches dynamically
2026-06-13 12:09:43 +02:00
Mahbod 30790de178 feat(edit-dataset): add concatenate_videos opt-out to merge (#3663)
* feat(edit-dataset): add `concatenate_videos` opt-out to merge

When merging datasets, source mp4s are concatenated into shards capped at
`video_files_size_in_mb` (default 200 MB). This is great for dataloader
throughput but destroys per-episode (or per-source) video boundaries,
which is undesirable when you want to inspect, ship, or reuse the
individual mp4s.

Add a `concatenate_videos: bool = True` knob plumbed through
`MergeConfig` → `merge_datasets` → `aggregate_datasets` → `aggregate_videos`.
When False, each source mp4 is copied 1:1 to its own destination mp4 with
no re-muxing, so the merge preserves source video boundaries.

Usage:

    lerobot-edit-dataset \
        --new_repo_id user/merged \
        --operation.type=merge \
        --operation.repo_ids "['user/a', 'user/b']" \
        --operation.concatenate_videos=false

Defaults are unchanged; the dataloader path is unaffected because the
`episodes.parquet` `from_timestamp`/`to_timestamp` index keeps working
regardless of whether each mp4 holds one or many episodes.

* feat(edit-dataset): extend concatenate opt-out to data files

Following review, add a concatenate_data flag mirroring concatenate_videos,
threaded through MergeConfig, merge_datasets, aggregate_datasets, aggregate_data
and append_or_create_parquet_file. Metadata index files still always concatenate.

Also trim the verbose docstrings and comments since the names are
self-explanatory, and extend the existing merge test to cover data files.
2026-06-12 20:05:04 +02:00