mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-28 13:47:02 +00:00
3dd19d043e
* feat(depth): add depth quantization helpers and tests
* feat(video): add ffv1 to supported codecs
* feat(depth): persist depth metadata
* feat(depth): extend quantization tools to better fit the encoding/decoding pipeline
* feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter
* feat(depth): wire StreamingVideoEncoder + writer to depth encoder
* feat(depth): wire DatasetReader to decode_depth_frames
* feat(cameras/realsense): expose async depth in metric meters
* feat(features): route 2D camera shapes to observation.depth.<key>
* feat(robots/so_follower): emit + populate depth keys when use_depth
* feat(record): plumb DepthEncoderConfig through lerobot-record
* feat(viz): render depth observations as rr.DepthImage in Viridis
* feat(depth maps writer): adding support for raw depth maps recording with image writer
* chore(format): format code
* feat(depth shape): ensuring depth maps shape is always including the channel
* feat(is_depth): simplifying is_depth nested name + legacy support
* fix(stop_event): fixing stop_event race condition in camera classes
* fix(plumbing): fixing missing parts in the depth maps pipeline
* chore(typos): fixing typos
* test(fix): fixing exisiting tests to still work with latest features
* tests(depth): adding new tests for depth integration validation
* feat(pix_fmt channels): use PyAv to check get pixel formats number of channels
* feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file.
* fix(pre-commit): fixing mutable defautl value
* fix(info): fixing info metadata update when is_depth_map was set
* tests(typos): fixing typos in tests
* fix(realsense): fixing typo in realsense serial number
* fix(normalization): restricting 255 normalization to non depth/uint8 images only
* fix(typo): fixing typo
* fix(TIFF): add missing quantization and cleanup for TIFF files
* feat(batched dequantization): optimizing dequantize_depth for torch based batched dequantization
* feat(tools): adding depth support in LeRobotDataset edition tools
* test(aggregate): extending aggregation tests to depth frames
* test(cleaning): cleaning up tests
* fix(from_video_info): fixing early validation issue in from_video_info
* fix(typo): fixing typo
* fix(is_depth): adding missing doctrings and is_depth arguments in video decoding functions
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* fix(depth units): fixing depth units output for the realsense cameras
* feat(output unit): adding support for output unit specification at dataset reading/training time
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* test(depth): cleaning up depth tests
* test(depth encoding): updating and cleaning video/depth encoding tests
* chore(format): formatting code
* docs(depth): improving depth maps docs
* test(fix): fixing depth tests
* test(dataset tools): adding missing tests for new dataset edition tools features
* chore(format): formatting code
* fix(pyav check): fixing PyAV option validation for integer codec options by normalizing
numeric values before calling `is_integer()`
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* docs(mermaid): fixing mermaid diagram
* fix(rebase): rebase follow up corrections
* feat(dataset tools): adding missing docstrings and features for depth fill support in dataset edition tools
* docs(docstring): updating docstrings
* docs(dataset tools): updating docs
* fix(save images): fixing image saving in dataset tools
* fix(update video info): fixing update video info logic to match the recording and editing use cases
* test(reencode): fixing reencoding monkeypatch
* fix(review): add Claude review
* chore(format): format code
* fix(update video info): ditching the differentiated approahces for video info update - video info are always updated unless for preserved keys.
* chore(rebase): fixing rebase merge conflicts
* test(visualization): fixing visualization tests
* feat(docstrings): adding explicit docstring for encoding parameters. Docstrigns will now show up as description in the CLI --help.
* feat(mm as default): adding a global DEFAULT_DEPTH_UNIT variable setting mm as default depth unit
* fix(RGB <-> camera): renaming camera_encoder to rgb_encoder for clarity
* chore(TODO): removing deprecated TODO
* doc(write_u16_plane): improving docstrings for write_u16_plane
* feat(units): adding constants for depth frames units (m and mm)
* fix(spam): replacing spamming warning but a debug log
* feat(leagcy metadata): adding automatic metadata update for legacy 'video.is_depth_map' feature
* fix(copy&reindex): fixing metadat reshaping for single channel frames
* fix(ImageNet): excluding dpeth frames from ImageNet stats
* fix(PyAV container seek): fixing initial PyAV container seek to be robust againsy codec choice
* feat(lerobot-dataset-viz): adding support for depth in lerobot-dataset-viz
* fix(compress): removing rerun compression for DepthImages
* fix(signle channel squeeze): fixing single channel squeezing
* chore(format): format code
* fix(streaming): adding support for dequantization in streaming_dataset.py
* refactor(read depth): factorizing depth reading methods for realsense camera and adding support for depth-only usage
* chore(renaming): fixing missed RGBEncoderConfig renamings
* docs(renaming): reflecting renamings in a clearer way in the docs
* chore(annotation): excluding depth from the annotation pipeline
* feat(robots): adding depth support in compatible follower robots
* feat(LeSadKiwi): excluding LeKiwi from depth support (for now)
* chore(fail): removing misplaced file
* chore(fail): removing misplaced file
* fix(remove ffv1): removing ffv1 as it does not support MP4
* docs(cheat sheet): adding depth and video encoding to the cheat sheet
* fix(lossless): tuning depth encoding parameters for lossless depth storage
* test(fix): fixing failing tests
* depth(ZMQ): excluding ZMQ from depth support
* Revert "depth(ZMQ): excluding ZMQ from depth support"
This reverts commit b95cf4e4c2.
* fix(image transforms): excluding depth frames from images transforms
* fix(typo): typo
* fix(stats): fixing stats computation for depth frames
* fix(TIFF vs. pytorch): adding an extra uint16 to float32 conversion for depth maps stored as raw TIFF images
* fix(typos): fixing typos
* test(dtype): fixing stats computation typing tests
---------
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Wensi Ai <wsai@stanford.edu>
248 lines
10 KiB
Python
248 lines
10 KiB
Python
"""Tests for the depth-integration feature.
|
||
|
||
Covers:
|
||
- ``depth_utils`` quantize/dequantize round-trips and backend agreement.
|
||
- Image-writer support for single-channel depth.
|
||
- Hardware-feature → depth flag routing.
|
||
- Feature-to-file-format routing through the dataset writer.
|
||
|
||
Depth metadata detection on ``LeRobotDatasetMetadata.depth_keys`` lives in
|
||
``test_dataset_metadata.py``. Depth video encoding/decoding lives in
|
||
``test_video_encoding.py``.
|
||
"""
|
||
|
||
from pathlib import Path
|
||
|
||
import pytest
|
||
|
||
pytest.importorskip("av", reason="av is required (install lerobot[dataset])")
|
||
|
||
import av
|
||
import numpy as np
|
||
import PIL.Image
|
||
import torch
|
||
|
||
from lerobot.configs import DepthEncoderConfig
|
||
from lerobot.configs.video import (
|
||
DEFAULT_DEPTH_MAX,
|
||
DEFAULT_DEPTH_MIN,
|
||
DEPTH_METER_UNIT,
|
||
DEPTH_MILLIMETER_UNIT,
|
||
DEPTH_QMAX,
|
||
)
|
||
from lerobot.datasets.depth_utils import dequantize_depth, quantize_depth
|
||
from lerobot.datasets.image_writer import image_array_to_pil_image, write_image
|
||
from tests.fixtures.constants import (
|
||
DEFAULT_FPS,
|
||
DUMMY_CAMERA_FEATURES,
|
||
DUMMY_CAMERA_FEATURES_WITH_DEPTH,
|
||
DUMMY_CHW,
|
||
DUMMY_DEPTH_CAMERA_FEATURES,
|
||
DUMMY_REPO_ID,
|
||
)
|
||
from tests.fixtures.dataset_factories import add_frames
|
||
|
||
_, H, W = DUMMY_CHW
|
||
|
||
|
||
def _depth_metres_ramp() -> np.ndarray:
|
||
"""Linearly-spaced float32 depth in metres covering the default range."""
|
||
return np.linspace(DEFAULT_DEPTH_MIN, DEFAULT_DEPTH_MAX, H * W, dtype=np.float32).reshape(H, W)
|
||
|
||
|
||
# ── 1. Quantize / dequantize round-trips ──────────────────────────────
|
||
|
||
|
||
class TestQuantizeDequantize:
|
||
"""Numerical contract of ``quantize_depth`` / ``dequantize_depth``."""
|
||
|
||
@pytest.mark.parametrize("use_log", [False, True])
|
||
@pytest.mark.parametrize("output_unit", [DEPTH_METER_UNIT, DEPTH_MILLIMETER_UNIT])
|
||
@pytest.mark.parametrize("output_channel_last", [False, True])
|
||
def test_roundtrip(self, use_log, output_unit, output_channel_last):
|
||
"""quantize → dequantize recovers depth; layout and unit are honored."""
|
||
depth = _depth_metres_ramp()
|
||
quantized = quantize_depth(depth, use_log=use_log, video_backend=None)
|
||
recovered = dequantize_depth(
|
||
quantized,
|
||
use_log=use_log,
|
||
output_unit=output_unit,
|
||
output_tensor=False,
|
||
output_channel_last=output_channel_last,
|
||
)
|
||
|
||
expected_shape = (H, W, 1) if output_channel_last else (1, H, W)
|
||
assert recovered.shape == expected_shape
|
||
|
||
recovered_m = recovered.astype(np.float32)
|
||
if output_unit == DEPTH_MILLIMETER_UNIT:
|
||
recovered_m = recovered_m / 1000.0
|
||
recovered_2d = recovered_m[..., 0] if output_channel_last else recovered_m[0]
|
||
|
||
if use_log:
|
||
# Log mode: tighter near-range error than far-range (the whole point).
|
||
near = depth < 1.0
|
||
far = depth > 8.0
|
||
err_near = np.abs(recovered_2d[near] - depth[near])
|
||
err_far = np.abs(recovered_2d[far] - depth[far])
|
||
assert err_near.mean() < err_far.mean()
|
||
else:
|
||
# Linear mode: bounded by quant step + 1 mm of unit-conversion rounding.
|
||
tol = (DEFAULT_DEPTH_MAX - DEFAULT_DEPTH_MIN) / DEPTH_QMAX + 1e-3
|
||
np.testing.assert_allclose(recovered_2d, depth, atol=tol)
|
||
|
||
@pytest.mark.parametrize("use_log", [False, True])
|
||
@pytest.mark.parametrize("output_unit", [DEPTH_METER_UNIT, DEPTH_MILLIMETER_UNIT])
|
||
def test_numpy_torch_agree(self, use_log, output_unit):
|
||
"""Batched torch path produces the same values as the numpy path."""
|
||
batch_size = 3
|
||
per_frame = np.linspace(0, DEPTH_QMAX, H * W, dtype=np.uint16).reshape(H, W)
|
||
batch_np = np.broadcast_to(per_frame[None, None, ...], (batch_size, 1, H, W)).copy()
|
||
batch_t = torch.from_numpy(batch_np.astype(np.int32)) # torch.uint16 support is patchy.
|
||
|
||
ref = dequantize_depth(batch_np, use_log=use_log, output_unit=output_unit, output_tensor=False)
|
||
out = dequantize_depth(batch_t, use_log=use_log, output_unit=output_unit, output_tensor=True)
|
||
|
||
assert isinstance(out, torch.Tensor)
|
||
assert out.shape == (batch_size, 1, H, W)
|
||
# ``m``: float32 noise (~10 µm in log mode, after ``exp``) — still 200× below the ~2 mm quant step.
|
||
# ``mm`` + tensor stays in float32 (no uint16 round-trip), so allow 1 mm slop.
|
||
atol = 1e-5 if output_unit == DEPTH_METER_UNIT else 1.0
|
||
np.testing.assert_allclose(out.cpu().numpy().astype(np.float64), ref.astype(np.float64), atol=atol)
|
||
|
||
@pytest.mark.parametrize(
|
||
"input_shape,output_shape",
|
||
[
|
||
((H, W), (1, H, W)),
|
||
((1, H, W), (1, H, W)),
|
||
((H, W, 1), (1, H, W)),
|
||
((3, 1, H, W), (3, 1, H, W)),
|
||
((3, H, W, 1), (3, 1, H, W)),
|
||
],
|
||
)
|
||
def test_input_layouts_accepted(self, input_shape, output_shape):
|
||
"""All documented input layouts decode to the channel-first default."""
|
||
quantized = np.full(input_shape, DEPTH_QMAX // 2, dtype=np.uint16)
|
||
out = dequantize_depth(quantized, output_unit=DEPTH_METER_UNIT, output_tensor=False)
|
||
assert out.shape == output_shape
|
||
|
||
def test_pyav_frame_roundtrip(self):
|
||
"""quantize → av.VideoFrame → dequantize works."""
|
||
depth = _depth_metres_ramp()
|
||
frame = quantize_depth(depth, use_log=False, video_backend="pyav")
|
||
assert isinstance(frame, av.VideoFrame)
|
||
|
||
recovered = dequantize_depth(frame, use_log=False, output_unit=DEPTH_METER_UNIT, output_tensor=False)
|
||
assert recovered.shape == (1, H, W)
|
||
tol = (DEFAULT_DEPTH_MAX - DEFAULT_DEPTH_MIN) / DEPTH_QMAX + 1e-3
|
||
np.testing.assert_allclose(recovered[0], depth, atol=tol)
|
||
|
||
def test_invalid_log_params_raises(self):
|
||
with pytest.raises(ValueError, match=r"depth_min \+ shift must be positive"):
|
||
quantize_depth(_depth_metres_ramp(), depth_min=1.0, shift=-2.0, use_log=True, video_backend=None)
|
||
|
||
|
||
# ── 2. Image writer depth support ─────────────────────────────────────
|
||
|
||
|
||
class TestImageWriterDepth:
|
||
"""``image_array_to_pil_image`` and ``write_image`` for depth maps."""
|
||
|
||
@pytest.mark.parametrize("dtype,expected_mode", [(np.uint16, "I;16"), (np.float32, "F")])
|
||
@pytest.mark.parametrize("shape", [(H, W), (H, W, 1), (1, H, W)])
|
||
def test_pil_depth_modes_and_squeeze(self, dtype, expected_mode, shape):
|
||
"""Single-channel depth converts to PIL with the right mode and (W, H) size."""
|
||
arr = np.zeros(shape, dtype=dtype)
|
||
img = image_array_to_pil_image(arr)
|
||
assert img.mode == expected_mode
|
||
assert img.size == (W, H)
|
||
|
||
def test_write_image_tiff_roundtrip(self, tmp_path):
|
||
"""uint16 depth round-trips through .tiff."""
|
||
arr = np.arange(H * W, dtype=np.uint16).reshape(H, W)
|
||
fpath = tmp_path / "depth.tiff"
|
||
write_image(arr, fpath)
|
||
with PIL.Image.open(fpath) as loaded:
|
||
recovered = np.array(loaded)
|
||
np.testing.assert_array_equal(recovered, arr)
|
||
|
||
|
||
# ── 3. Hardware-feature → depth flag ──────────────────────────────────
|
||
|
||
|
||
class TestHwToDatasetFeaturesDepth:
|
||
"""``hw_to_dataset_features`` flags single-channel cameras as depth."""
|
||
|
||
@pytest.mark.parametrize("channels,is_depth", [(1, True), (3, False)])
|
||
def test_depth_marker_by_channels(self, channels, is_depth):
|
||
from lerobot.utils.feature_utils import hw_to_dataset_features
|
||
|
||
features = hw_to_dataset_features({"cam": (480, 640, channels)}, prefix="observation")
|
||
assert features["observation.images.cam"]["info"]["is_depth_map"] is is_depth
|
||
|
||
def test_invalid_channel_count_raises(self):
|
||
from lerobot.utils.feature_utils import hw_to_dataset_features
|
||
|
||
with pytest.raises(ValueError, match="Expected a 3-tuple"):
|
||
hw_to_dataset_features({"cam": (480, 640, 2)}, prefix="observation")
|
||
|
||
|
||
# ── 4. Feature-to-file-format routing ────────────────────────────────
|
||
|
||
|
||
# Keys derived from DUMMY_CAMERA_FEATURES_WITH_DEPTH; pick one RGB and the depth camera.
|
||
RGB_KEY = next(iter(DUMMY_CAMERA_FEATURES))
|
||
DEPTH_KEY = next(iter(DUMMY_DEPTH_CAMERA_FEATURES))
|
||
|
||
|
||
class TestFeatureFileRouting:
|
||
"""Depth vs RGB features route to the correct file format."""
|
||
|
||
NUM_FRAMES = 5
|
||
|
||
def test_image_mode_depth_tiff_rgb_png(self, tmp_path, features_factory):
|
||
"""Without video encoding: depth → .tiff, RGB → .png."""
|
||
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
||
|
||
features = features_factory(camera_features=DUMMY_CAMERA_FEATURES_WITH_DEPTH, use_videos=False)
|
||
dataset = LeRobotDataset.create(
|
||
repo_id=DUMMY_REPO_ID,
|
||
fps=DEFAULT_FPS,
|
||
features=features,
|
||
root=tmp_path / "ds",
|
||
use_videos=False,
|
||
)
|
||
|
||
add_frames(dataset, num_frames=self.NUM_FRAMES)
|
||
|
||
buf = dataset.writer.episode_buffer
|
||
assert all(Path(p).suffix == ".tiff" for p in buf[DEPTH_KEY])
|
||
assert all(Path(p).suffix == ".png" for p in buf[RGB_KEY])
|
||
|
||
dataset.save_episode()
|
||
dataset.finalize()
|
||
|
||
def test_video_mode_depth_uses_depth_encoder(self, tmp_path, features_factory):
|
||
"""With streaming video encoding: depth → DepthEncoderConfig, RGB does not."""
|
||
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
||
|
||
features = features_factory(camera_features=DUMMY_CAMERA_FEATURES_WITH_DEPTH, use_videos=True)
|
||
dataset = LeRobotDataset.create(
|
||
repo_id=DUMMY_REPO_ID,
|
||
fps=DEFAULT_FPS,
|
||
features=features,
|
||
root=tmp_path / "ds",
|
||
use_videos=True,
|
||
streaming_encoding=True,
|
||
)
|
||
|
||
add_frames(dataset, num_frames=self.NUM_FRAMES)
|
||
|
||
encoder = dataset.writer._streaming_encoder
|
||
assert encoder is not None
|
||
assert isinstance(encoder._threads[DEPTH_KEY].video_encoder, DepthEncoderConfig)
|
||
assert not isinstance(encoder._threads[RGB_KEY].video_encoder, DepthEncoderConfig)
|
||
|
||
dataset.save_episode()
|
||
dataset.finalize()
|