Compare commits

...

10 Commits

Author SHA1 Message Date
CarolinePascal e36b0368d4 tests(update): updating tests 2026-07-03 13:49:38 +02:00
CarolinePascal 67b18d87b2 fix(debug log): avoinding spamming warning log with debug log 2026-07-03 13:37:02 +02:00
Mahbod 98052e5f6e feat(datasets): warn when skipping stats for zero-width features
Per review, log a warning when compute_episode_stats skips a feature with a
zero-width shape, so users know stats were intentionally not computed for it.
2026-07-03 13:35:22 +02:00
Mahbod f59260f4aa fix(datasets): skip zero-width features in compute_episode_stats
`LeRobotDataset.save_episode()` raised
`ValueError: cannot reshape array of size 0 into shape (0)` whenever a
declared non-string feature had a zero-width dimension (e.g. `shape=(0,)`).
The root cause was `compute_episode_stats` running stats on every
non-string/language feature, then `RunningQuantileStats.update` calling
`batch.reshape(-1, batch.shape[-1])` on the empty array.

Skip features whose declared `shape` contains a zero dim, mirroring the
existing skip for `string` / `language` dtype features.

Fixes #3654
2026-07-03 13:35:22 +02:00
Mahbod fc262fbc06 fix(datasets): allow zero-width features in get_hf_features_from_features
Setting a 1-D feature with shape=(0,) builds datasets.Sequence(length=0, ...),
which pyarrow rejects with ArrowInvalid: list_size needs to be a strict
positive integer when datasets.Dataset.from_dict(...) is called inside
save_episode. Use length=-1 (variable-length) for zero-width 1-D shapes.

Fixes the second half of #3654 (the first half is #3664, in compute_episode_stats).
2026-07-03 13:35:22 +02:00
Nikodem Bartnik 911734ec9c Docs/improve HF jobs documentation (#3909)
* improve hf jobs docs

* Update docs/source/hardware_guide.mdx

Co-authored-by: Nicolas Rabault <rabault.nicolas@gmail.com>
Signed-off-by: Nikodem Bartnik <39432165+NikodemBartnik@users.noreply.github.com>

---------

Signed-off-by: Nikodem Bartnik <39432165+NikodemBartnik@users.noreply.github.com>
Co-authored-by: Nicolas Rabault <rabault.nicolas@gmail.com>
2026-07-03 11:39:16 +02:00
Pepijn 07285677a3 fix(train): drive Accelerate mixed precision from policy.dtype (#3912)
* fix(train): drive Accelerate mixed precision from policy.dtype

`accelerator.autocast()` was always a no-op because `mixed_precision`
was never set, so `--policy.dtype=bfloat16` only cast the model params
(via the policy) while autocast-eligible ops still ran in fp32/tf32.

Map the active policy's `dtype` onto Accelerate's `mixed_precision`
(bfloat16 -> bf16, float16 -> fp16, float32 -> no) so autocast is active
for bf16/fp16 and stays full precision for float32. Policies without a
string `dtype` field fall back to Accelerate's launcher default, so
existing behavior is preserved.

* style(train): condense mixed-precision comment to one line
2026-07-02 19:15:19 +02:00
Caroline Pascal 7ae12124b0 fix(save codec options): making sure codec options are always set via set_if (#3910)
* fix(save codec options): making sure codec options are always safely set through `set_if`

* tests(update): updating tests
2026-07-02 15:29:14 +02:00
Caroline Pascal c746ca2df2 fix(depth unit): adding input depth unit storage in the dataset metadata (#3899)
* fix(depth unit): storing raw depth units in the dataset metadata for correct depth statistics and depth raw frames handling. The unit is stored as a string ("m","mm") under "depth_unit" at the same level as "is_depth_map". Unit is inferred from the depth frame type.

* feat(raw frame unit): adapting dataset reader so that raw depth frames are scaled according to the requested unit

* feat(stats units): rescaling stats when loading a dataset so that the stats are given in the requested unit

* tests(unit): adapting and extending depth tests to units manipulations

* chore(format): formating code

* feat(warning): adding a warning when depth unit is not specified in the dataset

* chore(infer_depth_unit): moving the depth unit inference utility in a more accessible location

* feat(rerun unit): adding correct depth unit display for rerun (foxglove does not support units yet)

* feat(unit getter): adding a proper output_depth_unit getter to LeRobotDataset for cleaner integration

* fix(streaming dataset): extending support for depth units to streaming datasets

* test(rerun): fixing rerun tests
2026-07-02 11:53:13 +02:00
Caroline Pascal b961d2a8c5 feat(libaom-av1): adding support for libaom-av1 codec (#3898) 2026-07-02 11:03:41 +02:00
21 changed files with 302 additions and 116 deletions
+9 -9
View File
@@ -82,18 +82,18 @@ VRAM is the first filter. Within a tier, pick by budget and availability — the
### Hugging Face Jobs
[Hugging Face Jobs](https://huggingface.co/docs/hub/jobs) lets you run training on managed HF infrastructure, billed by the second. The repo publishes a ready-to-use image: **`huggingface/lerobot-gpu:latest`**, rebuilt **every night at 02:00 UTC from `main`** ([`docker_publish.yml`](https://github.com/huggingface/lerobot/blob/main/.github/workflows/docker_publish.yml)) — so it tracks the current state of the repo, not a tagged release.
[Hugging Face Jobs](https://huggingface.co/docs/hub/jobs) lets you run training on managed HF infrastructure, billed by the second, without owning a GPU. `lerobot-train` submits and streams the job for you — just add `--job.target=<flavor>` to a normal training command:
```bash
hf jobs run --flavor a10g-large huggingface/lerobot-gpu:latest \
bash -c "nvidia-smi && lerobot-train \
--policy.type=act --dataset.repo_id=<USER>/<DATASET> \
--policy.repo_id=<USER>/act_<task> --batch_size=8 --steps=50000"
lerobot-train \
--policy.type=act --dataset.repo_id=<USER>/<DATASET> \
--policy.repo_id=<USER>/act_<task> \
--job.target=a10g-large
```
Notes:
- The leading `nvidia-smi` is a quick sanity check that CUDA is visible inside the container — useful to fail fast if the flavor or driver mismatched.
- The default Job timeout is 30 minutes; pass `--timeout 4h` (or longer) for real training.
- `--flavor` maps onto the table above: `t4-small`/`t4-medium` (T4, ACT only), `l4x1`/`l4x4` (L4 24 GB), `a10g-small/large/largex2/largex4` (A10G 24 GB scaled out), `a100-large` (A100). For the current full catalogue + pricing see [https://huggingface.co/docs/hub/jobs](https://huggingface.co/docs/hub/jobs).
- Prefer not to write the `hf jobs run` wrapper yourself? `lerobot-train` can submit the job for you: just add `--job.target=<flavor>` to a normal training command and it handles dataset upload, log streaming, and the final model push. See the [imitation-learning training guide](./il_robots).
- Run `hf auth login` once before submitting, the job runs under your token.
- `--job.target` maps onto the table above: `t4-small`/`t4-medium` (T4, ACT only), `l4x1`/`l4x4` (L4 24 GB), `a10g-small/large/largex2/largex4` (A10G 24 GB scaled out), `a100-large` (A100). List the current catalogue with pricing via `hf jobs hardware`, or see [https://huggingface.co/docs/hub/jobs](https://huggingface.co/docs/hub/jobs).
- The job defaults to a `2d` (48h) timeout. Override it with `--job.timeout=4h` (or any other valid duration string) to shorten or extend the timeout. The job automatically stops when the command completes.
- For the full walkthrough — dataset upload, checkpoint streaming, resuming a run on a job — see the [imitation-learning training guide](./il_robots#train-using-hugging-face-jobs).
+1 -78
View File
@@ -532,84 +532,7 @@ If your local computer doesn't have a powerful GPU you could utilize Google Cola
Hugging Face jobs let's you easily select hardware and run the training in the cloud. So if you don't have a powerful GPU or you need more VRAM or just want to train a model much faster use HF Jobs! It's pay as you go and you simply pay for each second of use, you can see the pricing and additional information [here](https://huggingface.co/docs/hub/jobs).
> **Tip:** if you just want to launch a standard training run, you can skip building the command below and use the integrated **Train on HF Jobs via `--job.target`** flow described further down — `lerobot-train` then submits the job, uploads a local-only dataset for you, and streams the logs.
To run the training manually use this command:
<hfoptions id="train_with_hf_jobs">
<hfoption id="Command">
```bash
hf jobs run \
--flavor a10g-small \
--timeout 4h \
--secrets HF_TOKEN \
huggingface/lerobot-gpu:latest \
-- \
python -m lerobot.scripts.lerobot_train \
--dataset.repo_id=username/dataset \
--policy.type=act \
--steps=5000 \
--batch_size=16 \
--policy.device=cuda \
--policy.repo_id=username/your_policy \
--log_freq=100
```
</hfoption>
<hfoption id="API example">
<!-- prettier-ignore-start -->
```python
from huggingface_hub import run_job, get_token
run_name = "act_so101_hf_jobs"
dataset_id = "username/dataset"
user_hub_id = "username"
command_args = [
"python", "-m", "lerobot.scripts.lerobot_train",
"--dataset.repo_id", dataset_id,
"--policy.type", "act",
"--steps", "5000",
"--batch_size", "16",
"--num_workers", "4",
"--policy.device", "cuda",
"--log_freq", "100",
"--save_freq", "1000",
"--save_checkpoint", "true",
"--wandb.enable", "false",
"--policy.repo_id", f"{user_hub_id}/{run_name}"
]
print(f"Submitting job '{run_name}' to Hugging Face Infrastructure...")
job_info = run_job(
image="huggingface/lerobot-gpu:latest",
command=command_args,
flavor="a10g-small",
timeout="4h",
secrets={"HF_TOKEN": get_token()}
)
print("\n🚀 Job successfully launched!")
print(f"🔹 Job ID: {job_info.id}")
print(f"🔗 Live UI Dashboard & Logs: {job_info.url}")
```
<!-- prettier-ignore-end -->
</hfoption>
</hfoptions>
You can modify the `--flavor` to use different hardware, for example: `t4-small`, `a100-large`, `h200`. Use `hf jobs hardware` to see the full list with pricing.
Depending on the model you want to train and the hardware you selected you can also modify the `--batch_size` and `--number_of_workers`.
For longer training sessions increase the timeout.
Once the training is started you can go to [Jobs](https://huggingface.co/settings/jobs) and see if your jobs is running as well as all the outputs. Sometimes it takes a few minutes to schedule your job so be patient.
After training the model will be pushed to hub and you can use it as any other model with LeRobot.
#### Train on HF Jobs via `--job.target` (integrated CLI)
`lerobot-train` runs locally by default. To run on a HuggingFace GPU without constructing the Docker command yourself, pass `--job.target` with a hardware flavor name:
`lerobot-train` runs locally by default. To run on a HuggingFace GPU, pass `--job.target` with a hardware flavor name:
```bash
lerobot-train \
+6
View File
@@ -34,6 +34,8 @@ from .types import (
)
from .video import (
DEFAULT_DEPTH_UNIT,
DEPTH_METER_UNIT,
DEPTH_MILLIMETER_UNIT,
VALID_VIDEO_CODECS,
VIDEO_ENCODER_INFO_KEYS,
DepthEncoderConfig,
@@ -41,6 +43,7 @@ from .video import (
VideoEncoderConfig,
depth_encoder_defaults,
encoder_config_from_video_info,
infer_depth_unit,
rgb_encoder_defaults,
)
@@ -70,8 +73,11 @@ __all__ = [
"depth_encoder_defaults",
# Factories
"encoder_config_from_video_info",
"infer_depth_unit",
# Constants
"DEFAULT_DEPTH_UNIT",
"DEPTH_METER_UNIT",
"DEPTH_MILLIMETER_UNIT",
"VALID_VIDEO_CODECS",
"VIDEO_ENCODER_INFO_KEYS",
]
+24 -5
View File
@@ -22,6 +22,8 @@ import logging
from dataclasses import dataclass, field
from typing import Any, ClassVar, Self
import numpy as np
from lerobot.utils.import_utils import require_package
logger = logging.getLogger(__name__)
@@ -36,7 +38,9 @@ HW_VIDEO_CODECS = [
"h264_vaapi", # Linux Intel/AMD
"h264_qsv", # Intel Quick Sync
]
VALID_VIDEO_CODECS: frozenset[str] = frozenset({"h264", "hevc", "libsvtav1", "auto", *HW_VIDEO_CODECS})
VALID_VIDEO_CODECS: frozenset[str] = frozenset(
{"h264", "hevc", "libsvtav1", "libaom-av1", "auto", *HW_VIDEO_CODECS}
)
# Aliases for legacy video codec names.
VIDEO_CODECS_ALIASES: dict[str, str] = {"av1": "libsvtav1"}
@@ -65,6 +69,15 @@ DEPTH_METER_UNIT: str = "m"
DEPTH_MILLIMETER_UNIT: str = "mm"
DEFAULT_DEPTH_UNIT: str = DEPTH_MILLIMETER_UNIT
def infer_depth_unit(dtype: np.dtype | type) -> str:
"""Infer the physical unit of raw depth frames from their dtype.
Floating-point frames are assumed to be in metres, integer frames in millimetres.
"""
return DEPTH_METER_UNIT if np.issubdtype(np.dtype(dtype), np.floating) else DEPTH_MILLIMETER_UNIT
# Depth-specific tuning fields persisted under ``features[*]["info"]`` as ``video.<name>``.
DEPTH_ENCODER_INFO_FIELD_NAMES: frozenset[str] = frozenset({"depth_min", "depth_max", "shift", "use_log"})
@@ -213,18 +226,24 @@ class VideoEncoderConfig:
if encoder_threads is not None:
svtav1_parts.append(f"lp={encoder_threads}")
if svtav1_parts:
opts["svtav1-params"] = ":".join(svtav1_parts)
set_if("svtav1-params", ":".join(svtav1_parts))
elif self.vcodec in ("h264", "hevc"):
set_if("crf", self.crf)
set_if("preset", self.preset)
if self.fast_decode:
opts["tune"] = "fastdecode"
set_if("tune", "fastdecode")
set_if("threads", encoder_threads)
elif self.vcodec == "libaom-av1":
set_if("crf", self.crf)
set_if("preset", self.preset)
if encoder_threads is not None:
set_if("threads", encoder_threads)
set_if("row-mt", 1)
elif self.vcodec in ("h264_videotoolbox", "hevc_videotoolbox"):
if self.crf is not None:
opts["q:v"] = max(1, min(100, 100 - self.crf * 2))
set_if("q:v", max(1, min(100, 100 - self.crf * 2)))
elif self.vcodec in ("h264_nvenc", "hevc_nvenc"):
opts["rc"] = 0
set_if("rc", 0)
set_if("qp", self.crf)
set_if("preset", self.preset)
elif self.vcodec == "h264_vaapi":
+8 -1
View File
@@ -509,7 +509,7 @@ def compute_episode_stats(
For 'image'/'video' features, stats are computed per channel and kept with a
leading channel axis (e.g. shape (3, 1, 1) for RGB). RGB stats are divided by
255 to land in [0, 1]; depth maps (features flagged with ``is_depth_map``) skip
this rescaling and remain in their stored units.
this rescaling and remain in their stored units (stored in ``depth_unit``).
"""
if quantile_list is None:
quantile_list = DEFAULT_QUANTILES
@@ -519,6 +519,13 @@ def compute_episode_stats(
if features[key]["dtype"] in {"string", "language"}:
continue
# Features with zero-width shapes are skipped (no data to compute stats on)
if any(d == 0 for d in features[key].get("shape", ())):
logging.debug(
f"Skipping statistics computation for feature '{key}' with a zero-width shape {features[key]['shape']}."
)
continue
if features[key]["dtype"] in ["image", "video"]:
ep_ft_array = sample_images(data)
axes_to_reduce = (0, 2, 3)
+31 -1
View File
@@ -26,12 +26,13 @@ import pyarrow as pa
import pyarrow.parquet as pq
from huggingface_hub import snapshot_download
from lerobot.configs import VideoEncoderConfig
from lerobot.configs import DEPTH_METER_UNIT, VideoEncoderConfig
from lerobot.utils.constants import DEFAULT_FEATURES, HF_LEROBOT_HOME, HF_LEROBOT_HUB_CACHE
from lerobot.utils.feature_utils import _validate_feature_names
from lerobot.utils.utils import flatten_dict
from .compute_stats import aggregate_stats
from .depth_utils import MM_PER_METRE
from .feature_utils import create_empty_dataset_info
from .io_utils import (
get_file_size_in_mb,
@@ -358,6 +359,35 @@ class LeRobotDatasetMetadata:
return [key for key, ft in self.features.items() if _is_depth(ft)]
def rescale_depth_stats(self, output_unit: str) -> None:
"""Rescale depth feature stats in place from their recorded unit to ``output_unit``.
Depth stats are stored in the unit the frames were recorded in
(``features[key]["info"]["depth_unit"]``), while frames are returned in
``output_unit`` on read. This converts the unit-bearing stat entries so
stats match the frames consumers see.
"""
missing_unit_keys = [
key for key in self.depth_keys if (self.features[key].get("info") or {}).get("depth_unit") is None
]
if missing_unit_keys:
logging.warning(
f"Depth feature(s) {missing_unit_keys} have no recorded 'depth_unit' in their info. "
f"Depth maps and stats for these keys will be returned AS IS, with no unit conversion "
f"to the requested output unit {output_unit!r}. Re-record the dataset or set 'depth_unit' "
f"in the feature info (meta/info.json) to enable conversion."
)
if self.stats is None:
return
for key in self.depth_keys:
stored_unit = (self.features[key].get("info") or {}).get("depth_unit")
if stored_unit is None or stored_unit == output_unit or key not in self.stats:
continue
factor = MM_PER_METRE if stored_unit == DEPTH_METER_UNIT else 1.0 / MM_PER_METRE
self.stats[key] = {
stat: value if stat == "count" else value * factor for stat, value in self.stats[key].items()
}
@property
def camera_keys(self) -> list[str]:
"""Keys to access visual modalities (regardless of their storage method)."""
+20 -2
View File
@@ -22,10 +22,14 @@ from pathlib import Path
import datasets
import torch
from lerobot.configs import DEFAULT_DEPTH_UNIT, DepthEncoderConfig
from lerobot.configs import (
DEFAULT_DEPTH_UNIT,
DEPTH_METER_UNIT,
DepthEncoderConfig,
)
from .dataset_metadata import LeRobotDatasetMetadata
from .depth_utils import dequantize_depth
from .depth_utils import MM_PER_METRE, dequantize_depth
from .feature_utils import (
check_delta_timestamps,
get_delta_indices,
@@ -102,6 +106,13 @@ class DatasetReader:
for vid_key in self._meta.depth_keys
}
# Get the input unit of each depth feature stored as raw images.
self._image_depth_units: dict[str, str | None] = {
key: (self._meta.features[key].get("info") or {}).get("depth_unit")
for key in self._meta.depth_keys
if key in self._meta.image_keys
}
def set_image_transforms(self, image_transforms: Callable | None) -> None:
"""Replace the transform applied to visual observations."""
if image_transforms is not None and not callable(image_transforms):
@@ -329,6 +340,13 @@ class DatasetReader:
continue
item[cam] = self._image_transforms(item[cam])
# Convert depth features to the output unit.
for key, stored_unit in self._image_depth_units.items():
if key in item and stored_unit is not None and stored_unit != self._depth_output_unit:
item[key] = (
item[key] * MM_PER_METRE if stored_unit == DEPTH_METER_UNIT else item[key] / MM_PER_METRE
)
# Add task as a string
task_idx = item["task_index"].item()
item["task"] = self._meta.tasks.iloc[task_idx].name
+10
View File
@@ -36,6 +36,7 @@ from lerobot.configs import (
RGBEncoderConfig,
VideoEncoderConfig,
depth_encoder_defaults,
infer_depth_unit,
rgb_encoder_defaults,
)
@@ -209,6 +210,15 @@ class DatasetWriter:
self.episode_buffer["timestamp"].append(timestamp)
self.episode_buffer["task"].append(frame.pop("task"))
# Record each depth feature's input unit once, inferred from the first frame's dtype.
if frame_index == 0:
for depth_key in self._meta.depth_keys:
if depth_key not in frame:
continue
info = self._meta.features[depth_key].setdefault("info", {})
if info.get("depth_unit") is None:
info["depth_unit"] = infer_depth_unit(np.asarray(frame[depth_key]).dtype)
# Start streaming encoder on first frame of episode
if frame_index == 0 and self._streaming_encoder is not None:
self._streaming_encoder.start_episode(
+8 -11
View File
@@ -34,12 +34,13 @@ from lerobot.configs.video import (
DEPTH_METER_UNIT,
DEPTH_MILLIMETER_UNIT,
DEPTH_QMAX,
infer_depth_unit,
)
from .image_writer import squeeze_single_channel
from .pyav_utils import write_u16_plane
_MM_PER_METRE = 1000.0
MM_PER_METRE = 1000.0
_UINT16_MAX = 65535
@@ -57,11 +58,7 @@ def _depth_input_to_float32_and_unit(
input_unit: Literal["auto", DEPTH_METER_UNIT, DEPTH_MILLIMETER_UNIT],
) -> tuple[NDArray[np.float32], Literal[DEPTH_METER_UNIT, DEPTH_MILLIMETER_UNIT]]:
"""Convert depth to float32 in the chosen unit, and return the resolved unit."""
resolved_unit = (
(DEPTH_METER_UNIT if np.issubdtype(depth.dtype, np.floating) else DEPTH_MILLIMETER_UNIT)
if input_unit == "auto"
else input_unit
)
resolved_unit = infer_depth_unit(depth.dtype) if input_unit == "auto" else input_unit
return depth.astype(np.float32, order="K"), resolved_unit
@@ -126,12 +123,12 @@ def quantize_depth(
# Convert depth_min, depth_max, and shift to the resolved input unit.
depth_min_u = (
np.float32(depth_min) if resolved_unit == DEPTH_METER_UNIT else np.float32(depth_min * _MM_PER_METRE)
np.float32(depth_min) if resolved_unit == DEPTH_METER_UNIT else np.float32(depth_min * MM_PER_METRE)
)
depth_max_u = (
np.float32(depth_max) if resolved_unit == DEPTH_METER_UNIT else np.float32(depth_max * _MM_PER_METRE)
np.float32(depth_max) if resolved_unit == DEPTH_METER_UNIT else np.float32(depth_max * MM_PER_METRE)
)
shift_u = np.float32(shift) if resolved_unit == DEPTH_METER_UNIT else np.float32(shift * _MM_PER_METRE)
shift_u = np.float32(shift) if resolved_unit == DEPTH_METER_UNIT else np.float32(shift * MM_PER_METRE)
# Normalization and quantization is performed in the resolved input unit.
if use_log:
@@ -236,7 +233,7 @@ def dequantize_depth(
# mm path: round + clamp in float32, skipping the uint16 round-trip
# when returning a tensor (torch.uint16 is poorly supported).
buf.mul_(_MM_PER_METRE).round_().clamp_(0.0, _UINT16_MAX)
buf.mul_(MM_PER_METRE).round_().clamp_(0.0, _UINT16_MAX)
if output_tensor:
return buf
return buf.cpu().numpy().astype(np.uint16, copy=False)
@@ -259,7 +256,7 @@ def dequantize_depth(
if output_unit == DEPTH_METER_UNIT:
return torch.from_numpy(buf) if output_tensor else buf
np.multiply(buf, _MM_PER_METRE, out=buf)
np.multiply(buf, MM_PER_METRE, out=buf)
np.rint(buf, out=buf)
np.clip(buf, 0.0, _UINT16_MAX, out=buf)
if output_tensor:
+3 -3
View File
@@ -67,9 +67,9 @@ def get_hf_features_from_features(features: dict) -> datasets.Features:
elif ft["shape"] == (1,):
hf_features[key] = datasets.Value(dtype=ft["dtype"])
elif len(ft["shape"]) == 1:
hf_features[key] = datasets.Sequence(
length=ft["shape"][0], feature=datasets.Value(dtype=ft["dtype"])
)
# pyarrow rejects fixed-size lists of length 0, so use a variable length list instead
length = ft["shape"][0] if ft["shape"][0] > 0 else -1
hf_features[key] = datasets.Sequence(length=length, feature=datasets.Value(dtype=ft["dtype"]))
elif len(ft["shape"]) == 2:
hf_features[key] = datasets.Array2D(shape=ft["shape"], dtype=ft["dtype"])
elif len(ft["shape"]) == 3:
+6
View File
@@ -224,6 +224,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
)
self.root = self.meta.root
self.revision = self.meta.revision
self.meta.rescale_depth_stats(self._depth_output_unit)
if episodes is not None and any(
episode >= self.meta.total_episodes or episode < 0 for episode in episodes
@@ -350,6 +351,11 @@ class LeRobotDataset(torch.utils.data.Dataset):
"""Frames per second used during data collection."""
return self.meta.fps
@property
def depth_output_unit(self) -> str:
"""Physical unit (``"m"`` or ``"mm"``) depth maps and statistics are returned in on read."""
return self._depth_output_unit
@property
def num_frames(self) -> int:
"""Number of frames in selected episodes."""
+24 -2
View File
@@ -22,11 +22,11 @@ import numpy as np
import torch
from datasets import load_dataset
from lerobot.configs import DEFAULT_DEPTH_UNIT, DepthEncoderConfig
from lerobot.configs import DEFAULT_DEPTH_UNIT, DEPTH_METER_UNIT, DepthEncoderConfig
from lerobot.utils.constants import HF_LEROBOT_HOME, LOOKAHEAD_BACKTRACKTABLE, LOOKBACK_BACKTRACKTABLE
from .dataset_metadata import CODEBASE_VERSION, LeRobotDatasetMetadata
from .depth_utils import dequantize_depth
from .depth_utils import MM_PER_METRE, dequantize_depth
from .feature_utils import get_delta_indices
from .io_utils import item_to_torch
from .utils import (
@@ -310,6 +310,7 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
)
self.root = self.meta.root
self.revision = self.meta.revision
self.meta.rescale_depth_stats(self._depth_output_unit)
# Check version
check_version_compatibility(self.repo_id, self.meta._version, CODEBASE_VERSION)
@@ -318,6 +319,13 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
for vid_key in self.meta.depth_keys
}
# Input unit of each depth feature stored as raw images (dequantized separately from videos).
self._image_depth_units: dict[str, str | None] = {
key: (self.meta.features[key].get("info") or {}).get("depth_unit")
for key in self.meta.depth_keys
if key in self.meta.image_keys
}
self.delta_timestamps = None
self.delta_indices = None
@@ -348,6 +356,11 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
def fps(self):
return self.meta.fps
@property
def depth_output_unit(self) -> str:
"""Physical unit (``"m"`` or ``"mm"``) depth maps are returned in on read."""
return self._depth_output_unit
@staticmethod
def _iter_random_indices(
rng: np.random.Generator, buffer_size: int, random_batch_size=100
@@ -530,6 +543,15 @@ class StreamingLeRobotDataset(torch.utils.data.IterableDataset):
for update in updates:
result.update(update)
# Convert raw-image depth features to the output unit (video depth is already converted).
for key, stored_unit in self._image_depth_units.items():
if key in result and stored_unit is not None and stored_unit != self._depth_output_unit:
result[key] = (
result[key] * MM_PER_METRE
if stored_unit == DEPTH_METER_UNIT
else result[key] / MM_PER_METRE
)
result["task"] = self.meta.tasks.iloc[item["task_index"]].name
yield result
@@ -84,6 +84,7 @@ import torch
import torch.utils.data
import tqdm
from lerobot.configs import DEPTH_MILLIMETER_UNIT
from lerobot.datasets import LeRobotDataset
from lerobot.utils.constants import ACTION, DONE, OBS_STATE, REWARD, SUCCESS
from lerobot.utils.utils import init_logging
@@ -228,6 +229,9 @@ def visualize_dataset(
logging.info("Logging to Rerun")
# Depth frames and stats are dequantized to the dataset's depth_output_unit on load.
depth_meter = 1000.0 if dataset.depth_output_unit == DEPTH_MILLIMETER_UNIT else 1.0
# Use the dataset's q01/q99 depth statistics for robust depth range bounds
depth_ranges = {}
for key in dataset.meta.depth_keys:
@@ -254,6 +258,7 @@ def visualize_dataset(
depth = to_hwc_float32_numpy(batch[key][i])
depth_entity = rr.DepthImage(
depth,
meter=depth_meter,
colormap=rr.components.Colormap.Viridis,
depth_range=depth_ranges.get(key),
)
+4
View File
@@ -211,8 +211,12 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
# Accelerate auto-detects the device based on the available hardware and ignores the policy.device setting.
# Force the device to be CPU when the active config's device is set to CPU (works for both policy and reward model training).
force_cpu = cfg.trainable_config.device == "cpu"
# Drive Accelerate's autocast from policy.dtype (bf16/fp16 activate it; float32/absent -> launcher default).
policy_dtype = getattr(cfg.trainable_config, "dtype", None)
mixed_precision = {"bfloat16": "bf16", "float16": "fp16", "float32": "no"}.get(policy_dtype)
accelerator = Accelerator(
step_scheduler_with_optimizer=False,
mixed_precision=mixed_precision,
kwargs_handlers=[ddp_kwargs],
cpu=force_cpu,
)
+8 -1
View File
@@ -24,6 +24,7 @@ import os
import numpy as np
from lerobot.configs import DEPTH_MILLIMETER_UNIT, infer_depth_unit
from lerobot.types import RobotAction, RobotObservation
from .constants import ACTION, ACTION_PREFIX, OBS_PREFIX, OBS_STR
@@ -161,7 +162,13 @@ def log_rerun_data(
observation_paths.add(key)
else:
if arr.shape[-1] == 1:
img_entity = rr.DepthImage(arr, colormap=rr.components.Colormap.Viridis)
# At record time, the depth unit is inferred from the frame type.
depth_unit = infer_depth_unit(arr.dtype)
img_entity = rr.DepthImage(
arr,
meter=1000.0 if depth_unit == DEPTH_MILLIMETER_UNIT else 1.0,
colormap=rr.components.Colormap.Viridis,
)
else:
img_entity = rr.Image(arr).compress() if compress_images else rr.Image(arr)
rr.log(key, entity=img_entity, static=True)
+23
View File
@@ -13,6 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from unittest.mock import patch
import numpy as np
@@ -687,6 +688,28 @@ def test_compute_episode_stats_string_features_skipped():
assert "q01" in stats["action"]
def test_compute_episode_stats_zero_width_features_skipped(caplog):
"""Test that features with a zero-width dim (e.g. shape=(0,)) are skipped with a debug log."""
episode_data = {
"empty": np.zeros((100, 0), dtype=np.float32), # Zero-width feature
"action": np.random.normal(0, 1, (100, 5)),
}
features = {
"empty": {"dtype": "float32", "shape": (0,)},
"action": {"dtype": "float32", "shape": (5,)},
}
with caplog.at_level(logging.DEBUG):
stats = compute_episode_stats(episode_data, features)
# Zero-width features should be skipped with a debug log, others computed as usual
assert "empty" not in stats
assert "empty" in caplog.text
assert "action" in stats
assert "q01" in stats["action"]
assert stats["action"]["mean"].shape == (5,)
def test_aggregate_feature_stats_with_quantiles():
"""Test aggregating feature stats that include quantiles."""
stats_ft_list = [
+10 -1
View File
@@ -1531,6 +1531,7 @@ def test_valid_video_codecs_constant():
assert "h264" in VALID_VIDEO_CODECS
assert "hevc" in VALID_VIDEO_CODECS
assert "libsvtav1" in VALID_VIDEO_CODECS
assert "libaom-av1" in VALID_VIDEO_CODECS
assert "auto" in VALID_VIDEO_CODECS
assert "h264_videotoolbox" in VALID_VIDEO_CODECS
assert "h264_nvenc" in VALID_VIDEO_CODECS
@@ -1538,7 +1539,7 @@ def test_valid_video_codecs_constant():
assert "h264_qsv" in VALID_VIDEO_CODECS
assert "hevc_videotoolbox" in VALID_VIDEO_CODECS
assert "hevc_nvenc" in VALID_VIDEO_CODECS
assert len(VALID_VIDEO_CODECS) == 10
assert len(VALID_VIDEO_CODECS) == 11
def test_delta_timestamps_with_episodes_filter(tmp_path, empty_lerobot_dataset_factory):
@@ -1803,3 +1804,11 @@ def test_episode_filter_unknown_key_raises(tmp_path, lerobot_dataset_factory):
root=dataset.root,
episode_filter=lambda ep: ep["not_a_real_field"] > 0,
)
def test_get_hf_features_zero_width_feature_does_not_raise_on_from_dict():
import datasets
features = {"empty": {"dtype": "float32", "shape": (0,), "names": ["empty"]}}
hf_features = get_hf_features_from_features(features)
datasets.Dataset.from_dict({"empty": [[], []]}, features=hf_features)
+89
View File
@@ -32,6 +32,7 @@ from lerobot.configs.video import (
)
from lerobot.datasets.depth_utils import dequantize_depth, quantize_depth
from lerobot.datasets.image_writer import image_array_to_pil_image, write_image
from lerobot.utils.constants import DEFAULT_FEATURES
from tests.fixtures.constants import (
DEFAULT_FPS,
DUMMY_CAMERA_FEATURES,
@@ -245,3 +246,91 @@ class TestFeatureFileRouting:
dataset.save_episode()
dataset.finalize()
class TestDepthUnitMetadata:
"""The depth unit is inferred once from dtype, stored in ``info``, and drives stats + reads."""
NUM_FRAMES = 4
def _record(self, root, features_factory, depth_dtype, value, use_videos):
from lerobot.datasets.lerobot_dataset import LeRobotDataset
features = features_factory(camera_features=DUMMY_CAMERA_FEATURES_WITH_DEPTH, use_videos=use_videos)
dataset = LeRobotDataset.create(
repo_id=DUMMY_REPO_ID,
fps=DEFAULT_FPS,
features=features,
root=root,
use_videos=use_videos,
streaming_encoding=use_videos,
)
for _ in range(self.NUM_FRAMES):
frame: dict = {"task": "test"}
for key, ft in dataset.meta.features.items():
if key in DEFAULT_FEATURES:
continue
if key in dataset.meta.depth_keys:
frame[key] = np.full(ft["shape"], value, dtype=depth_dtype)
elif key in dataset.meta.camera_keys:
frame[key] = np.random.randint(0, 256, ft["shape"], dtype=np.uint8)
else:
frame[key] = np.zeros(ft["shape"], dtype=np.float32)
dataset.add_frame(frame)
return dataset
@pytest.mark.parametrize("use_videos", [False, True])
@pytest.mark.parametrize(
("depth_dtype", "value", "expected_unit"),
[(np.float32, 2.0, DEPTH_METER_UNIT), (np.uint16, 2000, DEPTH_MILLIMETER_UNIT)],
)
def test_recorded_unit_inferred_persisted_and_kept_in_stats(
self, tmp_path, features_factory, use_videos, depth_dtype, value, expected_unit
):
"""Unit is inferred from the first frame's dtype, drives stats (raw, never canonicalized), and survives a reload."""
from lerobot.datasets.lerobot_dataset import LeRobotDataset
dataset = self._record(tmp_path / "ds", features_factory, depth_dtype, value, use_videos)
assert dataset.meta.features[DEPTH_KEY]["info"]["depth_unit"] == expected_unit
dataset.save_episode()
mean = float(np.asarray(dataset.meta.stats[DEPTH_KEY]["mean"]).reshape(-1)[0])
np.testing.assert_allclose(mean, value, rtol=0.05)
dataset.finalize()
reloaded = LeRobotDataset(repo_id=DUMMY_REPO_ID, root=tmp_path / "ds")
assert reloaded.meta.features[DEPTH_KEY]["info"]["depth_unit"] == expected_unit
@pytest.mark.parametrize("use_videos", [False, True])
@pytest.mark.parametrize(
("output_unit", "expected"),
[(DEPTH_MILLIMETER_UNIT, 2000.0), (DEPTH_METER_UNIT, 2.0)],
)
def test_read_honors_output_unit_for_frames_and_stats(
self, tmp_path, features_factory, use_videos, output_unit, expected
):
"""Reloading with a ``depth_output_unit`` converts metre frames (image mode) and rescales stats while preserving count."""
from lerobot.datasets.lerobot_dataset import LeRobotDataset
dataset = self._record(tmp_path / "ds", features_factory, np.float32, 2.0, use_videos=use_videos)
dataset.save_episode()
count = float(np.asarray(dataset.meta.stats[DEPTH_KEY]["count"]).reshape(-1)[0])
dataset.finalize()
read_dataset = LeRobotDataset(
repo_id=DUMMY_REPO_ID, root=tmp_path / "ds", depth_output_unit=output_unit
)
stats = read_dataset.meta.stats[DEPTH_KEY]
np.testing.assert_allclose(float(np.asarray(stats["mean"]).reshape(-1)[0]), expected, rtol=0.05)
np.testing.assert_allclose(float(np.asarray(stats["count"]).reshape(-1)[0]), count)
if not use_videos:
depth = read_dataset[0][DEPTH_KEY]
assert torch.allclose(depth, torch.full_like(depth, expected))
from lerobot.datasets.streaming_dataset import StreamingLeRobotDataset
stream_dataset = StreamingLeRobotDataset(
repo_id=DUMMY_REPO_ID, root=tmp_path / "ds", depth_output_unit=output_unit
)
stream_depth = next(iter(stream_dataset))[DEPTH_KEY]
assert torch.allclose(stream_depth, torch.full_like(stream_depth, expected))
+3 -1
View File
@@ -345,7 +345,9 @@ class TestExtraOptions:
opts = cfg.get_codec_options()
assert opts["qp"] == 20
assert isinstance(opts["qp"], int)
assert cfg.get_codec_options(as_strings=True)["qp"] == "20"
str_opts = cfg.get_codec_options(as_strings=True)
assert str_opts["qp"] == "20"
assert all(isinstance(v, str) for v in str_opts.values())
@require_libsvtav1
def test_structured_fields_win_on_collision(self):
+8
View File
@@ -26,6 +26,7 @@ import pytest
import torch
from datasets import Dataset
from lerobot.configs.video import infer_depth_unit
from lerobot.datasets.dataset_metadata import CODEBASE_VERSION, LeRobotDatasetMetadata
from lerobot.datasets.feature_utils import get_hf_features_from_features
from lerobot.datasets.io_utils import flatten_dict, hf_transform_to_torch
@@ -535,6 +536,13 @@ def lerobot_dataset_factory(
chunks_size=chunks_size,
**info_kwargs,
)
# This synthetic path skips add_frame, so record the depth unit the writer would
# have stored (dummy depth is uint16) to keep ``depth_unit`` present in info.json.
# Reassign a fresh info dict to avoid mutating the shared feature constants.
for ft in info.features.values():
ft_info = ft.get("info")
if ft_info is not None and ft_info.get("is_depth_map") and "depth_unit" not in ft_info:
ft["info"] = {**ft_info, "depth_unit": infer_depth_unit(np.uint16)}
if stats is None:
stats = stats_factory(features=info.features)
if tasks is None:
+2 -1
View File
@@ -50,8 +50,9 @@ def mock_rerun(monkeypatch):
return self
class DummyDepthImage:
def __init__(self, arr, colormap=None):
def __init__(self, arr, meter=None, colormap=None):
self.arr = arr
self.meter = meter
self.colormap = colormap
def dummy_log(key, obj=None, **kwargs):