mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-30 14:47:10 +00:00
3dd19d043e
* feat(depth): add depth quantization helpers and tests
* feat(video): add ffv1 to supported codecs
* feat(depth): persist depth metadata
* feat(depth): extend quantization tools to better fit the encoding/decoding pipeline
* feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter
* feat(depth): wire StreamingVideoEncoder + writer to depth encoder
* feat(depth): wire DatasetReader to decode_depth_frames
* feat(cameras/realsense): expose async depth in metric meters
* feat(features): route 2D camera shapes to observation.depth.<key>
* feat(robots/so_follower): emit + populate depth keys when use_depth
* feat(record): plumb DepthEncoderConfig through lerobot-record
* feat(viz): render depth observations as rr.DepthImage in Viridis
* feat(depth maps writer): adding support for raw depth maps recording with image writer
* chore(format): format code
* feat(depth shape): ensuring depth maps shape is always including the channel
* feat(is_depth): simplifying is_depth nested name + legacy support
* fix(stop_event): fixing stop_event race condition in camera classes
* fix(plumbing): fixing missing parts in the depth maps pipeline
* chore(typos): fixing typos
* test(fix): fixing exisiting tests to still work with latest features
* tests(depth): adding new tests for depth integration validation
* feat(pix_fmt channels): use PyAv to check get pixel formats number of channels
* feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file.
* fix(pre-commit): fixing mutable defautl value
* fix(info): fixing info metadata update when is_depth_map was set
* tests(typos): fixing typos in tests
* fix(realsense): fixing typo in realsense serial number
* fix(normalization): restricting 255 normalization to non depth/uint8 images only
* fix(typo): fixing typo
* fix(TIFF): add missing quantization and cleanup for TIFF files
* feat(batched dequantization): optimizing dequantize_depth for torch based batched dequantization
* feat(tools): adding depth support in LeRobotDataset edition tools
* test(aggregate): extending aggregation tests to depth frames
* test(cleaning): cleaning up tests
* fix(from_video_info): fixing early validation issue in from_video_info
* fix(typo): fixing typo
* fix(is_depth): adding missing doctrings and is_depth arguments in video decoding functions
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* fix(depth units): fixing depth units output for the realsense cameras
* feat(output unit): adding support for output unit specification at dataset reading/training time
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* test(depth): cleaning up depth tests
* test(depth encoding): updating and cleaning video/depth encoding tests
* chore(format): formatting code
* docs(depth): improving depth maps docs
* test(fix): fixing depth tests
* test(dataset tools): adding missing tests for new dataset edition tools features
* chore(format): formatting code
* fix(pyav check): fixing PyAV option validation for integer codec options by normalizing
numeric values before calling `is_integer()`
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
* docs(mermaid): fixing mermaid diagram
* fix(rebase): rebase follow up corrections
* feat(dataset tools): adding missing docstrings and features for depth fill support in dataset edition tools
* docs(docstring): updating docstrings
* docs(dataset tools): updating docs
* fix(save images): fixing image saving in dataset tools
* fix(update video info): fixing update video info logic to match the recording and editing use cases
* test(reencode): fixing reencoding monkeypatch
* fix(review): add Claude review
* chore(format): format code
* fix(update video info): ditching the differentiated approahces for video info update - video info are always updated unless for preserved keys.
* chore(rebase): fixing rebase merge conflicts
* test(visualization): fixing visualization tests
* feat(docstrings): adding explicit docstring for encoding parameters. Docstrigns will now show up as description in the CLI --help.
* feat(mm as default): adding a global DEFAULT_DEPTH_UNIT variable setting mm as default depth unit
* fix(RGB <-> camera): renaming camera_encoder to rgb_encoder for clarity
* chore(TODO): removing deprecated TODO
* doc(write_u16_plane): improving docstrings for write_u16_plane
* feat(units): adding constants for depth frames units (m and mm)
* fix(spam): replacing spamming warning but a debug log
* feat(leagcy metadata): adding automatic metadata update for legacy 'video.is_depth_map' feature
* fix(copy&reindex): fixing metadat reshaping for single channel frames
* fix(ImageNet): excluding dpeth frames from ImageNet stats
* fix(PyAV container seek): fixing initial PyAV container seek to be robust againsy codec choice
* feat(lerobot-dataset-viz): adding support for depth in lerobot-dataset-viz
* fix(compress): removing rerun compression for DepthImages
* fix(signle channel squeeze): fixing single channel squeezing
* chore(format): format code
* fix(streaming): adding support for dequantization in streaming_dataset.py
* refactor(read depth): factorizing depth reading methods for realsense camera and adding support for depth-only usage
* chore(renaming): fixing missed RGBEncoderConfig renamings
* docs(renaming): reflecting renamings in a clearer way in the docs
* chore(annotation): excluding depth from the annotation pipeline
* feat(robots): adding depth support in compatible follower robots
* feat(LeSadKiwi): excluding LeKiwi from depth support (for now)
* chore(fail): removing misplaced file
* chore(fail): removing misplaced file
* fix(remove ffv1): removing ffv1 as it does not support MP4
* docs(cheat sheet): adding depth and video encoding to the cheat sheet
* fix(lossless): tuning depth encoding parameters for lossless depth storage
* test(fix): fixing failing tests
* depth(ZMQ): excluding ZMQ from depth support
* Revert "depth(ZMQ): excluding ZMQ from depth support"
This reverts commit b95cf4e4c2.
* fix(image transforms): excluding depth frames from images transforms
* fix(typo): typo
* fix(stats): fixing stats computation for depth frames
* fix(TIFF vs. pytorch): adding an extra uint16 to float32 conversion for depth maps stored as raw TIFF images
* fix(typos): fixing typos
* test(dtype): fixing stats computation typing tests
---------
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Wensi Ai <wsai@stanford.edu>
227 lines
10 KiB
Plaintext
227 lines
10 KiB
Plaintext
# Cameras
|
||
|
||
LeRobot offers multiple options for video capture:
|
||
|
||
| Class | Supported Cameras |
|
||
| ----------------- | ----------------------------------- |
|
||
| `OpenCVCamera` | Phone, built-in laptop, USB webcams |
|
||
| `ZMQCamera` | Network-connected cameras |
|
||
| `RealSenseCamera` | Intel RealSense (with depth) |
|
||
| `Reachy2Camera` | Reachy 2 robot cameras |
|
||
|
||
> [!TIP]
|
||
> For `OpenCVCamera` compatibility details, see the [Video I/O with OpenCV Overview](https://docs.opencv.org/4.x/d0/da7/videoio_overview.html).
|
||
|
||
### Find your camera
|
||
|
||
Every camera requires a unique identifier to be instantiated, allowing you to distinguish between multiple connected devices.
|
||
|
||
`OpenCVCamera` and `RealSenseCamera` support auto-discovery. Run the command below to list available devices and their identifiers. Note that these identifiers may change after rebooting your computer or re-plugging the camera, depending on your operating system.
|
||
|
||
```bash
|
||
lerobot-find-cameras opencv # or realsense for Intel Realsense cameras
|
||
```
|
||
|
||
The output will look something like this if you have two cameras connected:
|
||
|
||
```bash
|
||
--- Detected Cameras ---
|
||
Camera #0:
|
||
Name: OpenCV Camera @ 0
|
||
Type: OpenCV
|
||
Id: 0
|
||
Backend api: AVFOUNDATION
|
||
Default stream profile:
|
||
Format: 16.0
|
||
Width: 1920
|
||
Height: 1080
|
||
Fps: 15.0
|
||
--------------------
|
||
(more cameras ...)
|
||
```
|
||
|
||
> [!WARNING]
|
||
> When using Intel RealSense cameras in `macOS`, you could get this [error](https://github.com/IntelRealSense/librealsense/issues/12307): `Error finding RealSense cameras: failed to set power state`, this can be solved by running the same command with `sudo` permissions. Note that using RealSense cameras in `macOS` is unstable.
|
||
|
||
`ZMQCamera` and `Reachy2Camera` do not support auto-discovery. They must be configured manually by providing their network address and port or robot SDK settings.
|
||
|
||
## Use cameras
|
||
|
||
### Frame access modes
|
||
|
||
All camera classes implement three access modes for capturing frames:
|
||
|
||
| Method | Behavior | Blocks? | Best For |
|
||
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | ---------------------------------------- |
|
||
| `read()` | Waits for the camera hardware to return a frame. May block for a long time depending on the camera and SDK. | Yes | Simple scripts, sequential capture |
|
||
| `async_read(timeout_ms)` | Returns the latest unconsumed frame from background thread. Blocks only if buffer is empty, up to `timeout_ms`. Raises `TimeoutError` if no frame arrives. | With a timeout | Control loops synchronized to camera FPS |
|
||
| `read_latest(max_age_ms)` | Peeks at the most recent frame in buffer (may be stale). Raises `TimeoutError` if frame is older than `max_age_ms`. | No | UI visualization, logging, monitoring |
|
||
|
||
### Usage examples
|
||
|
||
The following examples show how to use the camera API to configure and capture frames from different camera types.
|
||
|
||
- **Blocking and non-blocking frame capture** using an OpenCV-based camera
|
||
- **Color and depth capture** using an Intel RealSense camera
|
||
|
||
> [!WARNING]
|
||
> Failing to cleanly disconnect cameras can cause resource leaks. Use the context manager protocol to ensure automatic cleanup:
|
||
>
|
||
> ```python
|
||
> with OpenCVCamera(config) as camera:
|
||
> ...
|
||
> ```
|
||
>
|
||
> You can also call `connect()` and `disconnect()` manually, but always use a `finally` block for the latter.
|
||
|
||
<hfoptions id="shell_restart">
|
||
<hfoption id="Open CV Camera">
|
||
|
||
<!-- prettier-ignore-start -->
|
||
```python
|
||
from lerobot.cameras.opencv import OpenCVCamera, OpenCVCameraConfig
|
||
from lerobot.cameras import ColorMode, Cv2Rotation
|
||
|
||
# Construct an `OpenCVCameraConfig` with your desired FPS, resolution, color mode, and rotation.
|
||
config = OpenCVCameraConfig(
|
||
index_or_path=0,
|
||
fps=15,
|
||
width=1920,
|
||
height=1080,
|
||
color_mode=ColorMode.RGB,
|
||
rotation=Cv2Rotation.NO_ROTATION
|
||
)
|
||
|
||
# Instantiate and connect an `OpenCVCamera`, performing a warm-up read (default).
|
||
with OpenCVCamera(config) as camera:
|
||
|
||
# Read a frame synchronously — blocks until hardware delivers a new frame
|
||
frame = camera.read()
|
||
print(f"read() call returned frame with shape:", frame.shape)
|
||
|
||
# Read a frame asynchronously with a timeout — returns the latest unconsumed frame or waits up to timeout_ms for a new one
|
||
try:
|
||
for i in range(10):
|
||
frame = camera.async_read(timeout_ms=200)
|
||
print(f"async_read call returned frame {i} with shape:", frame.shape)
|
||
except TimeoutError as e:
|
||
print(f"No frame received within timeout: {e}")
|
||
|
||
# Instantly return a frame - returns the most recent frame captured by the camera
|
||
try:
|
||
initial_frame = camera.read_latest(max_age_ms=1000)
|
||
for i in range(10):
|
||
frame = camera.read_latest(max_age_ms=1000)
|
||
print(f"read_latest call returned frame {i} with shape:", frame.shape)
|
||
print(f"Was a new frame received by the camera? {not (initial_frame == frame).any()}")
|
||
except TimeoutError as e:
|
||
print(f"Frame too old: {e}")
|
||
|
||
```
|
||
<!-- prettier-ignore-end -->
|
||
|
||
</hfoption>
|
||
<hfoption id="Intel Realsense Camera">
|
||
|
||
<!-- prettier-ignore-start -->
|
||
```python
|
||
from lerobot.cameras.realsense import RealSenseCamera, RealSenseCameraConfig
|
||
from lerobot.cameras import ColorMode, Cv2Rotation
|
||
|
||
# Create a `RealSenseCameraConfig` specifying your camera’s serial number and enabling depth.
|
||
config = RealSenseCameraConfig(
|
||
serial_number_or_name="233522074606",
|
||
fps=15,
|
||
width=640,
|
||
height=480,
|
||
color_mode=ColorMode.RGB,
|
||
use_depth=True,
|
||
rotation=Cv2Rotation.NO_ROTATION
|
||
)
|
||
|
||
# Instantiate and connect a `RealSenseCamera` with warm-up read (default).
|
||
camera = RealSenseCamera(config)
|
||
camera.connect()
|
||
|
||
# Capture a color frame via `read()` and a depth map via `read_depth()`.
|
||
try:
|
||
color_frame = camera.read()
|
||
depth_map = camera.read_depth()
|
||
print("Color frame shape:", color_frame.shape)
|
||
print("Depth map shape:", depth_map.shape)
|
||
finally:
|
||
camera.disconnect()
|
||
```
|
||
<!-- prettier-ignore-end -->
|
||
|
||
</hfoption>
|
||
</hfoptions>
|
||
|
||
### Working with depth
|
||
|
||
The Intel RealSense and Reachy 2 cameras can capture both color and depth in lockstep. Calling `read()` returns the **color** frame as `(H, W, 3)` `uint8`. Calling `read_depth()` returns the **depth map** as `(H, W, 1)` `uint16`, where each pixel value is the distance from the sensor expressed in **millimetres**. A pixel value of `0` typically means "no measurement available" (out-of-range, occluded, or low-confidence).
|
||
|
||
During recording, the control loop peeks the freshest buffered frames non-blockingly via `read_latest()` (color) and `read_latest_depth()` (depth), adding the depth map as a sibling feature (e.g. `front_depth` next to `front`).
|
||
|
||
For how depth streams are stored and encoded when recording a dataset, see the [Depth streams](./video_encoding_parameters#depth-streams) section of the video encoding guide.
|
||
|
||
## Use your phone's camera
|
||
|
||
<hfoptions id="use phone">
|
||
<hfoption id="iPhone & macOS">
|
||
|
||
To use your iPhone as a camera on macOS, enable the Continuity Camera feature:
|
||
|
||
- Ensure your Mac is running macOS 13 or later, and your iPhone is on iOS 16 or later.
|
||
- Sign in both devices with the same Apple ID.
|
||
- Connect your devices with a USB cable or turn on Wi-Fi and Bluetooth for a wireless connection.
|
||
|
||
For more details, visit [Apple support](https://support.apple.com/en-gb/guide/mac-help/mchl77879b8a/mac).
|
||
|
||
</hfoption>
|
||
<hfoption id="OBS virtual camera">
|
||
|
||
If you want to use your phone as a camera using OBS, follow these steps to set up a virtual camera.
|
||
|
||
1. _(Linux only) Install `v4l2loopback-dkms` and `v4l-utils`_. These packages create virtual camera devices and verify their settings. Install with:
|
||
|
||
```bash
|
||
sudo apt install v4l2loopback-dkms v4l-utils
|
||
```
|
||
|
||
2. _Install the [DroidCam app](https://droidcam.app) on your phone_. This app is available for both iOS and Android.
|
||
3. _Download and install [OBS Studio](https://obsproject.com)_.
|
||
4. _Download and install the [DroidCam OBS plugin](https://droidcam.app/obs)_.
|
||
5. _Start OBS Studio_.
|
||
|
||
6. _Add your phone as a source_. Follow the instructions [here](https://droidcam.app/obs/usage). Be sure to set the resolution to `640x480` to avoid the watermarks.
|
||
7. _Adjust resolution settings_. In OBS Studio, go to `File > Settings > Video` or `OBS > Preferences... > Video`. Change the `Base(Canvas) Resolution` and the `Output(Scaled) Resolution` to `640x480` by manually typing it.
|
||
8. _Start virtual camera_. In OBS Studio, follow the instructions [here](https://obsproject.com/kb/virtual-camera-guide).
|
||
9. _Verify the virtual camera setup and resolution_.
|
||
- **Linux**: Use `v4l2-ctl` to list devices and check resolution:
|
||
```bash
|
||
v4l2-ctl --list-devices # find VirtualCam and note its /dev/videoX path
|
||
v4l2-ctl -d /dev/videoX --get-fmt-video # replace with your VirtualCam path
|
||
```
|
||
You should see `VirtualCam` listed and resolution `640x480`.
|
||
- **macOS**: Open Photo Booth or FaceTime and select "OBS Virtual Camera" as the input.
|
||
- **Windows**: The native Camera app doesn't support virtual cameras. Use a video conferencing app (Zoom, Teams) or run `lerobot-find-cameras opencv` directly to verify.
|
||
|
||
<details>
|
||
<summary><strong>Troubleshooting</strong></summary>
|
||
|
||
> The virtual camera resolution is incorrect.
|
||
|
||
Delete the virtual camera source and recreate it. The resolution cannot be changed after creation.
|
||
|
||
> Error reading frame in background thread for OpenCVCamera(X): OpenCVCamera(X) frame width=640 or height=480 do not match configured width=1920 or height=1080.
|
||
|
||
This error is caused by OBS Virtual Camera advertising a `1920x1080` resolution despite rescaling. The only fix for now is to comment out the width and height check in `_postprocess_image()`.
|
||
|
||
</details>
|
||
|
||
</hfoption>
|
||
</hfoptions>
|
||
|
||
If everything is set up correctly, your phone will appear as a standard OpenCV camera and can be used with `OpenCVCamera`.
|