Compare commits

..

12 Commits

Author SHA1 Message Date
Steven Palma 58f70b6bd3 fix(scripts): better prints teleop (#2538) 2025-11-27 16:54:17 +01:00
Steven Palma b07160eb1b feat(utils): precise_sleep() less CPU hungry without sacrificing accuracy (#2526) 2025-11-26 17:42:16 +01:00
Caroline Pascal 648ea8f485 fix(benchmark) : fixing video benchmark (#2094)
* fix(time benchmark): removing deprecated TimeBenchmark dependency

* fix(typo): renaming frames in an up-to-date fashion

* feat(duets): rearanging crf and g parameters in a proper unique combination manner

* fix(segfault): fixing segfault by adding a lock in ThreadPoolExecutor

* chore(update) : update datasets, codecs and backends to the latest versions

* chore(unused files): removing unused files

* fix(dataset paths): fix datasets paths to live among lerobot datasets
2025-11-26 17:41:31 +01:00
Caroline Pascal 581dd45eae feat(parallel encoding): making parallel encoding the default choice over all platforms (#2525) 2025-11-26 14:57:34 +01:00
Steven Palma 17581a9449 fix(examples): wrap all of them into a main function (#2524) 2025-11-26 14:28:04 +01:00
Steven Palma 87bee86640 feat(dataset): dynamic compress_level depending on the type of dataset (video or image) (#2517) 2025-11-25 19:11:12 +01:00
Steven Palma 18b32dced9 feat(dataset): speed-up encoding time (#2514)
* feat(dataset): speed-up encoding time

* feat(dataset): add parallel encoding option

* feat(datasets): parallel encoding only if num_cams > 2

* feat(datasets): implement feedback
2025-11-25 16:46:12 +01:00
Jade Choghari 36e8feefe3 docs: Add LeIsaac x LeRobot Envhub tutorial (#2498)
* add leisaac doc

* depreciate il in sim

* fix readme

* more

* fix styling

* update title

* more changes

* more

* fix style

* more

* fix style
2025-11-25 16:23:12 +01:00
Michel Aractingi 0f551df8f4 add absolute_to_reative_idx for remapping indicies when a subset of data is loaded (#2490) 2025-11-20 14:05:31 +01:00
Jade Choghari 6e86a69dcd feat(envs): add envs pre-post processor (#2474)
* more changes

* working changes

* more changes

* more fixes

* fix style

* more

* clean

* put axis-1

* more fixes

* more styling fixes:

* iterate on review:

* more changes

* add env processor

* style

* more changes

* add docs

* fix imports

* fix test, add to train

* Update src/lerobot/envs/factory.py

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Signed-off-by: Jade Choghari <chogharijade@gmail.com>

* iterate on review

---------

Signed-off-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: jade.choghari@huggingface.co <“chogharijade@gmail.com”>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-11-19 18:36:14 +01:00
Eugene Mironov 8a915c6b6f [RTC] Real Time Chunking for Pi0, Smolvla, Pi0.5 (#1698)
* Add Real-Time Chunking (RTC) support for flow matching models

Implement Real-Time Chunking (RTC) for action chunking policies using flow
matching denoising. RTC enables smooth action transitions between consecutive
chunks by using prefix guidance during denoising.

Key features:
- RTCProcessor class with denoise_step method for RTC guidance
- Tracker system for debug tracking using time-based dictionary storage
- RTCDebugVisualizer with comprehensive visualization utilities
- Integration with SmolVLA policy for flow matching models
- Support for multiple prefix attention schedules (ZEROS, ONES, LINEAR, EXP)
- Configurable execution horizon and max guidance weight
- Example scripts for dataset evaluation and real-time control

Technical details:
- Uses autograd-based gradient computation for RTC corrections
- Time-based tracking eliminates duplicate step issues
- Proxy methods in RTCProcessor for cleaner API
- Full integration with LeRobot's policy and dataset systems

Files added/modified:
- src/lerobot/configs/types.py: Add RTCAttentionSchedule enum
- src/lerobot/policies/rtc/: Core RTC implementation
  - configuration_rtc.py: RTC configuration
  - modeling_rtc.py: RTCProcessor with denoise_step
  - debug_handler.py: Tracker for debug information
  - debug_visualizer.py: Visualization utilities
- src/lerobot/policies/smolvla/modeling_smolvla.py: RTC integration
- examples/rtc/: Example scripts and evaluation tools

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* Fix rtc_config attribute access in SmolVLA

Use getattr() to safely check for rtc_config attribute existence
instead of direct attribute access. This fixes AttributeError when
loading policies without rtc_config in their config.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* fixup! Fix rtc_config attribute access in SmolVLA

* Add RTCConfig field to SmolVLAConfig

Add rtc_config as an optional field in SmolVLAConfig to properly
support Real-Time Chunking configuration. This replaces the previous
getattr() workarounds with direct attribute access, making the code
cleaner and more maintainable.

Changes:
- Import RTCConfig in configuration_smolvla.py
- Add rtc_config: RTCConfig | None = None field
- Revert getattr() calls to direct attribute access in modeling_smolvla.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* Refactor RTC enabled checks to use _rtc_enabled helper

Add _rtc_enabled() helper method in VLAFlowMatching class to simplify
and clean up RTC enabled checks throughout the code. This reduces
code duplication and improves readability.

Changes:
- Add _rtc_enabled() method in VLAFlowMatching
- Replace verbose rtc_config checks with _rtc_enabled() calls
- Maintain exact same functionality with cleaner code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* Rename track_debug method to track

Simplify the method name from track_debug to just track for better
readability and consistency. The method already has clear documentation
about its debug tracking purpose.

Changes:
- Rename RTCProcessor.track_debug() to track()
- Update all call sites in modeling_smolvla.py and modeling_rtc.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* Use output_dir for saving all evaluation images

Update eval_dataset.py to save all comparison images to the
configured output_dir instead of the current directory. This provides
better organization and allows users to specify where outputs should be
saved.

Changes:
- Add os import at top level
- Create output_dir at start of run_evaluation()
- Save all comparison images to output_dir
- Remove duplicate os imports
- Update init_rtc_processor() docstring to be more concise

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* fixup! Use output_dir for saving all evaluation images

* Fix logging buffering and enable tracking when RTC config provided

- Add force=True to logging.basicConfig to override existing configuration
- Enable line buffering for stdout/stderr for real-time log output
- Modify init_rtc_processor to create processor when rtc_config exists
  even if RTC is disabled, allowing tracking of denoising data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>

* Refactor SmolVLA plotting to use tracker data instead of local variables

Remove local tracking variables (correction, x1_t, error) from the
denoising loop and instead retrieve plotting data from the RTC tracker
after each denoise step. This makes the code cleaner and uses the
tracker as the single source of truth for debug/visualization data.

Changes:
- Remove initialization of correction, x1_t, error before denoising loop
- After each Euler step, retrieve most recent debug step from tracker
- Extract correction, x1_t, err from debug step for plotting
- Update tracking condition to use is_debug_enabled() method

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>

* Move plotting logic from modeling_smolvla to eval_dataset script

Refactor to improve separation of concerns:

modeling_smolvla.py changes:
- Remove all plotting logic from sample_actions method
- Remove viz_xt_axs, viz_vt_axs, viz_x1t_axs parameters
- Remove matplotlib and RTCDebugVisualizer imports
- Remove viz_fig, viz_axs, denoise_step_counter instance variables
- Simplify denoising loop to only track data in rtc_processor

eval_dataset.py changes:
- Add _plot_denoising_steps_from_tracker helper method
- Retrieve debug steps from tracker after inference
- Plot x_t, v_t, x1_t, correction, and error from tracker data
- Enable debug tracking (cfg.rtc.debug = True) for visualization
- Remove viz axes parameters from predict_action_chunk calls

modeling_rtc.py changes:
- Remove v_t from track() call (handled by user change)

Benefits:
- Cleaner modeling code focused on inference
- Evaluation script owns all visualization logic
- Better separation of concerns
- Tracker is single source of truth for debug data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>

* Refactor plotting loging

* fixup! Refactor plotting loging

* Improve visualization: separate correction plot and fix axis scaling

Changes:
- Create separate figure for correction data instead of overlaying on v_t
- Add _rescale_axes helper method to properly scale all axes
- Add 10% margin to y-axis for better visualization
- Fix v_t chart vertical compression issue

Benefits:
- Clearer v_t plot without correction overlay
- Better axis scaling with proper margins
- Separate correction figure for focused analysis
- Improved readability of all denoising visualizations

Output files:
- denoising_xt_comparison.png (x_t trajectories)
- denoising_vt_comparison.png (v_t velocity - now cleaner)
- denoising_correction_comparison.png (NEW - separate corrections)
- denoising_x1t_comparison.png (x1_t state with error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Alexander Soare <alexander.soare159@gmail.com>

* fixup! Improve visualization: separate correction plot and fix axis scaling

* fixup! fixup! Improve visualization: separate correction plot and fix axis scaling

* fixup! fixup! fixup! Improve visualization: separate correction plot and fix axis scaling

* Fix traacking

* Right kwargs for the policy

* Add tests for tracker

* Fix tests

* Drop not required methods

* Add torch compilation for eval_dataset

* delete policies

* Add matplotliv to dev

* fixup! Add matplotliv to dev

* Experiemnt with late detach

* Debug

* Fix compilation

* Add RTC to PI0

* Pi0

* Pi0 eval dataset

* fixup! Pi0 eval dataset

* Turn off compilation for pi0/pi05

* fixup! Turn off compilation for pi0/pi05

* fixup! fixup! Turn off compilation for pi0/pi05

* fixup! fixup! fixup! Turn off compilation for pi0/pi05

* fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05

* fixup! fixup! fixup! fixup! fixup! Turn off compilation for pi0/pi05

* Add workable flow

* Small fixes

* Add more tests

* Add validatio at the end

* Update README

* Silent validation

* Fix tests

* Add tests for modeling_rtc

* Add tests for flow matching models with RTC

* fixup! Add tests for flow matching models with RTC

* fixup! fixup! Add tests for flow matching models with RTC

* Add one more test

* fixup! Add one more test

* Fix test to use _rtc_enabled() instead of is_rtc_enabled()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled()

* fixup! fixup! Fix test to use _rtc_enabled() instead of is_rtc_enabled()

* Add RTC initialization tests without config for PI0.5 and SmolVLA

Add test_pi05_rtc_initialization_without_rtc_config and
test_smolvla_rtc_initialization_without_rtc_config to verify that
policies can initialize without RTC config and that _rtc_enabled()
returns False in this case.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix PI0.5 init_rtc_processor to use getattr instead of direct model access

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix SmolVLA init_rtc_processor to use getattr instead of direct model access

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fixup! Fix PI0.5 RTC tests to use quantile stats (q01, q99) for normalization

* Fixup eval with real robot

* fixup! Fixup eval with real robot

* fixup! fixup! Fixup eval with real robot

* Extract simulator logic from eval_with real robot and add proper headers to files

* Update images

* Fix tests

* fixup! Fix tests

* add docs for rtc

* enhance doc and add images

* Fix instal instructions

---------
Co-authored-by: Ben Zhang <benzhangniu@gmail.com>
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-11-19 11:19:48 +01:00
Michel Aractingi b464d9f8bc Fix episode filtering bug when requesting a subset of the episodes in a dataset (#2456)
* filter episodes in load_nested_dataset

* nit

* remove test filtering

* move import to module level

* added missing episode indices to the EpisodeAwareSampler in lerobot_train.py;
2025-11-18 17:26:41 +01:00
54 changed files with 3088 additions and 2102 deletions
-94
View File
@@ -1,94 +0,0 @@
#!/usr/bin/env python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import threading
import time
from contextlib import ContextDecorator
class TimeBenchmark(ContextDecorator):
"""
Measures execution time using a context manager or decorator.
This class supports both context manager and decorator usage, and is thread-safe for multithreaded
environments.
Args:
print: If True, prints the elapsed time upon exiting the context or completing the function. Defaults
to False.
Examples:
Using as a context manager:
>>> benchmark = TimeBenchmark()
>>> with benchmark:
... time.sleep(1)
>>> print(f"Block took {benchmark.result:.4f} seconds")
Block took approximately 1.0000 seconds
Using with multithreading:
```python
import threading
benchmark = TimeBenchmark()
def context_manager_example():
with benchmark:
time.sleep(0.01)
print(f"Block took {benchmark.result_ms:.2f} milliseconds")
threads = []
for _ in range(3):
t1 = threading.Thread(target=context_manager_example)
threads.append(t1)
for t in threads:
t.start()
for t in threads:
t.join()
```
Expected output:
Block took approximately 10.00 milliseconds
Block took approximately 10.00 milliseconds
Block took approximately 10.00 milliseconds
"""
def __init__(self, print=False):
self.local = threading.local()
self.print_time = print
def __enter__(self):
self.local.start_time = time.perf_counter()
return self
def __exit__(self, *exc):
self.local.end_time = time.perf_counter()
self.local.elapsed_time = self.local.end_time - self.local.start_time
if self.print_time:
print(f"Elapsed time: {self.local.elapsed_time:.4f} seconds")
return False
@property
def result(self):
return getattr(self.local, "elapsed_time", None)
@property
def result_ms(self):
return self.result * 1e3
-102
View File
@@ -1,102 +0,0 @@
#!/usr/bin/env python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Capture video feed from a camera as raw images."""
import argparse
import datetime as dt
import os
import time
from pathlib import Path
import cv2
import rerun as rr
# see https://rerun.io/docs/howto/visualization/limit-ram
RERUN_MEMORY_LIMIT = os.getenv("LEROBOT_RERUN_MEMORY_LIMIT", "5%")
def display_and_save_video_stream(output_dir: Path, fps: int, width: int, height: int, duration: int):
rr.init("lerobot_capture_camera_feed")
rr.spawn(memory_limit=RERUN_MEMORY_LIMIT)
now = dt.datetime.now()
capture_dir = output_dir / f"{now:%Y-%m-%d}" / f"{now:%H-%M-%S}"
if not capture_dir.exists():
capture_dir.mkdir(parents=True, exist_ok=True)
# Opens the default webcam
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open video stream.")
return
cap.set(cv2.CAP_PROP_FPS, fps)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
frame_index = 0
start_time = time.time()
while time.time() - start_time < duration:
ret, frame = cap.read()
if not ret:
print("Error: Could not read frame.")
break
rr.log("video/stream", rr.Image(frame), static=True)
cv2.imwrite(str(capture_dir / f"frame_{frame_index:06d}.png"), frame)
frame_index += 1
# Release the capture
cap.release()
# TODO(Steven): Add a graceful shutdown via a close() method for the Viewer context, though not currently supported in the Rerun API.
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--output-dir",
type=Path,
default=Path("outputs/cam_capture/"),
help="Directory where the capture images are written. A subfolder named with the current date & time will be created inside it for each capture.",
)
parser.add_argument(
"--fps",
type=int,
default=30,
help="Frames Per Second of the capture.",
)
parser.add_argument(
"--width",
type=int,
default=1280,
help="Width of the captured images.",
)
parser.add_argument(
"--height",
type=int,
default=720,
help="Height of the captured images.",
)
parser.add_argument(
"--duration",
type=int,
default=20,
help="Duration in seconds for which the video stream should be captured.",
)
args = parser.parse_args()
display_and_save_video_stream(**vars(args))
+43 -48
View File
@@ -21,11 +21,13 @@ See the provided README.md or run `python benchmark/video/run_video_benchmark.py
import argparse import argparse
import datetime as dt import datetime as dt
import itertools
import random import random
import shutil import shutil
from collections import OrderedDict from collections import OrderedDict
from concurrent.futures import ThreadPoolExecutor, as_completed from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path from pathlib import Path
from threading import Lock
import einops import einops
import numpy as np import numpy as np
@@ -35,13 +37,13 @@ import torch
from skimage.metrics import mean_squared_error, peak_signal_noise_ratio, structural_similarity from skimage.metrics import mean_squared_error, peak_signal_noise_ratio, structural_similarity
from tqdm import tqdm from tqdm import tqdm
from benchmarks.video.benchmark import TimeBenchmark
from lerobot.datasets.lerobot_dataset import LeRobotDataset from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.video_utils import ( from lerobot.datasets.video_utils import (
decode_video_frames_torchvision, decode_video_frames,
encode_video_frames, encode_video_frames,
) )
from lerobot.utils.constants import OBS_IMAGE from lerobot.utils.constants import OBS_IMAGE
from lerobot.utils.utils import TimerManager
BASE_ENCODING = OrderedDict( BASE_ENCODING = OrderedDict(
[ [
@@ -86,7 +88,7 @@ def load_original_frames(imgs_dir: Path, timestamps: list[float], fps: int) -> t
frames = [] frames = []
for ts in timestamps: for ts in timestamps:
idx = int(ts * fps) idx = int(ts * fps)
frame = PIL.Image.open(imgs_dir / f"frame_{idx:06d}.png") frame = PIL.Image.open(imgs_dir / f"frame-{idx:06d}.png")
frame = torch.from_numpy(np.array(frame)) frame = torch.from_numpy(np.array(frame))
frame = frame.type(torch.float32) / 255 frame = frame.type(torch.float32) / 255
frame = einops.rearrange(frame, "h w c -> c h w") frame = einops.rearrange(frame, "h w c -> c h w")
@@ -97,21 +99,21 @@ def load_original_frames(imgs_dir: Path, timestamps: list[float], fps: int) -> t
def save_decoded_frames( def save_decoded_frames(
imgs_dir: Path, save_dir: Path, frames: torch.Tensor, timestamps: list[float], fps: int imgs_dir: Path, save_dir: Path, frames: torch.Tensor, timestamps: list[float], fps: int
) -> None: ) -> None:
if save_dir.exists() and len(list(save_dir.glob("frame_*.png"))) == len(timestamps): if save_dir.exists() and len(list(save_dir.glob("frame-*.png"))) == len(timestamps):
return return
save_dir.mkdir(parents=True, exist_ok=True) save_dir.mkdir(parents=True, exist_ok=True)
for i, ts in enumerate(timestamps): for i, ts in enumerate(timestamps):
idx = int(ts * fps) idx = int(ts * fps)
frame_hwc = (frames[i].permute((1, 2, 0)) * 255).type(torch.uint8).cpu().numpy() frame_hwc = (frames[i].permute((1, 2, 0)) * 255).type(torch.uint8).cpu().numpy()
PIL.Image.fromarray(frame_hwc).save(save_dir / f"frame_{idx:06d}_decoded.png") PIL.Image.fromarray(frame_hwc).save(save_dir / f"frame-{idx:06d}_decoded.png")
shutil.copyfile(imgs_dir / f"frame_{idx:06d}.png", save_dir / f"frame_{idx:06d}_original.png") shutil.copyfile(imgs_dir / f"frame-{idx:06d}.png", save_dir / f"frame-{idx:06d}_original.png")
def save_first_episode(imgs_dir: Path, dataset: LeRobotDataset) -> None: def save_first_episode(imgs_dir: Path, dataset: LeRobotDataset) -> None:
episode_index = 0 episode_index = 0
ep_num_images = dataset.meta.episodes["length"][episode_index] ep_num_images = dataset.meta.episodes["length"][episode_index]
if imgs_dir.exists() and len(list(imgs_dir.glob("frame_*.png"))) == ep_num_images: if imgs_dir.exists() and len(list(imgs_dir.glob("frame-*.png"))) == ep_num_images:
return return
imgs_dir.mkdir(parents=True, exist_ok=True) imgs_dir.mkdir(parents=True, exist_ok=True)
@@ -125,7 +127,7 @@ def save_first_episode(imgs_dir: Path, dataset: LeRobotDataset) -> None:
tqdm(imgs_dataset, desc=f"saving {dataset.repo_id} first episode images", leave=False) tqdm(imgs_dataset, desc=f"saving {dataset.repo_id} first episode images", leave=False)
): ):
img = item[img_keys[0]] img = item[img_keys[0]]
img.save(str(imgs_dir / f"frame_{i:06d}.png"), quality=100) img.save(str(imgs_dir / f"frame-{i:06d}.png"), quality=100)
if i >= ep_num_images - 1: if i >= ep_num_images - 1:
break break
@@ -149,18 +151,6 @@ def sample_timestamps(timestamps_mode: str, ep_num_images: int, fps: int) -> lis
return [idx / fps for idx in frame_indexes] return [idx / fps for idx in frame_indexes]
def decode_video_frames(
video_path: str,
timestamps: list[float],
tolerance_s: float,
backend: str,
) -> torch.Tensor:
if backend in ["pyav", "video_reader"]:
return decode_video_frames_torchvision(video_path, timestamps, tolerance_s, backend)
else:
raise NotImplementedError(backend)
def benchmark_decoding( def benchmark_decoding(
imgs_dir: Path, imgs_dir: Path,
video_path: Path, video_path: Path,
@@ -172,8 +162,8 @@ def benchmark_decoding(
num_workers: int = 4, num_workers: int = 4,
save_frames: bool = False, save_frames: bool = False,
) -> dict: ) -> dict:
def process_sample(sample: int): def process_sample(sample: int, lock: Lock):
time_benchmark = TimeBenchmark() time_benchmark = TimerManager(log=False)
timestamps = sample_timestamps(timestamps_mode, ep_num_images, fps) timestamps = sample_timestamps(timestamps_mode, ep_num_images, fps)
num_frames = len(timestamps) num_frames = len(timestamps)
result = { result = {
@@ -182,13 +172,13 @@ def benchmark_decoding(
"mse_values": [], "mse_values": [],
} }
with time_benchmark: with time_benchmark, lock:
frames = decode_video_frames(video_path, timestamps=timestamps, tolerance_s=5e-1, backend=backend) frames = decode_video_frames(video_path, timestamps=timestamps, tolerance_s=5e-1, backend=backend)
result["load_time_video_ms"] = time_benchmark.result_ms / num_frames result["load_time_video_ms"] = (time_benchmark.last * 1000) / num_frames
with time_benchmark: with time_benchmark:
original_frames = load_original_frames(imgs_dir, timestamps, fps) original_frames = load_original_frames(imgs_dir, timestamps, fps)
result["load_time_images_ms"] = time_benchmark.result_ms / num_frames result["load_time_images_ms"] = (time_benchmark.last * 1000) / num_frames
frames_np, original_frames_np = frames.numpy(), original_frames.numpy() frames_np, original_frames_np = frames.numpy(), original_frames.numpy()
for i in range(num_frames): for i in range(num_frames):
@@ -215,8 +205,10 @@ def benchmark_decoding(
# A sample is a single set of decoded frames specified by timestamps_mode (e.g. a single frame, 2 frames, etc.). # A sample is a single set of decoded frames specified by timestamps_mode (e.g. a single frame, 2 frames, etc.).
# For each sample, we record metrics (loading time and quality metrics) which are then averaged over all samples. # For each sample, we record metrics (loading time and quality metrics) which are then averaged over all samples.
# As these samples are independent, we run them in parallel threads to speed up the benchmark. # As these samples are independent, we run them in parallel threads to speed up the benchmark.
# Use a single shared lock for all worker threads
shared_lock = Lock()
with ThreadPoolExecutor(max_workers=num_workers) as executor: with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(process_sample, i) for i in range(num_samples)] futures = [executor.submit(process_sample, i, shared_lock) for i in range(num_samples)]
for future in tqdm(as_completed(futures), total=num_samples, desc="samples", leave=False): for future in tqdm(as_completed(futures), total=num_samples, desc="samples", leave=False):
result = future.result() result = future.result()
load_times_video_ms.append(result["load_time_video_ms"]) load_times_video_ms.append(result["load_time_video_ms"])
@@ -358,24 +350,27 @@ def main(
imgs_dir = output_dir / "images" / dataset.repo_id.replace("/", "_") imgs_dir = output_dir / "images" / dataset.repo_id.replace("/", "_")
# We only use the first episode # We only use the first episode
save_first_episode(imgs_dir, dataset) save_first_episode(imgs_dir, dataset)
for key, values in tqdm(encoding_benchmarks.items(), desc="encodings (g, crf)", leave=False): for duet in [
for value in tqdm(values, desc=f"encodings ({key})", leave=False): dict(zip(encoding_benchmarks.keys(), unique_combination, strict=False))
encoding_cfg = BASE_ENCODING.copy() for unique_combination in itertools.product(*encoding_benchmarks.values())
encoding_cfg["vcodec"] = video_codec ]:
encoding_cfg["pix_fmt"] = pixel_format encoding_cfg = BASE_ENCODING.copy()
encoding_cfg["vcodec"] = video_codec
encoding_cfg["pix_fmt"] = pixel_format
for key, value in duet.items():
encoding_cfg[key] = value encoding_cfg[key] = value
args_path = Path("_".join(str(value) for value in encoding_cfg.values())) args_path = Path("_".join(str(value) for value in encoding_cfg.values()))
video_path = output_dir / "videos" / args_path / f"{repo_id.replace('/', '_')}.mp4" video_path = output_dir / "videos" / args_path / f"{repo_id.replace('/', '_')}.mp4"
benchmark_table += benchmark_encoding_decoding( benchmark_table += benchmark_encoding_decoding(
dataset, dataset,
video_path, video_path,
imgs_dir, imgs_dir,
encoding_cfg, encoding_cfg,
decoding_benchmarks, decoding_benchmarks,
num_samples, num_samples,
num_workers, num_workers,
save_frames, save_frames,
) )
# Save intermediate results # Save intermediate results
benchmark_df = pd.DataFrame(benchmark_table, columns=headers) benchmark_df = pd.DataFrame(benchmark_table, columns=headers)
@@ -409,9 +404,9 @@ if __name__ == "__main__":
nargs="*", nargs="*",
default=[ default=[
"lerobot/pusht_image", "lerobot/pusht_image",
"aliberts/aloha_mobile_shrimp_image", "lerobot/aloha_mobile_shrimp_image",
"aliberts/paris_street", "lerobot/paris_street",
"aliberts/kitchen", "lerobot/kitchen",
], ],
help="Datasets repo-ids to test against. First episodes only are used. Must be images.", help="Datasets repo-ids to test against. First episodes only are used. Must be images.",
) )
@@ -419,7 +414,7 @@ if __name__ == "__main__":
"--vcodec", "--vcodec",
type=str, type=str,
nargs="*", nargs="*",
default=["libx264", "hevc", "libsvtav1"], default=["h264", "hevc", "libsvtav1"],
help="Video codecs to be tested", help="Video codecs to be tested",
) )
parser.add_argument( parser.add_argument(
@@ -468,7 +463,7 @@ if __name__ == "__main__":
"--backends", "--backends",
type=str, type=str,
nargs="*", nargs="*",
default=["pyav", "video_reader"], default=["torchcodec", "pyav"],
help="Torchvision decoding backend to be tested.", help="Torchvision decoding backend to be tested.",
) )
parser.add_argument( parser.add_argument(
+4 -2
View File
@@ -47,8 +47,8 @@
- sections: - sections:
- local: envhub - local: envhub
title: Environments from the Hub title: Environments from the Hub
- local: il_sim - local: envhub_leisaac
title: Imitation Learning in Sim title: Control & Train Robots in Sim (LeIsaac)
- local: libero - local: libero
title: Using Libero title: Using Libero
- local: metaworld - local: metaworld
@@ -63,6 +63,8 @@
title: Implement your own processor title: Implement your own processor
- local: processors_robots_teleop - local: processors_robots_teleop
title: Processors for Robots and Teleoperators title: Processors for Robots and Teleoperators
- local: env_processor
title: Environment Processors
title: "Robot Processors" title: "Robot Processors"
- sections: - sections:
- local: so101 - local: so101
+1 -1
View File
@@ -196,7 +196,7 @@ client_cfg = RobotClientConfig(
server_address="localhost:8080", server_address="localhost:8080",
policy_device="mps", policy_device="mps",
policy_type="smolvla", policy_type="smolvla",
pretrained_name_or_path="fracapuano/smolvla_async", pretrained_name_or_path="<user>/smolvla_async",
chunk_size_threshold=0.5, chunk_size_threshold=0.5,
actions_per_chunk=50, # make sure this is less than the max actions of the policy actions_per_chunk=50, # make sure this is less than the max actions of the policy
) )
+418
View File
@@ -0,0 +1,418 @@
# Environment Processors
Environment processors are a critical layer in LeRobot's data processing architecture that handle **environment-specific** transformations, separate from policy-specific processing. This separation of concerns enables cleaner code, better modularity, and easier experimentation with different environments and policies.
## Why Environment Processors?
When working with different robot environments (LIBERO, MetaWorld, Aloha, etc.), each environment often has unique data formats, coordinate systems, and conventions that need standardization **before** policy processing. Without environment processors, these transformations would be:
1. **Hardcoded in environment code** - Making it difficult to experiment with different state representations
2. **Duplicated across policies** - Each policy would need to handle environment-specific quirks
3. **Mixed with policy logic** - Violating separation of concerns and making debugging harder
Environment processors solve this by providing a **dedicated processing layer** between raw environment observations and policy inputs.
## The Processing Pipeline
Here's how data flows through the complete processing pipeline during evaluation:
```python
# In lerobot_eval.py rollout() function:
# 1. Raw environment observation (numpy arrays, various formats)
raw_observation = env.step(action)
# 2. Convert numpy to torch, normalize images [0,1]
observation = preprocess_observation(raw_observation)
# 3. Add task metadata (for multi-task environments)
observation = add_envs_task(env, observation)
# 4. ENVIRONMENT-SPECIFIC preprocessing (NEW!)
# - Flatten robot states
# - Rotate images to match dataset conventions
# - Handle environment-specific coordinate systems
observation = env_preprocessor(observation)
# 5. POLICY-SPECIFIC preprocessing
# - Normalize with dataset statistics
# - Add batch dimensions
# - Move to GPU
# - Tokenize language instructions
observation = preprocessor(observation)
# 6. Policy inference
action = policy.select_action(observation)
# 7. POLICY-SPECIFIC postprocessing
# - Unnormalize actions
# - Remove batch dimensions
action = postprocessor(action)
# 8. ENVIRONMENT-SPECIFIC postprocessing (NEW!)
# - Convert action formats if needed
# - Apply environment-specific constraints
action_transition = {"action": action}
action_transition = env_postprocessor(action_transition)
action = action_transition["action"]
# 9. Execute in environment
env.step(action)
```
## The Benefits
### 1. **Separation of Concerns**
Environment processors handle transformations specific to the **environment's data format**, while policy processors handle transformations specific to the **model's requirements**.
```python
# ❌ Before: Mixed concerns
class LiberoVLAPolicy:
def preprocess(self, obs):
# Environment-specific: Flatten robot state (shouldn't be in policy!)
state = self._flatten_robot_state(obs["robot_state"])
# Policy-specific: Normalize with dataset stats
state = self.normalizer(state)
return state
# ✅ After: Clear separation
# Environment processor: Handles LIBERO's nested robot state
env_preprocessor = LiberoProcessorStep() # Flattens robot_state
# Policy processor: Handles model requirements
policy_preprocessor = NormalizerProcessorStep(stats=dataset_stats)
```
### 2. **Flexibility and Reusability**
The same policy can work with different environment processors, and the same environment processor can work with different policies:
```python
# Use SmolVLA policy with LIBERO environment
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
smolvla_preprocessor, smolvla_postprocessor = make_pre_post_processors(smolvla_cfg)
# Or use ACT policy with the same LIBERO environment
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
act_preprocessor, act_postprocessor = make_pre_post_processors(act_cfg)
```
### 3. **Easier Experimentation**
Want to try different state representations for LIBERO? Just create a new processor:
```python
# Original: 8D state (pos + quat→axisangle + gripper)
@ProcessorStepRegistry.register("libero_processor")
class LiberoProcessorStep(ObservationProcessorStep):
def _process_observation(self, obs):
eef_pos = robot_state["eef"]["pos"] # 3D
eef_axisangle = quat2axisangle(quat) # 3D
gripper = robot_state["gripper"]["qpos"] # 2D
state = torch.cat([eef_pos, eef_axisangle, gripper], dim=-1) # 8D
return state
# Experiment: Add velocity for better control
@ProcessorStepRegistry.register("libero_velocity_processor")
class LiberoVelocityProcessorStep(ObservationProcessorStep):
def _process_observation(self, obs):
# Include velocities for 14D state
eef_pos = robot_state["eef"]["pos"] # 3D
eef_axisangle = quat2axisangle(quat) # 3D
eef_vel = robot_state["eef"]["vel"] # 3D (NEW)
gripper_pos = robot_state["gripper"]["qpos"] # 2D
gripper_vel = robot_state["gripper"]["qvel"] # 3D (NEW)
state = torch.cat([eef_pos, eef_axisangle, eef_vel,
gripper_pos, gripper_vel], dim=-1) # 14D
return state
```
### 4. **Cleaner Environment Code**
Environments expose **all available data** without needing to know what downstream models will use:
```python
# LIBERO environment exposes full robot state
observation = {
"pixels": {"image": img, "image2": img2},
"robot_state": {
"eef": {"pos": ..., "quat": ..., "vel": ..., "mat": ..., "axisangle": ...},
"gripper": {"qpos": ..., "qvel": ...},
"joints": {"pos": ..., "vel": ...}
}
}
# Environment processor decides what to use
# Policy processor handles model-specific transformations
```
## Using Environment Processors
### Factory Function
The `make_env_pre_post_processors` function follows the same pattern as `make_pre_post_processors` for policies:
```python
from lerobot.envs.factory import make_env_pre_post_processors
from lerobot.envs.configs import LiberoEnv, PushtEnv
# For LIBERO: Returns LiberoProcessorStep in preprocessor
libero_cfg = LiberoEnv(task="libero_spatial", camera_name=["agentview"])
env_preprocessor, env_postprocessor = make_env_pre_post_processors(libero_cfg)
# For other environments: Returns identity processors (no-op)
pusht_cfg = PushtEnv()
env_preprocessor, env_postprocessor = make_env_pre_post_processors(pusht_cfg)
```
### Implementation in `envs/factory.py`
```python
def make_env_pre_post_processors(
env_cfg: EnvConfig,
) -> tuple[
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
]:
"""
Create preprocessor and postprocessor pipelines for environment observations.
Args:
env_cfg: The configuration of the environment.
Returns:
A tuple containing:
- preprocessor: Pipeline that processes environment observations
- postprocessor: Pipeline that processes environment outputs
"""
# For LIBERO environments, add the LiberoProcessorStep to preprocessor
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
preprocessor = PolicyProcessorPipeline(steps=[LiberoProcessorStep()])
else:
# For all other environments, return an identity preprocessor
preprocessor = PolicyProcessorPipeline(steps=[])
# Postprocessor is currently identity for all environments
# Future: Could add environment-specific action transformations
postprocessor = PolicyProcessorPipeline(steps=[])
return preprocessor, postprocessor
```
### Integration in Evaluation
In `lerobot_eval.py`, the environment processors are created once and used throughout:
```python
def eval_main(cfg: EvalPipelineConfig):
# Create environment
envs = make_env(cfg.env, n_envs=cfg.eval.batch_size)
# Create policy
policy = make_policy(cfg=cfg.policy, env_cfg=cfg.env)
# Create policy processors
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=cfg.policy,
pretrained_path=cfg.policy.pretrained_path,
)
# Create environment processors (NEW!)
env_preprocessor, env_postprocessor = make_env_pre_post_processors(env_cfg=cfg.env)
# Run evaluation with both processor types
eval_policy_all(
envs=envs,
policy=policy,
env_preprocessor=env_preprocessor, # Environment-specific
env_postprocessor=env_postprocessor, # Environment-specific
preprocessor=preprocessor, # Policy-specific
postprocessor=postprocessor, # Policy-specific
n_episodes=cfg.eval.n_episodes,
)
```
## Example: LIBERO Environment Processor
The `LiberoProcessorStep` demonstrates a real-world environment processor:
```python
from lerobot.processor.pipeline import ObservationProcessorStep
@dataclass
@ProcessorStepRegistry.register(name="libero_processor")
class LiberoProcessorStep(ObservationProcessorStep):
"""
Processes LIBERO observations into the LeRobot format.
**State Processing:**
- Extracts end-effector position (3D)
- Converts quaternion to axis-angle representation (3D)
- Extracts gripper joint positions (2D)
- Concatenates into 8D state vector
**Image Processing:**
- Rotates images 180° to match HuggingFaceVLA/libero convention
"""
def _process_observation(self, observation):
processed_obs = observation.copy()
# Process images: Flip 180° for camera convention
for key in list(processed_obs.keys()):
if key.startswith("observation.images."):
img = processed_obs[key]
img = torch.flip(img, dims=[2, 3]) # Flip H and W
processed_obs[key] = img
# Process robot_state: Flatten to 8D vector
if "observation.robot_state" in processed_obs:
robot_state = processed_obs.pop("observation.robot_state")
eef_pos = robot_state["eef"]["pos"] # (B, 3)
eef_quat = robot_state["eef"]["quat"] # (B, 4)
gripper_qpos = robot_state["gripper"]["qpos"] # (B, 2)
# Convert quaternion to axis-angle
eef_axisangle = self._quat2axisangle(eef_quat) # (B, 3)
# Concatenate into single state vector
state = torch.cat((eef_pos, eef_axisangle, gripper_qpos), dim=-1)
state = state.float()
processed_obs["observation.state"] = state
return processed_obs
```
### Why These Transformations?
1. **Image Rotation**: The HuggingFaceVLA/libero dataset has images rotated 180° from the raw LIBERO simulator. The processor handles this convention mismatch so policies trained on the dataset work seamlessly.
2. **State Flattening**: The raw LIBERO environment exposes nested dictionaries with all available state information (position, quaternion, velocity, matrix representation, etc.). The processor:
- Selects the relevant components (pos, quat, gripper)
- Converts quaternion to axis-angle (more suitable for learning)
- Flattens to a single 8D vector that policies expect
3. **Flexibility**: The environment still exposes **all** raw data. If you want to try different state representations (e.g., including velocities, using matrix representation instead of axis-angle), you can create a new processor without modifying the environment code.
## Adding Environment Processors for New Environments
To add environment processors for a new environment:
### 1. Create the Processor Step
```python
# In src/lerobot/processor/env_processor.py
@dataclass
@ProcessorStepRegistry.register(name="myenv_processor")
class MyEnvProcessorStep(ObservationProcessorStep):
"""Process observations from MyEnv."""
def _process_observation(self, observation):
processed = observation.copy()
# Your environment-specific transformations
if "myenv.specific.state" in processed:
state = processed.pop("myenv.specific.state")
# Transform to standard format
processed["observation.state"] = self._transform_state(state)
return processed
```
### 2. Update the Factory
```python
# In src/lerobot/envs/factory.py
def make_env_pre_post_processors(env_cfg: EnvConfig):
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
preprocessor = PolicyProcessorPipeline(steps=[LiberoProcessorStep()])
elif isinstance(env_cfg, MyEnvConfig) or "myenv" in env_cfg.type:
preprocessor = PolicyProcessorPipeline(steps=[MyEnvProcessorStep()])
else:
preprocessor = PolicyProcessorPipeline(steps=[])
postprocessor = PolicyProcessorPipeline(steps=[])
return preprocessor, postprocessor
```
### 3. Use in Evaluation
No changes needed! The evaluation script automatically uses the appropriate processor:
```bash
lerobot-eval \
--policy.path=lerobot/my_policy \
--env.type=myenv \ # Automatically uses MyEnvProcessorStep
--eval.n_episodes=10
```
## Future: Environment Postprocessors
Currently, postprocessors are identity (no-op) for all environments. Future use cases include:
### Action Space Transformations
```python
@dataclass
class MyEnvActionPostprocessor(ProcessorStep):
"""Convert policy actions to environment-specific format."""
def __call__(self, transition: EnvTransition) -> EnvTransition:
action = transition["action"]
# Example: Convert from Cartesian to joint space
if self.action_space == "joint":
action = self.ik_solver(action)
# Example: Apply environment-specific safety limits
action = torch.clamp(action, self.min_action, self.max_action)
transition["action"] = action
return transition
```
### Coordinate System Conversions
```python
@dataclass
class CoordinateTransformPostprocessor(ProcessorStep):
"""Transform actions between coordinate systems."""
def __call__(self, transition: EnvTransition) -> EnvTransition:
action = transition["action"]
# Example: Policy outputs in world frame, env expects base frame
action = self.world_to_base_transform(action)
transition["action"] = action
return transition
```
## Best Practices
1. **Keep environment processors simple**: They should only handle environment-specific data format issues, not complex learning-related transformations.
2. **Use policy processors for model requirements**: Normalization, batching, device placement, and tokenization belong in policy processors.
3. **Expose all data from environments**: Let processors decide what to use rather than hardcoding choices in the environment.
4. **Document conventions**: Clearly document any coordinate system conventions, camera orientations, or data formats that your processor handles.
5. **Test independently**: Environment processors should be testable without loading full policies or environments.
## Summary
Environment processors provide a **clean separation** between environment-specific data transformations and policy-specific model requirements. This architecture:
- ✅ Enables easy experimentation with different state representations
- ✅ Allows policies to work seamlessly across different environments
- ✅ Keeps environment code focused on simulation/hardware interface
- ✅ Makes processor pipelines more maintainable and debuggable
- ✅ Follows the single responsibility principle
The key insight: **Environments define data formats, processors standardize them, policies consume standardized data.** Each layer has a clear, focused responsibility.
+301
View File
@@ -0,0 +1,301 @@
# LeIsaac × LeRobot EnvHub
LeRobot EnvHub now supports **imitation learning in simulation** with LeIsaac.
Spin up everyday manipulation tasks, teleoperate the robot, collect demos, push them to the Hub, and train policies in LeRobot — all in one loop.
[LeIsaac](https://github.com/LightwheelAI/leisaac) integrates with IsaacLab and the SO101 Leader/Follower setup to provide:
- 🕹️ **Teleoperation-first workflows** for data collection
- 📦 **Built-in data conversion** ready for LeRobot training
- 🤖 **Everyday skills** like picking oranges, lifting cubes, cleaning tables, and folding cloth
- ☁️ **Ongoing upgrades** from [LightWheel](https://lightwheel.ai/): cloud simulation, EnvHub support, Sim2Real tooling, and more
Below youll find the currently supported LeIsaac tasks exposed through LeRobot EnvHub.
# Available Environments
The following table lists all available tasks and environments in LeIsaac x LeRobot Envhub. You can also get the latest list of environments by running the following command:
```bash
python scripts/environments/list_envs.py
```
| Task | Environment ID | Task Description | Related Robot |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------- |
| <video src="https://github.com/user-attachments/assets/466eddff-f720-4f99-94d5-5e123e4c302c" autoplay loop muted playsinline style="max-width: 300px;"></video> | [LeIsaac-SO101-PickOrange-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/pick_orange/pick_orange_env_cfg.py)<br /><br />[LeIsaac-SO101-PickOrange-Direct-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/pick_orange/direct/pick_orange_env.py) | Pick three oranges and put them into the plate, then reset the arm to rest state. | Single-Arm SO101 Follower |
| <video src="https://github.com/user-attachments/assets/1e4eb83a-0b38-40fb-a0b2-ddb0fe201e6d" autoplay loop muted playsinline style="max-width: 300px;"></video> | [LeIsaac-SO101-LiftCube-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/lift_cube/lift_cube_env_cfg.py)<br /><br />[LeIsaac-SO101-LiftCube-Direct-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/lift_cube/direct/lift_cube_env.py) | Lift the red cube up. | Single-Arm SO101 Follower |
| <video src="https://github.com/user-attachments/assets/e49d8f1c-dcc9-412b-a88f-100680d8a45b" autoplay loop muted playsinline style="max-width: 300px;"></video> | [LeIsaac-SO101-CleanToyTable-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/clean_toy_table/clean_toy_table_env_cfg.py)<br /><br />[LeIsaac-SO101-CleanToyTable-BiArm-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/clean_toy_table/clean_toy_table_bi_arm_env_cfg.py)<br /><br />[LeIsaac-SO101-CleanToyTable-BiArm-Direct-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/clean_toy_table/direct/clean_toy_table_bi_arm_env.py) | Pick two letter e objects into the box, and reset the arm to rest state. | Single-Arm SO101 Follower<br /><br />Bi-Arm SO101 Follower |
| <video src="https://github.com/user-attachments/assets/e29a0f8a-9286-4ce6-b45d-342c3d3ba754" autoplay loop muted playsinline style="max-width: 300px;"></video> | [LeIsaac-SO101-FoldCloth-BiArm-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/fold_cloth/fold_cloth_bi_arm_env_cfg.py)<br /><br />[LeIsaac-SO101-FoldCloth-BiArm-Direct-v0](https://github.com/LightwheelAI/leisaac/blob/main/source/leisaac/leisaac/tasks/fold_cloth/direct/fold_cloth_bi_arm_env.py) | Fold the cloth, and reset the arm to rest state.<br /><br />_Note: Only the DirectEnv support check_success in this task._ | Bi-Arm SO101 Follower |
# Load LeIsaac directly in LeRobot with one line of code
> EnvHub: Share LeIsaac environments through HuggingFace
[EnvHub](https://huggingface.co/docs/lerobot/envhub) is our reproducible environment hub, spin up a packaged simulation with one line, experiment immediately, and publish your own tasks for the community.
LeIsaac offers EnvHub support so you can consume or share tasks with only a few commands.
<video
controls
src="https://github.com/user-attachments/assets/687666f5-ebe0-421d-84a0-eb86116ac5f8"
style={{ width: "100%", maxWidth: "960px", borderRadius: "8px" }}
/>
## How to get started, environment Setup
Run the following commands to setup your code environments:
```bash
# Refer to Getting Started/Installation to install leisaac firstly
conda create -n leisaac_envhub python=3.11
conda activate leisaac_envhub
conda install -c "nvidia/label/cuda-12.8.1" cuda-toolkit
pip install -U torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
pip install 'leisaac[isaaclab] @ git+https://github.com/LightwheelAI/leisaac.git#subdirectory=source/leisaac' --extra-index-url https://pypi.nvidia.com
# Install lerobot
pip install lerobot==0.4.1
# Fix numpy version
pip install numpy==1.26.0
```
## Usage Example
EnvHub exposes every LeIsaac-supported task in a uniform interface. The examples below load `so101_pick_orange` and demonstrate a random-action rollout and an interactive teleoperation.
### Random Action
<details>
<summary>Click to expand code example</summary>
```python
# envhub_random_action.py
import torch
from lerobot.envs.factory import make_env
# Load from the hub
envs_dict = make_env("LightwheelAI/leisaac_env:envs/so101_pick_orange.py", n_envs=1, trust_remote_code=True)
# Access the environment
suite_name = next(iter(envs_dict))
sync_vector_env = envs_dict[suite_name][0]
# retrieve the isaac environment from the sync vector env
env = sync_vector_env.envs[0].unwrapped
# Use it like any gym environment
obs, info = env.reset()
while True:
action = torch.tensor(env.action_space.sample())
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
```
</details>
```bash
python envhub_random_action.py
```
You should see the SO101 arm swinging under purely random commands.
### Teleoperation
LeRobots teleoperation stack can drive the simulated arm.
Connect the SO101 Leader controller, run the calibration command below.
```bash
lerobot-calibrate \
--teleop.type=so101_leader \
--teleop.port=/dev/ttyACM0 \
--teleop.id=leader
```
And then launch the teleop script.
<details>
<summary>Click to expand code example</summary>
```python
# envhub_teleop_example.py
import logging
import time
import gymnasium as gym
from dataclasses import asdict, dataclass
from pprint import pformat
from lerobot.teleoperators import ( # noqa: F401
Teleoperator,
TeleoperatorConfig,
make_teleoperator_from_config,
so101_leader,
)
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import init_logging
from lerobot.envs.factory import make_env
@dataclass
class TeleoperateConfig:
teleop: TeleoperatorConfig
env_name: str = "so101_pick_orange"
fps: int = 60
@dataclass
class EnvWrap:
env: gym.Env
def make_env_from_leisaac(env_name: str = "so101_pick_orange"):
envs_dict = make_env(
f'LightwheelAI/leisaac_env:envs/{env_name}.py',
n_envs=1,
trust_remote_code=True
)
suite_name = next(iter(envs_dict))
sync_vector_env = envs_dict[suite_name][0]
env = sync_vector_env.envs[0].unwrapped
return env
def teleop_loop(teleop: Teleoperator, env: gym.Env, fps: int):
from leisaac.devices.action_process import preprocess_device_action
from leisaac.assets.robots.lerobot import SO101_FOLLOWER_MOTOR_LIMITS
from leisaac.utils.env_utils import dynamic_reset_gripper_effort_limit_sim
env_wrap = EnvWrap(env=env)
obs, info = env.reset()
while True:
loop_start = time.perf_counter()
if env.cfg.dynamic_reset_gripper_effort_limit:
dynamic_reset_gripper_effort_limit_sim(env, 'so101leader')
raw_action = teleop.get_action()
processed_action = preprocess_device_action(
dict(
so101_leader=True,
joint_state={
k.removesuffix(".pos"): v for k, v in raw_action.items()},
motor_limits=SO101_FOLLOWER_MOTOR_LIMITS),
env_wrap
)
obs, reward, terminated, truncated, info = env.step(processed_action)
if terminated or truncated:
obs, info = env.reset()
dt_s = time.perf_counter() - loop_start
precise_sleep(1 / fps - dt_s)
loop_s = time.perf_counter() - loop_start
print(f"\ntime: {loop_s * 1e3:.2f}ms ({1 / loop_s:.0f} Hz)")
def teleoperate(cfg: TeleoperateConfig):
init_logging()
logging.info(pformat(asdict(cfg)))
teleop = make_teleoperator_from_config(cfg.teleop)
env = make_env_from_leisaac(cfg.env_name)
teleop.connect()
if hasattr(env, 'initialize'):
env.initialize()
try:
teleop_loop(teleop=teleop, env=env, fps=cfg.fps)
except KeyboardInterrupt:
pass
finally:
teleop.disconnect()
env.close()
def main():
teleoperate(TeleoperateConfig(
teleop=so101_leader.SO101LeaderConfig(
port="/dev/ttyACM0",
id='leader',
use_degrees=False,
),
env_name="so101_pick_orange",
fps=60,
))
if __name__ == "__main__":
main()
```
</details>
```bash
python envhub_teleop_example.py
```
Running the script lets you operate the simulated arm using the physical Leader device.
## ☁️ Cloud Simulation (No GPU Required)
Dont have a local GPU or the right drivers? No problem! You can run LeIsaac entirely in the cloud with zero setup.
LeIsaac works out-of-the-box on **NVIDIA Brev**, giving you a fully configured environment directly in your browser.
👉 **Start here:** [https://lightwheelai.github.io/leisaac/docs/cloud_simulation/nvidia_brev](https://lightwheelai.github.io/leisaac/docs/cloud_simulation/nvidia_brev)
Once your instance is deployed, simply open the link for **port 80 (HTTP)** to launch **Visual Studio Code Server** (default password: `password`). From there, you can run simulations, edit code, and visualize IsaacLab environments — all from your web browser.
**No GPU, no drivers, no local installation. Just click and run.**
## Additional Notes
We keep EnvHub coverage aligned with the LeIsaac task. Currently supported:
- `so101_pick_orange`
- `so101_lift_cube`
- `so101_clean_toytable`
- `bi_so101_fold_cloth`
Switch tasks by targeting a different script when calling `make_env`, for example:
```python
envs_dict_pick_orange = make_env("LightwheelAI/leisaac_env:envs/so101_pick_orange.py", n_envs=1, trust_remote_code=True)
envs_dict_lift_cube = make_env("LightwheelAI/leisaac_env:envs/so101_lift_cube.py", n_envs=1, trust_remote_code=True)
envs_dict_clean_toytable = make_env("LightwheelAI/leisaac_env:envs/so101_clean_toytable.py", n_envs=1, trust_remote_code=True)
envs_dict_fold_cloth = make_env("LightwheelAI/leisaac_env:envs/bi_so101_fold_cloth.py", n_envs=1, trust_remote_code=True)
```
Note: when working with `bi_so101_fold_cloth`, call `initialize()` immediately after retrieving the env before performing any other operations:
<details>
<summary>Click to expand code example</summary>
```python
import torch
from lerobot.envs.factory import make_env
# Load from the hub
envs_dict = make_env("LightwheelAI/leisaac_env:envs/bi_so101_fold_cloth.py", n_envs=1, trust_remote_code=True)
# Access the environment
suite_name = next(iter(envs_dict))
sync_vector_env = envs_dict[suite_name][0]
# retrieve the isaac environment from the sync vector env
env = sync_vector_env.envs[0].unwrapped
# NOTE: initialize() first
env.initialize()
# other operation with env...
```
</details>
+2 -2
View File
@@ -393,7 +393,7 @@ import time
from lerobot.datasets.lerobot_dataset import LeRobotDataset from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say from lerobot.utils.utils import log_say
episode_idx = 0 episode_idx = 0
@@ -415,7 +415,7 @@ for idx in range(dataset.num_frames):
} }
robot.send_action(action) robot.send_action(action)
busy_wait(1.0 / dataset.fps - (time.perf_counter() - t0)) precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
robot.disconnect() robot.disconnect()
``` ```
-220
View File
@@ -1,220 +0,0 @@
# Imitation Learning in Sim
This tutorial will explain how to train a neural network to control a robot in simulation with imitation learning.
**You'll learn:**
1. How to record a dataset in simulation with [gym-hil](https://github.com/huggingface/gym-hil) and visualize the dataset.
2. How to train a policy using your data.
3. How to evaluate your policy in simulation and visualize the results.
For the simulation environment we use the same [repo](https://github.com/huggingface/gym-hil) that is also being used by the Human-In-the-Loop (HIL) reinforcement learning algorithm.
This environment is based on [MuJoCo](https://mujoco.org) and allows you to record datasets in LeRobotDataset format.
Teleoperation is easiest with a controller like the Logitech F710, but you can also use your keyboard if you are up for the challenge.
## Installation
First, install the `gym_hil` package within the LeRobot environment, go to your LeRobot folder and run this command:
```bash
pip install -e ".[hilserl]"
```
## Teleoperate and Record a Dataset
To use `gym_hil` with LeRobot, you need to use a configuration file. An example config file can be found [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/sim_il/env_config.json).
To teleoperate and collect a dataset, we need to modify this config file. Here's an example configuration for imitation learning data collection:
```json
{
"env": {
"type": "gym_manipulator",
"name": "gym_hil",
"task": "PandaPickCubeGamepad-v0",
"fps": 10
},
"dataset": {
"repo_id": "your_username/il_gym",
"root": null,
"task": "pick_cube",
"num_episodes_to_record": 30,
"replay_episode": null,
"push_to_hub": true
},
"mode": "record",
"device": "cuda"
}
```
Key configuration points:
- Set your `repo_id` in the `dataset` section: `"repo_id": "your_username/il_gym"`
- Set `num_episodes_to_record: 30` to collect 30 demonstration episodes
- Ensure `mode` is set to `"record"`
- If you don't have an NVIDIA GPU, change `"device": "cuda"` to `"mps"` for macOS or `"cpu"`
- To use keyboard instead of gamepad, change `"task"` to `"PandaPickCubeKeyboard-v0"`
Then we can run this command to start:
<hfoptions id="teleop_sim">
<hfoption id="Linux">
```bash
python -m lerobot.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json
```
</hfoption>
<hfoption id="MacOS">
```bash
mjpython -m lerobot.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json
```
</hfoption>
</hfoptions>
Once rendered you can teleoperate the robot with the gamepad or keyboard, below you can find the gamepad/keyboard controls.
Note that to teleoperate the robot you have to hold the "Human Take Over Pause Policy" Button `RB` to enable control!
**Gamepad Controls**
<p align="center">
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/gamepad_guide.jpg?raw=true"
alt="Figure shows the control mappings on a Logitech gamepad."
title="Gamepad Control Mapping"
width="100%"
></img>
</p>
<p align="center">
<i>Gamepad button mapping for robot control and episode management</i>
</p>
**Keyboard controls**
For keyboard controls use the `spacebar` to enable control and the following keys to move the robot:
```bash
Arrow keys: Move in X-Y plane
Shift and Shift_R: Move in Z axis
Right Ctrl and Left Ctrl: Open and close gripper
ESC: Exit
```
## Visualize a dataset
If you uploaded your dataset to the hub you can [visualize your dataset online](https://huggingface.co/spaces/lerobot/visualize_dataset) by copy pasting your repo id.
<p align="center">
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/dataset_visualizer_sim.png"
alt="Figure shows the dataset visualizer"
title="Dataset visualization"
width="100%"
></img>
</p>
<p align="center">
<i>Dataset visualizer</i>
</p>
## Train a policy
To train a policy to control your robot, use the [`lerobot-train`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
```bash
lerobot-train \
--dataset.repo_id=${HF_USER}/il_gym \
--policy.type=act \
--output_dir=outputs/train/il_sim_test \
--job_name=il_sim_test \
--policy.device=cuda \
--wandb.enable=true
```
Let's explain the command:
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/il_gym`.
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor states, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
3. We provided `policy.device=cuda` since we are training on a Nvidia GPU, but you could use `policy.device=mps` to train on Apple silicon.
4. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.
Training should take several hours, 100k steps (which is the default) will take about 1h on Nvidia A100. You will find checkpoints in `outputs/train/il_sim_test/checkpoints`.
#### Train using Collab
If your local computer doesn't have a powerful GPU you could utilize Google Collab to train your model by following the [ACT training notebook](./notebooks#training-act).
#### Upload policy checkpoints
Once training is done, upload the latest checkpoint with:
```bash
huggingface-cli upload ${HF_USER}/il_sim_test \
outputs/train/il_sim_test/checkpoints/last/pretrained_model
```
You can also upload intermediate checkpoints with:
```bash
CKPT=010000
huggingface-cli upload ${HF_USER}/il_sim_test${CKPT} \
outputs/train/il_sim_test/checkpoints/${CKPT}/pretrained_model
```
## Evaluate your policy in Sim
To evaluate your policy we have to use a configuration file. An example can be found [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/sim_il/eval_config.json).
Here's an example evaluation configuration:
```json
{
"env": {
"type": "gym_manipulator",
"name": "gym_hil",
"task": "PandaPickCubeGamepad-v0",
"fps": 10
},
"dataset": {
"repo_id": "your_username/il_sim_dataset",
"dataset_root": null,
"task": "pick_cube"
},
"pretrained_policy_name_or_path": "your_username/il_sim_model",
"device": "cuda"
}
```
Make sure to replace:
- `repo_id` with the dataset you trained on (e.g., `your_username/il_sim_dataset`)
- `pretrained_policy_name_or_path` with your model ID (e.g., `your_username/il_sim_model`)
Then you can run this command to visualize your trained policy
<hfoptions id="eval_policy">
<hfoption id="Linux">
```bash
python -m lerobot.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json
```
</hfoption>
<hfoption id="MacOS">
```bash
mjpython -m lerobot.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json
```
</hfoption>
</hfoptions>
> [!WARNING]
> While the main workflow of training ACT in simulation is straightforward, there is significant room for exploring how to set up the task, define the initial state of the environment, and determine the type of data required during collection to learn the most effective policy. If your trained policy doesn't perform well, investigate the quality of the dataset it was trained on using our visualizers, as well as the action values and various hyperparameters related to ACT and the simulation.
Congrats 🎉, you have finished this tutorial. If you want to continue with using LeRobot in simulation follow this [Tutorial on reinforcement learning in sim with HIL-SERL](https://huggingface.co/docs/lerobot/hilserl_sim)
> [!TIP]
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
+2 -2
View File
@@ -45,7 +45,7 @@ from lerobot.robots import ( # noqa: F401
so101_follower, so101_follower,
) )
from lerobot.utils.constants import ACTION from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import ( from lerobot.utils.utils import (
init_logging, init_logging,
log_say, log_say,
@@ -97,7 +97,7 @@ def replay(cfg: ReplayConfig):
robot.send_action(action) robot.send_action(action)
dt_s = time.perf_counter() - start_episode_t dt_s = time.perf_counter() - start_episode_t
busy_wait(1 / dataset.fps - dt_s) precise_sleep(1 / dataset.fps - dt_s)
robot.disconnect() robot.disconnect()
+86 -81
View File
@@ -34,105 +34,106 @@ from huggingface_hub import HfApi
import lerobot import lerobot
from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
# We ported a number of existing datasets ourselves, use this to see the list:
print("List of available datasets:")
pprint(lerobot.available_datasets)
# You can also browse through the datasets created/ported by the community on the hub using the hub api: def main():
hub_api = HfApi() # We ported a number of existing datasets ourselves, use this to see the list:
repo_ids = [info.id for info in hub_api.list_datasets(task_categories="robotics", tags=["LeRobot"])] print("List of available datasets:")
pprint(repo_ids) pprint(lerobot.available_datasets)
# Or simply explore them in your web browser directly at: # You can also browse through the datasets created/ported by the community on the hub using the hub api:
# https://huggingface.co/datasets?other=LeRobot hub_api = HfApi()
repo_ids = [info.id for info in hub_api.list_datasets(task_categories="robotics", tags=["LeRobot"])]
pprint(repo_ids)
# Let's take this one for this example # Or simply explore them in your web browser directly at:
repo_id = "lerobot/aloha_mobile_cabinet" # https://huggingface.co/datasets?other=LeRobot
# We can have a look and fetch its metadata to know more about it:
ds_meta = LeRobotDatasetMetadata(repo_id)
# By instantiating just this class, you can quickly access useful information about the content and the # Let's take this one for this example
# structure of the dataset without downloading the actual data yet (only metadata files — which are repo_id = "lerobot/aloha_mobile_cabinet"
# lightweight). # We can have a look and fetch its metadata to know more about it:
print(f"Total number of episodes: {ds_meta.total_episodes}") ds_meta = LeRobotDatasetMetadata(repo_id)
print(f"Average number of frames per episode: {ds_meta.total_frames / ds_meta.total_episodes:.3f}")
print(f"Frames per second used during data collection: {ds_meta.fps}")
print(f"Robot type: {ds_meta.robot_type}")
print(f"keys to access images from cameras: {ds_meta.camera_keys=}\n")
print("Tasks:") # By instantiating just this class, you can quickly access useful information about the content and the
print(ds_meta.tasks) # structure of the dataset without downloading the actual data yet (only metadata files — which are
print("Features:") # lightweight).
pprint(ds_meta.features) print(f"Total number of episodes: {ds_meta.total_episodes}")
print(f"Average number of frames per episode: {ds_meta.total_frames / ds_meta.total_episodes:.3f}")
print(f"Frames per second used during data collection: {ds_meta.fps}")
print(f"Robot type: {ds_meta.robot_type}")
print(f"keys to access images from cameras: {ds_meta.camera_keys=}\n")
# You can also get a short summary by simply printing the object: print("Tasks:")
print(ds_meta) print(ds_meta.tasks)
print("Features:")
pprint(ds_meta.features)
# You can then load the actual dataset from the hub. # You can also get a short summary by simply printing the object:
# Either load any subset of episodes: print(ds_meta)
dataset = LeRobotDataset(repo_id, episodes=[0, 10, 11, 23])
# And see how many frames you have: # You can then load the actual dataset from the hub.
print(f"Selected episodes: {dataset.episodes}") # Either load any subset of episodes:
print(f"Number of episodes selected: {dataset.num_episodes}") dataset = LeRobotDataset(repo_id, episodes=[0, 10, 11, 23])
print(f"Number of frames selected: {dataset.num_frames}")
# Or simply load the entire dataset: # And see how many frames you have:
dataset = LeRobotDataset(repo_id) print(f"Selected episodes: {dataset.episodes}")
print(f"Number of episodes selected: {dataset.num_episodes}") print(f"Number of episodes selected: {dataset.num_episodes}")
print(f"Number of frames selected: {dataset.num_frames}") print(f"Number of frames selected: {dataset.num_frames}")
# The previous metadata class is contained in the 'meta' attribute of the dataset: # Or simply load the entire dataset:
print(dataset.meta) dataset = LeRobotDataset(repo_id)
print(f"Number of episodes selected: {dataset.num_episodes}")
print(f"Number of frames selected: {dataset.num_frames}")
# LeRobotDataset actually wraps an underlying Hugging Face dataset # The previous metadata class is contained in the 'meta' attribute of the dataset:
# (see https://huggingface.co/docs/datasets for more information). print(dataset.meta)
print(dataset.hf_dataset)
# LeRobot datasets also subclasses PyTorch datasets so you can do everything you know and love from working # LeRobotDataset actually wraps an underlying Hugging Face dataset
# with the latter, like iterating through the dataset. # (see https://huggingface.co/docs/datasets for more information).
# The __getitem__ iterates over the frames of the dataset. Since our datasets are also structured by print(dataset.hf_dataset)
# episodes, you can access the frame indices of any episode using dataset.meta.episodes. Here, we access
# frame indices associated to the first episode:
episode_index = 0
from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
# Then we grab all the image frames from the first camera: # LeRobot datasets also subclasses PyTorch datasets so you can do everything you know and love from working
camera_key = dataset.meta.camera_keys[0] # with the latter, like iterating through the dataset.
frames = [dataset[idx][camera_key] for idx in range(from_idx, to_idx)] # The __getitem__ iterates over the frames of the dataset. Since our datasets are also structured by
# episodes, you can access the frame indices of any episode using dataset.meta.episodes. Here, we access
# frame indices associated to the first episode:
episode_index = 0
from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
# The objects returned by the dataset are all torch.Tensors # Then we grab all the image frames from the first camera:
print(type(frames[0])) camera_key = dataset.meta.camera_keys[0]
print(frames[0].shape) frames = [dataset[idx][camera_key] for idx in range(from_idx, to_idx)]
# Since we're using pytorch, the shape is in pytorch, channel-first convention (c, h, w). # The objects returned by the dataset are all torch.Tensors
# We can compare this shape with the information available for that feature print(type(frames[0]))
pprint(dataset.features[camera_key]) print(frames[0].shape)
# In particular:
print(dataset.features[camera_key]["shape"])
# The shape is in (h, w, c) which is a more universal format.
# For many machine learning applications we need to load the history of past observations or trajectories of # Since we're using pytorch, the shape is in pytorch, channel-first convention (c, h, w).
# future actions. Our datasets can load previous and future frames for each key/modality, using timestamps # We can compare this shape with the information available for that feature
# differences with the current loaded frame. For instance: pprint(dataset.features[camera_key])
delta_timestamps = { # In particular:
# loads 4 images: 1 second before current frame, 500 ms before, 200 ms before, and current frame print(dataset.features[camera_key]["shape"])
camera_key: [-1, -0.5, -0.20, 0], # The shape is in (h, w, c) which is a more universal format.
# loads 6 state vectors: 1.5 seconds before, 1 second before, ... 200 ms, 100 ms, and current frame
"observation.state": [-1.5, -1, -0.5, -0.20, -0.10, 0],
# loads 64 action vectors: current frame, 1 frame in the future, 2 frames, ... 63 frames in the future
"action": [t / dataset.fps for t in range(64)],
}
# Note that in any case, these delta_timestamps values need to be multiples of (1/fps) so that added to any
# timestamp, you still get a valid timestamp.
dataset = LeRobotDataset(repo_id, delta_timestamps=delta_timestamps) # For many machine learning applications we need to load the history of past observations or trajectories of
print(f"\n{dataset[0][camera_key].shape=}") # (4, c, h, w) # future actions. Our datasets can load previous and future frames for each key/modality, using timestamps
print(f"{dataset[0]['observation.state'].shape=}") # (6, c) # differences with the current loaded frame. For instance:
print(f"{dataset[0]['action'].shape=}\n") # (64, c) delta_timestamps = {
# loads 4 images: 1 second before current frame, 500 ms before, 200 ms before, and current frame
camera_key: [-1, -0.5, -0.20, 0],
# loads 6 state vectors: 1.5 seconds before, 1 second before, ... 200 ms, 100 ms, and current frame
"observation.state": [-1.5, -1, -0.5, -0.20, -0.10, 0],
# loads 64 action vectors: current frame, 1 frame in the future, 2 frames, ... 63 frames in the future
"action": [t / dataset.fps for t in range(64)],
}
# Note that in any case, these delta_timestamps values need to be multiples of (1/fps) so that added to any
# timestamp, you still get a valid timestamp.
dataset = LeRobotDataset(repo_id, delta_timestamps=delta_timestamps)
print(f"\n{dataset[0][camera_key].shape=}") # (4, c, h, w)
print(f"{dataset[0]['observation.state'].shape=}") # (6, c)
print(f"{dataset[0]['action'].shape=}\n") # (64, c)
if __name__ == "__main__":
dataloader = torch.utils.data.DataLoader( dataloader = torch.utils.data.DataLoader(
dataset, dataset,
num_workers=4, num_workers=4,
@@ -144,3 +145,7 @@ if __name__ == "__main__":
print(f"{batch['observation.state'].shape=}") # (32, 6, c) print(f"{batch['observation.state'].shape=}") # (32, 6, c)
print(f"{batch['action'].shape=}") # (32, 64, c) print(f"{batch['action'].shape=}") # (32, 64, c)
break break
if __name__ == "__main__":
main()
+86 -80
View File
@@ -33,83 +33,68 @@ TASK_DESCRIPTION = "My task description"
HF_MODEL_ID = "<hf_username>/<model_repo_id>" HF_MODEL_ID = "<hf_username>/<model_repo_id>"
HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>" HF_DATASET_ID = "<hf_username>/<eval_dataset_repo_id>"
# Create the robot configuration & robot
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
robot = LeKiwiClient(robot_config) def main():
# Create the robot configuration & robot
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
# Create policy robot = LeKiwiClient(robot_config)
policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
# Configure the dataset features # Create policy
action_features = hw_to_dataset_features(robot.action_features, ACTION) policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
dataset_features = {**action_features, **obs_features}
# Create the dataset # Configure the dataset features
dataset = LeRobotDataset.create( action_features = hw_to_dataset_features(robot.action_features, ACTION)
repo_id=HF_DATASET_ID, obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
fps=FPS, dataset_features = {**action_features, **obs_features}
features=dataset_features,
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Build Policy Processors # Create the dataset
preprocessor, postprocessor = make_pre_post_processors( dataset = LeRobotDataset.create(
policy_cfg=policy, repo_id=HF_DATASET_ID,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot
# To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi`
robot.connect()
# TODO(Steven): Update this example to use pipelines
teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="lekiwi_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
recorded_episodes = 0
while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Running inference, recording eval episode {recorded_episodes} of {NUM_EPISODES}")
# Main record loop
record_loop(
robot=robot,
events=events,
fps=FPS, fps=FPS,
policy=policy, features=dataset_features,
preprocessor=preprocessor, # Pass the pre and post policy processors robot_type=robot.name,
postprocessor=postprocessor, use_videos=True,
dataset=dataset, image_writer_threads=4,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=teleop_action_processor,
robot_action_processor=robot_action_processor,
robot_observation_processor=robot_observation_processor,
) )
# Reset the environment if not stopping or re-recording # Build Policy Processors
if not events["stop_recording"] and ( preprocessor, postprocessor = make_pre_post_processors(
(recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"] policy_cfg=policy,
): pretrained_path=HF_MODEL_ID,
log_say("Reset the environment") dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot
# To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi`
robot.connect()
# TODO(Steven): Update this example to use pipelines
teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="lekiwi_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
recorded_episodes = 0
while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Running inference, recording eval episode {recorded_episodes} of {NUM_EPISODES}")
# Main record loop
record_loop( record_loop(
robot=robot, robot=robot,
events=events, events=events,
fps=FPS, fps=FPS,
policy=policy,
preprocessor=preprocessor, # Pass the pre and post policy processors
postprocessor=postprocessor,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC, control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
@@ -118,21 +103,42 @@ while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
robot_observation_processor=robot_observation_processor, robot_observation_processor=robot_observation_processor,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-record episode") if not events["stop_recording"] and (
events["rerecord_episode"] = False (recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"]
events["exit_early"] = False ):
dataset.clear_episode_buffer() log_say("Reset the environment")
continue record_loop(
robot=robot,
events=events,
fps=FPS,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=teleop_action_processor,
robot_action_processor=robot_action_processor,
robot_observation_processor=robot_observation_processor,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-record episode")
recorded_episodes += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
robot.disconnect() recorded_episodes += 1
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
robot.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+82 -76
View File
@@ -34,78 +34,62 @@ RESET_TIME_SEC = 10
TASK_DESCRIPTION = "My task description" TASK_DESCRIPTION = "My task description"
HF_REPO_ID = "<hf_username>/<dataset_repo_id>" HF_REPO_ID = "<hf_username>/<dataset_repo_id>"
# Create the robot and teleoperator configurations
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
leader_arm_config = SO100LeaderConfig(port="/dev/tty.usbmodem585A0077581", id="my_awesome_leader_arm")
keyboard_config = KeyboardTeleopConfig()
# Initialize the robot and teleoperator def main():
robot = LeKiwiClient(robot_config) # Create the robot and teleoperator configurations
leader_arm = SO100Leader(leader_arm_config) robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
keyboard = KeyboardTeleop(keyboard_config) leader_arm_config = SO100LeaderConfig(port="/dev/tty.usbmodem585A0077581", id="my_awesome_leader_arm")
keyboard_config = KeyboardTeleopConfig()
# TODO(Steven): Update this example to use pipelines # Initialize the robot and teleoperator
teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors() robot = LeKiwiClient(robot_config)
leader_arm = SO100Leader(leader_arm_config)
keyboard = KeyboardTeleop(keyboard_config)
# Configure the dataset features # TODO(Steven): Update this example to use pipelines
action_features = hw_to_dataset_features(robot.action_features, ACTION) teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors()
obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
dataset_features = {**action_features, **obs_features}
# Create the dataset # Configure the dataset features
dataset = LeRobotDataset.create( action_features = hw_to_dataset_features(robot.action_features, ACTION)
repo_id=HF_REPO_ID, obs_features = hw_to_dataset_features(robot.observation_features, OBS_STR)
fps=FPS, dataset_features = {**action_features, **obs_features}
features=dataset_features,
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Connect the robot and teleoperator # Create the dataset
# To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi` dataset = LeRobotDataset.create(
robot.connect() repo_id=HF_REPO_ID,
leader_arm.connect()
keyboard.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="lekiwi_record")
if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop...")
recorded_episodes = 0
while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {recorded_episodes}")
# Main record loop
record_loop(
robot=robot,
events=events,
fps=FPS, fps=FPS,
dataset=dataset, features=dataset_features,
teleop=[leader_arm, keyboard], robot_type=robot.name,
control_time_s=EPISODE_TIME_SEC, use_videos=True,
single_task=TASK_DESCRIPTION, image_writer_threads=4,
display_data=True,
teleop_action_processor=teleop_action_processor,
robot_action_processor=robot_action_processor,
robot_observation_processor=robot_observation_processor,
) )
# Reset the environment if not stopping or re-recording # Connect the robot and teleoperator
if not events["stop_recording"] and ( # To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi`
(recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"] robot.connect()
): leader_arm.connect()
log_say("Reset the environment") keyboard.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="lekiwi_record")
if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop...")
recorded_episodes = 0
while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {recorded_episodes}")
# Main record loop
record_loop( record_loop(
robot=robot, robot=robot,
events=events, events=events,
fps=FPS, fps=FPS,
dataset=dataset,
teleop=[leader_arm, keyboard], teleop=[leader_arm, keyboard],
control_time_s=RESET_TIME_SEC, control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
teleop_action_processor=teleop_action_processor, teleop_action_processor=teleop_action_processor,
@@ -113,23 +97,45 @@ while recorded_episodes < NUM_EPISODES and not events["stop_recording"]:
robot_observation_processor=robot_observation_processor, robot_observation_processor=robot_observation_processor,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-record episode") if not events["stop_recording"] and (
events["rerecord_episode"] = False (recorded_episodes < NUM_EPISODES - 1) or events["rerecord_episode"]
events["exit_early"] = False ):
dataset.clear_episode_buffer() log_say("Reset the environment")
continue record_loop(
robot=robot,
events=events,
fps=FPS,
teleop=[leader_arm, keyboard],
control_time_s=RESET_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=teleop_action_processor,
robot_action_processor=robot_action_processor,
robot_observation_processor=robot_observation_processor,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-record episode")
recorded_episodes += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
robot.disconnect() recorded_episodes += 1
leader_arm.disconnect()
keyboard.disconnect()
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
robot.disconnect()
leader_arm.disconnect()
keyboard.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+32 -26
View File
@@ -20,42 +20,48 @@ from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.robots.lekiwi.config_lekiwi import LeKiwiClientConfig from lerobot.robots.lekiwi.config_lekiwi import LeKiwiClientConfig
from lerobot.robots.lekiwi.lekiwi_client import LeKiwiClient from lerobot.robots.lekiwi.lekiwi_client import LeKiwiClient
from lerobot.utils.constants import ACTION from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say from lerobot.utils.utils import log_say
EPISODE_IDX = 0 EPISODE_IDX = 0
# Initialize the robot config
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
# Initialize the robot def main():
robot = LeKiwiClient(robot_config) # Initialize the robot config
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="lekiwi")
# Fetch the dataset to replay # Initialize the robot
dataset = LeRobotDataset("<hf_username>/<dataset_repo_id>", episodes=[EPISODE_IDX]) robot = LeKiwiClient(robot_config)
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX)
actions = episode_frames.select_columns(ACTION)
# Connect to the robot # Fetch the dataset to replay
robot.connect() dataset = LeRobotDataset("<hf_username>/<dataset_repo_id>", episodes=[EPISODE_IDX])
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX)
actions = episode_frames.select_columns(ACTION)
if not robot.is_connected: # Connect to the robot
raise ValueError("Robot is not connected!") robot.connect()
print("Starting replay loop...") if not robot.is_connected:
log_say(f"Replaying episode {EPISODE_IDX}") raise ValueError("Robot is not connected!")
for idx in range(len(episode_frames)):
t0 = time.perf_counter()
# Get recorded action from dataset print("Starting replay loop...")
action = { log_say(f"Replaying episode {EPISODE_IDX}")
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"]) for idx in range(len(episode_frames)):
} t0 = time.perf_counter()
# Send action to robot # Get recorded action from dataset
_ = robot.send_action(action) action = {
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"])
}
busy_wait(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0)) # Send action to robot
_ = robot.send_action(action)
robot.disconnect() precise_sleep(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0))
robot.disconnect()
if __name__ == "__main__":
main()
+42 -36
View File
@@ -19,54 +19,60 @@ import time
from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig
from lerobot.teleoperators.keyboard.teleop_keyboard import KeyboardTeleop, KeyboardTeleopConfig from lerobot.teleoperators.keyboard.teleop_keyboard import KeyboardTeleop, KeyboardTeleopConfig
from lerobot.teleoperators.so100_leader import SO100Leader, SO100LeaderConfig from lerobot.teleoperators.so100_leader import SO100Leader, SO100LeaderConfig
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
FPS = 30 FPS = 30
# Create the robot and teleoperator configurations
robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="my_lekiwi")
teleop_arm_config = SO100LeaderConfig(port="/dev/tty.usbmodem585A0077581", id="my_awesome_leader_arm")
keyboard_config = KeyboardTeleopConfig(id="my_laptop_keyboard")
# Initialize the robot and teleoperator def main():
robot = LeKiwiClient(robot_config) # Create the robot and teleoperator configurations
leader_arm = SO100Leader(teleop_arm_config) robot_config = LeKiwiClientConfig(remote_ip="172.18.134.136", id="my_lekiwi")
keyboard = KeyboardTeleop(keyboard_config) teleop_arm_config = SO100LeaderConfig(port="/dev/tty.usbmodem585A0077581", id="my_awesome_leader_arm")
keyboard_config = KeyboardTeleopConfig(id="my_laptop_keyboard")
# Connect to the robot and teleoperator # Initialize the robot and teleoperator
# To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi` robot = LeKiwiClient(robot_config)
robot.connect() leader_arm = SO100Leader(teleop_arm_config)
leader_arm.connect() keyboard = KeyboardTeleop(keyboard_config)
keyboard.connect()
# Init rerun viewer # Connect to the robot and teleoperator
init_rerun(session_name="lekiwi_teleop") # To connect you already should have this script running on LeKiwi: `python -m lerobot.robots.lekiwi.lekiwi_host --robot.id=my_awesome_kiwi`
robot.connect()
leader_arm.connect()
keyboard.connect()
if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected: # Init rerun viewer
raise ValueError("Robot or teleop is not connected!") init_rerun(session_name="lekiwi_teleop")
print("Starting teleop loop...") if not robot.is_connected or not leader_arm.is_connected or not keyboard.is_connected:
while True: raise ValueError("Robot or teleop is not connected!")
t0 = time.perf_counter()
# Get robot observation print("Starting teleop loop...")
observation = robot.get_observation() while True:
t0 = time.perf_counter()
# Get teleop action # Get robot observation
# Arm observation = robot.get_observation()
arm_action = leader_arm.get_action()
arm_action = {f"arm_{k}": v for k, v in arm_action.items()}
# Keyboard
keyboard_keys = keyboard.get_action()
base_action = robot._from_keyboard_to_base_action(keyboard_keys)
action = {**arm_action, **base_action} if len(base_action) > 0 else arm_action # Get teleop action
# Arm
arm_action = leader_arm.get_action()
arm_action = {f"arm_{k}": v for k, v in arm_action.items()}
# Keyboard
keyboard_keys = keyboard.get_action()
base_action = robot._from_keyboard_to_base_action(keyboard_keys)
# Send action to robot action = {**arm_action, **base_action} if len(base_action) > 0 else arm_action
_ = robot.send_action(action)
# Visualize # Send action to robot
log_rerun_data(observation=observation, action=action) _ = robot.send_action(action)
busy_wait(max(1.0 / FPS - (time.perf_counter() - t0), 0.0)) # Visualize
log_rerun_data(observation=observation, action=action)
precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
if __name__ == "__main__":
main()
+135 -127
View File
@@ -52,125 +52,114 @@ TASK_DESCRIPTION = "My task description"
HF_MODEL_ID = "<hf_username>/<model_repo_id>" HF_MODEL_ID = "<hf_username>/<model_repo_id>"
HF_DATASET_ID = "<hf_username>/<dataset_repo_id>" HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"
# Create the robot configuration & robot
camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem58760434471",
id="my_awesome_follower_arm",
cameras=camera_config,
use_degrees=True,
)
robot = SO100Follower(robot_config) def main():
# Create the robot configuration & robot
# Create policy camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
policy = ACTPolicy.from_pretrained(HF_MODEL_ID) robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem58760434471",
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf id="my_awesome_follower_arm",
kinematics_solver = RobotKinematics( cameras=camera_config,
urdf_path="./SO101/so101_new_calib.urdf", use_degrees=True,
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joints observation to EE observation
robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys()))
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_DATASET_ID,
fps=FPS,
features=combine_feature_dicts(
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose_processor,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
# User for now should be explicit on the feature keys that were used for record
# Alternatively, the user can pass the processor step that has the right features
aggregate_pipeline_dataset_features(
pipeline=make_default_teleop_action_processor(),
initial_features=create_initial_features(
action={
f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,))
for k in ["x", "y", "z", "wx", "wy", "wz", "gripper_pos"]
}
),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Build Policy Processors
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=policy,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot
robot.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="phone_so100_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
episode_idx = 0
for episode_idx in range(NUM_EPISODES):
log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop(
robot=robot,
events=events,
fps=FPS,
policy=policy,
preprocessor=preprocessor, # Pass the pre and post policy processors
postprocessor=postprocessor,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=make_default_teleop_action_processor(),
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose_processor,
) )
# Reset the environment if not stopping or re-recording robot = SO100Follower(robot_config)
if not events["stop_recording"] and ((episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]):
log_say("Reset the environment") # Create policy
policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
kinematics_solver = RobotKinematics(
urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joints observation to EE observation
robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(
kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys())
)
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_DATASET_ID,
fps=FPS,
features=combine_feature_dicts(
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose_processor,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
# User for now should be explicit on the feature keys that were used for record
# Alternatively, the user can pass the processor step that has the right features
aggregate_pipeline_dataset_features(
pipeline=make_default_teleop_action_processor(),
initial_features=create_initial_features(
action={
f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,))
for k in ["x", "y", "z", "wx", "wy", "wz", "gripper_pos"]
}
),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Build Policy Processors
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=policy,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot
robot.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="phone_so100_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
episode_idx = 0
for episode_idx in range(NUM_EPISODES):
log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop( record_loop(
robot=robot, robot=robot,
events=events, events=events,
fps=FPS, fps=FPS,
policy=policy,
preprocessor=preprocessor, # Pass the pre and post policy processors
postprocessor=postprocessor,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC, control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
@@ -179,21 +168,40 @@ for episode_idx in range(NUM_EPISODES):
robot_observation_processor=robot_joints_to_ee_pose_processor, robot_observation_processor=robot_joints_to_ee_pose_processor,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-record episode") if not events["stop_recording"] and ((episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]):
events["rerecord_episode"] = False log_say("Reset the environment")
events["exit_early"] = False record_loop(
dataset.clear_episode_buffer() robot=robot,
continue events=events,
fps=FPS,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=make_default_teleop_action_processor(),
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose_processor,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-record episode")
episode_idx += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
robot.disconnect() episode_idx += 1
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
robot.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+142 -133
View File
@@ -50,133 +50,122 @@ RESET_TIME_SEC = 30
TASK_DESCRIPTION = "My task description" TASK_DESCRIPTION = "My task description"
HF_REPO_ID = "<hf_username>/<dataset_repo_id>" HF_REPO_ID = "<hf_username>/<dataset_repo_id>"
# Create the robot and teleoperator configurations
camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411",
id="my_awesome_follower_arm",
cameras=camera_config,
use_degrees=True,
)
teleop_config = PhoneConfig(phone_os=PhoneOS.IOS) # or PhoneOS.ANDROID
# Initialize the robot and teleoperator def main():
robot = SO100Follower(robot_config) # Create the robot and teleoperator configurations
phone = Phone(teleop_config) camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411",
id="my_awesome_follower_arm",
cameras=camera_config,
use_degrees=True,
)
teleop_config = PhoneConfig(phone_os=PhoneOS.IOS) # or PhoneOS.ANDROID
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot and teleoperator
kinematics_solver = RobotKinematics( robot = SO100Follower(robot_config)
urdf_path="./SO101/so101_new_calib.urdf", phone = Phone(teleop_config)
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert phone action to EE action # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
phone_to_robot_ee_pose_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( kinematics_solver = RobotKinematics(
steps=[ urdf_path="./SO101/so101_new_calib.urdf",
MapPhoneActionToRobotAction(platform=teleop_config.phone_os), target_frame_name="gripper_frame_link",
EEReferenceAndDelta( joint_names=list(robot.bus.motors.keys()),
kinematics=kinematics_solver,
end_effector_step_sizes={"x": 0.5, "y": 0.5, "z": 0.5},
motor_names=list(robot.bus.motors.keys()),
use_latched_reference=True,
),
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.20,
),
GripperVelocityToJoint(speed_factor=20.0),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joint observation to EE observation
robot_joints_to_ee_pose = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys()))
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_REPO_ID,
fps=FPS,
features=combine_feature_dicts(
# Run the feature contract of the pipelines
# This tells you how the features would look like after the pipeline steps
aggregate_pipeline_dataset_features(
pipeline=phone_to_robot_ee_pose_processor,
initial_features=create_initial_features(action=phone.action_features),
use_videos=True,
),
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Connect the robot and teleoperator
robot.connect()
phone.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="phone_so100_record")
if not robot.is_connected or not phone.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop. Move your phone to teleoperate the robot...")
episode_idx = 0
while episode_idx < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop(
robot=robot,
events=events,
fps=FPS,
teleop=phone,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=phone_to_robot_ee_pose_processor,
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose,
) )
# Reset the environment if not stopping or re-recording # Build pipeline to convert phone action to EE action
if not events["stop_recording"] and (episode_idx < NUM_EPISODES - 1 or events["rerecord_episode"]): phone_to_robot_ee_pose_processor = RobotProcessorPipeline[
log_say("Reset the environment") tuple[RobotAction, RobotObservation], RobotAction
](
steps=[
MapPhoneActionToRobotAction(platform=teleop_config.phone_os),
EEReferenceAndDelta(
kinematics=kinematics_solver,
end_effector_step_sizes={"x": 0.5, "y": 0.5, "z": 0.5},
motor_names=list(robot.bus.motors.keys()),
use_latched_reference=True,
),
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.20,
),
GripperVelocityToJoint(speed_factor=20.0),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joint observation to EE observation
robot_joints_to_ee_pose = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(
kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys())
)
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_REPO_ID,
fps=FPS,
features=combine_feature_dicts(
# Run the feature contract of the pipelines
# This tells you how the features would look like after the pipeline steps
aggregate_pipeline_dataset_features(
pipeline=phone_to_robot_ee_pose_processor,
initial_features=create_initial_features(action=phone.action_features),
use_videos=True,
),
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Connect the robot and teleoperator
robot.connect()
phone.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="phone_so100_record")
if not robot.is_connected or not phone.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop. Move your phone to teleoperate the robot...")
episode_idx = 0
while episode_idx < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop( record_loop(
robot=robot, robot=robot,
events=events, events=events,
fps=FPS, fps=FPS,
teleop=phone, teleop=phone,
control_time_s=RESET_TIME_SEC, dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
teleop_action_processor=phone_to_robot_ee_pose_processor, teleop_action_processor=phone_to_robot_ee_pose_processor,
@@ -184,22 +173,42 @@ while episode_idx < NUM_EPISODES and not events["stop_recording"]:
robot_observation_processor=robot_joints_to_ee_pose, robot_observation_processor=robot_joints_to_ee_pose,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-recording episode") if not events["stop_recording"] and (episode_idx < NUM_EPISODES - 1 or events["rerecord_episode"]):
events["rerecord_episode"] = False log_say("Reset the environment")
events["exit_early"] = False record_loop(
dataset.clear_episode_buffer() robot=robot,
continue events=events,
fps=FPS,
teleop=phone,
control_time_s=RESET_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=phone_to_robot_ee_pose_processor,
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-recording episode")
episode_idx += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
robot.disconnect() episode_idx += 1
phone.disconnect()
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
robot.disconnect()
phone.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+57 -51
View File
@@ -29,72 +29,78 @@ from lerobot.robots.so100_follower.robot_kinematic_processor import (
) )
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.utils.constants import ACTION from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say from lerobot.utils.utils import log_say
EPISODE_IDX = 0 EPISODE_IDX = 0
HF_REPO_ID = "<hf_username>/<dataset_repo_id>" HF_REPO_ID = "<hf_username>/<dataset_repo_id>"
# Initialize the robot config
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
# Initialize the robot def main():
robot = SO100Follower(robot_config) # Initialize the robot config
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot
kinematics_solver = RobotKinematics( robot = SO100Follower(robot_config)
urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( kinematics_solver = RobotKinematics(
steps=[ urdf_path="./SO101/so101_new_calib.urdf",
InverseKinematicsEEToJoints( target_frame_name="gripper_frame_link",
kinematics=kinematics_solver, joint_names=list(robot.bus.motors.keys()),
motor_names=list(robot.bus.motors.keys()), )
initial_guess_current_joints=False, # Because replay is open loop
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Fetch the dataset to replay # Build pipeline to convert EE action to joints action
dataset = LeRobotDataset(HF_REPO_ID, episodes=[EPISODE_IDX]) robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0 steps=[
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX) InverseKinematicsEEToJoints(
actions = episode_frames.select_columns(ACTION) kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=False, # Because replay is open loop
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Connect to the robot # Fetch the dataset to replay
robot.connect() dataset = LeRobotDataset(HF_REPO_ID, episodes=[EPISODE_IDX])
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX)
actions = episode_frames.select_columns(ACTION)
if not robot.is_connected: # Connect to the robot
raise ValueError("Robot is not connected!") robot.connect()
print("Starting replay loop...") if not robot.is_connected:
log_say(f"Replaying episode {EPISODE_IDX}") raise ValueError("Robot is not connected!")
for idx in range(len(episode_frames)):
t0 = time.perf_counter()
# Get recorded action from dataset print("Starting replay loop...")
ee_action = { log_say(f"Replaying episode {EPISODE_IDX}")
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"]) for idx in range(len(episode_frames)):
} t0 = time.perf_counter()
# Get robot observation # Get recorded action from dataset
robot_obs = robot.get_observation() ee_action = {
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"])
}
# Dataset EE -> robot joints # Get robot observation
joint_action = robot_ee_to_joints_processor((ee_action, robot_obs)) robot_obs = robot.get_observation()
# Send action to robot # Dataset EE -> robot joints
_ = robot.send_action(joint_action) joint_action = robot_ee_to_joints_processor((ee_action, robot_obs))
busy_wait(1.0 / dataset.fps - (time.perf_counter() - t0)) # Send action to robot
_ = robot.send_action(joint_action)
# Clean up precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
robot.disconnect()
# Clean up
robot.disconnect()
if __name__ == "__main__":
main()
+70 -62
View File
@@ -32,82 +32,90 @@ from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.teleoperators.phone.config_phone import PhoneConfig, PhoneOS from lerobot.teleoperators.phone.config_phone import PhoneConfig, PhoneOS
from lerobot.teleoperators.phone.phone_processor import MapPhoneActionToRobotAction from lerobot.teleoperators.phone.phone_processor import MapPhoneActionToRobotAction
from lerobot.teleoperators.phone.teleop_phone import Phone from lerobot.teleoperators.phone.teleop_phone import Phone
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
FPS = 30 FPS = 30
# Initialize the robot and teleoperator
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
teleop_config = PhoneConfig(phone_os=PhoneOS.IOS) # or PhoneOS.ANDROID
# Initialize the robot and teleoperator def main():
robot = SO100Follower(robot_config) # Initialize the robot and teleoperator
teleop_device = Phone(teleop_config) robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
teleop_config = PhoneConfig(phone_os=PhoneOS.IOS) # or PhoneOS.ANDROID
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot and teleoperator
kinematics_solver = RobotKinematics( robot = SO100Follower(robot_config)
urdf_path="./SO101/so101_new_calib.urdf", teleop_device = Phone(teleop_config)
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert phone action to ee pose action to joint action # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
phone_to_robot_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( kinematics_solver = RobotKinematics(
steps=[ urdf_path="./SO101/so101_new_calib.urdf",
MapPhoneActionToRobotAction(platform=teleop_config.phone_os), target_frame_name="gripper_frame_link",
EEReferenceAndDelta( joint_names=list(robot.bus.motors.keys()),
kinematics=kinematics_solver, )
end_effector_step_sizes={"x": 0.5, "y": 0.5, "z": 0.5},
motor_names=list(robot.bus.motors.keys()),
use_latched_reference=True,
),
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
),
GripperVelocityToJoint(
speed_factor=20.0,
),
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Connect to the robot and teleoperator # Build pipeline to convert phone action to ee pose action to joint action
robot.connect() phone_to_robot_joints_processor = RobotProcessorPipeline[
teleop_device.connect() tuple[RobotAction, RobotObservation], RobotAction
](
steps=[
MapPhoneActionToRobotAction(platform=teleop_config.phone_os),
EEReferenceAndDelta(
kinematics=kinematics_solver,
end_effector_step_sizes={"x": 0.5, "y": 0.5, "z": 0.5},
motor_names=list(robot.bus.motors.keys()),
use_latched_reference=True,
),
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
),
GripperVelocityToJoint(
speed_factor=20.0,
),
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Init rerun viewer # Connect to the robot and teleoperator
init_rerun(session_name="phone_so100_teleop") robot.connect()
teleop_device.connect()
if not robot.is_connected or not teleop_device.is_connected: # Init rerun viewer
raise ValueError("Robot or teleop is not connected!") init_rerun(session_name="phone_so100_teleop")
print("Starting teleop loop. Move your phone to teleoperate the robot...") if not robot.is_connected or not teleop_device.is_connected:
while True: raise ValueError("Robot or teleop is not connected!")
t0 = time.perf_counter()
# Get robot observation print("Starting teleop loop. Move your phone to teleoperate the robot...")
robot_obs = robot.get_observation() while True:
t0 = time.perf_counter()
# Get teleop action # Get robot observation
phone_obs = teleop_device.get_action() robot_obs = robot.get_observation()
# Phone -> EE pose -> Joints transition # Get teleop action
joint_action = phone_to_robot_joints_processor((phone_obs, robot_obs)) phone_obs = teleop_device.get_action()
# Send action to robot # Phone -> EE pose -> Joints transition
_ = robot.send_action(joint_action) joint_action = phone_to_robot_joints_processor((phone_obs, robot_obs))
# Visualize # Send action to robot
log_rerun_data(observation=phone_obs, action=joint_action) _ = robot.send_action(joint_action)
busy_wait(max(1.0 / FPS - (time.perf_counter() - t0), 0.0)) # Visualize
log_rerun_data(observation=phone_obs, action=joint_action)
precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
if __name__ == "__main__":
main()
+1 -1
View File
@@ -142,7 +142,7 @@ def _check_matplotlib_available():
raise ImportError( raise ImportError(
"matplotlib is required for RTC debug visualizations. " "matplotlib is required for RTC debug visualizations. "
"Please install it by running:\n" "Please install it by running:\n"
" uv pip install -e '.[matplotlib-dep]'" " uv pip install matplotlib"
) )
+135 -128
View File
@@ -52,126 +52,114 @@ TASK_DESCRIPTION = "My task description"
HF_MODEL_ID = "<hf_username>/<model_repo_id>" HF_MODEL_ID = "<hf_username>/<model_repo_id>"
HF_DATASET_ID = "<hf_username>/<dataset_repo_id>" HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"
# Create the robot configuration & robot
camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411",
id="my_awesome_follower_arm",
cameras=camera_config,
use_degrees=True,
)
robot = SO100Follower(robot_config) def main():
# Create the robot configuration & robot
# Create policy camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
policy = ACTPolicy.from_pretrained(HF_MODEL_ID) robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411",
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf id="my_awesome_follower_arm",
kinematics_solver = RobotKinematics( cameras=camera_config,
urdf_path="./SO101/so101_new_calib.urdf", use_degrees=True,
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joints observation to EE observation
robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys()))
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_DATASET_ID,
fps=FPS,
features=combine_feature_dicts(
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose_processor,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
# User for now should be explicit on the feature keys that were used for record
# Alternatively, the user can pass the processor step that has the right features
aggregate_pipeline_dataset_features(
pipeline=make_default_teleop_action_processor(),
initial_features=create_initial_features(
action={
f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,))
for k in ["x", "y", "z", "wx", "wy", "wz", "gripper_pos"]
}
),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Build Policy Processors
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=policy,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot and teleoperator
robot.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="so100_so100_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
episode_idx = 0
for episode_idx in range(NUM_EPISODES):
log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop(
robot=robot,
events=events,
fps=FPS,
policy=policy,
preprocessor=preprocessor, # Pass the pre and post policy processors
postprocessor=postprocessor,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=make_default_teleop_action_processor(),
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose_processor,
) )
# Reset the environment if not stopping or re-recording robot = SO100Follower(robot_config)
if not events["stop_recording"] and ((episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]):
log_say("Reset the environment") # Create policy
policy = ACTPolicy.from_pretrained(HF_MODEL_ID)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
kinematics_solver = RobotKinematics(
urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert joints observation to EE observation
robot_joints_to_ee_pose_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(
kinematics=kinematics_solver, motor_names=list(robot.bus.motors.keys())
)
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_DATASET_ID,
fps=FPS,
features=combine_feature_dicts(
aggregate_pipeline_dataset_features(
pipeline=robot_joints_to_ee_pose_processor,
initial_features=create_initial_features(observation=robot.observation_features),
use_videos=True,
),
# User for now should be explicit on the feature keys that were used for record
# Alternatively, the user can pass the processor step that has the right features
aggregate_pipeline_dataset_features(
pipeline=make_default_teleop_action_processor(),
initial_features=create_initial_features(
action={
f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,))
for k in ["x", "y", "z", "wx", "wy", "wz", "gripper_pos"]
}
),
use_videos=True,
),
),
robot_type=robot.name,
use_videos=True,
image_writer_threads=4,
)
# Build Policy Processors
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=policy,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
# The inference device is automatically set to match the detected hardware, overriding any previous device settings from training to ensure compatibility.
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
)
# Connect the robot and teleoperator
robot.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="so100_so100_evaluate")
if not robot.is_connected:
raise ValueError("Robot is not connected!")
print("Starting evaluate loop...")
episode_idx = 0
for episode_idx in range(NUM_EPISODES):
log_say(f"Running inference, recording eval episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop( record_loop(
robot=robot, robot=robot,
events=events, events=events,
fps=FPS, fps=FPS,
policy=policy,
preprocessor=preprocessor, # Pass the pre and post policy processors
postprocessor=postprocessor,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC, control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
@@ -180,21 +168,40 @@ for episode_idx in range(NUM_EPISODES):
robot_observation_processor=robot_joints_to_ee_pose_processor, robot_observation_processor=robot_joints_to_ee_pose_processor,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-record episode") if not events["stop_recording"] and ((episode_idx < NUM_EPISODES - 1) or events["rerecord_episode"]):
events["rerecord_episode"] = False log_say("Reset the environment")
events["exit_early"] = False record_loop(
dataset.clear_episode_buffer() robot=robot,
continue events=events,
fps=FPS,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=make_default_teleop_action_processor(),
robot_action_processor=robot_ee_to_joints_processor,
robot_observation_processor=robot_joints_to_ee_pose_processor,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-record episode")
episode_idx += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
robot.disconnect() episode_idx += 1
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
robot.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+142 -134
View File
@@ -48,134 +48,122 @@ RESET_TIME_SEC = 30
TASK_DESCRIPTION = "My task description" TASK_DESCRIPTION = "My task description"
HF_REPO_ID = "<hf_username>/<dataset_repo_id>" HF_REPO_ID = "<hf_username>/<dataset_repo_id>"
# Create the robot and teleoperator configurations
camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
follower_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", cameras=camera_config, use_degrees=True
)
leader_config = SO100LeaderConfig(port="/dev/tty.usbmodem5A460819811", id="my_awesome_leader_arm")
# Initialize the robot and teleoperator def main():
follower = SO100Follower(follower_config) # Create the robot and teleoperator configurations
leader = SO100Leader(leader_config) camera_config = {"front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS)}
follower_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411",
id="my_awesome_follower_arm",
cameras=camera_config,
use_degrees=True,
)
leader_config = SO100LeaderConfig(port="/dev/tty.usbmodem5A460819811", id="my_awesome_leader_arm")
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot and teleoperator
follower_kinematics_solver = RobotKinematics( follower = SO100Follower(follower_config)
urdf_path="./SO101/so101_new_calib.urdf", leader = SO100Leader(leader_config)
target_frame_name="gripper_frame_link",
joint_names=list(follower.bus.motors.keys()),
)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
leader_kinematics_solver = RobotKinematics( follower_kinematics_solver = RobotKinematics(
urdf_path="./SO101/so101_new_calib.urdf", urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link", target_frame_name="gripper_frame_link",
joint_names=list(leader.bus.motors.keys()), joint_names=list(follower.bus.motors.keys()),
)
# Build pipeline to convert follower joints to EE observation
follower_joints_to_ee = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(
kinematics=follower_kinematics_solver, motor_names=list(follower.bus.motors.keys())
),
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Build pipeline to convert leader joints to EE action
leader_joints_to_ee = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
ForwardKinematicsJointsToEE(
kinematics=leader_kinematics_solver, motor_names=list(leader.bus.motors.keys())
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert EE action to follower joints
ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
[
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
),
InverseKinematicsEEToJoints(
kinematics=follower_kinematics_solver,
motor_names=list(follower.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_REPO_ID,
fps=FPS,
features=combine_feature_dicts(
# Run the feature contract of the pipelines
# This tells you how the features would look like after the pipeline steps
aggregate_pipeline_dataset_features(
pipeline=leader_joints_to_ee,
initial_features=create_initial_features(action=leader.action_features),
use_videos=True,
),
aggregate_pipeline_dataset_features(
pipeline=follower_joints_to_ee,
initial_features=create_initial_features(observation=follower.observation_features),
use_videos=True,
),
),
robot_type=follower.name,
use_videos=True,
image_writer_threads=4,
)
# Connect the robot and teleoperator
leader.connect()
follower.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="recording_phone")
if not leader.is_connected or not follower.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop...")
episode_idx = 0
while episode_idx < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop(
robot=follower,
events=events,
fps=FPS,
teleop=leader,
dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=leader_joints_to_ee,
robot_action_processor=ee_to_follower_joints,
robot_observation_processor=follower_joints_to_ee,
) )
# Reset the environment if not stopping or re-recording # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
if not events["stop_recording"] and (episode_idx < NUM_EPISODES - 1 or events["rerecord_episode"]): leader_kinematics_solver = RobotKinematics(
log_say("Reset the environment") urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
joint_names=list(leader.bus.motors.keys()),
)
# Build pipeline to convert follower joints to EE observation
follower_joints_to_ee = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[
ForwardKinematicsJointsToEE(
kinematics=follower_kinematics_solver, motor_names=list(follower.bus.motors.keys())
),
],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
# Build pipeline to convert leader joints to EE action
leader_joints_to_ee = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[
ForwardKinematicsJointsToEE(
kinematics=leader_kinematics_solver, motor_names=list(leader.bus.motors.keys())
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Build pipeline to convert EE action to follower joints
ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
[
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
),
InverseKinematicsEEToJoints(
kinematics=follower_kinematics_solver,
motor_names=list(follower.bus.motors.keys()),
initial_guess_current_joints=True,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Create the dataset
dataset = LeRobotDataset.create(
repo_id=HF_REPO_ID,
fps=FPS,
features=combine_feature_dicts(
# Run the feature contract of the pipelines
# This tells you how the features would look like after the pipeline steps
aggregate_pipeline_dataset_features(
pipeline=leader_joints_to_ee,
initial_features=create_initial_features(action=leader.action_features),
use_videos=True,
),
aggregate_pipeline_dataset_features(
pipeline=follower_joints_to_ee,
initial_features=create_initial_features(observation=follower.observation_features),
use_videos=True,
),
),
robot_type=follower.name,
use_videos=True,
image_writer_threads=4,
)
# Connect the robot and teleoperator
leader.connect()
follower.connect()
# Initialize the keyboard listener and rerun visualization
listener, events = init_keyboard_listener()
init_rerun(session_name="recording_phone")
if not leader.is_connected or not follower.is_connected:
raise ValueError("Robot or teleop is not connected!")
print("Starting record loop...")
episode_idx = 0
while episode_idx < NUM_EPISODES and not events["stop_recording"]:
log_say(f"Recording episode {episode_idx + 1} of {NUM_EPISODES}")
# Main record loop
record_loop( record_loop(
robot=follower, robot=follower,
events=events, events=events,
fps=FPS, fps=FPS,
teleop=leader, teleop=leader,
control_time_s=RESET_TIME_SEC, dataset=dataset,
control_time_s=EPISODE_TIME_SEC,
single_task=TASK_DESCRIPTION, single_task=TASK_DESCRIPTION,
display_data=True, display_data=True,
teleop_action_processor=leader_joints_to_ee, teleop_action_processor=leader_joints_to_ee,
@@ -183,22 +171,42 @@ while episode_idx < NUM_EPISODES and not events["stop_recording"]:
robot_observation_processor=follower_joints_to_ee, robot_observation_processor=follower_joints_to_ee,
) )
if events["rerecord_episode"]: # Reset the environment if not stopping or re-recording
log_say("Re-recording episode") if not events["stop_recording"] and (episode_idx < NUM_EPISODES - 1 or events["rerecord_episode"]):
events["rerecord_episode"] = False log_say("Reset the environment")
events["exit_early"] = False record_loop(
dataset.clear_episode_buffer() robot=follower,
continue events=events,
fps=FPS,
teleop=leader,
control_time_s=RESET_TIME_SEC,
single_task=TASK_DESCRIPTION,
display_data=True,
teleop_action_processor=leader_joints_to_ee,
robot_action_processor=ee_to_follower_joints,
robot_observation_processor=follower_joints_to_ee,
)
# Save episode if events["rerecord_episode"]:
dataset.save_episode() log_say("Re-recording episode")
episode_idx += 1 events["rerecord_episode"] = False
events["exit_early"] = False
dataset.clear_episode_buffer()
continue
# Clean up # Save episode
log_say("Stop recording") dataset.save_episode()
leader.disconnect() episode_idx += 1
follower.disconnect()
listener.stop()
dataset.finalize() # Clean up
dataset.push_to_hub() log_say("Stop recording")
leader.disconnect()
follower.disconnect()
listener.stop()
dataset.finalize()
dataset.push_to_hub()
if __name__ == "__main__":
main()
+57 -51
View File
@@ -30,72 +30,78 @@ from lerobot.robots.so100_follower.robot_kinematic_processor import (
) )
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.utils.constants import ACTION from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say from lerobot.utils.utils import log_say
EPISODE_IDX = 0 EPISODE_IDX = 0
HF_REPO_ID = "<hf_username>/<dataset_repo_id>" HF_REPO_ID = "<hf_username>/<dataset_repo_id>"
# Initialize the robot config
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
# Initialize the robot def main():
robot = SO100Follower(robot_config) # Initialize the robot config
robot_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot
kinematics_solver = RobotKinematics( robot = SO100Follower(robot_config)
urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
joint_names=list(robot.bus.motors.keys()),
)
# Build pipeline to convert EE action to joints action # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( kinematics_solver = RobotKinematics(
steps=[ urdf_path="./SO101/so101_new_calib.urdf",
InverseKinematicsEEToJoints( target_frame_name="gripper_frame_link",
kinematics=kinematics_solver, joint_names=list(robot.bus.motors.keys()),
motor_names=list(robot.bus.motors.keys()), )
initial_guess_current_joints=False, # Because replay is open loop
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Fetch the dataset to replay # Build pipeline to convert EE action to joints action
dataset = LeRobotDataset(HF_REPO_ID, episodes=[EPISODE_IDX]) robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0 steps=[
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX) InverseKinematicsEEToJoints(
actions = episode_frames.select_columns(ACTION) kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
initial_guess_current_joints=False, # Because replay is open loop
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Connect to the robot # Fetch the dataset to replay
robot.connect() dataset = LeRobotDataset(HF_REPO_ID, episodes=[EPISODE_IDX])
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == EPISODE_IDX)
actions = episode_frames.select_columns(ACTION)
if not robot.is_connected: # Connect to the robot
raise ValueError("Robot is not connected!") robot.connect()
print("Starting replay loop...") if not robot.is_connected:
log_say(f"Replaying episode {EPISODE_IDX}") raise ValueError("Robot is not connected!")
for idx in range(len(episode_frames)):
t0 = time.perf_counter()
# Get recorded action from dataset print("Starting replay loop...")
ee_action = { log_say(f"Replaying episode {EPISODE_IDX}")
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"]) for idx in range(len(episode_frames)):
} t0 = time.perf_counter()
# Get robot observation # Get recorded action from dataset
robot_obs = robot.get_observation() ee_action = {
name: float(actions[idx][ACTION][i]) for i, name in enumerate(dataset.features[ACTION]["names"])
}
# Dataset EE -> robot joints # Get robot observation
joint_action = robot_ee_to_joints_processor((ee_action, robot_obs)) robot_obs = robot.get_observation()
# Send action to robot # Dataset EE -> robot joints
_ = robot.send_action(joint_action) joint_action = robot_ee_to_joints_processor((ee_action, robot_obs))
busy_wait(1.0 / dataset.fps - (time.perf_counter() - t0)) # Send action to robot
_ = robot.send_action(joint_action)
# Clean up precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
robot.disconnect()
# Clean up
robot.disconnect()
if __name__ == "__main__":
main()
+74 -68
View File
@@ -32,90 +32,96 @@ from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.teleoperators.so100_leader.config_so100_leader import SO100LeaderConfig from lerobot.teleoperators.so100_leader.config_so100_leader import SO100LeaderConfig
from lerobot.teleoperators.so100_leader.so100_leader import SO100Leader from lerobot.teleoperators.so100_leader.so100_leader import SO100Leader
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
FPS = 30 FPS = 30
# Initialize the robot and teleoperator config
follower_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
leader_config = SO100LeaderConfig(port="/dev/tty.usbmodem5A460819811", id="my_awesome_leader_arm")
# Initialize the robot and teleoperator def main():
follower = SO100Follower(follower_config) # Initialize the robot and teleoperator config
leader = SO100Leader(leader_config) follower_config = SO100FollowerConfig(
port="/dev/tty.usbmodem5A460814411", id="my_awesome_follower_arm", use_degrees=True
)
leader_config = SO100LeaderConfig(port="/dev/tty.usbmodem5A460819811", id="my_awesome_leader_arm")
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # Initialize the robot and teleoperator
follower_kinematics_solver = RobotKinematics( follower = SO100Follower(follower_config)
urdf_path="./SO101/so101_new_calib.urdf", leader = SO100Leader(leader_config)
target_frame_name="gripper_frame_link",
joint_names=list(follower.bus.motors.keys()),
)
# NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
leader_kinematics_solver = RobotKinematics( follower_kinematics_solver = RobotKinematics(
urdf_path="./SO101/so101_new_calib.urdf", urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link", target_frame_name="gripper_frame_link",
joint_names=list(leader.bus.motors.keys()), joint_names=list(follower.bus.motors.keys()),
) )
# Build pipeline to convert teleop joints to EE action # NOTE: It is highly recommended to use the urdf in the SO-ARM100 repo: https://github.com/TheRobotStudio/SO-ARM100/blob/main/Simulation/SO101/so101_new_calib.urdf
leader_to_ee = RobotProcessorPipeline[RobotAction, RobotAction]( leader_kinematics_solver = RobotKinematics(
steps=[ urdf_path="./SO101/so101_new_calib.urdf",
ForwardKinematicsJointsToEE( target_frame_name="gripper_frame_link",
kinematics=leader_kinematics_solver, motor_names=list(leader.bus.motors.keys()) joint_names=list(leader.bus.motors.keys()),
), )
],
to_transition=robot_action_to_transition,
to_output=transition_to_robot_action,
)
# build pipeline to convert EE action to robot joints # Build pipeline to convert teleop joints to EE action
ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( leader_to_ee = RobotProcessorPipeline[RobotAction, RobotAction](
[ steps=[
EEBoundsAndSafety( ForwardKinematicsJointsToEE(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]}, kinematics=leader_kinematics_solver, motor_names=list(leader.bus.motors.keys())
max_ee_step_m=0.10, ),
), ],
InverseKinematicsEEToJoints( to_transition=robot_action_to_transition,
kinematics=follower_kinematics_solver, to_output=transition_to_robot_action,
motor_names=list(follower.bus.motors.keys()), )
initial_guess_current_joints=False,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Connect to the robot and teleoperator # build pipeline to convert EE action to robot joints
follower.connect() ee_to_follower_joints = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
leader.connect() [
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
),
InverseKinematicsEEToJoints(
kinematics=follower_kinematics_solver,
motor_names=list(follower.bus.motors.keys()),
initial_guess_current_joints=False,
),
],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
# Init rerun viewer # Connect to the robot and teleoperator
init_rerun(session_name="so100_so100_EE_teleop") follower.connect()
leader.connect()
print("Starting teleop loop...") # Init rerun viewer
while True: init_rerun(session_name="so100_so100_EE_teleop")
t0 = time.perf_counter()
# Get robot observation print("Starting teleop loop...")
robot_obs = follower.get_observation() while True:
t0 = time.perf_counter()
# Get teleop observation # Get robot observation
leader_joints_obs = leader.get_action() robot_obs = follower.get_observation()
# teleop joints -> teleop EE action # Get teleop observation
leader_ee_act = leader_to_ee(leader_joints_obs) leader_joints_obs = leader.get_action()
# teleop EE -> robot joints # teleop joints -> teleop EE action
follower_joints_act = ee_to_follower_joints((leader_ee_act, robot_obs)) leader_ee_act = leader_to_ee(leader_joints_obs)
# Send action to robot # teleop EE -> robot joints
_ = follower.send_action(follower_joints_act) follower_joints_act = ee_to_follower_joints((leader_ee_act, robot_obs))
# Visualize # Send action to robot
log_rerun_data(observation=leader_ee_act, action=follower_joints_act) _ = follower.send_action(follower_joints_act)
busy_wait(max(1.0 / FPS - (time.perf_counter() - t0), 0.0)) # Visualize
log_rerun_data(observation=leader_ee_act, action=follower_joints_act)
precise_sleep(max(1.0 / FPS - (time.perf_counter() - t0), 0.0))
if __name__ == "__main__":
main()
+68 -62
View File
@@ -19,80 +19,86 @@ def make_delta_timestamps(delta_indices: list[int] | None, fps: int) -> list[flo
return [i / fps for i in delta_indices] return [i / fps for i in delta_indices]
output_directory = Path("outputs/robot_learning_tutorial/act") def main():
output_directory.mkdir(parents=True, exist_ok=True) output_directory = Path("outputs/robot_learning_tutorial/act")
output_directory.mkdir(parents=True, exist_ok=True)
# Select your device # Select your device
device = torch.device("mps") # or "cuda" or "cpu" device = torch.device("mps") # or "cuda" or "cpu"
dataset_id = "lerobot/svla_so101_pickplace" dataset_id = "lerobot/svla_so101_pickplace"
# This specifies the inputs the model will be expecting and the outputs it will produce # This specifies the inputs the model will be expecting and the outputs it will produce
dataset_metadata = LeRobotDatasetMetadata(dataset_id) dataset_metadata = LeRobotDatasetMetadata(dataset_id)
features = dataset_to_policy_features(dataset_metadata.features) features = dataset_to_policy_features(dataset_metadata.features)
output_features = {key: ft for key, ft in features.items() if ft.type is FeatureType.ACTION} output_features = {key: ft for key, ft in features.items() if ft.type is FeatureType.ACTION}
input_features = {key: ft for key, ft in features.items() if key not in output_features} input_features = {key: ft for key, ft in features.items() if key not in output_features}
cfg = ACTConfig(input_features=input_features, output_features=output_features) cfg = ACTConfig(input_features=input_features, output_features=output_features)
policy = ACTPolicy(cfg) policy = ACTPolicy(cfg)
preprocessor, postprocessor = make_pre_post_processors(cfg, dataset_stats=dataset_metadata.stats) preprocessor, postprocessor = make_pre_post_processors(cfg, dataset_stats=dataset_metadata.stats)
policy.train() policy.train()
policy.to(device) policy.to(device)
# To perform action chunking, ACT expects a given number of actions as targets # To perform action chunking, ACT expects a given number of actions as targets
delta_timestamps = { delta_timestamps = {
"action": make_delta_timestamps(cfg.action_delta_indices, dataset_metadata.fps), "action": make_delta_timestamps(cfg.action_delta_indices, dataset_metadata.fps),
} }
# add image features if they are present # add image features if they are present
delta_timestamps |= { delta_timestamps |= {
k: make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps) for k in cfg.image_features k: make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps)
} for k in cfg.image_features
}
# Instantiate the dataset # Instantiate the dataset
dataset = LeRobotDataset(dataset_id, delta_timestamps=delta_timestamps) dataset = LeRobotDataset(dataset_id, delta_timestamps=delta_timestamps)
# Create the optimizer and dataloader for offline training # Create the optimizer and dataloader for offline training
optimizer = cfg.get_optimizer_preset().build(policy.parameters()) optimizer = cfg.get_optimizer_preset().build(policy.parameters())
batch_size = 32 batch_size = 32
dataloader = torch.utils.data.DataLoader( dataloader = torch.utils.data.DataLoader(
dataset, dataset,
batch_size=batch_size, batch_size=batch_size,
shuffle=True, shuffle=True,
pin_memory=device.type != "cpu", pin_memory=device.type != "cpu",
drop_last=True, drop_last=True,
) )
# Number of training steps and logging frequency # Number of training steps and logging frequency
training_steps = 1 training_steps = 1
log_freq = 1 log_freq = 1
# Run training loop # Run training loop
step = 0 step = 0
done = False done = False
while not done: while not done:
for batch in dataloader: for batch in dataloader:
batch = preprocessor(batch) batch = preprocessor(batch)
loss, _ = policy.forward(batch) loss, _ = policy.forward(batch)
loss.backward() loss.backward()
optimizer.step() optimizer.step()
optimizer.zero_grad() optimizer.zero_grad()
if step % log_freq == 0: if step % log_freq == 0:
print(f"step: {step} loss: {loss.item():.3f}") print(f"step: {step} loss: {loss.item():.3f}")
step += 1 step += 1
if step >= training_steps: if step >= training_steps:
done = True done = True
break break
# Save the policy checkpoint, alongside the pre/post processors # Save the policy checkpoint, alongside the pre/post processors
policy.save_pretrained(output_directory) policy.save_pretrained(output_directory)
preprocessor.save_pretrained(output_directory) preprocessor.save_pretrained(output_directory)
postprocessor.save_pretrained(output_directory) postprocessor.save_pretrained(output_directory)
# Save all assets to the Hub # Save all assets to the Hub
policy.push_to_hub("fracapuano/robot_learning_tutorial_act") policy.push_to_hub("<user>/robot_learning_tutorial_act")
preprocessor.push_to_hub("fracapuano/robot_learning_tutorial_act") preprocessor.push_to_hub("<user>/robot_learning_tutorial_act")
postprocessor.push_to_hub("fracapuano/robot_learning_tutorial_act") postprocessor.push_to_hub("<user>/robot_learning_tutorial_act")
if __name__ == "__main__":
main()
+43 -37
View File
@@ -8,50 +8,56 @@ from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "fracapuano/robot_learning_tutorial_act"
model = ACTPolicy.from_pretrained(model_id)
dataset_id = "lerobot/svla_so101_pickplace"
# This only downloads the metadata for the dataset, ~10s of MB even for large-scale datasets
dataset_metadata = LeRobotDatasetMetadata(dataset_id)
preprocess, postprocess = make_pre_post_processors(model.config, dataset_stats=dataset_metadata.stats)
# # find ports using lerobot-find-port
follower_port = ... # something like "/dev/tty.usbmodem58760431631"
# # the robot ids are used the load the right calibration files
follower_id = ... # something like "follower_so100"
MAX_EPISODES = 5 MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20 MAX_STEPS_PER_EPISODE = 20
# Robot and environment configuration
# Camera keys must match the name and resolutions of the ones used for training!
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"side": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"up": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config) def main():
robot = SO100Follower(robot_cfg) device = torch.device("mps") # or "cuda" or "cpu"
robot.connect() model_id = "<user>/robot_learning_tutorial_act"
model = ACTPolicy.from_pretrained(model_id)
for _ in range(MAX_EPISODES): dataset_id = "lerobot/svla_so101_pickplace"
for _ in range(MAX_STEPS_PER_EPISODE): # This only downloads the metadata for the dataset, ~10s of MB even for large-scale datasets
obs = robot.get_observation() dataset_metadata = LeRobotDatasetMetadata(dataset_id)
obs_frame = build_inference_frame( preprocess, postprocess = make_pre_post_processors(model.config, dataset_stats=dataset_metadata.stats)
observation=obs, ds_features=dataset_metadata.features, device=device
)
obs = preprocess(obs_frame) # # find ports using lerobot-find-port
follower_port = ... # something like "/dev/tty.usbmodem58760431631"
action = model.select_action(obs) # # the robot ids are used the load the right calibration files
action = postprocess(action) follower_id = ... # something like "follower_so100"
action = make_robot_action(action, dataset_metadata.features) # Robot and environment configuration
# Camera keys must match the name and resolutions of the ones used for training!
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"side": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"up": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
robot.send_action(action) robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config)
robot = SO100Follower(robot_cfg)
robot.connect()
print("Episode finished! Starting new episode...") for _ in range(MAX_EPISODES):
for _ in range(MAX_STEPS_PER_EPISODE):
obs = robot.get_observation()
obs_frame = build_inference_frame(
observation=obs, ds_features=dataset_metadata.features, device=device
)
obs = preprocess(obs_frame)
action = model.select_action(obs)
action = postprocess(action)
action = make_robot_action(action, dataset_metadata.features)
robot.send_action(action)
print("Episode finished! Starting new episode...")
if __name__ == "__main__":
main()
+13 -7
View File
@@ -1,11 +1,17 @@
from lerobot.async_inference.configs import PolicyServerConfig from lerobot.async_inference.configs import PolicyServerConfig
from lerobot.async_inference.policy_server import serve from lerobot.async_inference.policy_server import serve
host = ... # something like "127.0.0.1" if you're exposing to localhost
port = ... # something like 8080
config = PolicyServerConfig( def main():
host=host, host = ... # something like "127.0.0.1" if you're exposing to localhost
port=port, port = ... # something like 8080
)
serve(config) config = PolicyServerConfig(
host=host,
port=port,
)
serve(config)
if __name__ == "__main__":
main()
+44 -38
View File
@@ -6,50 +6,56 @@ from lerobot.async_inference.robot_client import RobotClient
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
from lerobot.robots.so100_follower import SO100FollowerConfig from lerobot.robots.so100_follower import SO100FollowerConfig
# these cameras must match the ones expected by the policy - find your cameras with lerobot-find-cameras
# check the config.json on the Hub for the policy you are using to see the expected camera specs
camera_cfg = {
"up": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"side": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
# # find ports using lerobot-find-port def main():
follower_port = ... # something like "/dev/tty.usbmodem58760431631" # these cameras must match the ones expected by the policy - find your cameras with lerobot-find-cameras
# check the config.json on the Hub for the policy you are using to see the expected camera specs
camera_cfg = {
"up": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"side": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
# # the robot ids are used the load the right calibration files # # find ports using lerobot-find-port
follower_id = ... # something like "follower_so100" follower_port = ... # something like "/dev/tty.usbmodem58760431631"
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_cfg) # # the robot ids are used the load the right calibration files
follower_id = ... # something like "follower_so100"
server_address = ... # something like "127.0.0.1:8080" if using localhost robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_cfg)
# 3. Create client configuration server_address = ... # something like "127.0.0.1:8080" if using localhost
client_cfg = RobotClientConfig(
robot=robot_cfg,
server_address=server_address,
policy_device="mps",
policy_type="act",
pretrained_name_or_path="fracapuano/robot_learning_tutorial_act",
chunk_size_threshold=0.5, # g
actions_per_chunk=50, # make sure this is less than the max actions of the policy
)
# 4. Create and start client # 3. Create client configuration
client = RobotClient(client_cfg) client_cfg = RobotClientConfig(
robot=robot_cfg,
server_address=server_address,
policy_device="mps",
policy_type="act",
pretrained_name_or_path="<user>/robot_learning_tutorial_act",
chunk_size_threshold=0.5, # g
actions_per_chunk=50, # make sure this is less than the max actions of the policy
)
# 5. Provide a textual description of the task # 4. Create and start client
task = ... client = RobotClient(client_cfg)
if client.start(): # 5. Provide a textual description of the task
# Start action receiver thread task = ...
action_receiver_thread = threading.Thread(target=client.receive_actions, daemon=True)
action_receiver_thread.start()
try: if client.start():
# Run the control loop # Start action receiver thread
client.control_loop(task) action_receiver_thread = threading.Thread(target=client.receive_actions, daemon=True)
except KeyboardInterrupt: action_receiver_thread.start()
client.stop()
action_receiver_thread.join() try:
# (Optionally) plot the action queue size # Run the control loop
visualize_action_queue_size(client.action_queue_size) client.control_loop(task)
except KeyboardInterrupt:
client.stop()
action_receiver_thread.join()
# (Optionally) plot the action queue size
visualize_action_queue_size(client.action_queue_size)
if __name__ == "__main__":
main()
@@ -19,81 +19,87 @@ def make_delta_timestamps(delta_indices: list[int] | None, fps: int) -> list[flo
return [i / fps for i in delta_indices] return [i / fps for i in delta_indices]
output_directory = Path("outputs/robot_learning_tutorial/diffusion") def main():
output_directory.mkdir(parents=True, exist_ok=True) output_directory = Path("outputs/robot_learning_tutorial/diffusion")
output_directory.mkdir(parents=True, exist_ok=True)
# Select your device # Select your device
device = torch.device("mps") # or "cuda" or "cpu" device = torch.device("mps") # or "cuda" or "cpu"
dataset_id = "lerobot/svla_so101_pickplace" dataset_id = "lerobot/svla_so101_pickplace"
# This specifies the inputs the model will be expecting and the outputs it will produce # This specifies the inputs the model will be expecting and the outputs it will produce
dataset_metadata = LeRobotDatasetMetadata(dataset_id) dataset_metadata = LeRobotDatasetMetadata(dataset_id)
features = dataset_to_policy_features(dataset_metadata.features) features = dataset_to_policy_features(dataset_metadata.features)
output_features = {key: ft for key, ft in features.items() if ft.type is FeatureType.ACTION} output_features = {key: ft for key, ft in features.items() if ft.type is FeatureType.ACTION}
input_features = {key: ft for key, ft in features.items() if key not in output_features} input_features = {key: ft for key, ft in features.items() if key not in output_features}
cfg = DiffusionConfig(input_features=input_features, output_features=output_features) cfg = DiffusionConfig(input_features=input_features, output_features=output_features)
policy = DiffusionPolicy(cfg) policy = DiffusionPolicy(cfg)
preprocessor, postprocessor = make_pre_post_processors(cfg, dataset_stats=dataset_metadata.stats) preprocessor, postprocessor = make_pre_post_processors(cfg, dataset_stats=dataset_metadata.stats)
policy.train() policy.train()
policy.to(device) policy.to(device)
# To perform action chunking, ACT expects a given number of actions as targets # To perform action chunking, ACT expects a given number of actions as targets
delta_timestamps = { delta_timestamps = {
"observation.state": make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps), "observation.state": make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps),
"action": make_delta_timestamps(cfg.action_delta_indices, dataset_metadata.fps), "action": make_delta_timestamps(cfg.action_delta_indices, dataset_metadata.fps),
} }
# add image features if they are present # add image features if they are present
delta_timestamps |= { delta_timestamps |= {
k: make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps) for k in cfg.image_features k: make_delta_timestamps(cfg.observation_delta_indices, dataset_metadata.fps)
} for k in cfg.image_features
}
# Instantiate the dataset # Instantiate the dataset
dataset = LeRobotDataset(dataset_id, delta_timestamps=delta_timestamps) dataset = LeRobotDataset(dataset_id, delta_timestamps=delta_timestamps)
# Create the optimizer and dataloader for offline training # Create the optimizer and dataloader for offline training
optimizer = cfg.get_optimizer_preset().build(policy.parameters()) optimizer = cfg.get_optimizer_preset().build(policy.parameters())
batch_size = 32 batch_size = 32
dataloader = torch.utils.data.DataLoader( dataloader = torch.utils.data.DataLoader(
dataset, dataset,
batch_size=batch_size, batch_size=batch_size,
shuffle=True, shuffle=True,
pin_memory=device.type != "cpu", pin_memory=device.type != "cpu",
drop_last=True, drop_last=True,
) )
# Number of training steps and logging frequency # Number of training steps and logging frequency
training_steps = 1 training_steps = 1
log_freq = 1 log_freq = 1
# Run training loop # Run training loop
step = 0 step = 0
done = False done = False
while not done: while not done:
for batch in dataloader: for batch in dataloader:
batch = preprocessor(batch) batch = preprocessor(batch)
loss, _ = policy.forward(batch) loss, _ = policy.forward(batch)
loss.backward() loss.backward()
optimizer.step() optimizer.step()
optimizer.zero_grad() optimizer.zero_grad()
if step % log_freq == 0: if step % log_freq == 0:
print(f"step: {step} loss: {loss.item():.3f}") print(f"step: {step} loss: {loss.item():.3f}")
step += 1 step += 1
if step >= training_steps: if step >= training_steps:
done = True done = True
break break
# Save the policy checkpoint, alongside the pre/post processors # Save the policy checkpoint, alongside the pre/post processors
policy.save_pretrained(output_directory) policy.save_pretrained(output_directory)
preprocessor.save_pretrained(output_directory) preprocessor.save_pretrained(output_directory)
postprocessor.save_pretrained(output_directory) postprocessor.save_pretrained(output_directory)
# Save all assets to the Hub # Save all assets to the Hub
policy.push_to_hub("fracapuano/robot_learning_tutorial_diffusion") policy.push_to_hub("<user>/robot_learning_tutorial_diffusion")
preprocessor.push_to_hub("fracapuano/robot_learning_tutorial_diffusion") preprocessor.push_to_hub("<user>/robot_learning_tutorial_diffusion")
postprocessor.push_to_hub("fracapuano/robot_learning_tutorial_diffusion") postprocessor.push_to_hub("<user>/robot_learning_tutorial_diffusion")
if __name__ == "__main__":
main()
@@ -8,53 +8,57 @@ from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower from lerobot.robots.so100_follower.so100_follower import SO100Follower
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "fracapuano/robot_learning_tutorial_diffusion"
model = DiffusionPolicy.from_pretrained(model_id)
dataset_id = "lerobot/svla_so101_pickplace"
# This only downloads the metadata for the dataset, ~10s of MB even for large-scale datasets
dataset_metadata = LeRobotDatasetMetadata(dataset_id)
preprocess, postprocess = make_pre_post_processors(
model.config, model_id, dataset_stats=dataset_metadata.stats
)
MAX_EPISODES = 5 MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20 MAX_STEPS_PER_EPISODE = 20
# # find ports using lerobot-find-port def main():
follower_port = ... # something like "/dev/tty.usbmodem58760431631" device = torch.device("mps") # or "cuda" or "cpu"
model_id = "<user>/robot_learning_tutorial_diffusion"
# # the robot ids are used the load the right calibration files model = DiffusionPolicy.from_pretrained(model_id)
follower_id = ... # something like "follower_so100"
# Robot and environment configuration dataset_id = "lerobot/svla_so101_pickplace"
# Camera keys must match the name and resolutions of the ones used for training! # This only downloads the metadata for the dataset, ~10s of MB even for large-scale datasets
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub dataset_metadata = LeRobotDatasetMetadata(dataset_id)
camera_config = { preprocess, postprocess = make_pre_post_processors(
"side": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30), model.config, model_id, dataset_stats=dataset_metadata.stats
"up": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30), )
}
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config) # # find ports using lerobot-find-port
robot = SO100Follower(robot_cfg) follower_port = ... # something like "/dev/tty.usbmodem58760431631"
robot.connect()
# # the robot ids are used the load the right calibration files
follower_id = ... # something like "follower_so100"
# Robot and environment configuration
# Camera keys must match the name and resolutions of the ones used for training!
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"side": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"up": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config)
robot = SO100Follower(robot_cfg)
robot.connect()
for _ in range(MAX_EPISODES):
for _ in range(MAX_STEPS_PER_EPISODE):
obs = robot.get_observation()
obs_frame = build_inference_frame(
observation=obs, ds_features=dataset_metadata.features, device=device
)
obs = preprocess(obs_frame)
action = model.select_action(obs)
action = postprocess(action)
action = make_robot_action(action, dataset_metadata.features)
robot.send_action(action)
print("Episode finished! Starting new episode...")
for _ in range(MAX_EPISODES): if __name__ == "__main__":
for _ in range(MAX_STEPS_PER_EPISODE): main()
obs = robot.get_observation()
obs_frame = build_inference_frame(
observation=obs, ds_features=dataset_metadata.features, device=device
)
obs = preprocess(obs_frame)
action = model.select_action(obs)
action = postprocess(action)
action = make_robot_action(action, dataset_metadata.features)
robot.send_action(action)
print("Episode finished! Starting new episode...")
+48 -42
View File
@@ -11,57 +11,63 @@ from lerobot.robots.so100_follower.so100_follower import SO100Follower
MAX_EPISODES = 5 MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20 MAX_STEPS_PER_EPISODE = 20
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "lerobot/pi0_base"
model = PI0Policy.from_pretrained(model_id) def main():
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "lerobot/pi0_base"
preprocess, postprocess = make_pre_post_processors( model = PI0Policy.from_pretrained(model_id)
model.config,
model_id,
# This overrides allows to run on MPS, otherwise defaults to CUDA (if available)
preprocessor_overrides={"device_processor": {"device": str(device)}},
)
# find ports using lerobot-find-port preprocess, postprocess = make_pre_post_processors(
follower_port = ... # something like "/dev/tty.usbmodem58760431631" model.config,
model_id,
# This overrides allows to run on MPS, otherwise defaults to CUDA (if available)
preprocessor_overrides={"device_processor": {"device": str(device)}},
)
# the robot ids are used the load the right calibration files # find ports using lerobot-find-port
follower_id = ... # something like "follower_so100" follower_port = ... # something like "/dev/tty.usbmodem58760431631"
# Robot and environment configuration # the robot ids are used the load the right calibration files
# Camera keys must match the name and resolutions of the ones used for training! follower_id = ... # something like "follower_so100"
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"base_0_rgb": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"left_wrist_0_rgb": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
"right_wrist_0_rgb": OpenCVCameraConfig(index_or_path=2, width=640, height=480, fps=30),
}
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config) # Robot and environment configuration
robot = SO100Follower(robot_cfg) # Camera keys must match the name and resolutions of the ones used for training!
robot.connect() # You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"base_0_rgb": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"left_wrist_0_rgb": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
"right_wrist_0_rgb": OpenCVCameraConfig(index_or_path=2, width=640, height=480, fps=30),
}
task = "" # something like "pick the red block" robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config)
robot_type = "" # something like "so100_follower" for multi-embodiment datasets robot = SO100Follower(robot_cfg)
robot.connect()
# This is used to match the raw observation keys to the keys expected by the policy task = "" # something like "pick the red block"
action_features = hw_to_dataset_features(robot.action_features, "action") robot_type = "" # something like "so100_follower" for multi-embodiment datasets
obs_features = hw_to_dataset_features(robot.observation_features, "observation")
dataset_features = {**action_features, **obs_features}
for _ in range(MAX_EPISODES): # This is used to match the raw observation keys to the keys expected by the policy
for _ in range(MAX_STEPS_PER_EPISODE): action_features = hw_to_dataset_features(robot.action_features, "action")
obs = robot.get_observation() obs_features = hw_to_dataset_features(robot.observation_features, "observation")
obs_frame = build_inference_frame( dataset_features = {**action_features, **obs_features}
observation=obs, ds_features=dataset_features, device=device, task=task, robot_type=robot_type
)
obs = preprocess(obs_frame) for _ in range(MAX_EPISODES):
for _ in range(MAX_STEPS_PER_EPISODE):
obs = robot.get_observation()
obs_frame = build_inference_frame(
observation=obs, ds_features=dataset_features, device=device, task=task, robot_type=robot_type
)
action = model.select_action(obs) obs = preprocess(obs_frame)
action = postprocess(action)
action = make_robot_action(action, dataset_features)
robot.send_action(action)
print("Episode finished! Starting new episode...") action = model.select_action(obs)
action = postprocess(action)
action = make_robot_action(action, dataset_features)
robot.send_action(action)
print("Episode finished! Starting new episode...")
if __name__ == "__main__":
main()
+105 -103
View File
@@ -20,6 +20,8 @@ from lerobot.teleoperators.utils import TeleopEvents
LOG_EVERY = 10 LOG_EVERY = 10
SEND_EVERY = 10 SEND_EVERY = 10
MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20
def run_learner( def run_learner(
@@ -223,123 +225,123 @@ def make_policy_obs(obs, device: torch.device = "cpu"):
} }
"""Main function - coordinates actor and learner processes.""" def main():
"""Main function - coordinates actor and learner processes."""
device = "mps" # or "cuda" or "cpu" device = "mps" # or "cuda" or "cpu"
output_directory = Path("outputs/robot_learning_tutorial/hil_serl") output_directory = Path("outputs/robot_learning_tutorial/hil_serl")
output_directory.mkdir(parents=True, exist_ok=True) output_directory.mkdir(parents=True, exist_ok=True)
# find ports using lerobot-find-port # find ports using lerobot-find-port
follower_port = ... follower_port = ...
leader_port = ... leader_port = ...
# the robot ids are used the load the right calibration files # the robot ids are used the load the right calibration files
follower_id = ... follower_id = ...
leader_id = ... leader_id = ...
# A pretrained model (to be used in-distribution!) # A pretrained model (to be used in-distribution!)
reward_classifier_id = "fracapuano/reward_classifier_hil_serl_example" reward_classifier_id = "<user>/reward_classifier_hil_serl_example"
reward_classifier = Classifier.from_pretrained(reward_classifier_id) reward_classifier = Classifier.from_pretrained(reward_classifier_id)
reward_classifier.to(device) reward_classifier.to(device)
reward_classifier.eval() reward_classifier.eval()
MAX_EPISODES = 5 # Robot and environment configuration
MAX_STEPS_PER_EPISODE = 20 robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id)
teleop_cfg = SO100LeaderConfig(port=leader_port, id=leader_id)
processor_cfg = HILSerlProcessorConfig(control_mode="leader")
# Robot and environment configuration env_cfg = HILSerlRobotEnvConfig(robot=robot_cfg, teleop=teleop_cfg, processor=processor_cfg)
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id)
teleop_cfg = SO100LeaderConfig(port=leader_port, id=leader_id)
processor_cfg = HILSerlProcessorConfig(control_mode="leader")
env_cfg = HILSerlRobotEnvConfig(robot=robot_cfg, teleop=teleop_cfg, processor=processor_cfg) # Create robot environment
env, teleop_device = make_robot_env(env_cfg)
# Create robot environment obs_features = hw_to_dataset_features(env.robot.observation_features, "observation")
env, teleop_device = make_robot_env(env_cfg) action_features = hw_to_dataset_features(env.robot.action_features, "action")
obs_features = hw_to_dataset_features(env.robot.observation_features, "observation") # Create SAC policy for action selection
action_features = hw_to_dataset_features(env.robot.action_features, "action") policy_cfg = SACConfig(
device=device,
input_features=obs_features,
output_features=action_features,
)
# Create SAC policy for action selection policy_actor = SACPolicy(policy_cfg)
policy_cfg = SACConfig( policy_learner = SACPolicy(policy_cfg)
device=device,
input_features=obs_features,
output_features=action_features,
)
policy_actor = SACPolicy(policy_cfg) demonstrations_repo_id = "lerobot/example_hil_serl_dataset"
policy_learner = SACPolicy(policy_cfg) offline_dataset = LeRobotDataset(repo_id=demonstrations_repo_id)
demonstrations_repo_id = "lerobot/example_hil_serl_dataset" # Online buffer: initialized from scratch
offline_dataset = LeRobotDataset(repo_id=demonstrations_repo_id) online_replay_buffer = ReplayBuffer(device=device, state_keys=list(obs_features.keys()))
# Offline buffer: Created from dataset (pre-populated it with demonstrations)
offline_replay_buffer = ReplayBuffer.from_lerobot_dataset(
lerobot_dataset=offline_dataset, device=device, state_keys=list(obs_features.keys())
)
# Online buffer: initialized from scratch # Create communication channels between learner and actor processes
online_replay_buffer = ReplayBuffer(device=device, state_keys=list(obs_features.keys())) transitions_queue = mp.Queue(maxsize=10)
# Offline buffer: Created from dataset (pre-populated it with demonstrations) parameters_queue = mp.Queue(maxsize=2)
offline_replay_buffer = ReplayBuffer.from_lerobot_dataset( shutdown_event = mp.Event()
lerobot_dataset=offline_dataset, device=device, state_keys=list(obs_features.keys())
)
# Create communication channels between learner and actor processes # Signal handler for graceful shutdown
transitions_queue = mp.Queue(maxsize=10) def signal_handler(sig):
parameters_queue = mp.Queue(maxsize=2) print(f"\nSignal {sig} received, shutting down...")
shutdown_event = mp.Event() shutdown_event.set()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
# Create processes
learner_process = mp.Process(
target=run_learner,
args=(
transitions_queue,
parameters_queue,
shutdown_event,
policy_learner,
online_replay_buffer,
offline_replay_buffer,
),
kwargs={"device": device}, # can run on accelerated hardware for training
)
actor_process = mp.Process(
target=run_actor,
args=(
transitions_queue,
parameters_queue,
shutdown_event,
policy_actor,
reward_classifier,
env_cfg,
output_directory,
),
kwargs={"device": "cpu"}, # actor is frozen, can run on CPU or accelerate for inference
)
learner_process.start()
actor_process.start()
try:
# Wait for actor to finish (it controls the episode loop)
actor_process.join()
shutdown_event.set()
learner_process.join(timeout=10)
except KeyboardInterrupt:
print("Main process interrupted")
shutdown_event.set()
actor_process.join(timeout=5)
learner_process.join(timeout=10)
finally:
if learner_process.is_alive():
learner_process.terminate()
if actor_process.is_alive():
actor_process.terminate()
# Signal handler for graceful shutdown if __name__ == "__main__":
def signal_handler(sig): main()
print(f"\nSignal {sig} received, shutting down...")
shutdown_event.set()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
# Create processes
learner_process = mp.Process(
target=run_learner,
args=(
transitions_queue,
parameters_queue,
shutdown_event,
policy_learner,
online_replay_buffer,
offline_replay_buffer,
),
kwargs={"device": device}, # can run on accelerated hardware for training
)
actor_process = mp.Process(
target=run_actor,
args=(
transitions_queue,
parameters_queue,
shutdown_event,
policy_actor,
reward_classifier,
env_cfg,
output_directory,
),
kwargs={"device": "cpu"}, # actor is frozen, can run on CPU or accelerate for inference
)
learner_process.start()
actor_process.start()
try:
# Wait for actor to finish (it controls the episode loop)
actor_process.join()
shutdown_event.set()
learner_process.join(timeout=10)
except KeyboardInterrupt:
print("Main process interrupted")
shutdown_event.set()
actor_process.join(timeout=5)
learner_process.join(timeout=10)
finally:
if learner_process.is_alive():
learner_process.terminate()
if actor_process.is_alive():
actor_process.terminate()
@@ -4,59 +4,64 @@ from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.policies.factory import make_policy, make_pre_post_processors from lerobot.policies.factory import make_policy, make_pre_post_processors
from lerobot.policies.sac.reward_model.configuration_classifier import RewardClassifierConfig from lerobot.policies.sac.reward_model.configuration_classifier import RewardClassifierConfig
# Device to use for training
device = "mps" # or "cuda", or "cpu"
# Load the dataset used for training def main():
repo_id = "lerobot/example_hil_serl_dataset" # Device to use for training
dataset = LeRobotDataset(repo_id) device = "mps" # or "cuda", or "cpu"
# Configure the policy to extract features from the image frames # Load the dataset used for training
camera_keys = dataset.meta.camera_keys repo_id = "lerobot/example_hil_serl_dataset"
dataset = LeRobotDataset(repo_id)
config = RewardClassifierConfig( # Configure the policy to extract features from the image frames
num_cameras=len(camera_keys), camera_keys = dataset.meta.camera_keys
device=device,
# backbone model to extract features from the image frames
model_name="microsoft/resnet-18",
)
# Make policy, preprocessor, and optimizer config = RewardClassifierConfig(
policy = make_policy(config, ds_meta=dataset.meta) num_cameras=len(camera_keys),
optimizer = config.get_optimizer_preset().build(policy.parameters()) device=device,
preprocessor, _ = make_pre_post_processors(policy_cfg=config, dataset_stats=dataset.meta.stats) # backbone model to extract features from the image frames
model_name="microsoft/resnet-18",
)
# Make policy, preprocessor, and optimizer
policy = make_policy(config, ds_meta=dataset.meta)
optimizer = config.get_optimizer_preset().build(policy.parameters())
preprocessor, _ = make_pre_post_processors(policy_cfg=config, dataset_stats=dataset.meta.stats)
classifier_id = "<user>/reward_classifier_hil_serl_example"
# Instantiate a dataloader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
# Training loop
num_epochs = 5
for epoch in range(num_epochs):
total_loss = 0
total_accuracy = 0
for batch in dataloader:
# Preprocess the batch and move it to the correct device.
batch = preprocessor(batch)
# Forward pass
loss, output_dict = policy.forward(batch)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
total_accuracy += output_dict["accuracy"]
avg_loss = total_loss / len(dataloader)
avg_accuracy = total_accuracy / len(dataloader)
print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}, Accuracy: {avg_accuracy:.2f}%")
print("Training finished!")
# You can now save the trained policy.
policy.push_to_hub(classifier_id)
classifier_id = "fracapuano/reward_classifier_hil_serl_example" if __name__ == "__main__":
main()
# Instantiate a dataloader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
# Training loop
num_epochs = 5
for epoch in range(num_epochs):
total_loss = 0
total_accuracy = 0
for batch in dataloader:
# Preprocess the batch and move it to the correct device.
batch = preprocessor(batch)
# Forward pass
loss, output_dict = policy.forward(batch)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
total_accuracy += output_dict["accuracy"]
avg_loss = total_loss / len(dataloader)
avg_accuracy = total_accuracy / len(dataloader)
print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}, Accuracy: {avg_accuracy:.2f}%")
print("Training finished!")
# You can now save the trained policy.
policy.push_to_hub(classifier_id)
@@ -11,56 +11,62 @@ from lerobot.robots.so100_follower.so100_follower import SO100Follower
MAX_EPISODES = 5 MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20 MAX_STEPS_PER_EPISODE = 20
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "lerobot/smolvla_base"
model = SmolVLAPolicy.from_pretrained(model_id) def main():
device = torch.device("mps") # or "cuda" or "cpu"
model_id = "lerobot/smolvla_base"
preprocess, postprocess = make_pre_post_processors( model = SmolVLAPolicy.from_pretrained(model_id)
model.config,
model_id,
# This overrides allows to run on MPS, otherwise defaults to CUDA (if available)
preprocessor_overrides={"device_processor": {"device": str(device)}},
)
# find ports using lerobot-find-port preprocess, postprocess = make_pre_post_processors(
follower_port = ... # something like "/dev/tty.usbmodem58760431631" model.config,
model_id,
# This overrides allows to run on MPS, otherwise defaults to CUDA (if available)
preprocessor_overrides={"device_processor": {"device": str(device)}},
)
# the robot ids are used the load the right calibration files # find ports using lerobot-find-port
follower_id = ... # something like "follower_so100" follower_port = ... # something like "/dev/tty.usbmodem58760431631"
# Robot and environment configuration # the robot ids are used the load the right calibration files
# Camera keys must match the name and resolutions of the ones used for training! follower_id = ... # something like "follower_so100"
# You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"camera1": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"camera2": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config) # Robot and environment configuration
robot = SO100Follower(robot_cfg) # Camera keys must match the name and resolutions of the ones used for training!
robot.connect() # You can check the camera keys expected by a model in the info.json card on the model card on the Hub
camera_config = {
"camera1": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=30),
"camera2": OpenCVCameraConfig(index_or_path=1, width=640, height=480, fps=30),
}
task = "" # something like "pick the red block" robot_cfg = SO100FollowerConfig(port=follower_port, id=follower_id, cameras=camera_config)
robot_type = "" # something like "so100_follower" for multi-embodiment datasets robot = SO100Follower(robot_cfg)
robot.connect()
# This is used to match the raw observation keys to the keys expected by the policy task = "" # something like "pick the red block"
action_features = hw_to_dataset_features(robot.action_features, "action") robot_type = "" # something like "so100_follower" for multi-embodiment datasets
obs_features = hw_to_dataset_features(robot.observation_features, "observation")
dataset_features = {**action_features, **obs_features}
for _ in range(MAX_EPISODES): # This is used to match the raw observation keys to the keys expected by the policy
for _ in range(MAX_STEPS_PER_EPISODE): action_features = hw_to_dataset_features(robot.action_features, "action")
obs = robot.get_observation() obs_features = hw_to_dataset_features(robot.observation_features, "observation")
obs_frame = build_inference_frame( dataset_features = {**action_features, **obs_features}
observation=obs, ds_features=dataset_features, device=device, task=task, robot_type=robot_type
)
obs = preprocess(obs_frame) for _ in range(MAX_EPISODES):
for _ in range(MAX_STEPS_PER_EPISODE):
obs = robot.get_observation()
obs_frame = build_inference_frame(
observation=obs, ds_features=dataset_features, device=device, task=task, robot_type=robot_type
)
action = model.select_action(obs) obs = preprocess(obs_frame)
action = postprocess(action)
action = make_robot_action(action, dataset_features)
robot.send_action(action)
print("Episode finished! Starting new episode...") action = model.select_action(obs)
action = postprocess(action)
action = make_robot_action(action, dataset_features)
robot.send_action(action)
print("Episode finished! Starting new episode...")
if __name__ == "__main__":
main()
+6 -4
View File
@@ -110,8 +110,8 @@ def worker_thread_loop(queue: queue.Queue):
if item is None: if item is None:
queue.task_done() queue.task_done()
break break
image_array, fpath = item image_array, fpath, compress_level = item
write_image(image_array, fpath) write_image(image_array, fpath, compress_level)
queue.task_done() queue.task_done()
@@ -169,11 +169,13 @@ class AsyncImageWriter:
p.start() p.start()
self.processes.append(p) self.processes.append(p)
def save_image(self, image: torch.Tensor | np.ndarray | PIL.Image.Image, fpath: Path): def save_image(
self, image: torch.Tensor | np.ndarray | PIL.Image.Image, fpath: Path, compress_level: int = 1
):
if isinstance(image, torch.Tensor): if isinstance(image, torch.Tensor):
# Convert tensor to numpy array to minimize main process time # Convert tensor to numpy array to minimize main process time
image = image.cpu().numpy() image = image.cpu().numpy()
self.queue.put((image, fpath)) self.queue.put((image, fpath, compress_level))
def wait_until_done(self): def wait_until_done(self):
self.queue.join() self.queue.join()
+95 -20
View File
@@ -13,6 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import concurrent.futures
import contextlib import contextlib
import logging import logging
import shutil import shutil
@@ -539,6 +540,15 @@ class LeRobotDatasetMetadata:
return obj return obj
def _encode_video_worker(video_key: str, episode_index: int, root: Path, fps: int) -> Path:
temp_path = Path(tempfile.mkdtemp(dir=root)) / f"{video_key}_{episode_index:03d}.mp4"
fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=episode_index, frame_index=0)
img_dir = (root / fpath).parent
encode_video_frames(img_dir, temp_path, fps, overwrite=True)
shutil.rmtree(img_dir)
return temp_path
class LeRobotDataset(torch.utils.data.Dataset): class LeRobotDataset(torch.utils.data.Dataset):
def __init__( def __init__(
self, self,
@@ -712,6 +722,15 @@ class LeRobotDataset(torch.utils.data.Dataset):
self.download(download_videos) self.download(download_videos)
self.hf_dataset = self.load_hf_dataset() self.hf_dataset = self.load_hf_dataset()
# Create mapping from absolute indices to relative indices when only a subset of the episodes are loaded
# Build a mapping: absolute_index -> relative_index_in_filtered_dataset
self._absolute_to_relative_idx = None
if self.episodes is not None:
self._absolute_to_relative_idx = {
abs_idx.item() if isinstance(abs_idx, torch.Tensor) else abs_idx: rel_idx
for rel_idx, abs_idx in enumerate(self.hf_dataset["index"])
}
# Setup delta_indices # Setup delta_indices
if self.delta_timestamps is not None: if self.delta_timestamps is not None:
check_delta_timestamps(self.delta_timestamps, self.fps, self.tolerance_s) check_delta_timestamps(self.delta_timestamps, self.fps, self.tolerance_s)
@@ -830,7 +849,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
def load_hf_dataset(self) -> datasets.Dataset: def load_hf_dataset(self) -> datasets.Dataset:
"""hf_dataset contains all the observations, states, actions, rewards, etc.""" """hf_dataset contains all the observations, states, actions, rewards, etc."""
features = get_hf_features_from_features(self.features) features = get_hf_features_from_features(self.features)
hf_dataset = load_nested_dataset(self.root / "data", features=features) hf_dataset = load_nested_dataset(self.root / "data", features=features, episodes=self.episodes)
hf_dataset.set_transform(hf_transform_to_torch) hf_dataset.set_transform(hf_transform_to_torch)
return hf_dataset return hf_dataset
@@ -847,10 +866,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
# Determine requested episodes # Determine requested episodes
if self.episodes is None: if self.episodes is None:
# Requesting all episodes - check if we have all episodes from metadata
requested_episodes = set(range(self.meta.total_episodes)) requested_episodes = set(range(self.meta.total_episodes))
else: else:
# Requesting specific episodes
requested_episodes = set(self.episodes) requested_episodes = set(self.episodes)
# Check if all requested episodes are available in cached data # Check if all requested episodes are available in cached data
@@ -932,7 +949,11 @@ class LeRobotDataset(torch.utils.data.Dataset):
query_timestamps = {} query_timestamps = {}
for key in self.meta.video_keys: for key in self.meta.video_keys:
if query_indices is not None and key in query_indices: if query_indices is not None and key in query_indices:
timestamps = self.hf_dataset[query_indices[key]]["timestamp"] if self._absolute_to_relative_idx is not None:
relative_indices = [self._absolute_to_relative_idx[idx] for idx in query_indices[key]]
timestamps = self.hf_dataset[relative_indices]["timestamp"]
else:
timestamps = self.hf_dataset[query_indices[key]]["timestamp"]
query_timestamps[key] = torch.stack(timestamps).tolist() query_timestamps[key] = torch.stack(timestamps).tolist()
else: else:
query_timestamps[key] = [current_ts] query_timestamps[key] = [current_ts]
@@ -955,10 +976,16 @@ class LeRobotDataset(torch.utils.data.Dataset):
for key, q_idx in query_indices.items(): for key, q_idx in query_indices.items():
if key in self.meta.video_keys: if key in self.meta.video_keys:
continue continue
# Map absolute indices to relative indices if needed
relative_indices = (
q_idx
if self._absolute_to_relative_idx is None
else [self._absolute_to_relative_idx[idx] for idx in q_idx]
)
try: try:
result[key] = torch.stack(self.hf_dataset[key][q_idx]) result[key] = torch.stack(self.hf_dataset[key][relative_indices])
except (KeyError, TypeError, IndexError): except (KeyError, TypeError, IndexError):
result[key] = torch.stack(self.hf_dataset[q_idx][key]) result[key] = torch.stack(self.hf_dataset[relative_indices][key])
return result return result
def _query_videos(self, query_timestamps: dict[str, list[float]], ep_idx: int) -> dict[str, torch.Tensor]: def _query_videos(self, query_timestamps: dict[str, list[float]], ep_idx: int) -> dict[str, torch.Tensor]:
@@ -1054,6 +1081,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
ep_buffer[key] = current_ep_idx if key == "episode_index" else [] ep_buffer[key] = current_ep_idx if key == "episode_index" else []
return ep_buffer return ep_buffer
# TODO(Steven): consider move this to utils
def _get_image_file_path(self, episode_index: int, image_key: str, frame_index: int) -> Path: def _get_image_file_path(self, episode_index: int, image_key: str, frame_index: int) -> Path:
fpath = DEFAULT_IMAGE_PATH.format( fpath = DEFAULT_IMAGE_PATH.format(
image_key=image_key, episode_index=episode_index, frame_index=frame_index image_key=image_key, episode_index=episode_index, frame_index=frame_index
@@ -1063,13 +1091,15 @@ class LeRobotDataset(torch.utils.data.Dataset):
def _get_image_file_dir(self, episode_index: int, image_key: str) -> Path: def _get_image_file_dir(self, episode_index: int, image_key: str) -> Path:
return self._get_image_file_path(episode_index, image_key, frame_index=0).parent return self._get_image_file_path(episode_index, image_key, frame_index=0).parent
def _save_image(self, image: torch.Tensor | np.ndarray | PIL.Image.Image, fpath: Path) -> None: def _save_image(
self, image: torch.Tensor | np.ndarray | PIL.Image.Image, fpath: Path, compress_level: int = 1
) -> None:
if self.image_writer is None: if self.image_writer is None:
if isinstance(image, torch.Tensor): if isinstance(image, torch.Tensor):
image = image.cpu().numpy() image = image.cpu().numpy()
write_image(image, fpath) write_image(image, fpath, compress_level=compress_level)
else: else:
self.image_writer.save_image(image=image, fpath=fpath) self.image_writer.save_image(image=image, fpath=fpath, compress_level=compress_level)
def add_frame(self, frame: dict) -> None: def add_frame(self, frame: dict) -> None:
""" """
@@ -1107,14 +1137,19 @@ class LeRobotDataset(torch.utils.data.Dataset):
) )
if frame_index == 0: if frame_index == 0:
img_path.parent.mkdir(parents=True, exist_ok=True) img_path.parent.mkdir(parents=True, exist_ok=True)
self._save_image(frame[key], img_path) compress_level = 1 if self.features[key]["dtype"] == "video" else 6
self._save_image(frame[key], img_path, compress_level)
self.episode_buffer[key].append(str(img_path)) self.episode_buffer[key].append(str(img_path))
else: else:
self.episode_buffer[key].append(frame[key]) self.episode_buffer[key].append(frame[key])
self.episode_buffer["size"] += 1 self.episode_buffer["size"] += 1
def save_episode(self, episode_data: dict | None = None) -> None: def save_episode(
self,
episode_data: dict | None = None,
parallel_encoding: bool = True,
) -> None:
""" """
This will save to disk the current episode in self.episode_buffer. This will save to disk the current episode in self.episode_buffer.
@@ -1126,6 +1161,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
episode_data (dict | None, optional): Dict containing the episode data to save. If None, this will episode_data (dict | None, optional): Dict containing the episode data to save. If None, this will
save the current episode in self.episode_buffer, which is filled with 'add_frame'. Defaults to save the current episode in self.episode_buffer, which is filled with 'add_frame'. Defaults to
None. None.
parallel_encoding (bool, optional): If True, encode videos in parallel using ProcessPoolExecutor.
Defaults to True on Linux, False on macOS as it tends to use all the CPU available already.
""" """
episode_buffer = episode_data if episode_data is not None else self.episode_buffer episode_buffer = episode_data if episode_data is not None else self.episode_buffer
@@ -1162,8 +1199,40 @@ class LeRobotDataset(torch.utils.data.Dataset):
use_batched_encoding = self.batch_encoding_size > 1 use_batched_encoding = self.batch_encoding_size > 1
if has_video_keys and not use_batched_encoding: if has_video_keys and not use_batched_encoding:
for video_key in self.meta.video_keys: num_cameras = len(self.meta.video_keys)
ep_metadata.update(self._save_episode_video(video_key, episode_index)) if parallel_encoding and num_cameras > 1:
# TODO(Steven): Ideally we would like to control the number of threads per encoding such that:
# num_cameras * num_threads = (total_cpu -1)
with concurrent.futures.ProcessPoolExecutor(max_workers=num_cameras) as executor:
future_to_key = {
executor.submit(
_encode_video_worker,
video_key,
episode_index,
self.root,
self.fps,
): video_key
for video_key in self.meta.video_keys
}
results = {}
for future in concurrent.futures.as_completed(future_to_key):
video_key = future_to_key[future]
try:
temp_path = future.result()
results[video_key] = temp_path
except Exception as exc:
logging.error(f"Video encoding failed for {video_key}: {exc}")
raise exc
for video_key in self.meta.video_keys:
temp_path = results[video_key]
ep_metadata.update(
self._save_episode_video(video_key, episode_index, temp_path=temp_path)
)
else:
for video_key in self.meta.video_keys:
ep_metadata.update(self._save_episode_video(video_key, episode_index))
# `meta.save_episode` need to be executed after encoding the videos # `meta.save_episode` need to be executed after encoding the videos
self.meta.save_episode(episode_index, episode_length, episode_tasks, ep_stats, ep_metadata) self.meta.save_episode(episode_index, episode_length, episode_tasks, ep_stats, ep_metadata)
@@ -1328,9 +1397,18 @@ class LeRobotDataset(torch.utils.data.Dataset):
return metadata return metadata
def _save_episode_video(self, video_key: str, episode_index: int) -> dict: def _save_episode_video(
self,
video_key: str,
episode_index: int,
temp_path: Path | None = None,
) -> dict:
# Encode episode frames into a temporary video # Encode episode frames into a temporary video
ep_path = self._encode_temporary_episode_video(video_key, episode_index) if temp_path is None:
ep_path = self._encode_temporary_episode_video(video_key, episode_index)
else:
ep_path = temp_path
ep_size_in_mb = get_file_size_in_mb(ep_path) ep_size_in_mb = get_file_size_in_mb(ep_path)
ep_duration_in_s = get_video_duration_in_s(ep_path) ep_duration_in_s = get_video_duration_in_s(ep_path)
@@ -1448,11 +1526,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
Note: `encode_video_frames` is a blocking call. Making it asynchronous shouldn't speedup encoding, Note: `encode_video_frames` is a blocking call. Making it asynchronous shouldn't speedup encoding,
since video encoding with ffmpeg is already using multithreading. since video encoding with ffmpeg is already using multithreading.
""" """
temp_path = Path(tempfile.mkdtemp(dir=self.root)) / f"{video_key}_{episode_index:03d}.mp4" return _encode_video_worker(video_key, episode_index, self.root, self.fps)
img_dir = self._get_image_file_dir(episode_index, video_key)
encode_video_frames(img_dir, temp_path, self.fps, overwrite=True)
shutil.rmtree(img_dir)
return temp_path
@classmethod @classmethod
def create( def create(
@@ -1498,6 +1572,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
obj.image_transforms = None obj.image_transforms = None
obj.delta_timestamps = None obj.delta_timestamps = None
obj.delta_indices = None obj.delta_indices = None
obj._absolute_to_relative_idx = None
obj.video_backend = video_backend if video_backend is not None else get_safe_default_codec() obj.video_backend = video_backend if video_backend is not None else get_safe_default_codec()
obj.writer = None obj.writer = None
obj.latest_episode = None obj.latest_episode = None
+19 -5
View File
@@ -28,6 +28,7 @@ import numpy as np
import packaging.version import packaging.version
import pandas import pandas
import pandas as pd import pandas as pd
import pyarrow.dataset as pa_ds
import pyarrow.parquet as pq import pyarrow.parquet as pq
import torch import torch
from datasets import Dataset from datasets import Dataset
@@ -48,7 +49,7 @@ from lerobot.utils.utils import SuppressProgressBars, is_valid_numpy_dtype_strin
DEFAULT_CHUNK_SIZE = 1000 # Max number of files per chunk DEFAULT_CHUNK_SIZE = 1000 # Max number of files per chunk
DEFAULT_DATA_FILE_SIZE_IN_MB = 100 # Max size per file DEFAULT_DATA_FILE_SIZE_IN_MB = 100 # Max size per file
DEFAULT_VIDEO_FILE_SIZE_IN_MB = 500 # Max size per file DEFAULT_VIDEO_FILE_SIZE_IN_MB = 200 # Max size per file
INFO_PATH = "meta/info.json" INFO_PATH = "meta/info.json"
STATS_PATH = "meta/stats.json" STATS_PATH = "meta/stats.json"
@@ -103,7 +104,9 @@ def update_chunk_file_indices(chunk_idx: int, file_idx: int, chunks_size: int) -
return chunk_idx, file_idx return chunk_idx, file_idx
def load_nested_dataset(pq_dir: Path, features: datasets.Features | None = None) -> Dataset: def load_nested_dataset(
pq_dir: Path, features: datasets.Features | None = None, episodes: list[int] | None = None
) -> Dataset:
"""Find parquet files in provided directory {pq_dir}/chunk-xxx/file-xxx.parquet """Find parquet files in provided directory {pq_dir}/chunk-xxx/file-xxx.parquet
Convert parquet files to pyarrow memory mapped in a cache folder for efficient RAM usage Convert parquet files to pyarrow memory mapped in a cache folder for efficient RAM usage
Concatenate all pyarrow references to return HF Dataset format Concatenate all pyarrow references to return HF Dataset format
@@ -111,15 +114,26 @@ def load_nested_dataset(pq_dir: Path, features: datasets.Features | None = None)
Args: Args:
pq_dir: Directory containing parquet files pq_dir: Directory containing parquet files
features: Optional features schema to ensure consistent loading of complex types like images features: Optional features schema to ensure consistent loading of complex types like images
episodes: Optional list of episode indices to filter. Uses PyArrow predicate pushdown for efficiency.
""" """
paths = sorted(pq_dir.glob("*/*.parquet")) paths = sorted(pq_dir.glob("*/*.parquet"))
if len(paths) == 0: if len(paths) == 0:
raise FileNotFoundError(f"Provided directory does not contain any parquet file: {pq_dir}") raise FileNotFoundError(f"Provided directory does not contain any parquet file: {pq_dir}")
# TODO(rcadene): set num_proc to accelerate conversion to pyarrow
with SuppressProgressBars(): with SuppressProgressBars():
datasets = Dataset.from_parquet([str(path) for path in paths], features=features) # When no filtering needed, Dataset uses memory-mapped loading for efficiency
return datasets # PyArrow loads the entire dataset into memory
if episodes is None:
return Dataset.from_parquet([str(path) for path in paths], features=features)
arrow_dataset = pa_ds.dataset(paths, format="parquet")
filter_expr = pa_ds.field("episode_index").isin(episodes)
table = arrow_dataset.to_table(filter=filter_expr)
if features is not None:
table = table.cast(features.arrow_schema)
return Dataset(table)
def get_parquet_num_frames(parquet_path: str | Path) -> int: def get_parquet_num_frames(parquet_path: str | Path) -> int:
+4
View File
@@ -311,6 +311,7 @@ def encode_video_frames(
fast_decode: int = 0, fast_decode: int = 0,
log_level: int | None = av.logging.ERROR, log_level: int | None = av.logging.ERROR,
overwrite: bool = False, overwrite: bool = False,
preset: int | None = None,
) -> None: ) -> None:
"""More info on ffmpeg arguments tuning on `benchmark/video/README.md`""" """More info on ffmpeg arguments tuning on `benchmark/video/README.md`"""
# Check encoder availability # Check encoder availability
@@ -359,6 +360,9 @@ def encode_video_frames(
value = f"fast-decode={fast_decode}" if vcodec == "libsvtav1" else "fastdecode" value = f"fast-decode={fast_decode}" if vcodec == "libsvtav1" else "fastdecode"
video_options[key] = value video_options[key] = value
if vcodec == "libsvtav1":
video_options["preset"] = str(preset) if preset is not None else "12"
# Set logging level # Set logging level
if log_level is not None: if log_level is not None:
# "While less efficient, it is generally preferable to modify logging with Python's logging" # "While less efficient, it is generally preferable to modify logging with Python's logging"
+57 -9
View File
@@ -21,7 +21,22 @@ import draccus
from lerobot.configs.types import FeatureType, PolicyFeature from lerobot.configs.types import FeatureType, PolicyFeature
from lerobot.robots import RobotConfig from lerobot.robots import RobotConfig
from lerobot.teleoperators.config import TeleoperatorConfig from lerobot.teleoperators.config import TeleoperatorConfig
from lerobot.utils.constants import ACTION, OBS_ENV_STATE, OBS_IMAGE, OBS_IMAGES, OBS_STATE from lerobot.utils.constants import (
ACTION,
LIBERO_KEY_EEF_MAT,
LIBERO_KEY_EEF_POS,
LIBERO_KEY_EEF_QUAT,
LIBERO_KEY_GRIPPER_QPOS,
LIBERO_KEY_GRIPPER_QVEL,
LIBERO_KEY_JOINTS_POS,
LIBERO_KEY_JOINTS_VEL,
LIBERO_KEY_PIXELS_AGENTVIEW,
LIBERO_KEY_PIXELS_EYE_IN_HAND,
OBS_ENV_STATE,
OBS_IMAGE,
OBS_IMAGES,
OBS_STATE,
)
@dataclass @dataclass
@@ -246,28 +261,61 @@ class LiberoEnv(EnvConfig):
features_map: dict[str, str] = field( features_map: dict[str, str] = field(
default_factory=lambda: { default_factory=lambda: {
ACTION: ACTION, ACTION: ACTION,
"agent_pos": OBS_STATE, LIBERO_KEY_EEF_POS: f"{OBS_STATE}.eef_pos",
"pixels/agentview_image": f"{OBS_IMAGES}.image", LIBERO_KEY_EEF_QUAT: f"{OBS_STATE}.eef_quat",
"pixels/robot0_eye_in_hand_image": f"{OBS_IMAGES}.image2", LIBERO_KEY_EEF_MAT: f"{OBS_STATE}.eef_mat",
LIBERO_KEY_GRIPPER_QPOS: f"{OBS_STATE}.gripper_qpos",
LIBERO_KEY_GRIPPER_QVEL: f"{OBS_STATE}.gripper_qvel",
LIBERO_KEY_JOINTS_POS: f"{OBS_STATE}.joint_pos",
LIBERO_KEY_JOINTS_VEL: f"{OBS_STATE}.joint_vel",
LIBERO_KEY_PIXELS_AGENTVIEW: f"{OBS_IMAGES}.image",
LIBERO_KEY_PIXELS_EYE_IN_HAND: f"{OBS_IMAGES}.image2",
} }
) )
def __post_init__(self): def __post_init__(self):
if self.obs_type == "pixels": if self.obs_type == "pixels":
self.features["pixels/agentview_image"] = PolicyFeature( self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3) type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
) )
self.features["pixels/robot0_eye_in_hand_image"] = PolicyFeature( self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3) type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
) )
elif self.obs_type == "pixels_agent_pos": elif self.obs_type == "pixels_agent_pos":
self.features["agent_pos"] = PolicyFeature(type=FeatureType.STATE, shape=(8,)) self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
self.features["pixels/agentview_image"] = PolicyFeature(
type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3) type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
) )
self.features["pixels/robot0_eye_in_hand_image"] = PolicyFeature( self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3) type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
) )
self.features[LIBERO_KEY_EEF_POS] = PolicyFeature(
type=FeatureType.STATE,
shape=(3,),
)
self.features[LIBERO_KEY_EEF_QUAT] = PolicyFeature(
type=FeatureType.STATE,
shape=(4,),
)
self.features[LIBERO_KEY_EEF_MAT] = PolicyFeature(
type=FeatureType.STATE,
shape=(3, 3),
)
self.features[LIBERO_KEY_GRIPPER_QPOS] = PolicyFeature(
type=FeatureType.STATE,
shape=(2,),
)
self.features[LIBERO_KEY_GRIPPER_QVEL] = PolicyFeature(
type=FeatureType.STATE,
shape=(2,),
)
self.features[LIBERO_KEY_JOINTS_POS] = PolicyFeature(
type=FeatureType.STATE,
shape=(7,),
)
self.features[LIBERO_KEY_JOINTS_VEL] = PolicyFeature(
type=FeatureType.STATE,
shape=(7,),
)
else: else:
raise ValueError(f"Unsupported obs_type: {self.obs_type}") raise ValueError(f"Unsupported obs_type: {self.obs_type}")
+39
View File
@@ -14,12 +14,16 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import importlib import importlib
from typing import Any
import gymnasium as gym import gymnasium as gym
from gymnasium.envs.registration import registry as gym_registry from gymnasium.envs.registration import registry as gym_registry
from lerobot.envs.configs import AlohaEnv, EnvConfig, LiberoEnv, PushtEnv from lerobot.envs.configs import AlohaEnv, EnvConfig, LiberoEnv, PushtEnv
from lerobot.envs.utils import _call_make_env, _download_hub_file, _import_hub_module, _normalize_hub_result from lerobot.envs.utils import _call_make_env, _download_hub_file, _import_hub_module, _normalize_hub_result
from lerobot.processor import ProcessorStep
from lerobot.processor.env_processor import LiberoProcessorStep
from lerobot.processor.pipeline import PolicyProcessorPipeline
def make_env_config(env_type: str, **kwargs) -> EnvConfig: def make_env_config(env_type: str, **kwargs) -> EnvConfig:
@@ -33,6 +37,41 @@ def make_env_config(env_type: str, **kwargs) -> EnvConfig:
raise ValueError(f"Policy type '{env_type}' is not available.") raise ValueError(f"Policy type '{env_type}' is not available.")
def make_env_pre_post_processors(
env_cfg: EnvConfig,
) -> tuple[
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
]:
"""
Create preprocessor and postprocessor pipelines for environment observations.
This function creates processor pipelines that transform raw environment
observations and actions. By default, it returns identity processors that do nothing.
For specific environments like LIBERO, it adds environment-specific processing steps.
Args:
env_cfg: The configuration of the environment.
Returns:
A tuple containing:
- preprocessor: Pipeline that processes environment observations
- postprocessor: Pipeline that processes environment outputs (currently identity)
"""
# Preprocessor and Postprocessor steps are Identity for most environments
preprocessor_steps: list[ProcessorStep] = []
postprocessor_steps: list[ProcessorStep] = []
# For LIBERO environments, add the LiberoProcessorStep to preprocessor
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
preprocessor_steps.append(LiberoProcessorStep())
preprocessor = PolicyProcessorPipeline(steps=preprocessor_steps)
postprocessor = PolicyProcessorPipeline(steps=postprocessor_steps)
return preprocessor, postprocessor
def make_env( def make_env(
cfg: EnvConfig | str, cfg: EnvConfig | str,
n_envs: int = 1, n_envs: int = 1,
+69 -21
View File
@@ -28,7 +28,6 @@ import torch
from gymnasium import spaces from gymnasium import spaces
from libero.libero import benchmark, get_libero_path from libero.libero import benchmark, get_libero_path
from libero.libero.envs import OffScreenRenderEnv from libero.libero.envs import OffScreenRenderEnv
from robosuite.utils.transform_utils import quat2axisangle
def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]: def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
@@ -175,11 +174,36 @@ class LiberoEnv(gym.Env):
self.observation_space = spaces.Dict( self.observation_space = spaces.Dict(
{ {
"pixels": spaces.Dict(images), "pixels": spaces.Dict(images),
"agent_pos": spaces.Box( "robot_state": spaces.Dict(
low=AGENT_POS_LOW, {
high=AGENT_POS_HIGH, "eef": spaces.Dict(
shape=(OBS_STATE_DIM,), {
dtype=np.float64, "pos": spaces.Box(low=-np.inf, high=np.inf, shape=(3,), dtype=np.float64),
"quat": spaces.Box(
low=-np.inf, high=np.inf, shape=(4,), dtype=np.float64
),
"mat": spaces.Box(
low=-np.inf, high=np.inf, shape=(3, 3), dtype=np.float64
),
}
),
"gripper": spaces.Dict(
{
"qpos": spaces.Box(
low=-np.inf, high=np.inf, shape=(2,), dtype=np.float64
),
"qvel": spaces.Box(
low=-np.inf, high=np.inf, shape=(2,), dtype=np.float64
),
}
),
"joints": spaces.Dict(
{
"pos": spaces.Box(low=-np.inf, high=np.inf, shape=(7,), dtype=np.float64),
"vel": spaces.Box(low=-np.inf, high=np.inf, shape=(7,), dtype=np.float64),
}
),
}
), ),
} }
) )
@@ -191,6 +215,7 @@ class LiberoEnv(gym.Env):
def render(self): def render(self):
raw_obs = self._env.env._get_observations() raw_obs = self._env.env._get_observations()
image = self._format_raw_obs(raw_obs)["pixels"]["image"] image = self._format_raw_obs(raw_obs)["pixels"]["image"]
image = image[::-1, ::-1] # flip both H and W for visualization
return image return image
def _make_envs_task(self, task_suite: Any, task_id: int = 0): def _make_envs_task(self, task_suite: Any, task_id: int = 0):
@@ -212,23 +237,48 @@ class LiberoEnv(gym.Env):
images = {} images = {}
for camera_name in self.camera_name: for camera_name in self.camera_name:
image = raw_obs[camera_name] image = raw_obs[camera_name]
image = image[::-1, ::-1] # rotate 180 degrees
images[self.camera_name_mapping[camera_name]] = image images[self.camera_name_mapping[camera_name]] = image
state = np.concatenate(
( eef_pos = raw_obs.get("robot0_eef_pos")
raw_obs["robot0_eef_pos"], eef_quat = raw_obs.get("robot0_eef_quat")
quat2axisangle(raw_obs["robot0_eef_quat"]),
raw_obs["robot0_gripper_qpos"], # rotation matrix from controller
) eef_mat = self._env.robots[0].controller.ee_ori_mat if eef_pos is not None else None
) gripper_qpos = raw_obs.get("robot0_gripper_qpos")
agent_pos = state gripper_qvel = raw_obs.get("robot0_gripper_qvel")
joint_pos = raw_obs.get("robot0_joint_pos")
joint_vel = raw_obs.get("robot0_joint_vel")
obs = {
"pixels": images,
"robot_state": {
"eef": {
"pos": eef_pos, # (3,)
"quat": eef_quat, # (4,)
"mat": eef_mat, # (3, 3)
},
"gripper": {
"qpos": gripper_qpos, # (2,)
"qvel": gripper_qvel, # (2,)
},
"joints": {
"pos": joint_pos, # (7,)
"vel": joint_vel, # (7,)
},
},
}
if self.obs_type == "pixels": if self.obs_type == "pixels":
return {"pixels": images.copy()} return {"pixels": images.copy()}
if self.obs_type == "pixels_agent_pos": if self.obs_type == "pixels_agent_pos":
return { # Validate required fields are present
"pixels": images.copy(), if eef_pos is None or eef_quat is None or gripper_qpos is None:
"agent_pos": agent_pos, raise ValueError(
} f"Missing required robot state fields in raw observation. "
f"Got eef_pos={eef_pos is not None}, eef_quat={eef_quat is not None}, "
f"gripper_qpos={gripper_qpos is not None}"
)
return obs
raise NotImplementedError( raise NotImplementedError(
f"The observation type '{self.obs_type}' is not supported in LiberoEnv. " f"The observation type '{self.obs_type}' is not supported in LiberoEnv. "
"Please switch to an image-based obs_type (e.g. 'pixels', 'pixels_agent_pos')." "Please switch to an image-based obs_type (e.g. 'pixels', 'pixels_agent_pos')."
@@ -355,12 +405,10 @@ def create_libero_envs(
print(f"Restricting to task_ids={task_ids_filter}") print(f"Restricting to task_ids={task_ids_filter}")
out: dict[str, dict[int, Any]] = defaultdict(dict) out: dict[str, dict[int, Any]] = defaultdict(dict)
for suite_name in suite_names: for suite_name in suite_names:
suite = _get_suite(suite_name) suite = _get_suite(suite_name)
total = len(suite.tasks) total = len(suite.tasks)
selected = _select_task_ids(total, task_ids_filter) selected = _select_task_ids(total, task_ids_filter)
if not selected: if not selected:
raise ValueError(f"No tasks selected for suite '{suite_name}' (available: {total}).") raise ValueError(f"No tasks selected for suite '{suite_name}' (available: {total}).")
+20 -6
View File
@@ -29,10 +29,22 @@ from torch import Tensor
from lerobot.configs.types import FeatureType, PolicyFeature from lerobot.configs.types import FeatureType, PolicyFeature
from lerobot.envs.configs import EnvConfig from lerobot.envs.configs import EnvConfig
from lerobot.utils.constants import OBS_ENV_STATE, OBS_IMAGE, OBS_IMAGES, OBS_STATE from lerobot.utils.constants import OBS_ENV_STATE, OBS_IMAGE, OBS_IMAGES, OBS_STATE, OBS_STR
from lerobot.utils.utils import get_channel_first_image_shape from lerobot.utils.utils import get_channel_first_image_shape
def _convert_nested_dict(d):
result = {}
for k, v in d.items():
if isinstance(v, dict):
result[k] = _convert_nested_dict(v)
elif isinstance(v, np.ndarray):
result[k] = torch.from_numpy(v)
else:
result[k] = v
return result
def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Tensor]: def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Tensor]:
# TODO(aliberts, rcadene): refactor this to use features from the environment (no hardcoding) # TODO(aliberts, rcadene): refactor this to use features from the environment (no hardcoding)
"""Convert environment observation to LeRobot format observation. """Convert environment observation to LeRobot format observation.
@@ -78,12 +90,14 @@ def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Ten
return_observations[OBS_ENV_STATE] = env_state return_observations[OBS_ENV_STATE] = env_state
# TODO(rcadene): enable pixels only baseline with `obs_type="pixels"` in environment by removing if "agent_pos" in observations:
agent_pos = torch.from_numpy(observations["agent_pos"]).float() agent_pos = torch.from_numpy(observations["agent_pos"]).float()
if agent_pos.dim() == 1: if agent_pos.dim() == 1:
agent_pos = agent_pos.unsqueeze(0) agent_pos = agent_pos.unsqueeze(0)
return_observations[OBS_STATE] = agent_pos return_observations[OBS_STATE] = agent_pos
if "robot_state" in observations:
return_observations[f"{OBS_STR}.robot_state"] = _convert_nested_dict(observations["robot_state"])
return return_observations return return_observations
+154
View File
@@ -0,0 +1,154 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
import torch
from lerobot.configs.types import PipelineFeatureType, PolicyFeature
from lerobot.utils.constants import OBS_IMAGES, OBS_STATE
from .pipeline import ObservationProcessorStep, ProcessorStepRegistry
@dataclass
@ProcessorStepRegistry.register(name="libero_processor")
class LiberoProcessorStep(ObservationProcessorStep):
"""
Processes LIBERO observations into the LeRobot format.
This step handles the specific observation structure from LIBERO environments,
which includes nested robot_state dictionaries and image observations.
**State Processing:**
- Processes the `robot_state` dictionary which contains nested end-effector,
gripper, and joint information.
- Extracts and concatenates:
- End-effector position (3D)
- End-effector quaternion converted to axis-angle (3D)
- Gripper joint positions (2D)
- Maps the concatenated state to `"observation.state"`.
**Image Processing:**
- Rotates images by 180 degrees by flipping both height and width dimensions.
- This accounts for the HuggingFaceVLA/libero camera orientation convention.
"""
def _process_observation(self, observation):
"""
Processes both image and robot_state observations from LIBERO.
"""
processed_obs = observation.copy()
for key in list(processed_obs.keys()):
if key.startswith(f"{OBS_IMAGES}."):
img = processed_obs[key]
# Flip both H and W
img = torch.flip(img, dims=[2, 3])
processed_obs[key] = img
# Process robot_state into a flat state vector
if "observation.robot_state" in processed_obs:
robot_state = processed_obs.pop("observation.robot_state")
# Extract components
eef_pos = robot_state["eef"]["pos"] # (B, 3,)
eef_quat = robot_state["eef"]["quat"] # (B, 4,)
gripper_qpos = robot_state["gripper"]["qpos"] # (B, 2,)
# Convert quaternion to axis-angle
eef_axisangle = self._quat2axisangle(eef_quat) # (B, 3)
# Concatenate into a single state vector
state = torch.cat((eef_pos, eef_axisangle, gripper_qpos), dim=-1)
# ensure float32
state = state.float()
if state.dim() == 1:
state = state.unsqueeze(0)
processed_obs[OBS_STATE] = state
return processed_obs
def transform_features(
self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
"""
Transforms feature keys from the LIBERO format to the LeRobot standard.
"""
new_features: dict[PipelineFeatureType, dict[str, PolicyFeature]] = {}
# copy over non-STATE features
for ft, feats in features.items():
if ft != PipelineFeatureType.STATE:
new_features[ft] = feats.copy()
# rebuild STATE features
state_feats = {}
# add our new flattened state
state_feats["observation.state"] = PolicyFeature(
key="observation.state",
shape=(8,), # [eef_pos(3), axis_angle(3), gripper(2)]
dtype="float32",
description=("Concatenated end-effector position (3), axis-angle (3), and gripper qpos (2)."),
)
new_features[PipelineFeatureType.STATE] = state_feats
return new_features
def observation(self, observation):
return self._process_observation(observation)
def _quat2axisangle(self, quat: torch.Tensor) -> torch.Tensor:
"""
Convert batched quaternions to axis-angle format.
Only accepts torch tensors of shape (B, 4).
Args:
quat (Tensor): (B, 4) tensor of quaternions in (x, y, z, w) format
Returns:
Tensor: (B, 3) axis-angle vectors
Raises:
TypeError: if input is not a torch tensor
ValueError: if shape is not (B, 4)
"""
if not isinstance(quat, torch.Tensor):
raise TypeError(f"_quat2axisangle expected a torch.Tensor, got {type(quat)}")
if quat.ndim != 2 or quat.shape[1] != 4:
raise ValueError(f"_quat2axisangle expected shape (B, 4), got {tuple(quat.shape)}")
quat = quat.to(dtype=torch.float32)
device = quat.device
batch_size = quat.shape[0]
w = quat[:, 3].clamp(-1.0, 1.0)
den = torch.sqrt(torch.clamp(1.0 - w * w, min=0.0))
result = torch.zeros((batch_size, 3), device=device)
mask = den > 1e-10
if mask.any():
angle = 2.0 * torch.acos(w[mask]) # (M,)
axis = quat[mask, :3] / den[mask].unsqueeze(1)
result[mask] = axis * angle.unsqueeze(1)
return result
+2 -2
View File
@@ -78,7 +78,7 @@ from lerobot.transport.utils import (
transitions_to_bytes, transitions_to_bytes,
) )
from lerobot.utils.random_utils import set_seed from lerobot.utils.random_utils import set_seed
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.transition import ( from lerobot.utils.transition import (
Transition, Transition,
move_state_dict_to_device, move_state_dict_to_device,
@@ -398,7 +398,7 @@ def act_with_policy(
if cfg.env.fps is not None: if cfg.env.fps is not None:
dt_time = time.perf_counter() - start_time dt_time = time.perf_counter() - start_time
busy_wait(1 / cfg.env.fps - dt_time) precise_sleep(1 / cfg.env.fps - dt_time)
# Communication Functions - Group all gRPC/messaging functions # Communication Functions - Group all gRPC/messaging functions
+5 -5
View File
@@ -74,7 +74,7 @@ from lerobot.teleoperators import (
from lerobot.teleoperators.teleoperator import Teleoperator from lerobot.teleoperators.teleoperator import Teleoperator
from lerobot.teleoperators.utils import TeleopEvents from lerobot.teleoperators.utils import TeleopEvents
from lerobot.utils.constants import ACTION, DONE, OBS_IMAGES, OBS_STATE, REWARD from lerobot.utils.constants import ACTION, DONE, OBS_IMAGES, OBS_STATE, REWARD
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say from lerobot.utils.utils import log_say
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
@@ -114,7 +114,7 @@ def reset_follower_position(robot_arm: Robot, target_position: np.ndarray) -> No
for pose in trajectory: for pose in trajectory:
action_dict = dict(zip(current_position_dict, pose, strict=False)) action_dict = dict(zip(current_position_dict, pose, strict=False))
robot_arm.bus.sync_write("Goal_Position", action_dict) robot_arm.bus.sync_write("Goal_Position", action_dict)
busy_wait(0.015) precise_sleep(0.015)
class RobotEnv(gym.Env): class RobotEnv(gym.Env):
@@ -238,7 +238,7 @@ class RobotEnv(gym.Env):
reset_follower_position(self.robot, np.array(self.reset_pose)) reset_follower_position(self.robot, np.array(self.reset_pose))
log_say("Reset the environment done.", play_sounds=True) log_say("Reset the environment done.", play_sounds=True)
busy_wait(self.reset_time_s - (time.perf_counter() - start_time)) precise_sleep(self.reset_time_s - (time.perf_counter() - start_time))
super().reset(seed=seed, options=options) super().reset(seed=seed, options=options)
@@ -713,7 +713,7 @@ def control_loop(
transition = env_processor(transition) transition = env_processor(transition)
# Maintain fps timing # Maintain fps timing
busy_wait(dt - (time.perf_counter() - step_start_time)) precise_sleep(dt - (time.perf_counter() - step_start_time))
if dataset is not None and cfg.dataset.push_to_hub: if dataset is not None and cfg.dataset.push_to_hub:
logging.info("Pushing dataset to hub") logging.info("Pushing dataset to hub")
@@ -745,7 +745,7 @@ def replay_trajectory(
) )
transition = action_processor(transition) transition = action_processor(transition)
env.step(transition[TransitionKey.ACTION]) env.step(transition[TransitionKey.ACTION])
busy_wait(1 / cfg.env.fps - (time.perf_counter() - start_time)) precise_sleep(1 / cfg.env.fps - (time.perf_counter() - start_time))
@parser.wrap() @parser.wrap()
+33 -1
View File
@@ -71,7 +71,7 @@ from tqdm import trange
from lerobot.configs import parser from lerobot.configs import parser
from lerobot.configs.eval import EvalPipelineConfig from lerobot.configs.eval import EvalPipelineConfig
from lerobot.envs.factory import make_env from lerobot.envs.factory import make_env, make_env_pre_post_processors
from lerobot.envs.utils import ( from lerobot.envs.utils import (
add_envs_task, add_envs_task,
check_env_attributes_and_types, check_env_attributes_and_types,
@@ -94,6 +94,8 @@ from lerobot.utils.utils import (
def rollout( def rollout(
env: gym.vector.VectorEnv, env: gym.vector.VectorEnv,
policy: PreTrainedPolicy, policy: PreTrainedPolicy,
env_preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
env_postprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]], preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction], postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction],
seeds: list[int] | None = None, seeds: list[int] | None = None,
@@ -165,11 +167,19 @@ def rollout(
# Infer "task" from attributes of environments. # Infer "task" from attributes of environments.
# TODO: works with SyncVectorEnv but not AsyncVectorEnv # TODO: works with SyncVectorEnv but not AsyncVectorEnv
observation = add_envs_task(env, observation) observation = add_envs_task(env, observation)
# Apply environment-specific preprocessing (e.g., LiberoProcessorStep for LIBERO)
observation = env_preprocessor(observation)
observation = preprocessor(observation) observation = preprocessor(observation)
with torch.inference_mode(): with torch.inference_mode():
action = policy.select_action(observation) action = policy.select_action(observation)
action = postprocessor(action) action = postprocessor(action)
action_transition = {"action": action}
action_transition = env_postprocessor(action_transition)
action = action_transition["action"]
# Convert to CPU / numpy. # Convert to CPU / numpy.
action_numpy: np.ndarray = action.to("cpu").numpy() action_numpy: np.ndarray = action.to("cpu").numpy()
assert action_numpy.ndim == 2, "Action dimensions should be (batch, action_dim)" assert action_numpy.ndim == 2, "Action dimensions should be (batch, action_dim)"
@@ -239,6 +249,8 @@ def rollout(
def eval_policy( def eval_policy(
env: gym.vector.VectorEnv, env: gym.vector.VectorEnv,
policy: PreTrainedPolicy, policy: PreTrainedPolicy,
env_preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
env_postprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]], preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction], postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction],
n_episodes: int, n_episodes: int,
@@ -319,6 +331,8 @@ def eval_policy(
rollout_data = rollout( rollout_data = rollout(
env=env, env=env,
policy=policy, policy=policy,
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
seeds=list(seeds) if seeds else None, seeds=list(seeds) if seeds else None,
@@ -517,10 +531,16 @@ def eval_main(cfg: EvalPipelineConfig):
pretrained_path=cfg.policy.pretrained_path, pretrained_path=cfg.policy.pretrained_path,
preprocessor_overrides=preprocessor_overrides, preprocessor_overrides=preprocessor_overrides,
) )
# Create environment-specific preprocessor and postprocessor (e.g., for LIBERO environments)
env_preprocessor, env_postprocessor = make_env_pre_post_processors(env_cfg=cfg.env)
with torch.no_grad(), torch.autocast(device_type=device.type) if cfg.policy.use_amp else nullcontext(): with torch.no_grad(), torch.autocast(device_type=device.type) if cfg.policy.use_amp else nullcontext():
info = eval_policy_all( info = eval_policy_all(
envs=envs, envs=envs,
policy=policy, policy=policy,
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
n_episodes=cfg.eval.n_episodes, n_episodes=cfg.eval.n_episodes,
@@ -561,6 +581,8 @@ def eval_one(
env: gym.vector.VectorEnv, env: gym.vector.VectorEnv,
*, *,
policy: PreTrainedPolicy, policy: PreTrainedPolicy,
env_preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
env_postprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]], preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction], postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction],
n_episodes: int, n_episodes: int,
@@ -576,6 +598,8 @@ def eval_one(
task_result = eval_policy( task_result = eval_policy(
env=env, env=env,
policy=policy, policy=policy,
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
n_episodes=n_episodes, n_episodes=n_episodes,
@@ -600,6 +624,8 @@ def run_one(
env, env,
*, *,
policy, policy,
env_preprocessor,
env_postprocessor,
preprocessor, preprocessor,
postprocessor, postprocessor,
n_episodes: int, n_episodes: int,
@@ -622,6 +648,8 @@ def run_one(
metrics = eval_one( metrics = eval_one(
env, env,
policy=policy, policy=policy,
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
n_episodes=n_episodes, n_episodes=n_episodes,
@@ -639,6 +667,8 @@ def run_one(
def eval_policy_all( def eval_policy_all(
envs: dict[str, dict[int, gym.vector.VectorEnv]], envs: dict[str, dict[int, gym.vector.VectorEnv]],
policy, policy,
env_preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
env_postprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]], preprocessor: PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction], postprocessor: PolicyProcessorPipeline[PolicyAction, PolicyAction],
n_episodes: int, n_episodes: int,
@@ -694,6 +724,8 @@ def eval_policy_all(
task_runner = partial( task_runner = partial(
run_one, run_one,
policy=policy, policy=policy,
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
n_episodes=n_episodes, n_episodes=n_episodes,
@@ -50,7 +50,7 @@ from lerobot.teleoperators import ( # noqa: F401
make_teleoperator_from_config, make_teleoperator_from_config,
so100_leader, so100_leader,
) )
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
@dataclass @dataclass
@@ -114,7 +114,7 @@ def find_joint_and_ee_bounds(cfg: FindJointLimitsConfig):
print(f"Min joint pos position {np.round(min_pos, 4).tolist()}") print(f"Min joint pos position {np.round(min_pos, 4).tolist()}")
break break
busy_wait(0.01) precise_sleep(0.01)
def main(): def main():
+2 -2
View File
@@ -119,7 +119,7 @@ from lerobot.utils.control_utils import (
sanity_check_dataset_robot_compatibility, sanity_check_dataset_robot_compatibility,
) )
from lerobot.utils.import_utils import register_third_party_devices from lerobot.utils.import_utils import register_third_party_devices
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import ( from lerobot.utils.utils import (
get_safe_torch_device, get_safe_torch_device,
init_logging, init_logging,
@@ -364,7 +364,7 @@ def record_loop(
log_rerun_data(observation=obs_processed, action=action_values) log_rerun_data(observation=obs_processed, action=action_values)
dt_s = time.perf_counter() - start_loop_t dt_s = time.perf_counter() - start_loop_t
busy_wait(1 / fps - dt_s) precise_sleep(1 / fps - dt_s)
timestamp = time.perf_counter() - start_episode_t timestamp = time.perf_counter() - start_episode_t
+2 -2
View File
@@ -62,7 +62,7 @@ from lerobot.robots import ( # noqa: F401
) )
from lerobot.utils.constants import ACTION from lerobot.utils.constants import ACTION
from lerobot.utils.import_utils import register_third_party_devices from lerobot.utils.import_utils import register_third_party_devices
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import ( from lerobot.utils.utils import (
init_logging, init_logging,
log_say, log_say,
@@ -121,7 +121,7 @@ def replay(cfg: ReplayConfig):
_ = robot.send_action(processed_action) _ = robot.send_action(processed_action)
dt_s = time.perf_counter() - start_episode_t dt_s = time.perf_counter() - start_episode_t
busy_wait(1 / dataset.fps - dt_s) precise_sleep(1 / dataset.fps - dt_s)
robot.disconnect() robot.disconnect()
+5 -4
View File
@@ -89,7 +89,7 @@ from lerobot.teleoperators import ( # noqa: F401
so101_leader, so101_leader,
) )
from lerobot.utils.import_utils import register_third_party_devices from lerobot.utils.import_utils import register_third_party_devices
from lerobot.utils.robot_utils import busy_wait from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import init_logging, move_cursor_up from lerobot.utils.utils import init_logging, move_cursor_up
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
@@ -170,12 +170,13 @@ def teleop_loop(
# Display the final robot action that was sent # Display the final robot action that was sent
for motor, value in robot_action_to_send.items(): for motor, value in robot_action_to_send.items():
print(f"{motor:<{display_len}} | {value:>7.2f}") print(f"{motor:<{display_len}} | {value:>7.2f}")
move_cursor_up(len(robot_action_to_send) + 5) move_cursor_up(len(robot_action_to_send) + 3)
dt_s = time.perf_counter() - loop_start dt_s = time.perf_counter() - loop_start
busy_wait(1 / fps - dt_s) precise_sleep(1 / fps - dt_s)
loop_s = time.perf_counter() - loop_start loop_s = time.perf_counter() - loop_start
print(f"\ntime: {loop_s * 1e3:.2f}ms ({1 / loop_s:.0f} Hz)") print(f"Teleop loop time: {loop_s * 1e3:.2f}ms ({1 / loop_s:.0f} Hz)")
move_cursor_up(1)
if duration is not None and time.perf_counter() - start >= duration: if duration is not None and time.perf_counter() - start >= duration:
return return
+6 -1
View File
@@ -29,7 +29,7 @@ from lerobot.configs.train import TrainPipelineConfig
from lerobot.datasets.factory import make_dataset from lerobot.datasets.factory import make_dataset
from lerobot.datasets.sampler import EpisodeAwareSampler from lerobot.datasets.sampler import EpisodeAwareSampler
from lerobot.datasets.utils import cycle from lerobot.datasets.utils import cycle
from lerobot.envs.factory import make_env from lerobot.envs.factory import make_env, make_env_pre_post_processors
from lerobot.envs.utils import close_envs from lerobot.envs.utils import close_envs
from lerobot.optim.factory import make_optimizer_and_scheduler from lerobot.optim.factory import make_optimizer_and_scheduler
from lerobot.policies.factory import make_policy, make_pre_post_processors from lerobot.policies.factory import make_policy, make_pre_post_processors
@@ -259,6 +259,8 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
logging.info(colored("Output dir:", "yellow", attrs=["bold"]) + f" {cfg.output_dir}") logging.info(colored("Output dir:", "yellow", attrs=["bold"]) + f" {cfg.output_dir}")
if cfg.env is not None: if cfg.env is not None:
logging.info(f"{cfg.env.task=}") logging.info(f"{cfg.env.task=}")
logging.info("Creating environment processors")
env_preprocessor, env_postprocessor = make_env_pre_post_processors(env_cfg=cfg.env)
logging.info(f"{cfg.steps=} ({format_big_number(cfg.steps)})") logging.info(f"{cfg.steps=} ({format_big_number(cfg.steps)})")
logging.info(f"{dataset.num_frames=} ({format_big_number(dataset.num_frames)})") logging.info(f"{dataset.num_frames=} ({format_big_number(dataset.num_frames)})")
logging.info(f"{dataset.num_episodes=}") logging.info(f"{dataset.num_episodes=}")
@@ -274,6 +276,7 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
sampler = EpisodeAwareSampler( sampler = EpisodeAwareSampler(
dataset.meta.episodes["dataset_from_index"], dataset.meta.episodes["dataset_from_index"],
dataset.meta.episodes["dataset_to_index"], dataset.meta.episodes["dataset_to_index"],
episode_indices_to_use=dataset.episodes,
drop_n_last_frames=cfg.policy.drop_n_last_frames, drop_n_last_frames=cfg.policy.drop_n_last_frames,
shuffle=True, shuffle=True,
) )
@@ -384,6 +387,8 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
eval_info = eval_policy_all( eval_info = eval_policy_all(
envs=eval_env, # dict[suite][task_id] -> vec_env envs=eval_env, # dict[suite][task_id] -> vec_env
policy=accelerator.unwrap_model(policy), policy=accelerator.unwrap_model(policy),
env_preprocessor=env_preprocessor,
env_postprocessor=env_postprocessor,
preprocessor=preprocessor, preprocessor=preprocessor,
postprocessor=postprocessor, postprocessor=postprocessor,
n_episodes=cfg.eval.n_episodes, n_episodes=cfg.eval.n_episodes,
+12
View File
@@ -70,3 +70,15 @@ LOOKAHEAD_BACKTRACKTABLE = 100
# openpi # openpi
OPENPI_ATTENTION_MASK_VALUE = -2.3819763e38 # TODO(pepijn): Modify this when extending support to fp8 models OPENPI_ATTENTION_MASK_VALUE = -2.3819763e38 # TODO(pepijn): Modify this when extending support to fp8 models
# Constants for LIBERO observation keys
LIBERO_KEY_EEF_POS = "robot_state/eef/pos"
LIBERO_KEY_EEF_QUAT = "robot_state/eef/quat"
LIBERO_KEY_EEF_MAT = "robot_state/eef/mat"
LIBERO_KEY_EEF_AXISANGLE = "robot_state/eef/axisangle"
LIBERO_KEY_GRIPPER_QPOS = "robot_state/gripper/qpos"
LIBERO_KEY_GRIPPER_QVEL = "robot_state/gripper/qvel"
LIBERO_KEY_JOINTS_POS = "robot_state/joints/pos"
LIBERO_KEY_JOINTS_VEL = "robot_state/joints/vel"
LIBERO_KEY_PIXELS_AGENTVIEW = "pixels/agentview_image"
LIBERO_KEY_PIXELS_EYE_IN_HAND = "pixels/robot0_eye_in_hand_image"
+35 -9
View File
@@ -16,14 +16,40 @@ import platform
import time import time
def busy_wait(seconds): def precise_sleep(seconds: float, spin_threshold: float = 0.010, sleep_margin: float = 0.003):
if platform.system() == "Darwin" or platform.system() == "Windows": """
# On Mac and Windows, `time.sleep` is not accurate and we need to use this while loop trick, Wait for `seconds` with better precision than time.sleep alone at the expense of more CPU usage.
# but it consumes CPU cycles.
Parameters:
- seconds: duration to wait
- spin_threshold: if remaining <= spin_threshold -> spin; otherwise sleep (seconds). Default 10ms
- sleep_margin: when sleeping leave this much time before deadline to avoid oversleep. Default 3ms
Note:
The default parameters are chosen to prioritize timing accuracy over CPU usage for the common 30 FPS use case.
"""
if seconds <= 0:
return
system = platform.system()
# On macOS and Windows the scheduler / sleep granularity can make
# short sleeps inaccurate. Instead of burning CPU for the whole
# duration, sleep for most of the time and spin for the final few
# milliseconds to achieve good accuracy with much lower CPU usage.
if system in ("Darwin", "Windows"):
end_time = time.perf_counter() + seconds end_time = time.perf_counter() + seconds
while time.perf_counter() < end_time: while True:
pass remaining = end_time - time.perf_counter()
if remaining <= 0:
break
# If there's more than a couple milliseconds left, sleep most
# of the remaining time and leave a small margin for the final spin.
if remaining > spin_threshold:
# Sleep but avoid sleeping past the end by leaving a small margin.
time.sleep(max(remaining - sleep_margin, 0))
else:
# Final short spin to hit precise timing without long sleeps.
pass
else: else:
# On Linux time.sleep is accurate # On Linux time.sleep is accurate enough for most uses
if seconds > 0: time.sleep(seconds)
time.sleep(seconds)
+72
View File
@@ -0,0 +1,72 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import torch
from lerobot.envs.utils import preprocess_observation
from lerobot.processor.env_processor import LiberoProcessorStep
from lerobot.processor.pipeline import PolicyProcessorPipeline
seed = 42
np.random.seed(seed)
B = 5
obs1 = {
"pixels": {
"image": (np.random.rand(B, 256, 256, 3) * 255).astype(np.uint8),
"image2": (np.random.rand(B, 256, 256, 3) * 255).astype(np.uint8),
},
"robot_state": {
"eef": {
"pos": np.random.randn(B, 3),
"quat": np.random.randn(B, 4),
"mat": np.random.randn(B, 3, 3),
},
"gripper": {
"qpos": np.random.randn(B, 2),
"qvel": np.random.randn(B, 2),
},
"joints": {
"pos": np.random.randn(B, 7),
"vel": np.random.randn(B, 7),
},
},
}
observation = preprocess_observation(obs1)
libero_preprocessor = PolicyProcessorPipeline(
steps=[
LiberoProcessorStep(),
]
)
processed_obs = libero_preprocessor(observation)
assert "observation.state" in processed_obs
state = processed_obs["observation.state"]
assert isinstance(state, torch.Tensor)
assert state.dtype == torch.float32
assert state.shape[0] == B
assert state.shape[1] == 8
assert "observation.images.image" in processed_obs
assert "observation.images.image2" in processed_obs
assert isinstance(processed_obs["observation.images.image"], torch.Tensor)
assert isinstance(processed_obs["observation.images.image2"], torch.Tensor)
assert processed_obs["observation.images.image"].shape == (B, 3, 256, 256)
assert processed_obs["observation.images.image2"].shape == (B, 3, 256, 256)