diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 87fcacf42..412386e2d 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -3,6 +3,8 @@ title: LeRobot - local: installation title: Installation + - local: cheat-sheet + title: Cheat sheet title: Get started - sections: - local: il_robots @@ -37,8 +39,12 @@ title: Porting Large Datasets - local: using_dataset_tools title: Using the Dataset Tools - - local: dataset_subtask - title: Using Subtasks in the Dataset + - local: language_and_recipes + title: Language Columns and Recipes + - local: tools + title: Tools + - local: video_encoding_parameters + title: Video encoding parameters - local: streaming_video_encoding title: Streaming Video Encoding title: "Datasets" @@ -139,6 +145,8 @@ title: OMX - local: openarm title: OpenArm + - local: rebot_b601 + title: reBot B601-DM title: "Robots" - sections: - local: phone_teleop diff --git a/docs/source/cheat-sheet.mdx b/docs/source/cheat-sheet.mdx new file mode 100644 index 000000000..a6afa14c2 --- /dev/null +++ b/docs/source/cheat-sheet.mdx @@ -0,0 +1,139 @@ +# Cheat sheet + +All of the LeRobot commands in one place. If you forgot how to use a specific command or want to learn about a new one you can do it here. + +> [!WARNING] +> For all of the commands listed below remember to change the ports/names/ids to your own values! + +> [!TIP] +> Another great way to look at all the commands and get them configured for your specific setup is to use this [Jupyter Notebook](https://github.com/huggingface/lerobot/blob/main/examples/notebooks/quickstart.ipynb). + +### Setup and installation + +For installation please look at [LeRobot Installation](https://huggingface.co/docs/lerobot/main/en/installation). + +### Useful tools + +###### Find port + +Use this to identify which serial ports your robots are connected to. Follow the instructions in your terminal: you will be asked to unplug the USB cable and press Enter. The script will then detect and print the correct serial port for that robot. + +```bash +lerobot-find-port +``` + +###### Find cameras + +Quickly find camera indices and verify their output. This command prints camera information to the terminal and saves test frames from each detected camera to `lerobot/outputs/captured_images` + +```bash +lerobot-find-cameras +``` + +### Calibration + +In most cases you will need to perform calibration just once for each robot and teleoperation device. Before performing the calibration make sure that all the joints are roughly in the middle position. + +```bash +lerobot-calibrate \ + --robot.type=so101_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.id=my_follower_arm +``` + +Make sure that you use the same IDs used during calibration later for the other scripts. That's how LeRobot finds the calibration files. + +### Teleoperation + +Teleoperating with two cameras and displaying the data with Rerun. + +```bash +lerobot-teleoperate \ + --robot.type=so101_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.id=my_follower_arm \ + --robot.cameras="{ top: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, wrist: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30} }" \ + --teleop.type=so101_leader \ + --teleop.port=/dev/ttyACM1 \ + --teleop.id=my_leader_arm \ + --display_data=true +``` + +### Recording a dataset + +The dataset is automatically uploaded to the server and saved under repo_id, make sure you are logged in to your HF account with CLI: +`hf auth login` + +You can get the token from: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) + +```bash +lerobot-record \ + --robot.type=so101_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.id=my_follower_arm \ + --robot.cameras="{ top: {type: opencv, index_or_path: 1, width: 640, height: 480, fps: 30}, wrist: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30} }" \ + --teleop.type=so101_leader \ + --teleop.port=/dev/ttyACM1 \ + --teleop.id=my_leader_arm \ + --dataset.repo_id=${HF_USER}/so101_dataset_test \ + --dataset.num_episodes=30 \ + --dataset.single_task="put the red brick in a bowl" \ + --dataset.streaming_encoding=true \ + --display_data=true +``` + +While collecting the dataset you can control the process with your keyboard: +Control the data recording flow using keyboard shortcuts: + +- Press **Right Arrow (`→`)**: Save episode and move to the next. +- Press **Left Arrow (`←`)**: Delete current episode and retry. +- Press **Escape (`ESC`)**: Stop, encode videos, and upload. + +### Training + +Depending on your hardware training the policy might take a few hours. That's how you train simple `ACT` policy: + +```bash +lerobot-train \ + --dataset.repo_id=${HF_USER}/so101_dataset_test \ + --policy.type=act \ + --output_dir=outputs/train/act_so101_test \ + --job_name=act_so101_test \ + --policy.device=cuda \ + --wandb.enable=true \ + --policy.repo_id=${HF_USER}/policy_test \ + --steps=20000 +``` + +- Policy Types: `act`, `diffusion`, `smolvla`, `pi05` +- Devices: `cuda` (NVIDIA), `mps` (Apple Silicon), `cpu` + +If you want to fine-tune a specific model you can provide the path to the model. In this case path is enough and type can be skipped. + +```bash +lerobot-train \ + --dataset.repo_id=${HF_USER}/so101_dataset_test \ + --policy.path=username/the_policy_to_finetune \ + --policy.device=cuda \ + --policy.repo_id=${HF_USER}/policy_test \ + --output_dir=outputs/train/act_so101_test \ + --steps=20000 +``` + +### Inference + +Inference means running the trained policy/model on a robot. For that we use `lerobot-rollout`. You will need to provide a path to your policy. It can be a local path or a path to Hugging Face for example "lerobot/folding_latest". Your cameras configuration needs to match what was used when collecting the dataset. Duration is in seconds if unspecified, it will run forever. + +> [!TIP] +> If you are using the previous release V0.5.1 instead of `lerobot-rollout` you need to use `lerobot-record`. More information [here](https://huggingface.co/docs/lerobot/v0.5.1/en/il_robots#run-inference-and-evaluate-your-policy). + +```bash +lerobot-rollout \ + --strategy.type=base \ + --policy.path=${HF_USER}/my_policy \ + --robot.type=so101_follower \ + --robot.port=/dev/ttyACM1 \ + --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video1, width: 640, height: 480, fps: 30}, side: {type: opencv, index_or_path: /dev/video5, width: 640, height: 480, fps: 30}}" \ + --task="Put lego brick into the transparent box" \ + --duration=60 +``` diff --git a/docs/source/dataset_subtask.mdx b/docs/source/dataset_subtask.mdx deleted file mode 100644 index 6264aca22..000000000 --- a/docs/source/dataset_subtask.mdx +++ /dev/null @@ -1,277 +0,0 @@ -# Using Subtasks in LeRobot Datasets - -Subtask support in robotics datasets has proven effective in improving robot reasoning and understanding. Subtasks are particularly useful for: - -- **Hierarchical policies**: Building policies that include subtask predictions to visualize robot reasoning in real time -- **Reward modeling**: Helping reward models understand task progression (e.g., SARM-style stage-aware reward models) -- **Task decomposition**: Breaking down complex manipulation tasks into atomic, interpretable steps - -LeRobotDataset now supports subtasks as part of its dataset structure, alongside tasks. - -## What are Subtasks? - -While a **task** describes the overall goal (e.g., "Pick up the apple and place it in the basket"), **subtasks** break down the execution into finer-grained steps: - -1. "Approach the apple" -2. "Grasp the apple" -3. "Lift the apple" -4. "Move to basket" -5. "Release the apple" - -Each frame in the dataset can be annotated with its corresponding subtask, enabling models to learn and predict these intermediate stages. - -An overview of subtask annotation showing how frames are labeled with intermediate subtask stages - -

- Figure: Overview of subtask annotation. -

- -**Reference:** _Subtask-learning based for robot self-assembly in flexible collaborative assembly in manufacturing_, Original Article, Published: 19 April 2022. - -## Dataset Structure - -Subtask information is stored in the dataset metadata: - -``` -my-dataset/ -├── data/ -│ └── ... -├── meta/ -│ ├── info.json -│ ├── stats.json -│ ├── tasks.parquet -│ ├── subtasks.parquet # Subtask index → subtask string mapping -│ └── episodes/ -│ └── ... -└── videos/ - └── ... -``` - -### Subtasks Parquet File - -The `meta/subtasks.parquet` file maps subtask indices to their natural language descriptions: - -| subtask_index | subtask (index column) | -| ------------- | ---------------------- | -| 0 | "Approach the apple" | -| 1 | "Grasp the apple" | -| 2 | "Lift the apple" | -| ... | ... | - -### Frame-Level Annotations - -Each frame in the dataset can include a `subtask_index` field that references the subtasks parquet file: - -```python -# Example frame data in the parquet file -{ - "index": 42, - "timestamp": 1.4, - "episode_index": 0, - "task_index": 0, - "subtask_index": 2, # References "Lift the apple" - "observation.state": [...], - "action": [...], -} -``` - -## Annotating Datasets with Subtasks - -We provide a HuggingFace Space for easily annotating any LeRobotDataset with subtasks: - -**[https://huggingface.co/spaces/lerobot/annotate](https://huggingface.co/spaces/lerobot/annotate)** - -After completing your annotation: - -1. Click "Push to Hub" to upload your annotated dataset -2. You can also run the annotation space locally by following the instructions at [github.com/huggingface/lerobot-annotate](https://github.com/huggingface/lerobot-annotate) - -## Loading Datasets with Subtasks - -When you load a dataset with subtask annotations, the subtask information is automatically available: - -```python -from lerobot.datasets import LeRobotDataset - -# Load a dataset with subtask annotations -dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated") - -# Access a sample -sample = dataset[100] - -# The sample includes both task and subtask information -print(sample["task"]) # "Collect the fruit" -print(sample["subtask"]) # "Grasp the apple" -print(sample["task_index"]) # tensor(0) -print(sample["subtask_index"]) # tensor(2) -``` - -### Checking for Subtask Support - -You can check if a dataset has subtask annotations: - -```python -# Check if subtasks are available -has_subtasks = ( - "subtask_index" in dataset.features - and dataset.meta.subtasks is not None -) - -if has_subtasks: - print(f"Dataset has {len(dataset.meta.subtasks)} unique subtasks") - print("Subtasks:", list(dataset.meta.subtasks.index)) -``` - -## Using Subtasks for Training - -### With the Tokenizer Processor - -The `TokenizerProcessor` automatically handles subtask tokenization for Vision-Language Action (VLA) models: - -```python -from lerobot.processor import TokenizerProcessorStep - -# Create a tokenizer processor step -tokenizer_processor = TokenizerProcessorStep( - tokenizer_name_or_path="google/paligemma-3b-pt-224", - padding="max_length", - max_length=64, -) - -# The processor will automatically tokenize subtasks if present in the batch -# and add them to the observation under: -# - "observation.subtask.tokens" -# - "observation.subtask.attention_mask" -``` - -When subtasks are available in the batch, the tokenizer processor adds: - -- `observation.subtask.tokens`: Tokenized subtask text -- `observation.subtask.attention_mask`: Attention mask for the subtask tokens - -### DataLoader with Subtasks - -```python -import torch -from lerobot.datasets import LeRobotDataset - -dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated") - -dataloader = torch.utils.data.DataLoader( - dataset, - batch_size=16, - shuffle=True, -) - -for batch in dataloader: - # Access subtask information in the batch - subtasks = batch["subtask"] # List of subtask strings - subtask_indices = batch["subtask_index"] # Tensor of subtask indices - - # Use for training hierarchical policies or reward models - print(f"Batch subtasks: {set(subtasks)}") -``` - -## Example Datasets with Subtask Annotations - -Try loading a dataset with subtask annotations: - -```python -from lerobot.datasets import LeRobotDataset - -# Example dataset with subtask annotations -dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated") - -# Explore the subtasks -print("Available subtasks:") -for subtask_name in dataset.meta.subtasks.index: - print(f" - {subtask_name}") - -# Get subtask distribution -subtask_counts = {} -for i in range(len(dataset)): - sample = dataset[i] - subtask = sample["subtask"] - subtask_counts[subtask] = subtask_counts.get(subtask, 0) + 1 - -print("\nSubtask distribution:") -for subtask, count in sorted(subtask_counts.items(), key=lambda x: -x[1]): - print(f" {subtask}: {count} frames") -``` - -## Use Cases - -### 1. Hierarchical Policy Training - -Train policies that predict both actions and current subtask: - -```python -class HierarchicalPolicy(nn.Module): - def __init__(self, num_subtasks): - super().__init__() - self.action_head = nn.Linear(hidden_dim, action_dim) - self.subtask_head = nn.Linear(hidden_dim, num_subtasks) - - def forward(self, observations): - features = self.encoder(observations) - actions = self.action_head(features) - subtask_logits = self.subtask_head(features) - return actions, subtask_logits -``` - -### 2. Stage-Aware Reward Modeling (SARM) - -Build reward models that understand task progression: - -```python -# SARM predicts: -# - Stage: Which subtask is being executed (discrete) -# - Progress: How far along the subtask (continuous 0-1) - -class SARMRewardModel(nn.Module): - def forward(self, observations): - features = self.encoder(observations) - stage_logits = self.stage_classifier(features) - progress = self.progress_regressor(features) - return stage_logits, progress -``` - -### 3. Progress Visualization - -Monitor robot execution by tracking subtask progression: - -```python -def visualize_execution(model, observations): - for t, obs in enumerate(observations): - action, subtask_logits = model(obs) - predicted_subtask = subtask_names[subtask_logits.argmax()] - print(f"t={t}: Executing '{predicted_subtask}'") -``` - -## API Reference - -### LeRobotDataset Properties - -| Property | Type | Description | -| --------------------------- | ---------------------- | ------------------------------------------ | -| `meta.subtasks` | `pd.DataFrame \| None` | DataFrame mapping subtask names to indices | -| `features["subtask_index"]` | `dict` | Feature spec for subtask_index if present | - -### Sample Keys - -When subtasks are available, each sample includes: - -| Key | Type | Description | -| --------------- | -------------- | ------------------------------------ | -| `subtask_index` | `torch.Tensor` | Integer index of the current subtask | -| `subtask` | `str` | Natural language subtask description | - -## Related Resources - -- [SARM Paper](https://arxiv.org/pdf/2509.25358) - Stage-Aware Reward Modeling for Long Horizon Robot Manipulation -- [LeRobot Annotate Space](https://huggingface.co/spaces/lerobot/annotate) - Interactive annotation tool -- [LeRobotDataset v3.0](./lerobot-dataset-v3) - Dataset format documentation diff --git a/docs/source/earthrover_mini_plus.mdx b/docs/source/earthrover_mini_plus.mdx index a87bd325b..508c0e3a9 100644 --- a/docs/source/earthrover_mini_plus.mdx +++ b/docs/source/earthrover_mini_plus.mdx @@ -194,7 +194,7 @@ lerobot-record \ --dataset.single_task="Navigate around obstacles" \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --display_data=true ``` diff --git a/docs/source/groot.mdx b/docs/source/groot.mdx index 69f114ca6..a10b5e369 100644 --- a/docs/source/groot.mdx +++ b/docs/source/groot.mdx @@ -124,7 +124,7 @@ lerobot-rollout\ --dataset.single_task="Grab and handover the red cube to the other arm" \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --policy.path=/groot-bimanual \ # your trained model --duration=600 ``` diff --git a/docs/source/hope_jr.mdx b/docs/source/hope_jr.mdx index 8826d9758..1f3b08fd7 100644 --- a/docs/source/hope_jr.mdx +++ b/docs/source/hope_jr.mdx @@ -232,7 +232,7 @@ lerobot-record \ --dataset.private=true \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --display_data=true ``` @@ -278,6 +278,6 @@ lerobot-record \ --dataset.num_episodes=10 \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --policy.path=outputs/train/hopejr_hand/checkpoints/last/pretrained_model ``` diff --git a/docs/source/il_robots.mdx b/docs/source/il_robots.mdx index c5ed5be5b..dc2e02737 100644 --- a/docs/source/il_robots.mdx +++ b/docs/source/il_robots.mdx @@ -207,7 +207,7 @@ lerobot-record \ --dataset.num_episodes=5 \ --dataset.single_task="Grab the black cube" \ --dataset.streaming_encoding=true \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --dataset.encoder_threads=2 ``` diff --git a/docs/source/language_and_recipes.mdx b/docs/source/language_and_recipes.mdx new file mode 100644 index 000000000..4181dbe34 --- /dev/null +++ b/docs/source/language_and_recipes.mdx @@ -0,0 +1,147 @@ +# Language columns and recipes + +Most LeRobot datasets ship with a single `task` string per episode — fine for +short, single-instruction skills, but not enough for the longer-horizon, +multi-modal robot policies the field is moving toward (high-level planning, +memory, interjections, VQA, tool use). To support those policies without +forking the dataset format, LeRobot extends `LeRobotDataset` with two optional +language columns and a small recipe layer that turns those rows into +chat-style training samples on the fly. + +The design splits cleanly into three layers: + +1. **Data in the dataset** — language annotations stored next to frames in + `data/chunk-*/file-*.parquet` as two optional columns (`language_persistent` + and `language_events`). Datasets without these columns keep their existing + behavior. +2. **Recipe** — a YAML file that declares which annotation rows to bind and + how to lay them out as chat turns (`role`, `content`, optional images, + optional tool calls). Recipes are pure config; no Python required to add a + new one. +3. **Training format** — at sample time, `RenderMessagesStep` resolves the + recipe against the per-frame annotations and emits HF-style `messages` plus + LeRobot-specific sidecars (`message_streams`, `target_message_indices`) + that policy processors consume. + +This page describes each layer in turn. + +## Layer 1 — language columns in the dataset + +The two optional columns live next to frame data in +`data/chunk-*/file-*.parquet`: + +- `language_persistent`: a list of rows broadcast across every frame in an episode for state that remains active, such as `subtask`, `plan`, and `memory`. +- `language_events`: a list of rows only on the exact frame where an event was emitted, such as `interjection`, `vqa`, and speech tool calls. + +Both columns share the same row shape (event rows omit `timestamp` because the +frame the row sits on already provides it): + +```text +role: string +content: string | null +style: string | null +timestamp: float32 # persistent rows only +camera: string | null # observation.images.* feature key, view-dependent rows only +tool_calls: list[Json] | null +``` + +The `camera` field tags rows whose `content` is grounded in a specific camera +view. Rows of view-dependent styles (`vqa` and `trace`) MUST set `camera` to +the matching `observation.images.*` feature key. Rows of every other style — +including `motion`, which describes robot-frame primitives in joint / Cartesian +terms — MUST leave `camera` as `null`. Pipeline writers and the validator +enforce this via `validate_camera_field(style, camera)`. + +`meta/tasks.parquet` remains the canonical source for the task. The special `${task}` recipe binding always reads that task string and does not depend on language annotations. + +### Architecture + +The language stack itself has three internal modules backing layer 1: + +1. `lerobot.datasets.language` defines the schema, style registry, and `column_for_style`. +2. `lerobot.datasets.language_render` resolves rows and renders messages. +3. `RenderMessagesStep` turns dataset samples into `messages`, `message_streams`, and `target_message_indices`. + +`LeRobotDataset` stays recipe-agnostic. It passes `language_persistent` and `language_events` through when present, and unannotated datasets keep their existing behavior. + +## Layer 2 — recipe anatomy + +Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`. They +declare which annotation rows to pull (via `bindings`) and how to compose them +into chat turns (`messages`). + +```yaml +messages: + - { role: user, content: "${task}", stream: high_level } + - { role: assistant, content: "${subtask}", stream: low_level, target: true } +``` + +A recipe can also branch into a weighted **blend** of sub-recipes. At sample +time, exactly one branch is selected deterministically from the sample index, +so different frames train different objectives (e.g. memory updates vs. +low-level execution vs. VQA) without any Python wiring. + +### Temporal semantics + +Persistent styles are active after emission until replaced: + +- `active_at(t, style=subtask)` +- `nth_prev(style=memory, offset=1)` +- `nth_next(style=subtask, offset=1)` + +Event styles only exist on their exact timestamp: + +- `emitted_at(t, style=interjection)` +- `emitted_at(t, style=vqa, role=user, camera=observation.images.top)` +- `emitted_at(t, role=assistant, tool_name=say)` + +Exact event matching has no tolerance window, so writers must stamp event rows with frame timestamps from the parquet data. + +### View-dependent resolution + +For view-dependent styles (`vqa` and `trace`), the resolver gains a +`camera=` filter parallel to `role=` and `tool_name=`. Datasets with multiple +cameras typically emit one (`vqa`, `user`) + (`vqa`, `assistant`) pair per +camera at the same timestamp; without `camera=`, those resolvers see two +matches and raise an ambiguity error. Recipes consume each camera through its +own binding plus a matching image block, e.g. + +```yaml +ask_vqa_top: + bindings: + vqa_query: "emitted_at(t, style=vqa, role=user, camera=observation.images.top)" + vqa: "emitted_at(t, style=vqa, role=assistant, camera=observation.images.top)" + messages: + - role: user + stream: high_level + if_present: vqa_query + content: + - { type: image, feature: observation.images.top } + - { type: text, text: "${vqa_query}" } + - { + role: assistant, + content: "${vqa}", + stream: high_level, + target: true, + if_present: vqa, + } +``` + +Add one such sub-recipe per camera the dataset records. + +## Layer 3 — training format + +Rendered samples use HF-style chat messages plus LeRobot sidecars: + +```python +sample["messages"] +sample["message_streams"] +sample["target_message_indices"] +``` + +The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone, which keeps the same dataset usable across SmolVLA, Pi0.5, and any future VLM that expects OpenAI-style chat messages. + +## Graceful absence + +If both language columns are missing, `None`, or empty, `RenderMessagesStep` is a no-op. +If an event-scoped branch is selected on a frame without the required event row, rendering returns `None`, allowing a loader to retry another sample. diff --git a/docs/source/lerobot-dataset-v3.mdx b/docs/source/lerobot-dataset-v3.mdx index 8ab4a5d40..c23677d8c 100644 --- a/docs/source/lerobot-dataset-v3.mdx +++ b/docs/source/lerobot-dataset-v3.mdx @@ -10,6 +10,7 @@ This docs will guide you to: - Stream datasets without downloading using `StreamingLeRobotDataset` - Apply image transforms for data augmentation during training - Migrate existing `v2.1` datasets to `v3.0` +- Experiment with other `LeRobotDataset` formats and implementations like Lance ## What’s new in `v3` @@ -43,7 +44,7 @@ lerobot-record \ --dataset.num_episodes=5 \ --dataset.single_task="Grab the black cube" \ --dataset.streaming_encoding=true \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --dataset.encoder_threads=2 ``` @@ -315,3 +316,39 @@ Dataset v3.0 uses incremental parquet writing with buffered metadata for efficie - Ensures the dataset is valid for loading Without calling `finalize()`, your parquet files will be incomplete and the dataset won't load properly. + +## Other formats and implementations + +### Lance + +Lance is a useful format for multimodal AI datasets, especially for large-scale training requiring high performance IO and random access. + +The `lerobot-lancedb` package implements `LeRobotLanceDataset` (for JPEG images) and `LeRobotLanceVideoDataset` (for mp4 videos). +Those two storage layouts both subclass LeRobotDataset and can provide data loading speed ups. + +`LeRobotLanceDataset` is a drop-in replacement for `LeRobotDataset`: + +```python +from lerobot.datasets import LeRobotDatasetMetadata +from lerobot.policies.diffusion.configuration_diffusion import DiffusionConfig +from lerobot_lancedb import LeRobotLanceDataset, LeRobotLanceVideoDataset + +cfg = DiffusionConfig(...) +meta = LeRobotDatasetMetadata(root=local_dataset_path) # or use repo_id=... to load metadata from the Hub +delta_timestamps = {...} + +# Use LeRobotLanceDataset for image datasets +dataset = LeRobotLanceDataset( + root=local_dataset_path, # or use repo_id=... to stream from the Hub + delta_timestamps=delta_timestamps, + return_uint8=True, +) +# Or use LeRobotLanceVideoDataset for video datasets: +dataset = LeRobotLanceVideoDataset( + root=local_dataset_path, # or use repo_id=... to stream from the Hub + delta_timestamps=delta_timestamps, + return_uint8=True, +) +``` + +Join the discussion on [Github](https://github.com/huggingface/lerobot/issues/3608) and explore the `lerobot-lancedb` documentation [here](https://lancedb.github.io/lerobot-lancedb/). diff --git a/docs/source/reachy2.mdx b/docs/source/reachy2.mdx index 1b868711a..4b08569db 100644 --- a/docs/source/reachy2.mdx +++ b/docs/source/reachy2.mdx @@ -161,7 +161,7 @@ lerobot-record \ --dataset.private=true \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --display_data=true ``` @@ -203,7 +203,7 @@ lerobot-record \ --dataset.private=true \ --dataset.streaming_encoding=true \ --dataset.encoder_threads=2 \ - # --dataset.vcodec=auto \ + # --dataset.camera_encoder.vcodec=auto \ --display_data=true ``` diff --git a/docs/source/rebot_b601.mdx b/docs/source/rebot_b601.mdx new file mode 100644 index 000000000..adb751560 --- /dev/null +++ b/docs/source/rebot_b601.mdx @@ -0,0 +1,186 @@ +# reBot B601-DM + +[reBot B601-DM](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/) is an open-source, low-cost robot arm from Seeed Studio for embodied-AI and imitation learning. It comes as a **follower** arm (the `B601-DM`, a 6-DOF arm plus gripper driven by Damiao CAN motors) and a **leader** arm (the `StarArm102` / `reBot Arm 102`, driven by FashionStar UART smart servos) used to teleoperate it. + +This page covers **calibration** and **teleoperation** for both single-arm and bimanual (dual-arm) setups. + +
+ reBot B601-DM follower arm at its zero position + reBot Arm 102 leader arm at its zero position +
+ +_Left: the B601-DM follower at its zero position. Right: the reBot Arm 102 leader at its zero position. Images courtesy of [Seeed Studio](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/)._ + +## Install LeRobot 🤗 + +Follow our [Installation Guide](./installation), then install the reBot support: + +```bash +pip install -e ".[rebot]" +``` + +This pulls in `motorbridge` (CAN motor control for the B601-DM follower) and `motorbridge-smart-servo` (FashionStar UART servos for the reBot Arm 102 leader). + +## Registered device types + +| Type | Kind | +| ------------------------ | -------------------------------------------- | +| `rebot_b601_follower` | single-arm B601-DM follower robot | +| `bi_rebot_b601_follower` | bimanual (dual-arm) follower robot | +| `rebot_102_leader` | single-arm reBot Arm 102 leader teleoperator | +| `bi_rebot_102_leader` | bimanual (dual-arm) leader teleoperator | + +The bimanual types compose two single-arm instances and namespace each arm's +observation/action keys with a `left_` / `right_` prefix. Per-arm settings are +passed through nested `left_arm_config.*` / `right_arm_config.*` arguments. + +## Find the USB ports + +For each device, find the USB port associated with its motor bus using: + +```bash +lerobot-find-port +``` + + + On Linux, remove `brltty` (`sudo apt remove brltty`) so it does not hold the + leader's USB serial port. You may also need to grant access to the serial + devices: `sudo chmod 666 /dev/ttyACM* /dev/ttyUSB*`. + + +## Calibration + +Neither arm stores a persistent hardware calibration: every time it connects, the motors are re-zeroed against the pose the arm is physically holding. Calibration simply records that zero pose. When prompted, **manually move the arm to its zero position** (the default sit-down pose shown above, gripper fully closed) and press ENTER. + +### Follower (B601-DM) + + + + +```bash +lerobot-calibrate \ + --robot.type=rebot_b601_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.id=follower \ + --robot.can_adapter=damiao +``` + + + + +Connect the bimanual follower; calibration runs for the left arm, then the right arm. + +```bash +lerobot-calibrate \ + --robot.type=bi_rebot_b601_follower \ + --robot.id=bi_follower \ + --robot.left_arm_config.port=/dev/ttyACM0 \ + --robot.left_arm_config.can_adapter=damiao \ + --robot.right_arm_config.port=/dev/ttyACM1 \ + --robot.right_arm_config.can_adapter=damiao +``` + +Per-arm calibration files are saved with `_left` / `_right` suffixes on the id. + + + + +### Leader (reBot Arm 102) + + + + +```bash +lerobot-calibrate \ + --teleop.type=rebot_102_leader \ + --teleop.port=/dev/ttyUSB0 \ + --teleop.id=leader +``` + + + + +```bash +lerobot-calibrate \ + --teleop.type=bi_rebot_102_leader \ + --teleop.id=bi_leader \ + --teleop.left_arm_config.port=/dev/ttyUSB0 \ + --teleop.right_arm_config.port=/dev/ttyUSB1 +``` + + + + +## Teleoperation + +Once both arms are calibrated, drive the follower with the leader. The follower talks to its CAN bus through a Damiao serial bridge (`can_adapter=damiao`, the default) or a SocketCAN adapter (`can_adapter=socketcan`). See the [OpenArm page](./openarm) for more details on the SocketCAN adapter configuration. + + + + +```bash +lerobot-teleoperate \ + --robot.type=rebot_b601_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.id=follower \ + --robot.can_adapter=damiao \ + --teleop.type=rebot_102_leader \ + --teleop.port=/dev/ttyUSB0 \ + --teleop.id=leader +``` + + + + +The bimanual leader and follower reuse the single-arm classes; each arm is +configured through nested `left_arm_config.*` / `right_arm_config.*` arguments, +so a bimanual reBot Arm 102 leader drives a bimanual B601-DM follower. + +```bash +lerobot-teleoperate \ + --robot.type=bi_rebot_b601_follower \ + --robot.id=bi_follower \ + --robot.left_arm_config.port=/dev/ttyACM0 \ + --robot.left_arm_config.can_adapter=damiao \ + --robot.right_arm_config.port=/dev/ttyACM1 \ + --robot.right_arm_config.can_adapter=damiao \ + --teleop.type=bi_rebot_102_leader \ + --teleop.id=bi_leader \ + --teleop.left_arm_config.port=/dev/ttyUSB0 \ + --teleop.right_arm_config.port=/dev/ttyUSB1 +``` + + + + + + The leader and follower share the same joint names (`shoulder_pan, + shoulder_lift, elbow_flex, wrist_flex, wrist_yaw, wrist_roll, gripper`), so + leader actions map directly onto the follower. + + +If the motion of a joint is reversed, flip its sign in the leader's `joint_directions` (the gripper also carries a scale to widen its range to the follower): + +```bash +lerobot-teleoperate \ + --robot.type=rebot_b601_follower \ + --robot.port=/dev/ttyACM0 \ + --robot.can_adapter=damiao \ + --teleop.type=rebot_102_leader \ + --teleop.port=/dev/ttyUSB0 \ + --teleop.joint_directions='{"shoulder_pan":-1,"shoulder_lift":-1,"elbow_flex":1,"wrist_flex":1,"wrist_yaw":1,"wrist_roll":-1,"gripper":-6}' +``` + +## Recording datasets + +Swap `lerobot-teleoperate` for `lerobot-record` (with the same `--robot.*` / `--teleop.*` arguments, plus `--dataset.*`) to record demonstrations for training. See [Imitation Learning for Robots](./il_robots) for the full workflow. + +For hardware assembly and wiring, see the [Seeed Studio reBot wiki](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/). diff --git a/docs/source/streaming_video_encoding.mdx b/docs/source/streaming_video_encoding.mdx index 40004200e..96e049eb3 100644 --- a/docs/source/streaming_video_encoding.mdx +++ b/docs/source/streaming_video_encoding.mdx @@ -17,9 +17,9 @@ This makes `save_episode()` near-instant (the video is already encoded by the ti | Parameter | CLI Flag | Type | Default | Description | | ----------------------- | --------------------------------- | ------------- | ------------- | ----------------------------------------------------------------- | | `streaming_encoding` | `--dataset.streaming_encoding` | `bool` | `True` | Enable real-time encoding during capture | -| `vcodec` | `--dataset.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder | +| `vcodec` | `--dataset.camera_encoder.vcodec` | `str` | `"libsvtav1"` | Video codec. `"auto"` detects best HW encoder | | `encoder_threads` | `--dataset.encoder_threads` | `int \| None` | `None` (auto) | Threads per encoder instance. `None` will leave the vcoded decide | -| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `60` | Max buffered frames per camera (~2s at 30fps). Consumes RAM | +| `encoder_queue_maxsize` | `--dataset.encoder_queue_maxsize` | `int` | `30` | Max buffered frames per camera (~1s at 30fps). Consumes RAM | ## 3. Performance Considerations @@ -48,7 +48,7 @@ This parameter controls how many threads each encoder instance uses internally: ### Backpressure and Frame Dropping -Each camera has a bounded queue (`encoder_queue_maxsize`, default 60 frames). When the encoder can't keep up: +Each camera has a bounded queue (`encoder_queue_maxsize`, default 30 frames). When the encoder can't keep up: 1. The queue fills up (consuming RAM) 2. New frames are **dropped** (not blocked) — the capture loop continues uninterrupted @@ -82,15 +82,15 @@ Use HW encoding when: ### Available HW Encoders -| Encoder | Platform | Hardware | CLI Value | -| ------------------- | ------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------ | -| `h264_videotoolbox` | macOS | Apple Silicon / Intel | `--dataset.vcodec=h264_videotoolbox` | -| `hevc_videotoolbox` | macOS | Apple Silicon / Intel | `--dataset.vcodec=hevc_videotoolbox` | -| `h264_nvenc` | Linux/Windows | NVIDIA GPU | `--dataset.vcodec=h264_nvenc` | -| `hevc_nvenc` | Linux/Windows | NVIDIA GPU | `--dataset.vcodec=hevc_nvenc` | -| `h264_vaapi` | Linux | Intel/AMD GPU | `--dataset.vcodec=h264_vaapi` | -| `h264_qsv` | Linux/Windows | Intel Quick Sync | `--dataset.vcodec=h264_qsv` | -| `auto` | Any | Probes the system for available HW encoders. Falls back to `libsvtav1` if no HW encoder is found | `--dataset.vcodec=auto` | +| Encoder | Platform | Hardware | CLI Value | +| ------------------- | ------------- | ------------------------------------------------------------------------------------------------ | --------------------------------------------------- | +| `h264_videotoolbox` | macOS | Apple Silicon / Intel | `--dataset.camera_encoder.vcodec=h264_videotoolbox` | +| `hevc_videotoolbox` | macOS | Apple Silicon / Intel | `--dataset.camera_encoder.vcodec=hevc_videotoolbox` | +| `h264_nvenc` | Linux/Windows | NVIDIA GPU | `--dataset.camera_encoder.vcodec=h264_nvenc` | +| `hevc_nvenc` | Linux/Windows | NVIDIA GPU | `--dataset.camera_encoder.vcodec=hevc_nvenc` | +| `h264_vaapi` | Linux | Intel/AMD GPU | `--dataset.camera_encoder.vcodec=h264_vaapi` | +| `h264_qsv` | Linux/Windows | Intel Quick Sync | `--dataset.camera_encoder.vcodec=h264_qsv` | +| `auto` | Any | Probes the system for available HW encoders. Falls back to `libsvtav1` if no HW encoder is found | `--dataset.camera_encoder.vcodec=auto` | > [!NOTE] > In order to use the HW accelerated encoders you might need to upgrade your GPU drivers. @@ -100,15 +100,15 @@ Use HW encoding when: ## 5. Troubleshooting -| Symptom | Likely Cause | Fix | -| ------------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| System freezes or choppy robot movement or Rerun visualization lag | CPU starved (100% load usage) | Close other apps, reduce encoding throughput, lower `encoder_threads`, use `h264`, use `display_data=False`. If the CPU continues to be at 100% then it might be insufficient for your setup, consider `--dataset.streaming_encoding=false` or HW encoding (`--dataset.vcodec=auto`) | -| "Encoder queue full" warnings or dropped frames in dataset | Encoder can't keep up (Queue overflow) | If CPU is not at 100%: Increase `encoder_threads`, increase `encoder_queue_maxsize` or use HW encoding (`--dataset.vcodec=auto`). | -| High RAM usage | Queue filling faster than encoding | `encoder_threads` too low or CPU insufficient. Reduce `encoder_queue_maxsize` or use HW encoding | -| Large video files | Using HW encoder or H.264 | Expected trade-off. Switch to `libsvtav1` if CPU allows | -| `save_episode()` still slow | `streaming_encoding` is `False` | Set `--dataset.streaming_encoding=true` | -| Encoder thread crash | Codec not available or invalid settings | Check `vcodec` is installed, try `--dataset.vcodec=auto` | -| Recorded dataset is missing frames | CPU/GPU starvation or occasional load spikes | If ~5% of frames are missing, your system is likely overloaded — follow the recommendations above. If fewer frames are missing (~2%), they are probably due to occasional transient load spikes (often at startup) and can be considered expected. | +| Symptom | Likely Cause | Fix | +| ------------------------------------------------------------------ | -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| System freezes or choppy robot movement or Rerun visualization lag | CPU starved (100% load usage) | Close other apps, reduce encoding throughput, lower `encoder_threads`, use `h264`, use `display_data=False`. If the CPU continues to be at 100% then it might be insufficient for your setup, consider `--dataset.streaming_encoding=false` or HW encoding (`--dataset.camera_encoder.vcodec=auto`) | +| "Encoder queue full" warnings or dropped frames in dataset | Encoder can't keep up (Queue overflow) | If CPU is not at 100%: Increase `encoder_threads`, increase `encoder_queue_maxsize` or use HW encoding (`--dataset.camera_encoder.vcodec=auto`). | +| High RAM usage | Queue filling faster than encoding | `encoder_threads` too low or CPU insufficient. Reduce `encoder_queue_maxsize` or use HW encoding | +| Large video files | Using HW encoder or H.264 | Expected trade-off. Switch to `libsvtav1` if CPU allows | +| `save_episode()` still slow | `streaming_encoding` is `False` | Set `--dataset.streaming_encoding=true` | +| Encoder thread crash | Codec not available or invalid settings | Check `vcodec` is installed, try `--dataset.camera_encoder.vcodec=auto` | +| Recorded dataset is missing frames | CPU/GPU starvation or occasional load spikes | If ~5% of frames are missing, your system is likely overloaded — follow the recommendations above. If fewer frames are missing (~2%), they are probably due to occasional transient load spikes (often at startup) and can be considered expected. | ## 6. Recommended Configurations @@ -146,7 +146,7 @@ On very constrained systems, streaming encoding may compete too heavily with the # 2camsx 640x480x3 @30fps: Requires some tuning. # Use H.264, disable streaming, consider batching encoding -lerobot-record --dataset.vcodec=h264 --dataset.streaming_encoding=false ... +lerobot-record --dataset.camera_encoder.vcodec=h264 --dataset.streaming_encoding=false ... ``` ## 7. Closing note diff --git a/docs/source/tools.mdx b/docs/source/tools.mdx new file mode 100644 index 000000000..d88881184 --- /dev/null +++ b/docs/source/tools.mdx @@ -0,0 +1,210 @@ +# Tools + +LeRobot v3.1 supports **tool calls** in policies — assistant messages can +emit structured invocations like `say(text="OK, starting now")` that the +runtime dispatches to a real implementation (TTS, controller, logger, …). + +This page covers: + +1. Where the tool catalog lives. +2. How the annotation pipeline produces tool-call atoms. +3. How to add your own tool. + +## Where tools are declared + +Two layers. + +**The catalog** — a list of OpenAI-style function schemas — lives at +`meta/info.json["tools"]` on each dataset. Example: + +```json +{ + "features": { "...": "..." }, + "tools": [ + { + "type": "function", + "function": { + "name": "say", + "description": "Speak a short utterance to the user via the TTS executor.", + "parameters": { + "type": "object", + "properties": { + "text": { + "type": "string", + "description": "The verbatim text to speak." + } + }, + "required": ["text"] + } + } + } + ] +} +``` + +Read it via the dataset metadata accessor: + +```python +from lerobot.datasets.dataset_metadata import LeRobotDatasetMetadata + +meta = LeRobotDatasetMetadata(repo_id="pepijn/super_poulain_final_annotations") +tools = meta.tools # list[dict] — OpenAI tool schemas +``` + +If the dataset's `info.json` doesn't declare any tools, `meta.tools` +returns `DEFAULT_TOOLS` from `lerobot.datasets.language` — currently a +single-entry list with the canonical `say` schema. So unannotated +datasets and chat-template consumers keep working without any +configuration: + +```python +prompt_str = tokenizer.apply_chat_template( + sample["messages"], + tools=meta.tools, # works either way + add_generation_prompt=False, + tokenize=False, +) +``` + +**The implementations** — runnable Python — will live under +`src/lerobot/tools/`, one file per tool. The runtime dispatcher and +the canonical `say` implementation (wrapping Kyutai's pocket-tts) are +not part of the catalog layer described here; today this layer ships +only the schema storage and the `DEFAULT_TOOLS` fallback constant. + +## Per-row tool _invocations_ + +The catalog above describes _what can be called_. The actual _call_ — the +function name plus the argument values — is stored per-row, on the +assistant atoms in `language_events`: + +```python +{ + "role": "assistant", + "content": null, + "style": null, + "timestamp": 12.4, + "camera": null, + "tool_calls": [ + { "type": "function", + "function": { "name": "say", "arguments": { "text": "On it." } } } + ] +} +``` + +Recipes splice these into rendered messages via `tool_calls_from`: + +```yaml +user_interjection_response: + bindings: + speech: "emitted_at(t, role=assistant, tool_name=say)" + messages: + - { role: user, content: "${task}", stream: high_level } + - { + role: assistant, + content: "${current_plan}", + stream: high_level, + target: true, + tool_calls_from: speech, + } +``` + +The model's training target is one assistant turn that carries both the +plan text _and_ the `say` tool call. At inference, the runtime parses +the generated text back into structured `tool_calls` and dispatches to +the matching implementation. + +## How to add your own tool + +> **Note:** Steps 2 and 3 below describe the runtime layer +> (`src/lerobot/tools/`, the `Tool` protocol, `TOOL_REGISTRY`, +> `get_tools(meta)`) which is not part of the catalog layer shipped +> today — those modules don't yet exist in the tree. Step 1 alone is +> enough to make the tool visible to the chat template via +> `meta.tools` so the model can learn to _generate_ the call; +> executing the call at inference requires the runtime layer. + +Three steps. Concrete example: a `record_observation` tool the policy +can call to capture an extra observation outside the regular control +loop. + +### Step 1 — declare the schema + +Add an entry under `meta/info.json["tools"]`. Either edit the file +directly on disk _before_ running the annotation pipeline (it'll be +preserved) or hand it to `lerobot-annotate` via a config flag. + +```json +{ + "tools": [ + { "type": "function", "function": { "name": "say", "...": "..." } }, + { + "type": "function", + "function": { + "name": "record_observation", + "description": "Capture a high-resolution still image for the user.", + "parameters": { + "type": "object", + "properties": { + "label": { + "type": "string", + "description": "Short label for the saved image." + } + }, + "required": ["label"] + } + } + } + ] +} +``` + +The schema follows OpenAI's function-calling convention exactly, so the +chat template can render it natively. + +### Step 2 — implement the call + +Create `src/lerobot/tools/record_observation.py`: + +```python +from .base import Tool +from typing import Any + +RECORD_OBSERVATION_SCHEMA: dict[str, Any] = { "...": "..." } # mirrors the JSON above + + +class RecordObservationTool: + name = "record_observation" + schema = RECORD_OBSERVATION_SCHEMA + + def __init__(self, schema: dict | None = None, output_dir: str = "."): + self.output_dir = output_dir + + def call(self, arguments: dict) -> str: + label = arguments["label"] + # ... save the latest camera frame to /