refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch

The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.

* `executor.py`: drop `select_executor_class`, the "kind" log line, and
  the references to LocalPipelineExecutor / SlurmPipelineExecutor.
  Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
  `slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
  `episode_parallelism`. While here, prune the longer "why" docstrings
  on every field down to the load-bearing bits — full story moves to
  `docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
  `[annotations]` extra; the dep was only there for the (never used)
  cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
  flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
  - remove the flavor-namespace / sidecar paragraph (out of scope —
    "multiple revisions = multiple copies" is dataset-level policy);
  - remove the "writer drops the legacy `subtask_index` column" note
    (already covered by PR 1's intentional-break call-out);
  - remove the chat-template + `apply_chat_template(messages, tools=...)`
    line (covered by Tools doc);
  - replace the "executor picks Local vs Slurm" paragraph with
    `--executor.episode_parallelism` and a pointer to HF Jobs;
  - rewrite the style→recipe section to talk about "recipes" generically
    instead of pinning a specific YAML;
  - add a "Running on Hugging Face Jobs" section pointing at
    `examples/annotation/run_hf_job.py`;
  - add a "Running locally" example matching the CLI's docstring
    (`uv run lerobot-annotate --root=... --vlm.model_id=...`);
  - extend the paper-inspirations list with Pi0.7 and Steerable VLA
    Policies (Zhao 2025) for Module 3.

Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-05-08 11:09:22 +02:00
parent 8fa8323c91
commit dad2cf1178
7 changed files with 1551 additions and 369 deletions
+54 -39
View File
@@ -3,8 +3,7 @@
`lerobot-annotate` populates the two language columns introduced by the `lerobot-annotate` populates the two language columns introduced by the
[Language Columns and Recipes](./language_and_recipes) page — [Language Columns and Recipes](./language_and_recipes) page —
`language_persistent` and `language_events` — directly into `language_persistent` and `language_events` — directly into
`data/chunk-*/file-*.parquet`. There is no flavor namespace and no sidecar `data/chunk-*/file-*.parquet`.
file tree: multiple revisions of a dataset mean multiple dataset copies.
## What the pipeline produces ## What the pipeline produces
@@ -16,18 +15,16 @@ rewrites the data shards in place:
| `subtask` (Pi0.7-style "how, not what") | `language_persistent` | Module 1 | | `subtask` (Pi0.7-style "how, not what") | `language_persistent` | Module 1 |
| `plan` (initial + refresh on interjection) | `language_persistent` | Module 1 | | `plan` (initial + refresh on interjection) | `language_persistent` | Module 1 |
| `memory` (MEM-style compression) | `language_persistent` | Module 1 | | `memory` (MEM-style compression) | `language_persistent` | Module 1 |
| `task_aug` (rephrasings of canonical task) | `language_persistent` | Module 1 |
| `interjection` | `language_events` | Module 2 | | `interjection` | `language_events` | Module 2 |
| speech tool-call atom (`style=null`, `say`) | `language_events` | Module 2 | | speech tool-call atom (`style=null`, `say`) | `language_events` | Module 2 |
| `vqa` (user / assistant pair) | `language_events` | Module 3 | | `vqa` (user / assistant pair) | `language_events` | Module 3 |
The writer drops the legacy `subtask_index` column. It does **not** add a The writer does **not** add a `tools` column to the parquet — the tool
`tools` column to the parquet — the tool catalog lives at catalog lives at `meta/info.json["tools"]` instead (see
`meta/info.json["tools"]` instead (see [Tools](./tools)). After every [Tools](./tools)). After every annotation run the pipeline ensures the
annotation run the pipeline ensures the canonical `say` schema is canonical `say` schema is present in that list, preserving any tools the
present in that list, preserving any tools the user pre-declared. Chat- user pre-declared.
template consumers read the catalog through
`LeRobotDatasetMetadata.tools` and pass it to
`apply_chat_template(messages, tools=meta.tools, ...)`.
If you want to declare additional tools for a dataset before annotation If you want to declare additional tools for a dataset before annotation
runs, edit `meta/info.json["tools"]` directly — the pipeline preserves runs, edit `meta/info.json["tools"]` directly — the pipeline preserves
@@ -35,17 +32,17 @@ anything already there. Implementations of those tools live under
`src/lerobot/tools/`; one file per tool, registered via `src/lerobot/tools/`; one file per tool, registered via
`TOOL_REGISTRY`. See the [Tools](./tools) doc for the authoring guide. `TOOL_REGISTRY`. See the [Tools](./tools) doc for the authoring guide.
## How to run it locally or on SLURM ## Running locally
Install the extra and invoke the console script: Install the extra and invoke the console script. Episode-level
concurrency comes from `--executor.episode_parallelism` (default 16);
that is the only knob the in-process executor exposes.
```bash ```bash
uv sync --extra annotations uv sync --extra annotations
uv run lerobot-annotate \ uv run lerobot-annotate \
--repo_id=imstevenpmwork/super_poulain_draft \ --root=/path/to/dataset \
--vlm.backend=vllm \ --vlm.model_id=Qwen/Qwen2.5-VL-7B-Instruct
--vlm.model_id=Qwen/Qwen3.6-27B-FP8 \
--vlm.tensor_parallel_size=2
``` ```
The pipeline attaches actual camera footage to every Module 1/2/3 prompt The pipeline attaches actual camera footage to every Module 1/2/3 prompt
@@ -58,40 +55,56 @@ text-only prompts automatically.
decomposition gets a `{"type":"video", "video":[<frames>]}` block decomposition gets a `{"type":"video", "video":[<frames>]}` block
covering the entire demonstration; Qwen-VL pools temporally on its own covering the entire demonstration; Qwen-VL pools temporally on its own
and decides where to cut. There is no keyframe stride or count knob — and decides where to cut. There is no keyframe stride or count knob —
`--module_1.max_video_frames` (default 32) only caps the frames packed `--module_1.max_video_frames` (default 128) only caps the frames packed
into the video block as a model-capacity bound. Module 2 attaches a into the video block as a model-capacity bound. Module 2 attaches a
single still frame at the interjection timestamp; Module 3 attaches the short window of frames around the interjection timestamp; Module 3
exact emission frame to each VQA pair. attaches the exact emission frame to each VQA pair.
The executor picks `LocalPipelineExecutor` for small datasets and ## Running on Hugging Face Jobs
`SlurmPipelineExecutor` for large ones based on
`--executor.auto_threshold` (default 32 episodes). Force local with Distributed annotation is delegated to
`--executor.force_local=true`. SLURM jobs honour `--executor.slurm_partition`, [Hugging Face Jobs](https://huggingface.co/docs/hub/en/jobs). The repo
`--executor.slurm_gpus`, and `--executor.slurm_time`. ships a launcher script you copy and edit for your dataset:
```bash
HF_TOKEN=hf_... uv run python examples/annotation/run_hf_job.py
```
[`examples/annotation/run_hf_job.py`](https://github.com/huggingface/lerobot/blob/main/examples/annotation/run_hf_job.py)
spawns one `h200x2` job that:
1. installs the branch under test plus the annotation extras,
2. boots two vllm servers (one per GPU) for the chosen model,
3. runs Modules 1 / 2 / 3 across the dataset via `lerobot-annotate`,
4. uploads the annotated dataset to `--push_to_hub`.
To target a different dataset, model, or hub repo, edit the `CMD` block
inside the script — every flag in there maps directly onto a CLI flag of
`lerobot-annotate` (see `lerobot-annotate --help` for the full list).
## Style-to-recipe consumer mapping ## Style-to-recipe consumer mapping
The pipeline produces exactly the styles consumed by The pipeline's outputs are designed to be consumed by recipes (see
`src/lerobot/configs/recipes/pi05_hirobot.yaml`: [Language Columns and Recipes](./language_and_recipes)) — typically:
- `low_level_execution`, `high_level_subtask`, `memory_update` consume - low-level / high-level / memory-update branches consume
`subtask`/`plan`/`memory` from `language_persistent`. `subtask`/`plan`/`memory` from `language_persistent`.
- `user_interjection_response` consumes `interjection` events plus the - An interjection-response branch consumes `interjection` events plus
paired speech atom (merged into one assistant target turn via the paired speech atom (merged into one assistant target turn via
`tool_calls_from`) and the same-timestamp `plan` refresh. `tool_calls_from`) and the same-timestamp `plan` refresh.
- `ask_vqa` consumes the `(vqa, user)` and `(vqa, assistant)` pairs from - A VQA branch consumes the `(vqa, user)` and `(vqa, assistant)` pairs
`language_events`. from `language_events`.
## Why the design is scoped to the canonical recipe ## Why the design splits state from events
Two things drive the scope: Two things drive the scope:
1. **Persistent state vs exact-event split.** Persistent rows (`subtask`, 1. **Persistent state vs exact-event split.** Persistent rows
`plan`, `memory`) broadcast per episode and answer "what state is in (`subtask`, `plan`, `memory`) broadcast per episode and answer "what
force at this frame?". Event rows (`interjection`, `vqa`, speech) only state is in force at this frame?". Event rows (`interjection`, `vqa`,
appear on the exact frame whose timestamp matches the emission. The speech) only appear on the exact frame whose timestamp matches the
pipeline writes timestamps taken straight from the source parquet — no emission. The pipeline writes timestamps taken straight from the
floating-point recomputation. source parquet — no floating-point recomputation.
2. **One Qwen-VL pass.** All three modules share a single VLM client 2. **One Qwen-VL pass.** All three modules share a single VLM client
(vLLM if available, transformers fallback) so the cost is one model (vLLM if available, transformers fallback) so the cost is one model
load per dataset, not three. load per dataset, not three.
@@ -134,7 +147,9 @@ Errors abort the writer (`--skip_validation=true` overrides for debugging).
arguments:{text:...}}}]`). arguments:{text:...}}}]`).
- **Module 3 — VQA.** ECoT ([Zawalski 2024](https://arxiv.org/abs/2407.08693)) - **Module 3 — VQA.** ECoT ([Zawalski 2024](https://arxiv.org/abs/2407.08693))
grounded features (bounding boxes in pixel `[x_min, y_min, x_max, y_max]`, grounded features (bounding boxes in pixel `[x_min, y_min, x_max, y_max]`,
keypoints) and Steerable Policies' multi-abstraction grounding. keypoints) and Steerable VLA Policies ([Zhao 2025](https://arxiv.org/abs/2509.07626))
multi-abstraction grounding. Pi0.7 also grounds answers across
multiple abstraction levels.
Future maintainers should adjust the prompt templates in Future maintainers should adjust the prompt templates in
`src/lerobot/annotations/steerable_pipeline/prompts/` against these `src/lerobot/annotations/steerable_pipeline/prompts/` against these
+4 -3
View File
@@ -200,12 +200,13 @@ hilserl = ["lerobot[transformers-dep]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpci
async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"] async = ["lerobot[grpcio-dep]", "lerobot[matplotlib-dep]"]
peft = ["lerobot[transformers-dep]", "lerobot[peft-dep]"] peft = ["lerobot[transformers-dep]", "lerobot[peft-dep]"]
# Annotation pipeline (lerobot-annotate). datatrove is mandatory; vllm is # Annotation pipeline (lerobot-annotate). vllm is the preferred backend on
# the preferred backend on Linux, with a transformers fallback elsewhere. # Linux, with a transformers fallback elsewhere. Distributed execution is
# delegated to Hugging Face Jobs (see examples/annotation/run_hf_job.py),
# so this pipeline pulls no cluster-scheduler dependency.
annotations = [ annotations = [
"lerobot[dataset]", "lerobot[dataset]",
"lerobot[transformers-dep]", "lerobot[transformers-dep]",
"datatrove>=0.4.0,<2.0.0",
"vllm>=0.6.0,<1.0.0; sys_platform == 'linux'", "vllm>=0.6.0,<1.0.0; sys_platform == 'linux'",
] ]
@@ -23,94 +23,62 @@ from typing import Any
@dataclass @dataclass
class Module1Config: class Module1Config:
"""Module 1 hyperparameters: plan + subtasks + memory + task augmentation. """Module 1: plan + subtasks + memory + task augmentation.
Subtask decomposition sees the **whole episode** as one Qwen-VL video Module 1 attaches the whole episode as one Qwen-VL video block;
block — no keyframe stride or count: the model handles temporal pooling ``max_video_frames`` only caps the frames packed in (a model-capacity
itself and decides where to cut. ``max_video_frames`` only caps the bound, not an annotation-logic knob).
number of frames packed into the video block (a model-capacity bound,
not an annotation-logic knob).
""" """
enabled: bool = True enabled: bool = True
# Number of ``task_aug`` rephrasings emitted at ``t=0``. The renderer's
# ``${task}`` binding rotates among them per ``sample_idx``. ``0`` disables.
n_task_rephrasings: int = 10 n_task_rephrasings: int = 10
"""Number of task rephrasings to generate at ``t=0`` as ``task_aug``
persistent rows (PR 1 ``CORE_STYLES``). The renderer's ``${task}`` # When to derive the task from the video instead of using
binding rotates among them deterministically per ``sample_idx``, # ``record.episode_task``: ``off``, ``if_short`` (short / placeholder /
realizing Xiao 2022 / CAST-style task-prompt diversity without # missing canonical task), or ``always``. The derived task replaces the
touching ``meta/tasks.parquet``. Set to 0 to disable.""" # canonical one for every Module-1 prompt; ``meta/tasks.parquet`` is
# never modified.
derive_task_from_video: str = "if_short" derive_task_from_video: str = "if_short"
"""When to bypass the user-provided ``record.episode_task`` and
derive a fresh task description from the episode video alone:
- ``off`` never; always use the canonical task as the basis.
- ``if_short`` derive when the canonical task is empty, has fewer
than ``derive_task_min_words`` words, or matches a
placeholder string (``debug``, ``unnamed``, ``tbd``,
...). Default — fixes noisy / placeholder tasks
without forcing derivation everywhere.
- ``always`` ignore the canonical task entirely; always derive
from the video. Useful when the dataset's task
labels are uniformly bad.
The video-derived task replaces the canonical task as the basis for
subtask decomposition, plan, memory, AND the ``task_aug`` rephrasings,
so every downstream annotation is grounded in what's actually visible.
``meta/tasks.parquet`` is NOT modified — the Module-1-derived task
only lives in ``language_persistent`` rows."""
derive_task_min_words: int = 3 derive_task_min_words: int = 3
"""Word-count threshold for ``derive_task_from_video=if_short``."""
# Frame sampling for the subtask-decomposition prompt.
frames_per_second: float = 1.0 frames_per_second: float = 1.0
"""Sample one image-frame per ``1/fps`` seconds across the episode for
Module 1's subtask-decomposition prompt. ``1.0`` = 1 fps. Capped by
``max_video_frames`` to avoid blowing up the request payload."""
max_video_frames: int = 128 max_video_frames: int = 128
"""Hard cap on the number of frames Module 1 sends. With ``fps=1`` and
a 30 s episode this yields 30 frames. Bumped from 32 since each frame
is small (~30-100 KB PNG when base64'd)."""
min_subtask_seconds: float = 1.5 min_subtask_seconds: float = 1.5
plan_max_steps: int = 8 plan_max_steps: int = 8
# When True (and backend supports it, e.g. ``openai``), Module 1 sends a
# ``video_url`` block pointing at a per-episode mp4 subclip and lets the
# server sample frames at ``use_video_url_fps``.
use_video_url: bool = False use_video_url: bool = False
"""When True (and backend supports it, e.g. ``openai``), Module 1
sends a ``video_url`` content block pointing at the episode's mp4
file instead of pre-decoded frames. Lets the server sample frames at
its own ``fps`` — no in-process conv3d cost. The video file is
extracted as a per-episode subclip to ``staging/.video_clips/`` so
the model sees only this episode's frames."""
use_video_url_fps: float = 1.0 use_video_url_fps: float = 1.0
"""Frame-rate hint to send to the server (mm_processor_kwargs.fps).
Only used when ``use_video_url=True``. ``1.0`` = sample 1 frame per
second, which is plenty for subtask-boundary detection on most
manipulation episodes."""
@dataclass @dataclass
class Module2Config: class Module2Config:
"""Module 2 hyperparameters: interjections + paired speech.""" """Module 2: interjections + paired speech."""
enabled: bool = True enabled: bool = True
# Each interjection emits a paired ``(interjection, speech)`` event row
# and triggers a ``plan`` refresh at the same timestamp via Module 1.
max_interjections_per_episode: int = 3 max_interjections_per_episode: int = 3
"""Number of mid-episode interjections to generate per episode. Each
creates a paired ``(interjection, speech)`` event row plus triggers a
``plan`` refresh at the same timestamp via Module 1. Bumped from the
original ``1`` after qwen36moe-10 showed plan/interjection coverage
was too sparse for Hi Robot-style training."""
interjection_min_t: float = 2.0 interjection_min_t: float = 2.0
# Visual context attached to the interjection prompt: a short window
# of frames centered on the chosen timestamp so the VLM sees the
# ongoing motion rather than a single frozen frame.
interjection_window_seconds: float = 2.0 interjection_window_seconds: float = 2.0
"""How many seconds of video to attach to the interjection prompt as
visual context. Without this the VLM only sees a single frozen frame
and writes generic interjections that aren't grounded in the actual
motion happening at the chosen timestamp."""
interjection_window_frames: int = 4 interjection_window_frames: int = 4
"""How many frames to sample over ``interjection_window_seconds``.
Default 4 ⇒ ~0.5 fps over the leading 2 seconds — enough for the
model to read the ongoing motion, cheap enough to keep prompt size
bounded for the 32k context."""
@dataclass @dataclass
class Module3Config: class Module3Config:
"""Module 3 hyperparameters: general VQA.""" """Module 3: general VQA."""
enabled: bool = True enabled: bool = True
vqa_emission_hz: float = 1.0 vqa_emission_hz: float = 1.0
@@ -122,118 +90,82 @@ class Module3Config:
class VlmConfig: class VlmConfig:
"""Shared Qwen-VL client configuration.""" """Shared Qwen-VL client configuration."""
# One of ``vllm``, ``transformers``, ``openai``, or ``stub`` (tests).
# ``openai`` talks to a local OpenAI-compatible server; the CLI
# auto-spawns one when ``auto_serve=True``.
backend: str = "openai" backend: str = "openai"
"""One of ``vllm``, ``transformers``, ``openai``, or ``stub`` (tests only).
Default ``openai`` talks to a local OpenAI-compatible server (vllm /
transformers) which the CLI auto-spawns when ``auto_serve=True``."""
model_id: str = "Qwen/Qwen2.5-VL-7B-Instruct" model_id: str = "Qwen/Qwen2.5-VL-7B-Instruct"
api_base: str = "http://localhost:8000/v1"
"""Base URL for the ``openai`` backend."""
api_key: str = "EMPTY"
"""API key for the ``openai`` backend; ``EMPTY`` works for local servers."""
auto_serve: bool = True
"""When True with ``backend=openai``, the CLI probes ``api_base``
first; if no server answers, it spawns one (default:
``transformers serve``), waits for it to be ready, runs the
pipeline, and tears it down on exit. Default ``True`` so a single
``lerobot-annotate`` call can drive the whole flow. Set to ``False``
if you want to fail fast when no server is reachable (e.g. you're
pointing at a remote endpoint that should already be up)."""
serve_port: int = 8000
"""Port the auto-spawned server binds to. Sets ``api_base`` automatically."""
serve_command: str | None = None
"""Override the auto-serve command (full shell command). When ``None``,
we run ``transformers serve <model_id> --port <serve_port> --continuous-batching``.
When ``parallel_servers > 1``, the literal ``{port}`` placeholder in # OpenAI-compatible server endpoint; ``EMPTY`` works for local servers.
this command (if present) is substituted per-replica.""" api_base: str = "http://localhost:8000/v1"
api_key: str = "EMPTY"
# When True with ``backend=openai``, the CLI probes ``api_base`` and
# spawns a server if none answers (default: ``transformers serve``).
# Set to False to fail fast when pointing at a remote endpoint.
auto_serve: bool = True
serve_port: int = 8000
# Override the auto-serve command. ``{port}`` is substituted per replica
# when ``parallel_servers > 1``.
serve_command: str | None = None
# Run multiple independent inference servers for round-robin client
# routing (each pinned to a GPU via ``CUDA_VISIBLE_DEVICES`` and bound
# to ``serve_port + i``). ``num_gpus=0`` means one GPU per replica.
parallel_servers: int = 1 parallel_servers: int = 1
"""When >1, spawn this many independent inference servers (each pinned
to a GPU via ``CUDA_VISIBLE_DEVICES`` and listening on
``serve_port + i``) and round-robin client requests across them.
Useful when DP/TP NCCL setup is broken on the node — single-GPU
replicas don't need cross-GPU communication. When
``parallel_servers > num_gpus``, replicas are round-robin-assigned
to GPUs (e.g. 4 replicas on 2 GPUs → 0,1,0,1)."""
num_gpus: int = 0 num_gpus: int = 0
"""How many physical GPUs are available for round-robin replica
placement. ``0`` means ``parallel_servers`` (one GPU per replica,
backward-compatible default). Set this to ``2`` with
``parallel_servers=4`` to pack 2 replicas per GPU."""
client_concurrency: int = 16 client_concurrency: int = 16
"""Maximum number of in-flight chat requests the client issues in
parallel. vllm batches them internally for free, so bumping this
typically gives big throughput wins on a single TP=1 server. Set to
``1`` for strict serial calls."""
serve_ready_timeout_s: float = 600.0 serve_ready_timeout_s: float = 600.0
"""Max seconds to wait for the server to start serving requests."""
max_new_tokens: int = 512 max_new_tokens: int = 512
temperature: float = 0.2 temperature: float = 0.2
json_mode: bool = True json_mode: bool = True
batch_size: int = 4 batch_size: int = 4
tensor_parallel_size: int = 1 tensor_parallel_size: int = 1
# Fraction of GPU memory vllm allocates for weights + KV cache.
gpu_memory_utilization: float = 0.9 gpu_memory_utilization: float = 0.9
"""Fraction of GPU memory vllm allocates for weights + KV cache. # Cap context length (None = model default). On 80 GB H100 a 30B BF16
Lower (e.g. 0.7) when the vision encoder needs cuDNN workspace, or to # model often needs <= 8192 to leave KV-cache headroom.
avoid CUDNN_STATUS_NOT_INITIALIZED on tight VRAM (30B BF16 on 80 GB)."""
max_model_len: int | None = None max_model_len: int | None = None
"""Cap context length. ``None`` keeps the model's default; on H100 80 GB
a 30B BF16 model often needs ``max_model_len=8192`` or smaller to leave
room for KV cache."""
trust_remote_code: bool = False trust_remote_code: bool = False
"""Pass ``trust_remote_code`` to HF auto-classes. Default ``False`` —
only enable for models that actually ship custom code in their repo # Override the camera stream used for keyframe attachment. None picks
(rare for first-class VL releases). On Qwen3-VL it triggers an # the first ``observation.images.*`` key the dataset declares.
std::bad_alloc post-load even though the official transformers class
is sufficient, so leaving this off is safest."""
camera_key: str | None = None camera_key: str | None = None
"""Override the camera stream used for keyframe attachment. ``None`` picks # Forwarded as ``extra_body.chat_template_kwargs`` on every chat call;
the first ``observation.images.*`` key the dataset declares.""" # use to pass model-specific flags such as ``{"enable_thinking": false}``.
chat_template_kwargs: dict[str, Any] | None = None chat_template_kwargs: dict[str, Any] | None = None
"""Forwarded as ``extra_body.chat_template_kwargs`` on every chat call.
Use this to pass model-specific template flags such as
``{"enable_thinking": false}`` for Qwen3.5/Qwen3.6 to suppress the
reasoning preamble that otherwise eats the entire ``max_new_tokens``
budget before any JSON is emitted."""
@dataclass @dataclass
class ExecutorConfig: class ExecutorConfig:
"""Executor selection and SLURM hyperparameters.""" """Executor settings.
auto_threshold: int = 32 Distributed execution is provided by Hugging Face Jobs (see
force_local: bool = False ``examples/annotation/run_hf_job.py``); this config only controls
slurm_partition: str | None = None intra-process episode concurrency.
slurm_gpus: int = 1 """
slurm_time: str = "06:00:00"
workers: int = 1 # Episodes processed concurrently within each module phase. Each
# in-flight episode dispatches 3-5 dependent VLM calls, so this is the
# main knob for saturating ``parallel_servers`` and ``client_concurrency``.
episode_parallelism: int = 16 episode_parallelism: int = 16
"""Number of episodes processed concurrently within each module phase.
Each in-flight episode sends 35 dependent VLM calls; bumping this is
how you actually saturate ``parallel_servers`` and ``client_concurrency``
— without it, the executor loops one episode at a time and the
inference servers sit ~90% idle. Set to ``1`` for strict serial
execution."""
@dataclass @dataclass
class AnnotationPipelineConfig: class AnnotationPipelineConfig:
"""Top-level config for ``lerobot-annotate``. """Top-level config for ``lerobot-annotate``.
Mirrors the structure of :class:`lerobot.configs.train.TrainPipelineConfig`: The writer rewrites ``data/chunk-*/file-*.parquet`` in place. Multiple
a draccus-parsed dataclass that contains nested per-module sub-configs and revisions of the same dataset live in separate copies.
leaves the dataset, executor, and VLM choices independently knobbable.
Output is always in-place: the writer rewrites ``data/chunk-*/file-*.parquet``
in place. Multiple revisions of the same dataset live in separate copies.
""" """
repo_id: str | None = None repo_id: str | None = None
root: Path | None = None root: Path | None = None
# Defaults to ``<root>/.annotate_staging/`` when unset.
staging_dir: Path | None = None staging_dir: Path | None = None
"""If unset, defaults to ``<root>/.annotate_staging/``."""
seed: int = 1729 seed: int = 1729
@@ -247,14 +179,10 @@ class AnnotationPipelineConfig:
skip_validation: bool = False skip_validation: bool = False
only_episodes: tuple[int, ...] | None = None only_episodes: tuple[int, ...] | None = None
# Upload the annotated dataset to the Hugging Face Hub when set.
push_to_hub: str | None = None push_to_hub: str | None = None
"""If set, after the pipeline completes, upload the annotated dataset
root to the Hugging Face Hub as a dataset repo with this id (e.g.
``pepijn/super_poulain_steerable``). Creates the repo if missing."""
push_private: bool = False push_private: bool = False
"""When ``push_to_hub`` is set, create the repo as private."""
push_commit_message: str | None = None push_commit_message: str | None = None
"""Override the commit message used for the hub upload."""
def resolved_staging_dir(self, root: Path) -> Path: def resolved_staging_dir(self, root: Path) -> Path:
return self.staging_dir if self.staging_dir is not None else root / ".annotate_staging" return self.staging_dir if self.staging_dir is not None else root / ".annotate_staging"
@@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Executor selection: local vs SLURM via datatrove. """In-process executor that runs the four annotation phases.
The executor plans **four phases** with the dependency order from the plan: The executor plans **four phases** with the dependency order from the plan:
@@ -25,8 +25,14 @@ The executor plans **four phases** with the dependency order from the plan:
phase 5: validator phase 5: validator
phase 6: writer phase 6: writer
Phase 3 is why ``executor.py`` documents the dependency: Module 1 must be Phase 3 is why Module 1 must be re-entered after Module 2 to refresh
re-entered after Module 2 to refresh ``plan`` rows at interjection times. ``plan`` rows at interjection timestamps.
Distributed execution is provided by Hugging Face Jobs (see
``examples/annotation/run_hf_job.py``); the runner inside the job
invokes ``lerobot-annotate`` which uses this in-process executor.
Episode-level concurrency is controlled by
``ExecutorConfig.episode_parallelism``.
""" """
from __future__ import annotations from __future__ import annotations
@@ -36,7 +42,7 @@ from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
from .config import AnnotationPipelineConfig, ExecutorConfig from .config import AnnotationPipelineConfig
from .reader import EpisodeRecord, iter_episodes from .reader import EpisodeRecord, iter_episodes
from .staging import EpisodeStaging from .staging import EpisodeStaging
from .validator import StagingValidator from .validator import StagingValidator
@@ -63,28 +69,14 @@ class PipelineRunSummary:
validation_report: Any # ValidationReport, kept Any to avoid import cycle validation_report: Any # ValidationReport, kept Any to avoid import cycle
def select_executor_class(num_episodes: int, config: ExecutorConfig) -> str:
"""Return ``"local"`` or ``"slurm"`` based on the threshold.
The plan's "executor selection threshold" lives in
:class:`ExecutorConfig.auto_threshold`. ``force_local`` always wins.
"""
if config.force_local:
return "local"
return "local" if num_episodes <= config.auto_threshold else "slurm"
@dataclass @dataclass
class Executor: class Executor:
"""Run all four phases over a dataset root. """Run all four phases over a dataset root in-process.
The executor is intentionally framework-agnostic: by default it runs the Episode-level concurrency comes from ``ExecutorConfig.episode_parallelism``
phases inline (suitable for tests, small datasets, and the CLI's (a thread pool); cluster-level concurrency comes from running this
``--force-local`` mode). It will optionally hand off to datatrove's executor inside a Hugging Face Job. Tests construct the executor
:class:`LocalPipelineExecutor` or :class:`SlurmPipelineExecutor` when those directly with stub modules.
are installed and the dataset is large enough to benefit from them.
Tests construct the executor directly with stub modules.
""" """
config: AnnotationPipelineConfig config: AnnotationPipelineConfig
@@ -100,8 +92,7 @@ class Executor:
if n == 0: if n == 0:
raise ValueError(f"No episodes found under {root}/data/") raise ValueError(f"No episodes found under {root}/data/")
executor_kind = select_executor_class(n, self.config.executor) print(f"[annotate] {n} episodes total", flush=True)
print(f"[annotate] {n} episodes total; executor={executor_kind}", flush=True)
staging_dir = self.config.resolved_staging_dir(root) staging_dir = self.config.resolved_staging_dir(root)
staging_dir.mkdir(parents=True, exist_ok=True) staging_dir.mkdir(parents=True, exist_ok=True)
@@ -170,11 +161,7 @@ class Executor:
existing = info.get("tools") existing = info.get("tools")
if not isinstance(existing, list): if not isinstance(existing, list):
existing = [] existing = []
names = { names = {(t.get("function") or {}).get("name") for t in existing if isinstance(t, dict)}
(t.get("function") or {}).get("name")
for t in existing
if isinstance(t, dict)
}
merged = list(existing) merged = list(existing)
if SAY_TOOL_SCHEMA["function"]["name"] not in names: if SAY_TOOL_SCHEMA["function"]["name"] not in names:
merged.append(SAY_TOOL_SCHEMA) merged.append(SAY_TOOL_SCHEMA)
@@ -207,8 +194,7 @@ class Executor:
n = len(records) n = len(records)
parallelism = max(1, min(self.config.executor.episode_parallelism, n)) parallelism = max(1, min(self.config.executor.episode_parallelism, n))
print( print(
f"[annotate] phase={name} starting on {n} episode(s) " f"[annotate] phase={name} starting on {n} episode(s) (parallelism={parallelism})",
f"(parallelism={parallelism})",
flush=True, flush=True,
) )
t0 = _time.time() t0 = _time.time()
@@ -226,8 +212,7 @@ class Executor:
_, ep_idx, elapsed = _do((i, record)) _, ep_idx, elapsed = _do((i, record))
processed += 1 processed += 1
print( print(
f"[annotate] {name} episode {i}/{n} " f"[annotate] {name} episode {i}/{n} (idx={ep_idx}) done in {elapsed:.1f}s",
f"(idx={ep_idx}) done in {elapsed:.1f}s",
flush=True, flush=True,
) )
else: else:
@@ -262,15 +247,11 @@ class Executor:
for record in records: for record in records:
staging = EpisodeStaging(staging_dir, record.episode_index) staging = EpisodeStaging(staging_dir, record.episode_index)
interjection_rows = [ interjection_rows = [
row row for row in staging.read("module_2") if row.get("style") == "interjection"
for row in staging.read("module_2")
if row.get("style") == "interjection"
] ]
interjection_times = [float(row["timestamp"]) for row in interjection_rows] interjection_times = [float(row["timestamp"]) for row in interjection_rows]
interjection_texts = [str(row.get("content") or "") for row in interjection_rows] interjection_texts = [str(row.get("content") or "") for row in interjection_rows]
if interjection_times: if interjection_times:
self.module_1.run_plan_updates( self.module_1.run_plan_updates(record, staging, interjection_times, interjection_texts)
record, staging, interjection_times, interjection_texts
)
processed += 1 processed += 1
return PhaseResult(name="module_1_plan_update", episodes_processed=processed, episodes_skipped=0) return PhaseResult(name="module_1_plan_update", episodes_processed=processed, episodes_skipped=0)
@@ -26,9 +26,7 @@ episode containing:
- ``frames_df``: pandas.DataFrame slice for the episode (only loaded on demand) - ``frames_df``: pandas.DataFrame slice for the episode (only loaded on demand)
This shape lets each module operate per-episode without loading all parquet This shape lets each module operate per-episode without loading all parquet
rows into memory at once. It deliberately does not depend on datatrove rows into memory at once.
datatrove integration wraps this generator inside a ``PipelineStep`` in
:mod:`.executor`.
""" """
from __future__ import annotations from __future__ import annotations
+3 -4
View File
@@ -16,16 +16,15 @@
"""``lerobot-annotate`` — populate ``language_persistent`` and """``lerobot-annotate`` — populate ``language_persistent`` and
``language_events`` columns on a LeRobot dataset. ``language_events`` columns on a LeRobot dataset.
Annotations live directly in ``data/chunk-*/file-*.parquet``: there is no Annotations live directly in ``data/chunk-*/file-*.parquet``.
flavor namespace and no sidecar tree. Multiple revisions of the same dataset
mean multiple dataset copies.
Example: Example:
uv run lerobot-annotate \\ uv run lerobot-annotate \\
--root=/path/to/dataset \\ --root=/path/to/dataset \\
--vlm.backend=transformers \\
--vlm.model_id=Qwen/Qwen2.5-VL-7B-Instruct --vlm.model_id=Qwen/Qwen2.5-VL-7B-Instruct
For distributed runs, see ``examples/annotation/run_hf_job.py``.
""" """
import logging import logging
Generated
+1397 -137
View File
File diff suppressed because it is too large Load Diff