lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-16 00:59:46 +00:00

Author	SHA1	Message	Date
Pepijn	1927077fea	fix(annotate): transcode subclips to H.264 instead of stream-copy Modern LeRobot datasets store videos in AV1, which vllm's libav build cannot decode (the video processor returns 0 frames and downstream chokes with ZeroDivisionError). Re-encode each per-episode subclip with libx264 (preset ultrafast, crf 23) so the resulting mp4 is universally decodable. Strip audio with -an for a smaller payload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 17:00:39 +02:00
Pepijn	327ce89423	feat(annotate): pack multiple vllm replicas per GPU via num_gpus Adds VlmConfig.num_gpus so parallel_servers can exceed the physical GPU count. Replicas are round-robin-assigned to GPUs (e.g. parallel_servers=4 + num_gpus=2 → replicas pinned to GPUs 0,1,0,1). Backward-compatible: num_gpus=0 keeps the existing 1-replica-per-GPU behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:11:51 +02:00
Pepijn	b56cfe7eb9	feat(annotate): forward chat_template_kwargs to OpenAI extra_body Lets callers pass per-request template flags such as {"enable_thinking": false} for Qwen3.5/Qwen3.6 models, where the default thinking preamble otherwise consumes the entire max_new_tokens budget before any JSON is emitted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:00:23 +02:00
Pepijn	e06a7c79f7	fix(annotate): include prompt .txt files in wheel The setuptools package-data declaration only listed envs/*.json, so pip-installed wheels (including HF Jobs runs) were missing the module_1_subtasks/plan/memory and module_2/3 prompt templates, causing FileNotFoundError at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 11:47:03 +02:00
Pepijn	72d0fc0dce	refactor(annotate): drop HF Inference Providers code path Default backend is now a local OpenAI-compatible server (vllm / transformers) which auto_serve spawns. Removes the use_hf_inference_providers config flag and the router.huggingface.co routing branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:53:08 +02:00
Pepijn	3c6a6b39a2	feat(annotate): --vlm.push_to_hub uploads the annotated dataset After the pipeline completes, optionally create/locate a dataset repo and upload the dataset root (excluding .annotate_staging/). Add push_private and push_commit_message knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:28:38 +02:00
Pepijn	39f6167fa3	feat(annotate): parallelize episodes within each module phase Saturates parallel_servers + client_concurrency. Previously the executor processed one episode at a time, so each Module 1 episode's 3-5 dependent VLM calls hit a single server with the others idle. Now defaults to 16 episodes in flight; configurable via ExecutorConfig.episode_parallelism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:59:02 +02:00
Pepijn	caef184c82	fix(annotate): probe /v1/models for spawn-helper readiness vllm with --uvicorn-log-level warning suppresses the "Uvicorn running" banner that the readiness watcher waited for, so the spawn helper hung forever even after the API was live. Add an HTTP probe in parallel with the log watcher and broaden the log markers to include vllm's own "Starting vLLM API server" / "Available routes are" lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:47:52 +02:00
Pepijn	7bbf5777a2	fix(annotate): lock-protect per-line writes for parallel server streams 8 server-streaming threads writing chars unsynchronized cause UTF-8 sequences from different servers to interleave mid-byte, garbling the terminal output. Switch to line-buffered reads with a single shared print lock — output stays readable, ready-marker detection still works on the line containing 'Uvicorn running' / 'Application startup complete'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:19:37 +02:00
Pepijn	545d7eb713	feat(annotate): client_concurrency for parallel in-flight requests Adds vlm.client_concurrency (default 16) which uses a ThreadPoolExecutor to fan out batched chat.completions calls. vllm batches them internally on the server side, giving big throughput wins on a single TP=1 server without needing DP/TP and the NCCL setup it requires. Module 3 now batches all per-episode VQA calls into a single generate_json invocation so they fire in parallel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:07:48 +02:00
Pepijn	47f2ea17bb	feat(annotate): parallel_servers spawns N independent vllm replicas Adds --vlm.parallel_servers=N. Spawns N independent vllm processes (each pinned to GPU i via CUDA_VISIBLE_DEVICES, listening on serve_port+i) and round-robins requests across them. Sidesteps DP/TP NCCL setup failures on nodes with restricted P2P/SHM. Default serve_command for parallel mode: vllm serve <model_id> --tensor-parallel-size 1 --max-model-len 32768 --uvicorn-log-level warning. Override via --vlm.serve_command (use {port} placeholder). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:06:20 +02:00
Pepijn	5119d22f1f	feat(annotate): per-episode progress logs in executor	2026-04-28 22:56:03 +02:00
Pepijn	916b419af3	fix(annotate): don't crash pipeline on persistent JSON parse failure Some prompts/models occasionally return pure prose with no JSON object even on retry. Returning None (and logging a preview) lets the pipeline skip that one VLM call cleanly instead of aborting the whole episode. The modules already check for None / non-dict results and degrade gracefully (no row emitted from that call). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:33:54 +02:00
Pepijn	7c10c4fcdd	fix(annotate): robust JSON extraction (think tags + first balanced object) Models often wrap JSON in prose or <think>...</think> blocks. Strip the think tags first, then try direct json.loads, then fall back to scanning for the first balanced {...} substring (ignoring braces inside strings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:15:25 +02:00
Pepijn	421e84497b	fix(annotate): stream child stdout char-by-char so tqdm \\r progress flushes	2026-04-28 21:58:12 +02:00
Pepijn	9d38477728	test(annotate): adjust video-block test for fps-based frame sampling	2026-04-28 19:49:08 +02:00
Pepijn	b895e3b057	feat(annotate): Module 1 samples image frames at fps rate Replace the fixed max_video_frames count with a rate (default 1 fps). A 30 s episode now sends 30 frames; a 5 s episode sends 5; capped at max_video_frames (default 128) to avoid blowing up the payload on long episodes. Override with --module_1.frames_per_second=2.0 for denser sampling, or --module_1.frames_per_second=0.5 for sparser. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:48:25 +02:00
Pepijn	a8aa6b08ba	feat(annotate): use cached HF token from huggingface-cli login Fall back to huggingface_hub.get_token() when HF_TOKEN/HUGGINGFACE_API_KEY env vars aren't set. That picks up the token cached by 'huggingface-cli login' so users don't need to export it on every shell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:36:24 +02:00
Pepijn	4ac6c58ab1	feat(annotate): default to HF Inference Providers, no local GPU needed Flip the default backend to 'openai' with use_hf_inference_providers=True and a Qwen3-VL-30B-A3B-Instruct:novita default model_id. The CLI now runs end-to-end without a local model load — annotations are produced by sending video_url + prompt to https://router.huggingface.co/v1. Switch back to local inference with --vlm.backend=vllm or --vlm.use_hf_inference_providers=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:33:34 +02:00
Pepijn	d5559a9445	feat(annotate): one-flag HF Inference Providers backend Setting --vlm.use_hf_inference_providers=true routes requests through https://router.huggingface.co/v1 using HF_TOKEN as the API key, and disables auto_serve so no local server is spawned. Combine with a provider-pinned model id like 'Qwen/Qwen3-VL-30B-A3B-Instruct:novita' or any plain model id to let HF route. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:28:13 +02:00
Pepijn	7a7b8ac111	fix(annotate): omit mm_processor_kwargs by default; transformers serve rejects it transformers serve returns HTTP 422 'Unexpected fields' when mm_processor_kwargs is in extra_body — that field is vllm-specific. Drop it by default; opt in via LEROBOT_OPENAI_SEND_MM_KWARGS=1 when talking to vllm serve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:11:58 +02:00
Pepijn	504bad6342	fix(annotate): mm_processor_kwargs in extra_body; inline file URLs as data URLs Two fixes for video_url with transformers serve: - fps must be in extra_body.mm_processor_kwargs, not in the content block; otherwise the server discards it as unknown kwargs. - file:// URLs aren't fetched by transformers serve. Read the local mp4 and inline it as a base64 data:video/mp4 URL so the server sees the bytes directly. Both surface as std::bad_alloc on the server side when wrong, which is unhelpful but explains what we hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 18:53:43 +02:00
Pepijn	aded6214ab	fix(annotate): detect server ready via stdout banner, not /v1/models polls transformers serve rescans the HF cache on every /v1/models request which exceeds the 2s urllib timeout, leaving the probe loop spinning even after Uvicorn is fully up. Watch the streamed server output for 'Uvicorn running' / 'Application startup complete' instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:43:47 +02:00
Pepijn	e70277ba3e	fix(annotate): visible auto_serve via stdout prints + live server log stream The previous logger-based output never appeared, leaving users in the dark when auto_serve silently no-op'd. Switch to print(flush=True) so the spawn decision is unmistakable, and stream the server's stdout to the parent terminal in real-time on a background thread so model-load progress and errors surface immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:34:36 +02:00
Pepijn	4930338c52	fix(annotate): auto_serve defaults to True; probe before spawning Default auto_serve to True so lerobot-annotate can drive the entire flow with one command. Probe api_base/models first — if a server is already reachable (user started one manually, or it's a remote endpoint), skip the spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:14:48 +02:00
Pepijn	55879e4fb4	feat(annotate): auto_serve mode spawns and tears down inference server Setting --vlm.auto_serve=true with --vlm.backend=openai makes the CLI launch 'transformers serve <model_id> --port <serve_port> --continuous-batching' as a child process, poll /v1/models until ready (up to serve_ready_timeout_s), run the pipeline, then SIGINT the server on process exit. Override the spawn command with --vlm.serve_command='vllm serve ...' or any OpenAI-compatible launcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:39:50 +02:00
Pepijn	0b2f0d1d6a	feat(annotate): video_url block for openai backend Module 1 can now send the episode's actual mp4 file as a video_url content block instead of pre-decoded frames. The server (transformers serve / vllm serve / ktransformers serve) handles frame sampling at the configured fps. Default fps=1 (one frame per second is enough for subtask-boundary detection on manipulation episodes). A per-episode subclip is extracted to <root>/.annotate_staging/.video_clips/ via ffmpeg stream-copy (no re-encode) so the model sees only this episode's frames, not the whole shard. Enable with --module_1.use_video_url=true (and --vlm.backend=openai). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:26:30 +02:00
Pepijn	a27972125b	feat(annotate): openai-compatible backend for transformers/ktransformers serve Adds a third backend that talks to any OpenAI-compatible server. This unblocks Qwen3.6 (and other models) that work in transformers serve / ktransformers but not in vllm 0.10.2's fallback path: - launch the server out-of-process (transformers serve, vllm serve, ktransformers serve) - point lerobot-annotate at it via --vlm.backend=openai --vlm.api_base=http://localhost:8000/v1 --vlm.model_id=... Image and video blocks are converted to OpenAI image_url/video_url data URLs automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:24:32 +02:00
Pepijn	70bdec72ef	fix(annotate): use vllm.chat() API for multimodal prompts vllm.generate() expects a string/TextPrompt; passing message dicts fails. vllm.chat() applies the chat template and extracts image/video blocks automatically, which is what we need for VL models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:20:50 +02:00
Pepijn	de50eabd3f	fix(annotate): drop guided_decoding=dict (api differs across vllm) vllm 0.10.2 expects guided_decoding to be a GuidedDecodingParams object, not a dict. Different vllm versions differ here. The parser already has a one-retry JSON-recovery path, so drop guided decoding entirely for portability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:14:31 +02:00
Pepijn	23845218b6	fix(annotate): tolerate decoder returning fewer frames than requested pyav (and sometimes torchcodec) decode can return fewer frames than requested timestamps when some timestamps fall outside the video file's content range. Drop the strict=True on the zip and rely on the None-filter to discard missing frames. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:21:38 +02:00
Pepijn	01fc975eb5	fix(annotate): default video decode backend to pyav torchcodec's __init__ bad-allocs on the cu128/torch-2.8 stack in some environments (Lustre/conda combos). The annotation pipeline calls decode_video_frames many times per episode, so this is a hard blocker. Default to pyav (always available via the av package) and let users opt back into torchcodec via LEROBOT_VIDEO_BACKEND=torchcodec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:10:57 +02:00
Pepijn	fc4f6d2502	fix(annotate): default trust_remote_code=False for HF loaders Setting trust_remote_code=True unconditionally pulled custom loader code that triggers std::bad_alloc post-load on Qwen3-VL — the official transformers class is sufficient. Flip the default to False; keep the config field so users can opt in for models that actually need it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:39:55 +02:00
Pepijn	e21996f23b	fix(annotate): default transformers backend to manual GPU placement Loading Qwen3-VL via transformers + accelerate's device_map='auto' fails with std::bad_alloc on hosts with abundant RAM. The bug is in accelerate's post-load dispatch path. Bypassing accelerate by loading to CPU first and then calling .to('cuda') manually avoids that path. LEROBOT_TRANSFORMERS_DEVICE_MAP=auto switches back to the old behavior for cases where it works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:27:19 +02:00
Pepijn	10fa65a996	fix(annotate): LEROBOT_DISABLE_CUDNN escape hatch for conv3d crash cuDNN 9.x + torch 2.8 has a regression where the conv3d kernel used in Qwen-VL vision tower patch embedders fails with CUDNN_STATUS_NOT_INITIALIZED. The crash is independent of model size and reproduces on both Qwen2.5-VL and Qwen3-VL because both use 3D conv for video patch embedding. Setting LEROBOT_DISABLE_CUDNN=1 falls back to native PyTorch conv3d kernels (slower but functional) so the pipeline can run while the torch/cuDNN stack is still on the broken combo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 12:05:00 +02:00
Pepijn	8f125a5ec1	fix(annotate): expose gpu_memory_utilization and max_model_len for vllm Large VL models (Qwen3-VL-30B-A3B BF16) take ~58 GB of an 80 GB H100, leaving only ~22 GB for KV cache + cuDNN workspace. The vision tower's 3D conv then fails with CUDNN_STATUS_NOT_INITIALIZED because cuDNN can't grab a workspace large enough. - vlm.gpu_memory_utilization (default 0.9) — drop to 0.7 when the vision encoder needs more cuDNN workspace. - vlm.max_model_len — cap context to free KV cache memory; the 262k default for Qwen3 is wildly more than annotation prompts need. - vlm.trust_remote_code — already plumbed; now also passed to LLM(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 11:29:22 +02:00
Pepijn	a81e23b0e9	fix(annotate): pass trust_remote_code=True to HF auto-classes Required for many newer VL checkpoints (Qwen3.x FP8 in particular) that ship custom loader code in their repo. Without it, the FP8 weight_scale_inv parameters never bind to FP8Linear modules and the post-load dispatch path bad-allocs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 09:44:15 +02:00
Pepijn	a2bacb2f76	fix(annotate): low_cpu_mem_usage=True on transformers load path The std::bad_alloc we hit on Qwen3-line VL models is not a real OOM — it triggers in the post-load tensor-placement path even on hosts with 2 TB RAM. low_cpu_mem_usage=True bypasses the offending intermediate staging buffer and is the standard accelerate workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 23:29:20 +02:00
Pepijn	f8c00d9ca5	fix(annotate): use device_map='auto' for transformers backend Without device_map, transformers stages the full FP8 checkpoint in CPU RAM before any GPU placement, OOMing the host on 27B+ models even when the GPU has enough VRAM. device_map='auto' streams shards directly to GPU memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 23:05:14 +02:00
Pepijn	de104936bf	fix(annotate): try AutoModelForImageTextToText first, fall back to AutoModelForVision2Seq Newer transformers versions renamed/removed AutoModelForVision2Seq in favour of AutoModelForImageTextToText for VL models. Try the new name first and fall back gracefully so the transformers backend works on both transformers 4.45-4.5x and 5.x. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 22:32:56 +02:00
Pepijn	d170402075	fix(annotate): replace Literal types with str for older draccus Older draccus versions (e.g. 0.10.x bundled in some envs) lack a decoder for typing.Literal and raise: No decoding function for type typing.Literal['vllm', 'transformers', 'stub'] Switching VlmConfig.backend from Literal to str works under every draccus version. The runtime branch in vlm_client.make_vlm_client already validates the value and raises ValueError on unknown backends, so the constraint stays enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 18:26:57 +02:00
Pepijn	79ca79cba2	feat(annotate): Module 1 sees the whole episode as one video block Replaces keyframe sampling with a single Qwen-VL video block covering the whole demonstration. The model pools temporally itself and chooses where to cut subtasks — no stride, no count, no keyframe count knob to tune. - frames.py: ``FrameProvider`` gains ``video_for_episode(record, max_frames)``; ``VideoFrameProvider`` samples up to ``max_frames`` uniformly across the episode duration; ``_NullProvider`` returns [] for the no-video fallback. New ``to_video_block`` helper. - Module 1: drops keyframe sampling. The subtask prompt now goes out as ``[{"type":"video", "video":[<frames>]}, {"type":"text", ...}]`` and the prompt template asks the model to "watch the whole clip, then segment it" with cut points decided from gripper/contact/regrasp events the model sees. - Module1Config: ``keyframes_per_episode`` removed; replaced with ``max_video_frames: int = 32`` (model-capacity bound, not annotation logic). - Test: ``test_module1_attaches_video_block_to_subtask_prompt`` locks in the single-video-block invariant. - Stub-VLM markers updated: tests now key on "atomic subtasks" instead of the old "Decompose the demonstration" phrase that no longer appears in the prompt. - Docs: updated to describe the whole-episode video-block behavior and the no-video fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:08:36 +02:00
Pepijn	80b7708a61	feat(annotate): attach camera keyframes to module prompts; default to Qwen3.6-27B-FP8 Closes the visual-grounding gap flagged after the initial PR review: modules now decode actual camera frames at the relevant timestamps and attach them as `{"type":"image", "image":<PIL>}` content blocks to the VLM prompts. - New `frames.py`: - `FrameProvider` Protocol; `VideoFrameProvider` decodes from the dataset's first `observation.images.*` stream via `LeRobotDatasetMetadata.get_video_file_path` and `decode_video_frames`, with the same `from_timestamp` shift the main dataset uses. - Per-process LRU cache so co-timestamped Module 1 plan-update + Module 2 calls share decode work. - `make_frame_provider` falls back to a null provider when the dataset has no video tracks → text-only prompts (graceful absence). - Modules 1/2/3 take an optional `frame_provider` (default null) and prepend image blocks before the text block. - Module 1 attaches `keyframes_per_episode` keyframes to the subtask decomposition prompt. - Module 2 attaches the frame at the interjection timestamp. - Module 3 attaches the exact emission frame to each VQA pair. - VlmConfig: backend now defaults to `vllm`; default model is `Qwen/Qwen3.6-27B-FP8`. New knobs: `--vlm.tensor_parallel_size`, `--vlm.camera_key` (override the keyframe stream). - `_make_vllm_client` honours `tensor_parallel_size` so 27B-FP8 sharded on 2× GPUs works out of the box. - `test_module3_attaches_frame_image_block_to_prompt` asserts modules emit one image block per VQA prompt at the exact emission timestamp. - Docs: example switched to `imstevenpmwork/super_poulain_draft` + Qwen3.6-27B-FP8 + tensor_parallel_size=2; documents the keyframe attachment behaviour and the no-video fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:58:45 +02:00
Pepijn	a635a32290	feat: language annotation pipeline (PR 2/3) Adds the steerable annotation pipeline (`lerobot-annotate`) that populates the `language_persistent` and `language_events` columns introduced in PR 1 directly into `data/chunk-/file-.parquet`. No flavor namespace, no sidecar tree. Modules produced: - Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init + refresh on interjection), MEM-style memory at subtask boundaries. - Module 2 (interjections_and_speech): t=0 speech-only acknowledgement, mid-episode paired interjection + speech tool-call atom. - Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at configurable cadence with one-retry JSON validation. Writer enforces: per-episode persistent identity, exact-frame event timestamps, column routing per `column_for_style`, dataset-level `tools` column with the `say` schema, drops legacy `subtask_index`. Validator runs against staged JSONL artifacts before the writer rewrites parquet. Adds `lerobot-annotate` console script, `annotations` extra (datatrove + optional vllm), `make annotation-e2e` opt-in smoke target, and `docs/source/annotation_pipeline.mdx`. Branched from PR 1 (`feat/language-columns`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:22:51 +02:00
Pepijn	0b06790da0	feat(language): add motion (persistent) and trace (event-only) styles Promote the previously-reserved motion/trace styles to first-class core styles. motion routes to language_persistent (it tracks robot state over time); trace routes to language_events (single-moment annotations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:21:49 +02:00
Pepijn	b43dc39ba4	Add docstrings to all new helpers; revert uv.lock Covers private helpers in recipe.py, language.py, language_render.py, and render_messages_processor.py. Also reverts uv.lock to main (it was re-generated by `uv run` during local checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:15:03 +02:00
Pepijn	2b71221194	Address review: split persistent/event schemas, drop event timestamps - recipe.py: derive _VALID_ROLES/_VALID_STREAMS from MessageRole/MessageStream Literals - dataset_metadata.py: keep CODEBASE_VERSION at v3.0 - language.py: remove RESERVED_STYLES; split arrow/feature schemas into persistent (with timestamp) and event (without timestamp); add docstrings - language_render.py: events use frame-row timestamp implicitly; no per-event timestamp filtering or sorting - converters.py: drop unused subtask_key passthrough - add docstrings to new public APIs (recipe, render_messages_processor, collate) - update tests for split schemas; revert uv.lock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:38:23 +02:00
Pepijn	8833d735a1	Add extensive language support	2026-04-27 10:56:32 +02:00
Pepijn	ba27aab79c	fix(robotwin): pin compatible curobo in benchmark image (#3427 ) * fix(robotwin): pin compatible curobo in benchmark image * fix(robotwin): make curobo smoke check gpu-free	2026-04-21 19:51:44 +02:00
Pepijn	5adad11128	feat(sim): VLABench benchmark integration (#3396 ) feat(sim): add VLABench benchmark integration Add VLABench as a new simulation benchmark in LeRobot, following the existing LIBERO and MetaWorld patterns. This PR wires VLABench end-to-end across environment integration, Docker setup, CI smoke evaluation, and documentation. It also fixes a number of upstream packaging and runtime issues required to make VLABench usable and reproducible in CI. What’s included Benchmark integration Add VLABench as a new simulation benchmark. Expose supported VLABench tasks through the LeRobot env interface. Follow the established LIBERO / MetaWorld factory patterns. Preserve lazy async-env metadata so env.unwrapped.metadata["render_fps"] continues to work. CI smoke evaluation Add a VLABench smoke-eval job using lerobot/smolvla_vlabench. Use the correct rename_map for the 3-camera dataset layout. Expand smoke coverage from 1 to 10 primitive tasks. Extract task descriptions after eval so metrics artifacts include per-task labels. Skip Docker Hub login when secrets are unavailable (e.g. fork PRs). Docker / install fixes Install VLABench from GitHub rather than PyPI. Use uv pip, not pip, in the base image. Fail loudly on install errors instead of masking them. Clone VLABench into the non-root user’s home directory. Use shallow editable installs for VLABench and rrt-algorithms to work around missing __init__.py issues. Pin upstream clones to exact commit SHAs for reproducibility. Add undeclared runtime dependencies required by VLABench (open3d, colorlog, scikit-learn, openai). Unpin open3d so Python 3.12 wheels resolve. Assets Support downloading VLABench assets from a Hugging Face Hub mirror via VLABENCH_ASSETS_REPO. Keep Google Drive download support as fallback. Install huggingface_hub[hf_xet] so Xet-backed assets download correctly. Validate required mesh/XML asset subtrees at build time. Patch VLABench constants to tolerate missing asset directories at import time. Runtime / env correctness Import VLABench robots and tasks explicitly so decorator-based registry population happens. Resize and normalize camera observations so they always match the declared (H, W, 3) uint8 observation space. Reinstall LeRobot editably inside the image so the new env code is actually used. Coerce agent_pos / ee_state to the expected shape. Pad actions when needed to match data.ctrl. Replace zero-padding fallback with proper dm_control IK for 7D end-effector actions. Refetch dm_control physics on each step instead of caching weakrefs. Retry unstable resets with reseeding and handle PhysicsError gracefully at step time. Dataset / policy alignment Align VLABench observations and actions with Hugging Face dataset conventions used by lerobot/vlabench_unified: convert EE position between world frame and robot-base frame at the env boundary, expose / consume Euler XYZ instead of raw quaternion layout, align gripper semantics with dataset convention (1 = open, 0 = closed). This fixes policy/env mismatches that previously caused incorrect IK targets and unstable behavior at evaluation time. Docs Add a full docs/source/vlabench.mdx page aligned with the standard benchmark template. Document task selection forms (single task, comma list, suite shortcut). Document installation, evaluation, training, and result reproduction. Point examples at lerobot/smolvla_vlabench. Add a benchmark banner image. Remove outdated / misleading references to upstream evaluation tracks. Document manual install flow instead of a broken vlabench extra. Packaging cleanup Remove the unresolvable vlabench extra from pyproject.toml. Remove the no-op VLABench processor step. Remove the obsolete env unit test that only covered the dropped gripper remap helper. Apply formatting / logging / style cleanup from review feedback. Why this is needed VLABench is not currently consumable as a normal Python dependency and requires several upstream workarounds: no PyPI release, missing package declarations, undeclared runtime deps, SSH-only submodule references, asset downloads outside normal package install flow, registry population that depends on import side effects, env outputs that do not always match declared observation shapes, task resets that can diverge under some random layouts. This PR makes the benchmark usable in LeRobot despite those constraints, and ensures CI runs are reproducible and informative. If you want a much shorter squash commit message, I’d use this: feat(sim): integrate VLABench benchmark with CI, Docker, and docs Add VLABench as a new LeRobot simulation benchmark, following the existing LIBERO / MetaWorld patterns. This includes: LeRobot env integration and task exposure, CI smoke eval with lerobot/smolvla_vlabench, Docker install and asset-download fixes, runtime fixes for registry loading, assets, camera obs, action handling, dm_control IK, and PhysicsError recovery, alignment of obs/action semantics with HF VLABench datasets, docs and packaging cleanup. The PR also incorporates review feedback, improves reproducibility by pinning upstream commits, and makes VLABench usable in CI despite upstream packaging and asset-management issues.	2026-04-21 17:54:11 +02:00

1 2 3 4 5 ...

1455 Commits