Remove Qwen2VL, Qwen3VL, Qwen35VL in favor of one QwenVL class that
uses AutoModelForImageTextToText and works with the whole Qwen VL
family. Moves shared _parse_skills_response to BaseVLM and extracts
_build_messages/_prepare_inputs/_decode helpers to reduce duplication.
Made-with: Cursor
The Qwen3VLProcessor distributes kwargs to sub-processors via
_merge_kwargs. Flat kwargs like video_metadata and do_sample_frames
were not reaching the video processor, causing fps to default to 24
and producing shape mismatches.
Pass these kwargs explicitly under videos_kwargs so they reach
Qwen3VLVideoProcessor directly. Revert Qwen2VL to its simpler
original approach since its processor doesn't use videos_kwargs.
Made-with: Cursor
The Qwen3.5 processor requires video_metadata as a separate parameter,
not embedded in the video tensors. Use return_video_metadata=True from
process_vision_info, then unpack the (tensor, metadata) tuples into
separate videos and video_metadata lists for the processor call.
Made-with: Cursor
The return_video_metadata=True approach causes 'list index out of range'
due to (tensor, metadata) tuple format issues. Since all extracted
videos are at 1fps (ffmpeg -r 1), directly pass fps=1.0 as a scalar
alongside do_sample_frames=False — this gives the processor the exact
fps for position embedding computation without format compatibility
issues across Qwen processor versions.
Made-with: Cursor
The Qwen3.5 processor needs video_metadata (fps, frame indices) to
compute temporal position embeddings. Use return_video_metadata=True
which embeds metadata inside the video tensors as (tensor, metadata)
tuples, and return_video_kwargs=True which returns {'do_sample_frames':
False} without the problematic fps list.
Made-with: Cursor
The Qwen3.5 processor expects fps as a scalar, not a list, so passing
video_kwargs with fps=[...] fails validation. Since process_vision_info
already handles frame sampling, we only need do_sample_frames=False to
tell the processor to use the pre-sampled frames as-is.
Made-with: Cursor
The Qwen processor needs fps metadata (via return_video_kwargs=True)
to compute correct temporal position embeddings. Without it, the
processor defaults to fps=24 regardless of the actual video fps,
causing shape mismatches between expected and actual video tokens.
Made-with: Cursor
Replace manual cv2 frame reading with FORCE_QWENVL_VIDEO_READER=torchvision
env var. The torchvision backend (PyAV) properly reads video metadata and
respects the fps parameter, avoiding the torchcodec fps=24 default issue.
Made-with: Cursor
When torchcodec is installed, qwen-vl-utils ignores the fps parameter
and defaults to 24fps if video metadata is missing, causing shape
mismatches. Fix by reading video frames directly as PIL images and
passing them to the processor, bypassing torchcodec entirely.
Made-with: Cursor
The single-episode `segment_skills` method was missing
`enable_thinking=False` in `apply_chat_template`, causing the model to
output reasoning traces instead of JSON when the batch path fails and
falls back to per-episode processing.
Made-with: Cursor
Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.
* chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label
* chore(task): renamming the default index label in the tasks DataFrame to task
* Revert "chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label"
This reverts commit f55de3255278f23f18b5d955565f6768d094951d.
* chore(docstrings): updating docstrings to match dataset v3.0 architecture
* chore(format): formatting code
* Fixing metadata indexing when writing new Parquet file
Summary:
- addressing this issue: https://github.com/huggingface/lerobot/issues/2401
- vibe-coded bugfix by Claude Sonnet 4.5
* Backing out changes to convert_videos_of_camera
* Addressing Ruff pre-commit complaint
Summary:
- addressing "SIM113 Use `enumerate()` for index variable `ep_idx` in `for` loop"
---------
Co-authored-by: Paul <238953601+pac-robotics@users.noreply.github.com>
* fix(root): adding proper support for the root and new_root arguments
* feat(roots): adding a roots agrument for the merge operation
* chore(clean): cleaning up code
* chore(doctrings): updating doctrings with new features
* fix(repo_id): setting repo_id to None when not needed
* fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation
* fix(path): fixing path related issues
* fix(repo_id): fixing issues related to repo_id
* chore(doctrings): updating docstrings + fix typo
* chore(clean): cleaning code
* fix(split new_repo_id): reverting new_repo_id addition for split operation
* docs(dosctrings): completing docstrings
* fix(repo_ids/roots): improving checks for repo_ids/roots lengths
* fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given
* fix(docstrings): fixing docstrings for split operation
* fix(hints): updating get_output_path hints to accept paths as strings too
* fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset
* fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id
* fix(typo): fixing typo in doctrings
* fix(frame_index): making rerun's "frame_index" timeline compatible with behaviour1k datasets
* fix(segfault risk): removing segfault risk by calling batch["index"] in the dataloader loop
* feat(async-inference) Try using async inference server with plugins
* Fix import
* Fix import error in Robot Client
---------
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* Fix SmolVLA meta tensor error by removing device_map
- Remove device_map parameter from VLM model loading
- Change torch_dtype from string to torch.bfloat16
- Add explicit .to(device) calls after initialization
This resolves NotImplementedError when training SmolVLA policy.
Fixes meta tensor copy issue in factory.py:418.
* fix: remove manual device movement logic and fix dtype handling
---------
Co-authored-by: Highsky7 <albert31115@gmail.com>
* add OpenArm Mini config and module init
* add OpenArm Mini teleoperator implementation
* add OpenArm Mini into factory and setup motors
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Replaced assert statements with FrameTimestampError exceptions in
decode_video_frames_torchvision and decode_video_frames_torchcodec.
Assertions are unsuitable for runtime validation because they can be
silently disabled with python -O, and they produce unhelpful
AssertionError tracebacks. The codebase already defines
FrameTimestampError for this exact purpose but it was only used
in one of the three validation sites.
Also removed AssertionError from the except clause in
LeRobotDataset.__init__, which was masking video timestamp errors
by silently triggering a dataset re-download instead of surfacing
the actual problem.
1. Include metaworld_config.json in package distributions by adding it to
both MANIFEST.in (for sdist) and pyproject.toml package-data (for wheels).
Without this, pip-installed lerobot raises FileNotFoundError when
importing the metaworld environment.
2. Fix crash in sanity_check_dataset_name where the error message accesses
policy_cfg.type when policy_cfg is None, raising AttributeError instead
of the intended ValueError.
Fixes#2958
* fix(dataset): Reindex videos based on frame and not on time
Sometimes during split operations the frame timestamp floating
precision leads to frame ending up in the wrong split.
This changes fixes the issues by directly working with frame indices
instead.
* Fix formatting
* feat(motors): add initial implementation of robstride
Co-authored-by: Virgile <virgilebatto@gmail.com>
* chore(motors): solve some linter
* remove kp/kd attribute
* code uniformisation between damiao and robstride
* remove normalization warning
* remove non valid baudrates and small docstring update
* remove all useless files. Only keeping robstride.py and table.py
* typing for mypy
* reduce NameOrId usage
* align signature with damiao
* put the same helper than in the damiao implementation
* bug correction : expect a response after each bus.send
---------
Co-authored-by: Virgile <virgilebatto@gmail.com>
* Add GymHILAdapterProcessorStep for gym-hil environment integration
* Fix action features in control loop for None teleop device with gym-hil
* Finalize dataset before pushing to hub for visualization on the hub
* Fix neutral action for gripper
* fix pre-commit