chore(deps): allow torch 2.11/2.12 and fix autocast deprecation (#3435 )

* chore(deps): allow torch 2.11/2.12 and fix autocast deprecation - Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26), and torchcodec to <0.13 (was <0.11) to allow installs against the latest stable torch 2.11 and the upcoming 2.12 line. - Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda") in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+). - Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130, torchcodec 0.11.1, full CUDA 13 stack). Verified locally with `uv sync --locked` from a clean .venv and the lerobot test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is identical to the pre-bump baseline: 18 pre-existing failures (test_sac_policy*, test_pi0_rtc*, test_pi05_rtc*, test_replay_buffer*), 0 new, 0 fixed. AI assistance: this change was authored with Claude Code per AI_POLICY.md. * fix(policies): use device-agnostic autocast dtype lookup Pass query_states.device.type to torch.get_autocast_dtype() instead of hardcoding 'cuda', so the cast matches the active autocast context when running under CPU/MPS/XPU autocast. --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
chore(dependencies): update uv.lock (#3475 )
2026-05-23 12:40:08 +00:00 · 2026-05-10 13:05:35 +02:00 · 2026-05-10 12:24:22 +02:00 · 2026-05-10 11:49:31 +02:00 · 2026-05-08 20:27:01 +02:00 · 2026-05-07 17:52:34 +02:00
14 changed files with 667 additions and 559 deletions
@@ -382,6 +382,7 @@ jobs:
                --policy.path=\"\$ROBOTWIN_POLICY\" \
                --env.type=robotwin \
                --env.task=\"\$ROBOTWIN_TASKS\" \
+                --env.max_parallel_tasks=5 \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
@@ -482,6 +483,7 @@ jobs:
                --policy.path=lerobot/smolvla_robocasa \
                --env.type=robocasa \
                --env.task=CloseFridge,OpenCabinet,OpenDrawer,TurnOnMicrowave,TurnOffStove,CloseToasterOvenDoor,SlideDishwasherRack,TurnOnSinkFaucet,NavigateKitchen,TurnOnElectricKettle \
+                --env.max_parallel_tasks=5 \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
@@ -693,6 +695,7 @@ jobs:
                --env.task=\"\$ROBOMME_TASKS\" \
                --env.dataset_split=test \
                --env.task_ids=[0] \
+                --env.max_parallel_tasks=5 \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
@@ -800,6 +803,7 @@ jobs:
                --env.type=libero_plus \
                --env.task=\"\$LIBERO_PLUS_SUITE\" \
                --env.task_ids=\"\$LIBERO_PLUS_TASK_IDS\" \
+                --env.max_parallel_tasks=5 \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
@@ -900,6 +904,8 @@ jobs:
                --policy.path=lerobot/smolvla_vlabench \
                --env.type=vlabench \
                --env.task=select_fruit,select_toy,select_book,select_painting,select_drink,select_ingredient,select_billiards,select_poker,add_condiment,insert_flower \
+                --env.episode_length=50 \
+                --env.max_parallel_tasks=5 \
                --eval.batch_size=1 \
                --eval.n_episodes=1 \
                --eval.use_async_envs=false \
@@ -19,8 +19,8 @@ on:
  workflow_dispatch:

  # Runs at 02:00
-  schedule:
-    - cron: "0 2 * * *"
+  # schedule:
+  #   - cron: "0 2 * * *"

 env:
  CLOSE_ISSUE_MESSAGE: >
@@ -59,8 +59,8 @@ keywords = ["lerobot", "huggingface", "robotics",  "machine learning", "artifici

 dependencies = [
    # Core ML
-    "torch>=2.7,<2.11.0",
-    "torchvision>=0.22.0,<0.26.0",
+    "torch>=2.7,<2.13.0",
+    "torchvision>=0.22.0,<0.28.0",
    "numpy>=2.0.0,<2.3.0", # NOTE: Explicitly listing numpy helps the resolver converge faster. Upper bound imposed by opencv-python-headless.
    "opencv-python-headless>=4.9.0,<4.14.0",
    "Pillow>=10.0.0,<13.0.0",
@@ -99,7 +99,7 @@ dataset = [
    "pandas>=2.0.0,<3.0.0", # NOTE: Transitive dependency of datasets
    "pyarrow>=21.0.0,<30.0.0", # NOTE: Transitive dependency of datasets
    "lerobot[av-dep]",
-    "torchcodec>=0.3.0,<0.11.0; sys_platform != 'win32' and (sys_platform != 'linux' or (platform_machine != 'aarch64' and platform_machine != 'arm64' and platform_machine != 'armv7l')) and (sys_platform != 'darwin' or platform_machine != 'x86_64')", # NOTE: Windows support starts at version 0.7 (needs torch==2.8), ffmpeg>=8 support starts at version 0.8.1 (needs torch==2.9), system-wide ffmpeg support starts at version 0.10 (needs torch==2.10).
+    "torchcodec>=0.3.0,<0.13.0; sys_platform != 'win32' and (sys_platform != 'linux' or (platform_machine != 'aarch64' and platform_machine != 'arm64' and platform_machine != 'armv7l')) and (sys_platform != 'darwin' or platform_machine != 'x86_64')", # NOTE: Windows support starts at version 0.7 (needs torch==2.8), ffmpeg>=8 support starts at version 0.8.1 (needs torch==2.9), system-wide ffmpeg support starts at version 0.10 (needs torch==2.10), 0.11 needs torch==2.11, 0.12 needs torch==2.12.
    "jsonlines>=4.0.0,<5.0.0",
 ]
 training = [
@@ -256,7 +256,9 @@ class TrainPipelineConfig(HubMixin):
                ) from e

        cli_args = kwargs.pop("cli_args", [])
-        if config_file is not None:
+        # Legacy RA-BC migration only applies to framework-saved checkpoints (always JSON).
+        # Hand-written YAML/TOML configs are expected to use the current sample_weighting schema.
+        if config_file is not None and config_file.endswith(".json"):
            with open(config_file) as f:
                config = json.load(f)
            migrated_config = _migrate_legacy_rabc_fields(config)
@@ -100,8 +100,8 @@ class DiffusionConfig(PreTrainedConfig):

    # Inputs / output structure.
    n_obs_steps: int = 2
-    horizon: int = 16
-    n_action_steps: int = 8
+    horizon: int = 64
+    n_action_steps: int = 32

    normalization_mapping: dict[str, NormalizationMode] = field(
        default_factory=lambda: {
@@ -122,10 +122,10 @@ class DiffusionConfig(PreTrainedConfig):
    crop_ratio: float = 1.0
    crop_shape: tuple[int, int] | None = None
    crop_is_random: bool = True
-    pretrained_backbone_weights: str | None = None
-    use_group_norm: bool = True
+    pretrained_backbone_weights: str | None = "ResNet18_Weights.IMAGENET1K_V1"
+    use_group_norm: bool = False
    spatial_softmax_num_keypoints: int = 32
-    use_separate_rgb_encoder_per_camera: bool = False
+    use_separate_rgb_encoder_per_camera: bool = True
    # Unet.
    down_dims: tuple[int, ...] = (512, 1024, 2048)
    kernel_size: int = 5
@@ -97,8 +97,8 @@ class VQBeTConfig(PreTrainedConfig):
    vision_backbone: str = "resnet18"
    crop_shape: tuple[int, int] | None = (84, 84)
    crop_is_random: bool = True
-    pretrained_backbone_weights: str | None = None
-    use_group_norm: bool = True
+    pretrained_backbone_weights: str | None = "ResNet18_Weights.IMAGENET1K_V1"
+    use_group_norm: bool = False
    spatial_softmax_num_keypoints: int = 32
    # VQ-VAE
    n_vqvae_training_steps: int = 20000
@@ -939,7 +939,7 @@ class Qwen2_5_VLFlashAttention2(Qwen2_5_VLAttention):
        input_dtype = query_states.dtype
        if input_dtype == torch.float32:
            if torch.is_autocast_enabled():
-                target_dtype = torch.get_autocast_gpu_dtype()
+                target_dtype = torch.get_autocast_dtype(query_states.device.type)
            # Handle the case where the model is quantized
            elif hasattr(self.config, "_pre_quantization_dtype"):
                target_dtype = self.config._pre_quantization_dtype
@@ -985,7 +985,7 @@ class Florence2FlashAttention2(Florence2Attention):
        input_dtype = query_states.dtype
        if input_dtype == torch.float32:
            if torch.is_autocast_enabled():
-                target_dtype = torch.get_autocast_gpu_dtype()
+                target_dtype = torch.get_autocast_dtype(query_states.device.type)
            # Handle the case where the model is quantized
            elif hasattr(self.config, "_pre_quantization_dtype"):
                target_dtype = self.config._pre_quantization_dtype
@@ -46,7 +46,7 @@ class LeKiwiConfig(RobotConfig):
    cameras: dict[str, CameraConfig] = field(default_factory=lekiwi_cameras_config)

    # Set to `True` for backward compatibility with previous policies/dataset
-    use_degrees: bool = False
+    use_degrees: bool = True


@dataclass
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54aecbc1af72a4cd5e9261492f5e7601890517516257aacdf2a0ffb3ce281f1b
+oid sha256:51effd76b73e972f10d31f5084ab906386134b600c87b2668767d30232a902bd
 size 992
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:88a9c3775a2aa1e90a08850521970070a4fcf0f6b82aab43cd8ccc5cf77e0013
-size 47424
+oid sha256:d4d7a16ca67f9adefac0e0620a7b2e9c822f2db42faaaced7a89fbad60e5ead4
+size 47680
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:91a2635e05a75fe187a5081504c5f35ce3417378813fa2deaf9ca4e8200e1819
+oid sha256:796c439ee8a64bf9901ff8325e7419bda8bd316360ee95e6304e8e1ae0f4c36c
 size 68
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:645bff922ac7bea63ad018ebf77c303c0e4cd2c1c0dc5ef3192865281bef3dc6
-size 47424
+oid sha256:ad33a8b47c39c2e1374567ff9da43cdb95e2dbe904c1b02a35051346d3043095
+size 47680
Author	SHA1	Message	Date
Anthony Shoumikhin	1f7b03f5f2	chore(deps): allow torch 2.11/2.12 and fix autocast deprecation (#3435 ) * chore(deps): allow torch 2.11/2.12 and fix autocast deprecation - Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26), and torchcodec to <0.13 (was <0.11) to allow installs against the latest stable torch 2.11 and the upcoming 2.12 line. - Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda") in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+). - Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130, torchcodec 0.11.1, full CUDA 13 stack). Verified locally with `uv sync --locked` from a clean .venv and the lerobot test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is identical to the pre-bump baseline: 18 pre-existing failures (test_sac_policy, test_pi0_rtc, test_pi05_rtc, test_replay_buffer), 0 new, 0 fixed. AI assistance: this change was authored with Claude Code per AI_POLICY.md. * fix(policies): use device-agnostic autocast dtype lookup Pass query_states.device.type to torch.get_autocast_dtype() instead of hardcoding 'cuda', so the cast matches the active autocast context when running under CPU/MPS/XPU autocast. --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-10 13:05:35 +02:00
Steven Palma	cb8edf17e6	chore(dependencies): update uv.lock (#3475 )	2026-05-10 12:24:22 +02:00
Steven Palma	5699f6cbf4	chore(ci): disable auto-stale (#3550 )	2026-05-10 11:49:31 +02:00
masato-ka	0e6114ac36	fix(train): restrict legacy RA-BC migration to JSON checkpoints only (#3490 ) * fix(train): restrict legacy RA-BC migration to JSON checkpoints only _migrate_legacy_rabc_fields was called for all config files, causing json.load to raise DecodeError when a YAML/TOML config was passed to lerobot-train for a new training run. Guard the block with an .endswith(".json") check so migration only runs when resuming from a JSON checkpoint.	2026-05-08 20:27:01 +02:00
Steven Palma	c8ce413d73	fix(robots): allign lekiwi default with so100 use_degrees (#3531 )	2026-05-07 17:52:34 +02:00
Pepijn	82dffde7fa	fix(ci): speed up multi-task benchmark evals (parallelize + cap VLABench steps) (#3529 ) * fix(ci): run multi-task benchmark evals 5-at-a-time in parallel The eval script supports running tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus — 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra) are unchanged. * fix(ci): cap VLABench smoke eval at 50 steps per task VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s the smoke eval took ~80 minutes of rollouts on top of the image build. The eval is a pipeline smoke test (running_success_rate stays at 0% on this short rollout anyway), so we don't need full episodes — cap each task at 50 steps to bring total rollout time down ~10x. * fix(ci): run VLABench tasks 5-at-a-time in parallel The eval script already supports running multiple tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10 VLABench tasks finish in ~2 waves instead of running sequentially.	2026-05-07 13:37:16 +02:00
Ville Kuosmanen	eaf0218bc8	feat(policy): use pretrained vision encoder weights by default for diffusion and vqbet (#3202 ) * feat: add pretrained vision encoder weights for diffusion and vqbet * fix test by re-generating artifacts --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-07 12:10:38 +02:00