lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-01 15:17:05 +00:00

Author	SHA1	Message	Date
Khalil Meftah	f581adb3be	Merge branch 'main' into feat/add-recap	2026-06-30 16:22:18 +02:00
Khalil Meftah	2f2b567951	Enable MolmoAct2 rollout on SO-100/101 with calibration correction (#3879 ) * fix(rollout): improve visual feature mismatch error with --rename_map hint * feat(policies): add joint frame transform and hardware deployment docs for MolmoAct2 Add MolmoAct2StateFrameTransformStep and MolmoAct2ActionFrameTransformStep processor steps for cross-calibration compatibility on SO-100/101. Add joint_signs and joint_offsets config fields. Add hardware deployment section to molmoact2.mdx with camera naming convention, joint frame correction, and safety guidance. * chore(docs): address PR comment * fix: address reviewer comments	2026-06-29 18:52:59 +02:00
Maxime Ellerbach	18eee1b477	refactor(vla-jepa): removing gpu roundtrip (#3750 ) * refactor(vla-jepa): removing gpu roundtrip for the preprocessing part * major refactor of the forward pass and model input conversion * linting * adressing suggestions from reviews * removing redundant state dtype conversion * avoiding recreating the same tensor each foward pass * api simplification of `_encode_qwen` * avoiding useless video assembly during inference * guard against video=None for the wm loss	2026-06-29 18:50:04 +02:00
Nicolas Rabault	5ac3b49a5f	feat(train): run training remotely on HF Jobs via --job.target (#3856 ) * feat(train): add JobConfig group, save_checkpoint_to_hub flag, Hub checkpoint helper Introduce a JobConfig draccus group on TrainPipelineConfig (--job.target/image/ timeout/detach/tags) whose is_remote property gates remote dispatch, plus a save_checkpoint_to_hub flag and validation. Add push_checkpoint_to_hub(), which uploads a saved checkpoint directory to the model repo under checkpoints/<step>/ and creates the repo idempotently (private propagates from policy.private). * feat(train): run training remotely on HF Jobs via --job.target When --job.target names a GPU flavor, train() dispatches to lerobot.jobs.submit_to_hf instead of training locally: it authenticates, ensures the dataset is on the Hub (pushing a local-only one privately), serializes a pod-compatible train_config.json (strips client-only fields, points at the model repo), submits via HfApi.run_job with HF_TOKEN/WANDB_API_KEY secrets, then streams logs and finishes when the model is pushed. Wires push_checkpoint_to_hub into the training loop behind save_checkpoint_to_hub, and tags jobs/datasets/model with 'lerobot' + --job.tags. * docs(train): document remote training on HF Jobs * test(train): skip remote-dispatch tests without the dataset extra The module imports lerobot.scripts.lerobot_train, which eagerly pulls in lerobot.datasets (dataset extra). The base fast-test CI tier runs without that extra, so collection failed there. Guard with pytest.importorskip, matching the existing tests/scripts dataset-extra tests. * refactor(jobs): hoist huggingface_hub imports to module level in hf.py huggingface_hub is a core dependency, so the per-function dynamic imports had no lazy-loading rationale. Move them to a single module-level import and update test monkeypatch targets to lerobot.jobs.hf.* accordingly. * refactor(jobs): build remote config dict via cfg.to_dict() TrainPipelineConfig.to_dict() already returns the canonical draccus encoding, so the StringIO + draccus.dump + json.loads round-trip was redundant. Use it directly and drop the now-unused io/draccus imports. * refactor(train): use module-level HfApi import in push_checkpoint_to_hub huggingface_hub is a core dependency; the in-function import was unnecessary. Move HfApi to a module-level import and point the test monkeypatches at lerobot.common.train_utils.HfApi. * refactor(configs): export JobConfig from the configs package Re-export JobConfig in lerobot/configs/__init__.py so external callers import it as `from lerobot.configs import JobConfig`, matching the other config classes. Adapt the train script and test imports. * refactor(jobs): check dataset presence with api.repo_exists Replace the dataset_info try/except RepositoryNotFoundError dance with a direct api.repo_exists(repo_id, repo_type="dataset") call, dropping the httpx/RepositoryNotFoundError test scaffolding. * chore(jobs): annotate ensure_dataset_available api param as HfApi Add the missing HfApi type hint via a TYPE_CHECKING import. * refactor(jobs): use HF_LEROBOT_HOME constant for the local cache root Resolve the local dataset cache via lerobot.utils.constants.HF_LEROBOT_HOME instead of re-reading the env var by hand, dropping the os/Path imports. Tests now patch the imported constant and assert on a stable message substring (the previous "neither" match only passed by accident, matching the test name embedded in the pytest tmp_path). * chore(jobs): guard LeRobotDataset import with require_package Surface a clear "install lerobot[dataset]" error if the datasets extra is missing, instead of a raw ImportError, before pushing a local dataset. * docs(configs): clarify the is_remote_target/is_remote split Add a comment explaining why JobConfig keeps both the staticmethod (tests a raw target string from argv before a config exists) and the property (accessor for an existing config instance). * docs(train): note how to pin a pushed model version for inference Document --policy.pretrained_revision alongside --policy.path so a specific Hub-pushed checkpoint (once --save_checkpoint_to_hub has committed several) can be selected for inference. * test(jobs): skip dataset import guard in base-deps test The fast test env installs base deps only, so require_package('datasets') raised ImportError before the mocked lerobot.datasets import was reached. Monkeypatch the guard to a no-op so the unit test exercises the upload logic. * fix(jobs): address claude review findings on remote training Resolve the claude[bot] review on #3856: - Reject reward-model training under --job.target with a clear error instead of crashing on a None policy inside build_remote_config_file. - Support --policy.path remote runs: validate() no longer requires repo_id for remote runs (it is auto-generated in submit_to_hf), and repo_id/push_to_hub are now set after validate() resolves the policy. - Narrow the bare `except Exception` in _tail_logs/_poll_until_done to (OSError, httpx.HTTPError) so programming errors surface instead of being silently retried or counted as job failures. - Install the SIGINT detach handler only on the main thread. - Generate model repo timestamps in UTC. * docs(jobs): document the model-pushed marker contract and orphaned repos Follow-up to the claude[bot] review on #3856 (non-blocking observations): - Cross-reference the "Model pushed to <url>" log line between its producer (PreTrainedPolicy.push_model_to_hub) and the remote-run consumer in submit_to_hf, noting the contract is an early-finish optimization that falls back to status polling if it drifts. - Note in the HF Jobs guide that a failed remote run leaves its model repo on the Hub (it is not auto-deleted) and how to remove it. * feat(train): tag each pushed checkpoint with its step Address review feedback on #3856: pushing a checkpoint to the Hub now also creates a tag named after the checkpoint step, so a checkpoint can be recovered with --policy.pretrained_revision=<step> instead of having to look up its commit sha. * fix(jobs): hoist ensure_dataset_available to a module-level import Addresses Caroline's review comment on PR #3856: the local import of ensure_dataset_available inside submit_to_hf was vestigial. dataset.py does not import hf.py, so there is no circular-import risk and no extra load cost (its heavy deps stay lazy), so make it a top-level import. * refactor(configs): untangle config_path/resume resolution in validate() Split the re-parse HACK block in TrainPipelineConfig.validate() into focused helpers (_resolve_pretrained_from_cli, _resolve_resume_checkpoint) that handle the policy path, reward-model path, and resume config_path as separate, readable units. Behavior-preserving. * feat(train): resume training from a Hub checkpoint Allow --config_path to be a Hub repo id when resuming, not only a local path. The latest checkpoint under checkpoints/<step>/ is downloaded into a fresh local run dir and resumed from there (optimizer, scheduler, RNG and data order restored as for a local resume). TrainPipelineConfig.from_pretrained falls back to the latest checkpoint's train_config.json when a repo has no root config (an interrupted run that only pushed checkpoints). The download is skipped when dispatching remotely so the executor (local machine or HF Jobs pod) performs it. - add find_latest_hub_checkpoint (utils/hub) and resolve_resume_checkpoint (common/train_utils), the symmetric download counterpart to push_checkpoint_to_hub - unit tests for both helpers and the from_pretrained fallback * feat(jobs): resume a run on HF Jobs from a checkpoint When --resume is set with a remote --job.target, submit_to_hf resumes from the checkpoint repo instead of staging a fresh config. A Hub config_path is resumed in place (its checkpoint config already targets that repo); a local config_path has its checkpoint uploaded to a new private repo first and the run is forced to push back to it. The pod command carries --job.target=local so the checkpoint's saved job.target can't make the pod re-dispatch itself, and the user's CLI overrides are forwarded so a remote resume matches the same local command. ensure_dataset_available is hoisted before the resume/fresh branch since it applies to both. * docs(train): document resuming from a Hub checkpoint, locally and on jobs Show that --config_path accepts a Hub repo id for --resume, and that adding --job.target resumes on HF Jobs (uploading a local checkpoint/dataset first). * fix(jobs): default remote job timeout to 2d instead of the platform default HF Jobs applies its own short 30-minute timeout when none is sent, which silently kills long training runs. Pass an explicit, generous 2d cap by default; users can still override --job.timeout to fail fast or extend it. * fix(jobs): drop --dataset.root on resume + restore keyboard-control docs Address the latest Claude review on #3856: - _build_resume_job no longer forwards --dataset.root to the pod (a host-local path it can't read); the fresh-run path already nulls it in build_remote_config_file, so this makes resume consistent. Add a unit test for _pod_forwarded_args covering the drop in both flag forms. - Restore the display-independent keyboard-control docs (n/r/q letter equivalents + X11/Wayland/headless Tip) in il_robots.mdx that this branch was stale on relative to main (#3875). * fix(jobs): handle str-typed job stage from huggingface_hub inspect_job's status.stage is an enum (with .value) in some huggingface_hub versions and a plain str in others. The poller assumed the enum shape, raising "'str' object has no attribute 'value'" on resume for users on the str-returning version. Read it via getattr(..., "value", ...) so both shapes work, and parametrize the poll test over enum and str stages so the str case is actually exercised (the old mock only ever simulated the enum). * refactor(jobs): use relative import for ensure_dataset_available * refactor(train): hoist submit_to_hf import to module top The `from lerobot.jobs import submit_to_hf` was a function-local import in train(); it pulls no heavy/optional deps and has no circular-import risk, so move it to the top-level import block. * refactor(train): hoist _remote_target_in_argv imports to module top Move `import sys` and `from lerobot.configs import JobConfig` out of the function body and into the top-level import block. * refactor(utils): use relative import for sibling constants in hub.py `from lerobot.utils.constants import CHECKPOINTS_DIR` was the odd one out in utils/ — sibling modules there are imported relatively (.constants, .errors, .utils, ...). Match that convention. * refactor(jobs): hoist LeRobotDataset import, guard dataset extra at package init Move the `from lerobot.datasets import LeRobotDataset` import to the top of dataset.py and relocate the `require_package("datasets", extra="dataset")` guard to the jobs package __init__, per review feedback. * test(jobs): skip test_hf if datasets extra is missing lerobot.configs.train pulls in datasets at import time, so the module fails to collect without lerobot[dataset]. Guard with importorskip, matching the convention in tests/training/test_multi_gpu.py. * test(jobs): skip test_dataset if datasets extra is missing tests/jobs/test_dataset.py imports lerobot.jobs.dataset, which triggers the require_package("datasets") guard in lerobot/jobs/__init__.py at import time. Without lerobot[dataset] the module fails to collect in the base CI tier. Guard with importorskip, same as test_hf.py.	2026-06-29 17:59:33 +02:00
Caroline Pascal	a5821a01a2	feat(dependencies): bump rerun-sdk to `<0.34.0` (#3763 ) * Update upper bound to latest rerun-sdk * chore(updae): update rerun logging to use the latest features * chore(format): formatting code * feat(features names and color): improving features names and display colors when replaying an episode * feat(blueprints): switching to blueprints for backwards (and forward) compatibiltiy * feat(blueprints): switching to blueprints for backwards (and forward) compatibiltiy * feat(grid): Leveraging rerun's automatic grid arangement for improved layout * test(update): update tests * chore(colors): removing unreliable colors * chore(simplification): removing no longer needed reshape * chore(imports): cleaning up imports * fix(claude): claude reviews * chore(dependecies): update rerun ceil version * chore(scripts): recover comments * chore(utils): add guard for blueprint * fix(test): style check * fix(deps): typo bound --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: ntjohnson1 <24689722+ntjohnson1@users.noreply.github.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Steven Palma <steven.palma@huggingface.co>	2026-06-29 17:28:06 +02:00
Caroline Pascal	3dd19d043e	feat(depth maps): adding support for depth in LeRobot (#3644 ) * feat(depth): add depth quantization helpers and tests * feat(video): add ffv1 to supported codecs * feat(depth): persist depth metadata * feat(depth): extend quantization tools to better fit the encoding/decoding pipeline * feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter * feat(depth): wire StreamingVideoEncoder + writer to depth encoder * feat(depth): wire DatasetReader to decode_depth_frames * feat(cameras/realsense): expose async depth in metric meters * feat(features): route 2D camera shapes to observation.depth.<key> * feat(robots/so_follower): emit + populate depth keys when use_depth * feat(record): plumb DepthEncoderConfig through lerobot-record * feat(viz): render depth observations as rr.DepthImage in Viridis * feat(depth maps writer): adding support for raw depth maps recording with image writer * chore(format): format code * feat(depth shape): ensuring depth maps shape is always including the channel * feat(is_depth): simplifying is_depth nested name + legacy support * fix(stop_event): fixing stop_event race condition in camera classes * fix(plumbing): fixing missing parts in the depth maps pipeline * chore(typos): fixing typos * test(fix): fixing exisiting tests to still work with latest features * tests(depth): adding new tests for depth integration validation * feat(pix_fmt channels): use PyAv to check get pixel formats number of channels * feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file. * fix(pre-commit): fixing mutable defautl value * fix(info): fixing info metadata update when is_depth_map was set * tests(typos): fixing typos in tests * fix(realsense): fixing typo in realsense serial number * fix(normalization): restricting 255 normalization to non depth/uint8 images only * fix(typo): fixing typo * fix(TIFF): add missing quantization and cleanup for TIFF files * feat(batched dequantization): optimizing dequantize_depth for torch based batched dequantization * feat(tools): adding depth support in LeRobotDataset edition tools * test(aggregate): extending aggregation tests to depth frames * test(cleaning): cleaning up tests * fix(from_video_info): fixing early validation issue in from_video_info * fix(typo): fixing typo * fix(is_depth): adding missing doctrings and is_depth arguments in video decoding functions Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * fix(depth units): fixing depth units output for the realsense cameras * feat(output unit): adding support for output unit specification at dataset reading/training time Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * test(depth): cleaning up depth tests * test(depth encoding): updating and cleaning video/depth encoding tests * chore(format): formatting code * docs(depth): improving depth maps docs * test(fix): fixing depth tests * test(dataset tools): adding missing tests for new dataset edition tools features * chore(format): formatting code * fix(pyav check): fixing PyAV option validation for integer codec options by normalizing numeric values before calling `is_integer()` Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> * docs(mermaid): fixing mermaid diagram * fix(rebase): rebase follow up corrections * feat(dataset tools): adding missing docstrings and features for depth fill support in dataset edition tools * docs(docstring): updating docstrings * docs(dataset tools): updating docs * fix(save images): fixing image saving in dataset tools * fix(update video info): fixing update video info logic to match the recording and editing use cases * test(reencode): fixing reencoding monkeypatch * fix(review): add Claude review * chore(format): format code * fix(update video info): ditching the differentiated approahces for video info update - video info are always updated unless for preserved keys. * chore(rebase): fixing rebase merge conflicts * test(visualization): fixing visualization tests * feat(docstrings): adding explicit docstring for encoding parameters. Docstrigns will now show up as description in the CLI --help. * feat(mm as default): adding a global DEFAULT_DEPTH_UNIT variable setting mm as default depth unit * fix(RGB <-> camera): renaming camera_encoder to rgb_encoder for clarity * chore(TODO): removing deprecated TODO * doc(write_u16_plane): improving docstrings for write_u16_plane * feat(units): adding constants for depth frames units (m and mm) * fix(spam): replacing spamming warning but a debug log * feat(leagcy metadata): adding automatic metadata update for legacy 'video.is_depth_map' feature * fix(copy&reindex): fixing metadat reshaping for single channel frames * fix(ImageNet): excluding dpeth frames from ImageNet stats * fix(PyAV container seek): fixing initial PyAV container seek to be robust againsy codec choice * feat(lerobot-dataset-viz): adding support for depth in lerobot-dataset-viz * fix(compress): removing rerun compression for DepthImages * fix(signle channel squeeze): fixing single channel squeezing * chore(format): format code * fix(streaming): adding support for dequantization in streaming_dataset.py * refactor(read depth): factorizing depth reading methods for realsense camera and adding support for depth-only usage * chore(renaming): fixing missed RGBEncoderConfig renamings * docs(renaming): reflecting renamings in a clearer way in the docs * chore(annotation): excluding depth from the annotation pipeline * feat(robots): adding depth support in compatible follower robots * feat(LeSadKiwi): excluding LeKiwi from depth support (for now) * chore(fail): removing misplaced file * chore(fail): removing misplaced file * fix(remove ffv1): removing ffv1 as it does not support MP4 * docs(cheat sheet): adding depth and video encoding to the cheat sheet * fix(lossless): tuning depth encoding parameters for lossless depth storage * test(fix): fixing failing tests * depth(ZMQ): excluding ZMQ from depth support * Revert "depth(ZMQ): excluding ZMQ from depth support" This reverts commit `b95cf4e4c2`. * fix(image transforms): excluding depth frames from images transforms * fix(typo): typo * fix(stats): fixing stats computation for depth frames * fix(TIFF vs. pytorch): adding an extra uint16 to float32 conversion for depth maps stored as raw TIFF images * fix(typos): fixing typos * test(dtype): fixing stats computation typing tests --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Wensi (Vince) Ai <59036629+wensi-ai@users.noreply.github.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Wensi Ai <wsai@stanford.edu>	2026-06-27 14:21:21 +02:00
Khalil Meftah	6a788fbdb0	Add inline offline validation with train/eval split (#3824 ) * refactor(training): rename eval_freq to env_eval_freq - Rename eval_freq to env_eval_freq to distinguish sim environment evaluation from offline loss evaluation. * feat(training): add inline offline validation with train/eval split - Add eval_split config for balanced per-task holdout - Add eval_steps for periodic inline eval loss computation - Add max_eval_samples to cap eval cost * fix(datasets): remap absolute indices in __getitem__ for filtered datasets * fix(train): vectorize eval subset selection for max_eval_samples * fix(datasets): Move the remapping into EpisodeAwareSampler via absolute_to_relative_idx * fix(validation): add eval_split range check and eval_steps warning Validate eval_split is in [0.0, 1.0) to prevent garbage splits from out-of-range values. Raise when eval_steps > 0 but eval_split is 0.0 since no offline eval will run. * fix(train): prepare eval dataloader with accelerator for multi-GPU Prepare eval_dataloader through accelerator.prepare() so eval data is sharded across ranks instead of duplicated. Reduce eval_loss across ranks with mean reduction for consistent logging. * fix(test): rename eval_freq to env_eval_freq for multi-GPU training	2026-06-25 15:31:24 +02:00
Khalil Meftah	c3f180e115	refactor(policies): clean MolmoAct2 to follow EO1/TOPReward patterns (#3724 ) Align the MolmoAct2 implementation with lerobot codebase conventions: - Rename hf_model/ to molmoact2_hf_model/ - Slim config: move all I/O and runtime logic to modeling - Remove blanket from 8 vendored files, fix 66 lint issues - Deduplicate _hf_token() and _resolve_checkpoint_location() - Make huggingface_hub imports lazy - Remove custom MolmoAct2CosineDecayWithWarmupSchedulerConfig, use base class - Extract 13 static/classmethods from MolmoAct2Policy to free functions - Replace print() with logger in vendored action_tokenizer - Add module docstrings, class docstring, and key method docstrings - Add module-level loggers to modeling and processor - Fix docs: pip to uv install, deduplicate README symlink - Remove shebangs from all files	2026-06-25 14:19:35 +02:00
Eric Chan	324086abc3	Update follower arm description in documentation (#3780 ) Signed-off-by: Eric Chan <hazzelnut@pm.me>	2026-06-25 13:58:08 +02:00
Steven Palma	b4e454c0ff	feat(utils): display-independent keyboard controls for recording (Wayland / headless / macOS) (#3875 ) * feat(utils): headless keyboard control * refactor(utils): consolidate keyboard listener creation * fix(rollout): remove import require guard for pynput --------- Co-authored-by: Leo Toff <leo@toff.dev> Co-authored-by: Stefano Maestri <stefano.maestri@javalinux.it> Co-authored-by: Sahil Chande <85823961+SahilChande@users.noreply.github.com> Co-authored-by: Vinayak Agarwal <63502278+Vinayak-Agarwal-2004@users.noreply.github.com> Co-authored-by: Abdul Rahim Mirani <abdulrahimmirani@gmail.com>	2026-06-25 10:58:39 +02:00
someone114514	508d18f8a1	Fix ACT policy type examples in docs (#3792 )	2026-06-25 08:59:07 +02:00
Alexandre Edmond	536b9621b2	Fix pi0fast model id in docs (#3855 )	2026-06-24 11:44:03 +02:00
Jiwen Cai	79d4976ae2	fix(deps): pin cmeel-urdfdom <5 and cmeel-tinyxml2 <11 in placo-dep (#3873 ) placo pulls in pin (Pinocchio), whose binary wheels dlopen specific cmeel sonames (liburdfdom_sensor.so.4.0, libtinyxml2.so.10) but declare only `>=` floors on their cmeel packages. The 2026-05-21 major bumps (cmeel-urdfdom 6.0.0 -> .so.6, cmeel-tinyxml2 11.0.0 -> .so.11) ship newer sonames, so left unpinned the resolver grabs them and `import placo` fails at load with "liburdfdom_sensor.so.4.0: cannot open shared object file". #3647 capped placo and hardened the kinematics import, but the guard only defers the failure: constructing RobotKinematics still raises. Pin the cmeel packages to the 4.x / 10.x ABI the placo/pin wheels are built against (there is no cmeel-urdfdom 5.x; <5 selects 4.x). Regenerated uv.lock with uv 0.8.0 to match CI; the only resolution change is the two cmeel versions (plus a deterministic decord platform-marker cascade from 4.0.1's wider wheel set). Fixes #3755	2026-06-24 11:23:25 +02:00
Khalil Meftah	6f0ba4be38	Record eval rollouts as LeRobot datasets (#3825 ) * feat(eval): record eval rollouts as raw LeRobot datasets - Record raw env observations inline during rollout(), before preprocess_observation() transforms them. Uses LeRobotDataset.create() with add_frame()/save_episode(). - Supports vectorized envs: each env in the batch records independently, with save_episode() called per env on termination. Each task gets its own dataset under output_dir/recordings/{task_group}_{task_id}/. Enabled via --eval.recording=true; disabled by default. * fix(eval): use FeatureType enum comparison instead of string value * refactor(eval): per-env datasets recording, no double reset - Extract _infer_shape_from_obs() to reduce nesting in feature conversion - Move dataset creation into rollout() using its own env.reset() observation, eliminating the extra reset in run_one() - Replace deepcopy with _shallow_copy_obs() for raw observation stashing - Support batch_size > 1: each parallel env records to its own dataset (single env skips the env_0/ nesting for simplicity) - One-time warning for env_features keys missing from observations - Pass recording_dir + env_features through the call chain instead of a pre-built recording_dataset object * refactor(eval): remove shape inference and shallow copy helpers * feat(eval): optionally push recorded eval datasets to the Hub * fix(eval): address review comments - Wrap rollout loop in try/finally so finalize() runs on crash/interrupt - Guard push_to_hub with num_episodes > 0 to avoid pushing empty datasets - Hoist loop-invariant multi_env and base_repo_id out of creation loop	2026-06-23 14:03:57 +02:00
Khalil Meftah	2d4be80425	feat(pi05): implement Classifier-Free Guidance (CFG) inference Add dual-path denoising with configurable cfg_beta scale for language- conditioned action generation. When cfg_beta > 1.0, VLM prefills both conditioned and unconditional prompts, and action expert velocities are interpolated via v = v_uncond + β*(v_cond - v_uncond).	2026-06-22 17:37:33 +02:00
Khalil Meftah	7d1e1b0357	feat(pi05): integrate RenderMessagesStep for advantage conditioning Add RenderedMessagesToTaskStep adapter that bridges recipe-rendered chat messages back into PI05's task-string prompt format. When recipe_path is set on PI05Config, the preprocessor inserts RenderMessagesStep + adapter before prompt construction, enabling RECAP advantage text to flow end-to-end through the recipe YAML system.	2026-06-22 15:55:39 +02:00
Khalil Meftah	0d2ba54385	feat(rollout): add episode success labeling to DAgger strategy	2026-06-22 15:08:05 +02:00
Khalil Meftah	4b779b1e99	feat(recap): add advantage conditioning recipe YAMLs	2026-06-22 14:39:45 +02:00
Khalil Meftah	ea908c0672	feat(recap): add advantage scoring annotation module Implement the RECAP advantage scoring module as a new phase in lerobot-annotate. Uses a frozen distributional VF to compute per-frame advantages, binarizes into positive/negative indicators with per-task threshold, and writes style=advantage persistent rows for policy conditioning. Skips VF inference on intervention frames as an optimization.	2026-06-22 14:01:58 +02:00
Maxime Ellerbach	73782447f2	feat(train): FSDP checkpoint saving (#3810 ) * feat(train): FSDP checkpoint saving * adding docs for FSDP * adding a test for the fsdp checkpoint path * cleanup * fixing final upload to hub * refactored initial implementation to use torch fsdp api and adding new tests	2026-06-22 13:51:21 +02:00
Khalil Meftah	e5c94c732f	feat(recap): add lerobot-compute-returns script to compute MC returns	2026-06-22 12:17:37 +02:00
Khalil Meftah	2d7a42011a	fix(policies): support offline batch inference for ACT and Diffusion (#3822 ) - Guard ACT's KL divergence computation against None latent params to prevent crashes during eval when use_vae is set but the forward path returns no VAE outputs. - Add offline batch fallback to Diffusion's predict_action_chunk() so it works with dataloader batches (empty queues) in addition to the existing online rollout path (populated queues). This enables batched action prediction for offline evaluation.	2026-06-21 11:48:45 +02:00
Khalil Meftah	b06ad40888	feat(hub): add pretrained_revision to pin Hub model versions (#3820 ) - Add pretrained_revision field to PreTrainedConfig (policies) and RewardModelConfig (reward models), and thread it through make_policy(), make_pre_post_processors(), and make_reward_model() so that weights and processor configs can be loaded from a specific Hub commit, branch, or tag. Defaults to None (latest version, preserving current behavior). Dataset and env hub loading already supported revision pinning. Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-06-19 18:32:47 +02:00
Khalil Meftah	b3d74f80f0	Fix batch wandb logging metrics and handle scalar stats (#3821 ) * fix(logging): batch wandb metrics - Batch all metrics into a single wandb.log() call instead of one per key, reducing API overhead. - Add support for list-valued metrics by expanding them to indexed keys (e.g. metric_0, metric_1). * fix(stats): handle scalar stats robustly - Wrap cast_stats_to_numpy with np.atleast_1d to prevent 0-d arrays from scalar stats causing shape mismatches downstream. * fix(logging): remove unused list-valued metric expansion --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-06-19 18:31:12 +02:00
Khalil Meftah	552b4c3563	Add third-party env plugin discovery (#3823 ) * feat(envs): add env plugin discovery - Add 'lerobot_env_' to third-party plugin discovery prefixes, completing the plugin system for all component types (robots, cameras, teleoperators, policies, and now environments). External packages named lerobot_env_* can self-register EnvConfig subclasses on import, enabling --env.type= resolution without lerobot code changes. * feat(envs): add generic observation passthrough - Add generic observation passthrough in preprocess_observation() for unhandled ndarray/tensor keys, replacing the pattern of adding per-env hardcoded key handlers. Extra keys are forwarded as observation.<key> and can be shaped by env-specific ProcessorSteps via get_env_processors(). --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-06-19 18:30:00 +02:00
Khalil Meftah	c18b8277f1	Merge branch 'main' into feat/add-recap # Conflicts: # uv.lock	2026-06-18 17:14:59 +02:00
Nicolas Rabault	8bf6056d14	docs: add LeLab web interface to README (#3831 )	2026-06-17 18:22:21 +02:00
Caroline Pascal	da92db8fc0	fix(image transforms): cleaning up image_transforms implementation in LeRobotDataset (#3829 )	2026-06-17 11:50:09 +02:00
Caroline Pascal	2b0834bcb8	fix(cameras): snapshot stop_event in read loops to avoid None deref (#3812 ) * Do not set stop_event to None when stopping thread * fix(cameras): snapshot stop_event in read loops to avoid None deref The background read loops accessed self.stop_event repeatedly while _stop_read_thread() can reassign it to None after join(). Reading the attribute across the loop condition (and a mid-loop re-check) was a time-of-check/time-of-use race: stop_event could flip to None between the `is None` test and the `.is_set()` call, raising AttributeError on the worker thread. Snapshot self.stop_event into a local once, guard it, and loop on the local Event. The Event object is thread-safe and lives for the thread's lifetime; _stop_read_thread() always calls .set() before nulling the attribute, so the local observes the stop and exits cleanly. This also lets us drop the redundant pre-lock stop check. Applies to OpenCVCamera, RealSenseCamera, and ZMQ camera. --------- Co-authored-by: Anes Benmerzoug <anes.benmerzoug@gmail.com>	2026-06-17 11:40:17 +02:00
Caroline Pascal	287c823f13	fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks (#3826 ) * fix(features copy): adding deepcopy on LeRobot dataset features to avoid shallow copy leaks * tests(test): adding new test	2026-06-16 17:58:59 +02:00
Pepijn	58ccc01508	fix(datasets): enforce one parquet row group per episode in v3 data writes (#3807 ) * fix(datasets): enforce one parquet row group per episode in v3 data writes LeRobot v3 data shards must hold exactly one row group per episode so a reader can fetch episode i with pq.ParquetFile(path).read_row_group(i) (a byte-range read) instead of loading the whole shard. The recording writer already does this (one write_table per episode); the aggregate and lerobot-annotate re-write paths instead concatenated many episodes and wrote them in one shot, collapsing the file to a single row group. - io_utils: add write_table_one_row_group_per_episode (one ParquetWriter, one write_table per episode — same pattern as the recording writer); to_parquet_with_hf_images embeds images then writes per-episode row groups; to_parquet_one_row_group_per_episode wraps it for plain frames - aggregate: route non-image data writes through the per-episode writer; leave the episodes-metadata parquet untouched (already one row/episode) - annotate: rewrite shards via the per-episode writer instead of a single bulk pq.write_table - tests: invariant coverage through the aggregate (image + video) and annotate paths No change to on-disk schema, paths, naming, rollover thresholds, or compression. Readers stay backward-compatible (old collapsed files load). * Update src/lerobot/datasets/io_utils.py Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update src/lerobot/datasets/io_utils.py Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(datasets): correct indentation and add strict= in row-group helper The web-edited numpy version of write_table_one_row_group_per_episode had an over-indented line (IndentationError, breaking pre-commit + test collection) and a zip() without strict=. Fix both; behaviour unchanged. --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-06-16 12:15:48 +02:00
Caroline Pascal	38327fdc84	fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion (#3783 ) * fix(images/videos): fixing aggregate_pipeline_dataset_features to avoid unwanted images features deletion when videos are not used * fix(docstrings): improving docstrings Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com> --------- Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>	2026-06-15 17:55:52 +02:00
Steven Palma	9555efc02c	chore(dependencies): update uv.lock (#3595 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-06-15 16:29:44 +02:00
Steven Palma	d576c59afb	refactor(robots): homogenize bi-manual setups implementations (#3772 ) * chore(robots): homogenize bi setups * feat(robots): split openarm mini into single and bi * refactor(robots): mixin for bi classes * docs: update docs	2026-06-15 16:28:54 +02:00
Altman	8515d456be	fix(datasets): avoid uint8 overflow in image stats (#3697 ) * fix(datasets): avoid uint8 overflow in image stats * fix(datasets): promote stats batches dynamically	2026-06-13 12:09:43 +02:00
Mahbod	30790de178	feat(edit-dataset): add `concatenate_videos` opt-out to merge (#3663 ) * feat(edit-dataset): add `concatenate_videos` opt-out to merge When merging datasets, source mp4s are concatenated into shards capped at `video_files_size_in_mb` (default 200 MB). This is great for dataloader throughput but destroys per-episode (or per-source) video boundaries, which is undesirable when you want to inspect, ship, or reuse the individual mp4s. Add a `concatenate_videos: bool = True` knob plumbed through `MergeConfig` → `merge_datasets` → `aggregate_datasets` → `aggregate_videos`. When False, each source mp4 is copied 1:1 to its own destination mp4 with no re-muxing, so the merge preserves source video boundaries. Usage: lerobot-edit-dataset \ --new_repo_id user/merged \ --operation.type=merge \ --operation.repo_ids "['user/a', 'user/b']" \ --operation.concatenate_videos=false Defaults are unchanged; the dataloader path is unaffected because the `episodes.parquet` `from_timestamp`/`to_timestamp` index keeps working regardless of whether each mp4 holds one or many episodes. * feat(edit-dataset): extend concatenate opt-out to data files Following review, add a concatenate_data flag mirroring concatenate_videos, threaded through MergeConfig, merge_datasets, aggregate_datasets, aggregate_data and append_or_create_parquet_file. Metadata index files still always concatenate. Also trim the verbose docstrings and comments since the names are self-explanatory, and extend the existing merge test to cover data files.	2026-06-12 20:05:04 +02:00
Pepijn	cec8ee0be6	feat: language annotation pipeline (#3471 ) Steerable annotation pipeline (lerobot-annotate) that populates the language_persistent and language_events columns introduced in PR 1 (#3467) directly into data/chunk-/file-.parquet. This is PR 2 of the three-PR plan: PR 1 (Add extensive language support #3467): schema + DSL + rendering, base of this PR PR 2 (this PR): annotation pipeline writing into PR 1's columns PR 3: model with language prediction and runtime A VLM (Qwen-VL family, served on vLLM) watches each episode's video and emits grounded language annotations: subtasks, plans, memory, task rephrasings, interjections + speech, and per-camera VQA. The pipeline is built for production annotation at scale — single-camera grounding, embedded-frame inputs, a describe-then-segment grounding flow, and a deterministic full-episode coverage guarantee — informed by Scale's dense-captioning findings (representation > sampling, rules > reasoning, model capacity is the biggest lever, two-pass systems compound errors)	2026-06-12 15:12:33 +02:00
Nikodem Bartnik	02b315ab6a	Docs/model card improvements (#3634 ) * update policy deployment instruction with rollout * add port and fix formatting * add more base models to generate model card * updated and extended model descriptions * fix bug * improved and extended structure * exclude the templates from config * add images and visualize dataset button * add all policies we have docs for * remove policies without the docs * new fields, improved examples	2026-06-12 13:26:52 +02:00
Pepijn	234c768dfb	feat(datasets): deterministic, resumable shuffling for EpisodeAwareSampler (#3769 ) * fix(datasets): expose a generator on EpisodeAwareSampler for distributed shuffle sync In distributed training, accelerate can only synchronize the shuffle permutation across ranks when the sampler exposes a generator attribute. EpisodeAwareSampler shuffled via the global torch RNG, so disjoint batch shards relied on every rank's global CPU RNG staying in lockstep forever; any rank-asymmetric RNG consumption (e.g. eval rollouts on the main process only) silently desynced the permutations and ranks trained on overlapping/missing samples. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(train): seed sampler generator and gate dataset download per node - Pass a generator seeded with cfg.seed to EpisodeAwareSampler so accelerator.prepare registers it as the synchronized RNG and the shuffle order is reproducible. - Gate the initial make_dataset call on is_local_main_process instead of is_main_process: the global main process only exists on node 0, so on every other node all local ranks were downloading the dataset and building the Arrow cache concurrently. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(datasets): add DeterministicEpisodeAwareSampler with O(1) memory and sample-exact resume Add a sampler that never materializes frame indices: it stores only per-episode boundaries (numpy, a few bytes per episode) and maps logical positions to frame indices on the fly with searchsorted. Shuffling uses a seeded Feistel permutation over [0, num_frames) (cycle-walking to the exact domain), so the data order is a pure function of (seed, epoch): - no RNG state to synchronize across distributed ranks, - constant memory and zero epoch-boundary cost at any dataset size, - O(1) seek to any position, enabling sample-exact resume. Opt in with --deterministic_sampler=true. On resume, lerobot-train maps the checkpointed step back to (epoch, start_index) via compute_sampler_state and continues at the exact sample where the run left off (up to accelerate's even_batches padding at epoch boundaries). The shuffle is pseudo-random rather than a true uniform permutation, the standard trade-off in large-scale training loaders. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(datasets): fold deterministic mode into EpisodeAwareSampler Instead of a parallel DeterministicEpisodeAwareSampler class, extend the existing EpisodeAwareSampler with a deterministic=True mode (seeded Feistel permutation, epoch auto-advance, state_dict/load_state_dict). The default mode is behavior-identical: same torch.randperm consumption and the same generator contract accelerate synchronizes; the O(N) Python index list is replaced by O(num_episodes) boundary arrays in both modes, with `indices` kept as a back-compat property. Passing a generator together with deterministic=True is rejected, and the state/seek methods raise outside deterministic mode. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(train): enable deterministic_sampler by default Deterministic data order (sample-exact resume, no cross-rank RNG sync, O(1) sampler memory) is now the default for map-style training; set deterministic_sampler=false to restore the legacy RNG-based shuffle. Streaming datasets ignore the flag (the sampler path only applies to map-style datasets), replacing the previous hard validation error so streaming configs keep working with the new default. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(datasets): default EpisodeAwareSampler to deterministic mode and trim comments deterministic=True is now the class default as well as the training default; the legacy RNG path requires an explicit deterministic=False (the train script's non-deterministic branch passes it). Docstrings and inline comments slimmed down across the changed files. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(sampler): drain resumed trillion-frame sampler via iter() to avoid list() prealloc list(sampler) calls PyObject_LengthHint -> __len__ (the full 10*12 epoch length) and preallocates that many slots before iterating, OOMing even though the resumed epoch only yields 3 frames. Collect through the iterator (no length hint) so the test exercises the real O(1) seek/drain instead of CPython's list growth heuristic. fix(datasets): guard Feistel cycle-walking loop against non-convergence Replace the unbounded while True in EpisodeAwareSampler._permute with a bounded for loop capped at _MAX_CYCLE_WALK_STEPS (100) and raise RuntimeError if the cycle-walk fails to land in [0, num_frames). The loop is expected to converge in <4 steps on the chosen power-of-two domain, so the bound is a safety net that should never trip in practice but prevents a pathological infinite loop. https://claude.ai/code/session_01HQ15tFrBsHYScjGWosEv22 * fix(datasets): make deterministic-sampler resume robust to world-size changes compute_sampler_state mapped a checkpointed step back to (epoch, start_index) using the current num_processes, but the number of sampler positions a step consumes scales with the world size that produced it. Resuming on a different GPU count therefore landed on the wrong epoch/offset, silently re-seeing or skipping data. Record num_processes in training_step.json at checkpoint time and feed the checkpoint's value into compute_sampler_state on resume, so the data order resumes at the right position regardless of the new world size. Warn when the world size changed (the global offset is correct, but per-rank sample-exactness needs the same topology). Old checkpoints without the field fall back to the current world size. Also document compute_sampler_state's assumptions explicitly: num_processes / batch_size must match the checkpointing run, and accelerate's even_batches=True padding is mirrored by the ceil(... / num_processes) term. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> * style: apply ruff-format to lerobot_train.py Collapse the compute_sampler_state(...) call onto one line so the ruff-format pre-commit hook passes (fixes the failing CI check). Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(datasets): use seeded torch.randperm instead of Feistel in EpisodeAwareSampler Drop the Feistel permutation (and its SplitMix64 hash / cycle-walking) in favor of a torch.randperm seeded from (seed, epoch). The deterministic mode keeps its key properties - data order is a pure function of (seed, epoch), so it reproduces on every rank with no global-RNG synchronization, and - state_dict / load_state_dict still resume sample-exactly, now by regenerating the epoch's permutation and slicing from the saved offset. Construction stays O(num_episodes) (only episode boundaries are stored, never a per-frame index list). The trade-off vs Feistel: the per-epoch shuffle is again O(num_frames) memory (the randperm tensor) and no longer O(1)-seekable, in exchange for ~30 fewer LOC and a truly uniform shuffle. Tests updated: the trillion-frame O(1) test is replaced with a boundary-storage check and a scale resume-exactness test. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(datasets): make EpisodeAwareSampler always deterministic With Feistel gone, deterministic and legacy modes were both just torch.randperm and the deterministic path strictly dominated (reproducible across ranks via the (seed, epoch) seed, no accelerate generator sync, resumable). Collapse to a single path and drop the redundant flag: - remove the `deterministic` and `generator` constructor args, `_iter_default`, and `_require_deterministic`; `set_epoch` / `state_dict` / `load_state_dict` are now unconditional - remove the `deterministic_sampler` train config field and the legacy generator branch in lerobot_train.py (non-streaming map datasets always use the sampler) - drop the now-obsolete generator/legacy tests Note: removes the `generator` kwarg from EpisodeAwareSampler (back-compat break vs main); the order is now a pure function of (seed, epoch), so no cross-rank RNG sync is needed. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(datasets): address sampler review (batch_size resume guard + docs) - Record batch_size in training_step.json alongside num_processes and feed the checkpoint's value into compute_sampler_state on resume; warn when it differs (per-rank sample-exactness needs the same batch size). - Document the set_epoch vs __iter__ auto-advance coupling on EpisodeAwareSampler (callers should rely on exactly one mechanism per run). - Note the broadened (reproducibility-breaking) sampler guard and the no-generator distributed sharding correctness in lerobot_train.py. - Add load_training_batch_size + parallel tests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(train): download dataset once on the global main process Gate the training dataset download on the global is_main_process (download once to the shared dataset root, barrier, then every other rank reads the already-populated copy) instead of per-node is_local_main_process. LeRobotDataset skips its snapshot_download when try_load() succeeds, so no rank re-downloads. Assumes the dataset root / HF cache is on storage shared across nodes. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(datasets): trim sampler comment and drop duplicate tests Remove the verbose dataloader-guard comment and the two EpisodeAwareSampler tests that duplicated existing validation/warning coverage (no coverage loss). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-12 11:47:16 +02:00
Caroline Pascal	0e9bd9e6fb	feat(trim): adding optional trimming option in reencode_video (#3779 ) * feat(trim): adding optional trimming option in reencode_video * tests(trim): add triming test --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-06-12 11:29:26 +02:00
Steven Palma	87242cfced	chore(dependecies): relax grpc-related bounds (#3777 ) Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>	2026-06-11 19:13:14 +02:00
Steven Palma	1edc83a0ef	feat(training): bump accelerate + use reduction types for tracked metrics in a multi rank setup (#3773 ) * feat(training): bump accelerate + use reduction types for tracked metrics in a multi rank setup * chore: address feedback	2026-06-11 19:07:28 +02:00
Steven Palma	6fbcf67249	chore: update readme (#3774 ) * chore: update readme * chore: update authors in project readme	2026-06-11 18:17:26 +02:00
Pepijn	41166b39fb	fix(train): synchronize EpisodeAwareSampler shuffling across ranks and gate dataset download per node (#3768 ) * fix(datasets): expose a generator on EpisodeAwareSampler for distributed shuffle sync In distributed training, accelerate can only synchronize the shuffle permutation across ranks when the sampler exposes a generator attribute. EpisodeAwareSampler shuffled via the global torch RNG, so disjoint batch shards relied on every rank's global CPU RNG staying in lockstep forever; any rank-asymmetric RNG consumption (e.g. eval rollouts on the main process only) silently desynced the permutations and ranks trained on overlapping/missing samples. * fix(train): seed sampler generator and gate dataset download per node - Pass a generator seeded with cfg.seed to EpisodeAwareSampler so accelerator.prepare registers it as the synchronized RNG and the shuffle order is reproducible. - Gate the initial make_dataset call on is_local_main_process instead of is_main_process: the global main process only exists on node 0, so on every other node all local ranks were downloading the dataset and building the Arrow cache concurrently.	2026-06-11 11:07:42 +02:00
Khalil Meftah	fa3eb9fce3	test(rewards): add unit tests for distributional value function model	2026-06-10 16:07:43 +02:00
Khalil Meftah	500c91ba92	feat(rewards): introduce distributional value function model - Added a new distributional value function (DistributionalVF) model for RECAP, including its configuration, modeling, and processor components. - Updated the rewards factory to support the new model type. - Updated to include the new model in the dependencies.	2026-06-10 15:24:50 +02:00
Steven Palma	79c6821407	chore(dependecies): update mujoco transitives (#3756 )	2026-06-10 12:58:55 +02:00
Steven Palma	507083249f	Revert "fix(pyproject): adding ceiling bound on mujoco (<3.9.0) (#3751 )" (#3754 ) This reverts commit `bd22407d93`.	2026-06-10 10:38:42 +02:00
Caroline Pascal	bd22407d93	fix(pyproject): adding ceiling bound on mujoco (<3.9.0) (#3751 ) * fix(pyproject): adding ceiling bound on mujoco (<3.9.0) * chore(uv.lock): updating uv.lock * fix(linux): adding missing linux dependencies * chore(uv.lock): updating uv.lock	2026-06-09 23:31:43 +02:00
Adil Zouitine	49755a3d9e	feat(processor): Add in-memory processor pipeline serialization (#3732 ) * feat(processor): add in-memory pipeline serialization Expose processor pipeline config and tensor state without requiring temporary files, so processors can be transported, compared, or hashed directly in memory. * feat(processor): enhance DataProcessorPipeline with registry support - Added a new RegisteredLazyTensorStateStep for registry-based serialization tests. - Improved state filename handling in _get_state_filename method. - Refactored validation logic in _validate_loaded_config to simplify parameter types. - Updated tests to verify registry step functionality and ensure correct state loading. * refactor(processor): update state handling in DataProcessorPipeline - Introduced a new static method _get_state_key to derive in-memory state keys from serialized filenames. - Updated state_dict and load_state_dict methods to use suffixless state keys instead of filenames. - Adjusted related tests to reflect changes in state key handling, ensuring consistency in state management * fix(processor): update loaded_config argument description in DataProcessorPipeline - Clarified the documentation for the loaded_config parameter to indicate that it may be a non-dictionary value, enhancing understanding for future developers.	2026-06-08 11:27:24 +02:00

1 2 3 4 5 ...

1530 Commits