feat(annotate): add dest_repo_id for separate push target

Adds an optional `dest_repo_id` to AnnotationPipelineConfig. When set, `push_to_hub` uploads the annotated dataset there instead of overwriting the source `repo_id`, restoring separate source/destination repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:40:04 +00:00 · 2026-05-18 15:05:23 +02:00
parent 9dfc9084e1
commit c5676ef1b3
3 changed files with 29 additions and 11 deletions
@@ -20,10 +20,13 @@ Spawns one ``h200x2`` job that:
  1. installs this branch of ``lerobot`` plus the annotation extras,
  2. boots two vllm servers (one per GPU) with Qwen3.6-35B-A3B-FP8,
  3. runs the plan / interjections / vqa modules across the dataset,
-  4. uploads the annotated dataset back to ``--repo_id``.
+  4. uploads the annotated dataset back to ``--repo_id`` (or to
+     ``--dest_repo_id`` when set).

-``--repo_id`` is both the download source and, with ``--push_to_hub=true``,
-the upload destination — the job annotates the dataset in place.
+``--repo_id`` is the download source and, with ``--push_to_hub=true``, also
+the default upload destination — the job annotates the dataset in place.
+Pass ``--dest_repo_id`` to push the result to a separate repo instead and
+leave the source untouched.

 Usage:

@@ -53,9 +56,11 @@ CMD = (
    "export VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0 && "
    "export VLLM_VIDEO_BACKEND=pyav && "
    "lerobot-annotate "
-    # The dataset to annotate; also the push destination (annotate in place).
+    # The dataset to annotate. By default it is also the push destination
+    # (annotate in place); pass --dest_repo_id to push to a separate repo.
    "--repo_id=<your-org>/<your-dataset> "
    "--push_to_hub=true "
+    # "--dest_repo_id=<your-org>/<your-annotated-dataset> "
    "--vlm.backend=openai "
    "--vlm.model_id=Qwen/Qwen3.6-35B-A3B-FP8 "
    "--vlm.parallel_servers=2 "