You are labeling a teleoperated robot demonstration.

The user originally asked: "{episode_task}"

You are shown the entire demonstration as a single video. Watch the
whole clip, then segment it into a list of consecutive atomic subtasks
the robot performs.

{vocabulary_block}Authoring rules — Hi Robot atom granularity, pi0.7-style short prompts:

- Each subtask = one COMPOSITE atomic skill the low-level policy can
  execute end-to-end. A "skill" bundles its own approach motion with
  its terminal action — do NOT split the approach off as its own
  subtask. The whole-arm policy already learns to reach as part of
  every manipulation primitive.
- Write each subtask as an IMPERATIVE COMMAND, starting with one of
  these verbs (extend only when none fits):
    pick up <obj>           — approach + grasp + lift in one subtask
    put <obj> on/in <loc>   — transport + release in one subtask
    place <obj> on/in <loc> — synonym of "put"; pick one and stay consistent
    push <obj>              — contact + linear shove
    pull <obj>              — contact + linear retract
    turn <knob/dial/handle> — rotary actuation
    press <button>          — single-press contact
    open <drawer/door/lid>  — full open motion
    close <drawer/door/lid> — full close motion
    pour <src> into <dst>   — tilt + flow
    insert <obj> into <slot>— alignment + push-fit
    go to <loc>             — ONLY when no grasp / actuation follows
                             (e.g. a pure relocation between phases).
                             If the next subtask grasps something at
                             that location, drop "go to ..." and just
                             write "pick up ..." instead.
- Forbidden ultra-fine splits — the VLM is NOT allowed to emit these
  as standalone subtasks; fold them into the parent composite:
    "move to X"   → fold into "pick up X" (or whatever follows)
    "reach for X" → fold into "pick up X"
    "grasp X"     → fold into "pick up X"
    "lift X"      → fold into "pick up X" (or "put X on Y" if it's
                    the transport phase of a place)
    "release X"   → fold into "put X on Y" (or "place X in Y")
- Keep it SHORT — a verb phrase, not a sentence. Drop articles
  ("the", "a") and adverbs ("carefully", "slowly"). Add a "how"
  detail (which hand, which grasp point) ONLY when it is needed to
  disambiguate. Every subtask must begin with one of the verbs
  above (no leading nouns, no "then", no "first").
- NEVER use third person. Never write "the robot", "the arm", "the
  gripper moves", "it picks up" — the robot is implied. Command it,
  do not describe it.
- Use the exact object nouns from the task above. If the task says
  "cube", every subtask says "cube" — never switch to "block". If it
  says "box", never switch to "bin"/"container". Keep vocabulary
  consistent across the whole episode.
- Good: "pick up blue cube", "put blue cube in box", "open drawer",
  "turn red knob", "press start button", "go to sink".
- Bad: "move to blue cube" (approach as its own subtask — forbidden,
  must be folded into "pick up blue cube"); "the robot arm moves
  towards the blue cube" (third person, too long); "carefully pick
  up the cube" (adverb, article); "release the yellow block"
  ("block" when the task said "cube", and "release" must be folded
  into a "put"/"place" subtask).
- Subtasks are non-overlapping and cover the full episode in order.
  Choose the cut points yourself based on what you see in the video
  (gripper open/close events, contact, regrasps, transitions).
- Each subtask spans at least {min_subtask_seconds} seconds. If a
  candidate span would be shorter, merge it into its neighbour
  rather than emitting it.
- Do not exceed {max_steps} subtasks total. Fewer, larger composites
  are preferred over many micro-steps.
- Every subtask's [start_time, end_time] must lie within
  [0.0, {episode_duration}] seconds.

Output strictly valid JSON of shape:

  {{
    "subtasks": [
      {{"text": "<short imperative verb phrase>", "start": <float>, "end": <float>}},
      ...
    ]
  }}
