lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-28 15:09:51 +00:00

Files

T

pepijn 8615f3f613 annotate(vqa): tighten bbox + keypoint quality bar

Low-confidence VLM detections were producing many overlapping, loose
boxes per frame (oven + toaster oven + counter + drawer + ...) and
coarse keypoints, hurting downstream policy grounding. Two surgical
fixes:

- module_3_vqa prompt: cap bbox at most 3 high-confidence detections
  (prefer 1 tight box), require specific labels and ≤10% padding,
  allow empty detections list when nothing meets the bar; keypoint
  must be a single pixel-precise feature (handle / button / gripper
  tip) rather than a coarse "somewhere on object" point.
- run_hf_job: lower vlm.temperature 0.7 → 0.2. Bbox + keypoint are
  coordinate-regression tasks where sampling noise directly degrades
  localization; question phrasing still varies enough at 0.2.

No new config knobs — the count cap lives in the prompt since "top-N
by confidence" is best picked by the VLM itself. Validator already
accepts empty detections.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-26 08:31:37 +00:00

annotations

annotate(vqa): tighten bbox + keypoint quality bar

2026-05-26 08:31:37 +00:00

backward_compatibility

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

benchmark

pi052: SDPA attention port + selective AC + bench harness

2026-05-25 21:59:20 +00:00

dataset

refactor: support custom progress parquet overlays (#3640 )