lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-08 02:22:02 +00:00

Files

T

Pepijn fbcb9225f5 feat: oversample sparse VQA annotations (recipe consumption + weighted sampler)

VQA annotations are sparse, so VQA was badly underrepresented in training:
its effective share was weight x density, and blend draws that picked an
ask_vqa* sub-recipe for a non-VQA frame were wasted entirely.

Two pieces:

1. Recipe-side consumption (language_render.py): render_sample now routes
   any frame that carries a VQA annotation to a matching ask_vqa* sub-recipe,
   regardless of the weighted blend draw. No VQA annotation is wasted and no
   draw lands on a non-renderable VQA recipe — VQA's recipe-side share now
   equals the VQA-annotation density.

2. Dataset-side oversampling (WeightedEpisodeAwareSampler + vqa_target_fraction):
   a new weighted, episode-aware sampler draws frames with replacement by
   per-frame weight. When TrainPipelineConfig.vqa_target_fraction is set, the
   train script scans language_events, weights VQA frames so they make up
   ~that fraction of the training stream, and uses the weighted sampler. This
   is what actually lets VQA exceed its natural density. Default None keeps
   uniform episode-aware sampling unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:30:00 +02:00

annotations

refactor(annotate): drop dataset-level `tools` parquet column

2026-04-30 18:48:36 +02:00

artifacts

feat(dataset): 2x faster dataloader via parallel decode, uint8 transport, and persistent workers (#3406 )

2026-04-19 00:08:22 +02:00

async_inference

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

cameras

test(cameras): skip flaky async_read test (#3106 )