mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-17 09:39:47 +00:00
21 lines
832 B
Plaintext
21 lines
832 B
Plaintext
Goal:
|
|
Create a high and low policy, the high policy take obs + text and return text, its a VLM
|
|
The low policy is a VLA that take text returned by the high and ouptut actions
|
|
|
|
|
|
High policy is ran every one second or when user send prompt
|
|
|
|
|
|
Synthetic data generation:
|
|
D demo -> teleop data with a global task annotation
|
|
D label -> segment data into short skills (one to three seconds)
|
|
D syn: p-gen will create the high level prompt user might gave to p-hi
|
|
Given D label prompt p-gen to imagine appropriate action, take images, ALL PRIOR skill labels in the episode: ℓ̂₀, …, ℓ̂ₜ₋₁
|
|
|
|
“Given the scene + all previous steps + current needed skill ℓ̂₅,
|
|
generate a user request that logically leads to ℓ̂₅.”
|
|
|
|
Train:
|
|
Phi(lt| images, global label) cross entropy - next token predictions
|
|
Plow(At| images, qt, lt)
|