Files
lerobot/examples/dataset/prompt.txt
T
Jade Choghari 6216932fb0 more changres
2025-12-09 08:57:49 +00:00

21 lines
832 B
Plaintext

Goal:
Create a high and low policy, the high policy take obs + text and return text, its a VLM
The low policy is a VLA that take text returned by the high and ouptut actions
High policy is ran every one second or when user send prompt
Synthetic data generation:
D demo -> teleop data with a global task annotation
D label -> segment data into short skills (one to three seconds)
D syn: p-gen will create the high level prompt user might gave to p-hi
Given D label prompt p-gen to imagine appropriate action, take images, ALL PRIOR skill labels in the episode: ℓ̂₀, …, ℓ̂ₜ₋₁
“Given the scene + all previous steps + current needed skill ℓ̂₅,
generate a user request that logically leads to ℓ̂₅.”
Train:
Phi(lt| images, global label) cross entropy - next token predictions
Plow(At| images, qt, lt)