mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-21 19:49:49 +00:00
fix formatting
This commit is contained in:
@@ -97,48 +97,47 @@ def create_sarm_prompt(subtask_list: list[str]) -> str:
|
|||||||
subtask_str = "\n".join([f" - {name}" for name in subtask_list])
|
subtask_str = "\n".join([f" - {name}" for name in subtask_list])
|
||||||
|
|
||||||
return f"""# Role
|
return f"""# Role
|
||||||
You are an expert Robotics Vision System specializing in temporal action localization. Your task is to segment a video of a robot manipulation demonstration into a sequence of distinct, non-overlapping atomic actions.
|
You are an expert Robotics Vision System specializing in temporal action localization. Your task is to segment a video of a robot manipulation demonstration into a sequence of distinct, non-overlapping atomic actions.
|
||||||
|
|
||||||
# Input Data
|
# Input Data
|
||||||
## Allowed Subtask Vocabulary
|
## Allowed Subtask Vocabulary
|
||||||
You must strictly identify the video segments using ONLY the following labels. Do not create new labels or modify existing ones:
|
You must strictly identify the video segments using ONLY the following labels. Do not create new labels or modify existing ones:
|
||||||
[
|
[
|
||||||
{subtask_str}
|
{subtask_str}
|
||||||
]
|
]
|
||||||
|
|
||||||
# Constraints & Logic
|
# Constraints & Logic
|
||||||
1. **Continuous Coverage:** The entire video duration (from 00:00 to the final second) must be accounted for. There can be no gaps between tasks.
|
1. **Continuous Coverage:** The entire video duration (from 00:00 to the final second) must be accounted for. There can be no gaps between tasks.
|
||||||
2. **Boundary Logic:** The `end` timestamp of one task must be the exact `start` timestamp of the next task.
|
2. **Boundary Logic:** The `end` timestamp of one task must be the exact `start` timestamp of the next task.
|
||||||
3. **Linear Progression:** The video represents a single successful demonstration. Each subtask from the vocabulary appears exactly once, in logical chronological order.
|
3. **Linear Progression:** The video represents a single successful demonstration. Each subtask from the vocabulary appears exactly once, in logical chronological order.
|
||||||
4. **Format:** Timestamps must be in "MM:SS" format.
|
4. **Format:** Timestamps must be in "MM:SS" format.
|
||||||
|
|
||||||
# Step-by-Step Analysis Process
|
# Step-by-Step Analysis Process
|
||||||
1. **Visual grounding:** Look for the specific visual state changes that define the transition between tasks (e.g., gripper touching object, object lifting off table).
|
1. **Visual grounding:** Look for the specific visual state changes that define the transition between tasks (e.g., gripper touching object, object lifting off table).
|
||||||
2. **Define Boundaries:** Determine the specific frame where the motion profile changes to fit the next subtask label.
|
2. **Define Boundaries:** Determine the specific frame where the motion profile changes to fit the next subtask label.
|
||||||
3. **Fill Gaps:** If there is a pause between meaningful actions, append that time to the *preceding* task to ensure continuous coverage.
|
3. **Fill Gaps:** If there is a pause between meaningful actions, append that time to the *preceding* task to ensure continuous coverage.
|
||||||
|
|
||||||
# Output Format
|
# Output Format
|
||||||
Provide the output in valid JSON format.
|
Provide the output in valid JSON format.
|
||||||
Structure:
|
Structure:
|
||||||
{
|
{{
|
||||||
"subtasks": [
|
"subtasks": [
|
||||||
{
|
{{
|
||||||
"name": "EXACT_NAME_FROM_LIST",
|
"name": "EXACT_NAME_FROM_LIST",
|
||||||
"timestamps": {
|
"timestamps": {{
|
||||||
"start": "MM:SS",
|
"start": "MM:SS",
|
||||||
"end": "MM:SS"
|
"end": "MM:SS"
|
||||||
}
|
}}
|
||||||
},
|
}},
|
||||||
{
|
{{
|
||||||
"name": "EXACT_NAME_FROM_LIST",
|
"name": "EXACT_NAME_FROM_LIST",
|
||||||
"timestamps": {
|
"timestamps": {{
|
||||||
"start": "MM:SS",
|
"start": "MM:SS",
|
||||||
"end": "MM:SS"
|
"end": "MM:SS"
|
||||||
}
|
}}
|
||||||
}
|
}}
|
||||||
]
|
]
|
||||||
}
|
}}"""
|
||||||
"""
|
|
||||||
|
|
||||||
class VideoAnnotator:
|
class VideoAnnotator:
|
||||||
"""Annotates robot manipulation videos using local Qwen3-VL model on GPU"""
|
"""Annotates robot manipulation videos using local Qwen3-VL model on GPU"""
|
||||||
|
|||||||
Reference in New Issue
Block a user