feat(dagger): Add HIL/Dagger/HG-Dagger/RaC style data collection (#2833)

* feat: HIL data collection, RTC interpolator, and action queue improvements

- Add Human-in-the-Loop (HIL) data collection examples (sync + RTC)
- Add HIL data collection documentation
- Add ActionInterpolator for smoother policy control at higher rates
- Integrate interpolator into lerobot-record and eval_with_real_robot
- Add action queue clear() and get_processed_left_over() methods
- Add rtc/__init__.py for cleaner imports

* docs: expand Related Work section with paper summaries

* fix: only record dataset frames at original fps, not at interpolated rate

The interpolator speeds up robot control (e.g. 2x) but dataset frames
should still be recorded at the original fps. Interpolated-only
iterations now only send actions to the robot without writing to the
dataset.

* refactor: merge HIL sync and RTC scripts into single file with --rtc.enabled toggle

Combines hil_data_collection.py and hil_data_collection_rtc.py into one
script. RTC is toggled via --rtc.enabled=true (defaults to off for sync
inference). Deletes the separate hil_data_collection_rtc.py and updates
docs to reflect the single-script usage.

* test: add ActionInterpolator test suite (29 tests)

Covers constructor validation, passthrough (multiplier=1), 2x and 3x
interpolation with exact value checks, reset/episode boundaries,
control interval calculation, multi-dim actions, and simulated
control loop integration.

* test: add ActionQueue + ActionInterpolator integration tests

Verifies the interpolator doesn't interfere with RTC's leftover chunk
tracking: queue consumption rate matches base fps regardless of
multiplier, get_left_over/get_processed_left_over only change on
queue.get(), merge preserves smooth interpolation across chunks,
and interpolator reset is independent of queue state.

* feat: register SO follower/leader configs in HIL script

Adds SOFollowerRobotConfig and SOLeaderTeleopConfig imports so
SO100/SO101 robots can be used via --robot.type=so_follower
and --teleop.type=so_leader. Updates docs accordingly.

Made-with: Cursor

* docs: remove em dashes from HIL documentation

Made-with: Cursor

* refactor: rename examples/rac to examples/hil

Updates directory name and all references in docs and script docstrings.

Made-with: Cursor

* fix: encorperate pr feedback comments

* refactor(tests): enhance ActionInterpolator test structure and add detailed docstrings

* feedback pr and test fix

* fix(test): pass correct real_delay in interpolator delay test

The test was passing real_delay=0 and relying on _check_delays to
silently override it with the index-based diff. Now passes real_delay=3
to match the 3 actions consumed during the simulated inference period.


* fix pr feedback

* ordering

* update hil script

* fix

* default name

* fix(bi_openarm): use kw_only=True to fix dataclass field ordering

BiOpenArmFollowerConfig overrides `id` with a default, making it
positional in the child — non-default `left_arm_config` then follows a
default field, which Python dataclasses forbid. Adding kw_only=True
(matching the parent RobotConfig) removes positional constraints.

Made-with: Cursor

* style: format long line in hil_data_collection.py

Made-with: Cursor

* pr feedback

---------

Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
This commit is contained in:
Pepijn
2026-04-02 19:53:59 +02:00
committed by GitHub
parent 66fef25ded
commit 818892a38b
13 changed files with 2605 additions and 61 deletions
+2
View File
@@ -17,6 +17,8 @@
title: Train RL in Simulation
- local: multi_gpu_training
title: Multi GPU training
- local: hil_data_collection
title: Human In the Loop Data Collection
- local: peft_training
title: Training with PEFT (e.g., LoRA)
- local: rename_map
+269
View File
@@ -0,0 +1,269 @@
# Human-In-the-Loop Data Collection
Human-In-the-Loop (HIL) data collection lets you improve a trained policy by deploying it on a real robot while a human operator monitors and intervenes when needed. The intervention data (recovery movements and corrections) is recorded alongside autonomous segments, producing a richer training dataset that teaches the policy how to handle failures.
---
## Why Human-In-the-Loop?
Standard behavioral cloning trains policies on successful demonstrations only. During deployment, small errors can compound and push the robot into states never seen during training (distribution shift). HIL data collection addresses this by:
- Running the trained policy on the real robot
- Having a human intervene when the robot is about to fail
- Recording the human's recovery and correction as training data
- Fine-tuning the policy on the combined dataset
This produces a policy that not only knows how to perform the task, but also how to recover when things go wrong.
---
## How It Works
During a HIL session, the human operator follows this loop within each episode:
1. **Watch** the policy run autonomously
2. **Pause** when failure is imminent, the robot holds its position
3. **Take control** and teleoperate the robot back to a good state (recovery), then correct the behavior
4. **Return control to the policy**, the policy resumes autonomous execution
5. Repeat steps 24 as many times as needed during the episode
6. **End the episode** when the task is complete, save and move on to the next rollout
Both autonomous and human-controlled segments are recorded. The policy and human can alternate control multiple times within a single episode, and the episode continues from the current state after each handoff (no reset required just because intervention happened). This captures autonomous execution, recovery, and correction in one continuous trajectory. After collection, the combined dataset (original demonstrations + HIL data) is used to fine-tune the policy.
This process can be repeated iteratively: deploy, collect, fine-tune, repeat. Each round targets the current policy's failure modes.
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Policy v0 (trained on demos) │
│ ↓ │
│ HIL Collection (target current failure modes) → Fine-tune → Policy v1 │
│ ↓ │
│ HIL Collection (target new failure modes) → Fine-tune → Policy v2 │
│ ↓ │
│ ... (repeat until satisfactory performance) │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Hardware Requirements
### Teleoperator Requirements
The `examples/hil` HIL scripts require **teleoperators with active motors** that can:
- Enable/disable torque programmatically
- Move to target positions (to mirror the robot state when pausing)
**Compatible teleoperators in the current `examples/hil` scripts:**
- `openarm_mini` - OpenArm Mini
- `so_leader` - SO100 / SO101 leader arm
> [!IMPORTANT]
> The provided `examples/hil` commands default to `bi_openarm_follower` + `openarm_mini`.
> `so_follower` + `so_leader` configs are also registered and can be used via CLI flags.
---
## Script
A single script handles both synchronous and RTC-based inference. Toggle RTC with `--rtc.enabled=true`:
| Mode | Flag | Models |
| ------------------------ | -------------------- | --------------------- |
| Standard (default) | _(no flag needed)_ | ACT, Diffusion Policy |
| Real-Time Chunking (RTC) | `--rtc.enabled=true` | Pi0, Pi0.5, SmolVLA |
---
## Step-by-Step Guide
### Step 1: Pre-train a Base Policy
First, train a policy on your demonstration dataset:
```bash
python src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your-username/demo-dataset \
--policy.type=pi0 \
--output_dir=outputs/pretrain \
--batch_size=32 \
--steps=50000
```
### Step 2: Collect HIL Data
**Standard inference (ACT, Diffusion Policy):**
```bash
python examples/hil/hil_data_collection.py \
--robot.type=bi_openarm_follower \
--robot.left_arm_config.port=can1 \
--robot.left_arm_config.side=left \
--robot.right_arm_config.port=can0 \
--robot.right_arm_config.side=right \
--robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}}' \
--teleop.type=openarm_mini \
--teleop.port_left=/dev/ttyACM0 \
--teleop.port_right=/dev/ttyACM1 \
--policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
--dataset.repo_id=your-username/hil-dataset \
--dataset.single_task="Fold the T-shirt properly" \
--dataset.fps=30 \
--dataset.episode_time_s=1000 \
--dataset.num_episodes=50 \
--interpolation_multiplier=2
```
**With RTC for large models (Pi0, Pi0.5, SmolVLA):**
For models with high inference latency, enable RTC for smooth execution:
```bash
python examples/hil/hil_data_collection.py \
--rtc.enabled=true \
--rtc.execution_horizon=20 \
--rtc.max_guidance_weight=5.0 \
--rtc.prefix_attention_schedule=LINEAR \
--robot.type=bi_openarm_follower \
--robot.left_arm_config.port=can1 \
--robot.left_arm_config.side=left \
--robot.right_arm_config.port=can0 \
--robot.right_arm_config.side=right \
--robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}}' \
--teleop.type=openarm_mini \
--teleop.port_left=/dev/ttyACM0 \
--teleop.port_right=/dev/ttyACM1 \
--policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
--dataset.repo_id=your-username/hil-rtc-dataset \
--dataset.single_task="Fold the T-shirt properly" \
--dataset.fps=30 \
--dataset.episode_time_s=1000 \
--dataset.num_episodes=50 \
--interpolation_multiplier=3
```
**Controls (Conceptual):**
The interaction model is:
- **Pause input**: pause autonomous policy execution
- **Takeover input**: transfer control to the human operator and record intervention data
- **Return-to-policy input**: hand control back to the policy and continue the same episode
- **Episode control inputs**: save/re-record/stop/reset as needed
Exact key/pedal bindings can differ across scripts and hardware integrations. Use each script's printed controls as the source of truth for the concrete mapping on your setup.
**The HIL Protocol:**
1. Watch the policy run autonomously (teleop is idle/free)
2. When you see imminent failure, trigger the **pause input**
- Policy stops
- Teleoperator moves to match robot position (torque enabled)
- No frames recorded during pause
3. Trigger the **takeover input** to take control
- Teleoperator torque disabled, free to move
- **Recovery**: Teleoperate the robot back to a good state
- **Correction**: Correct the behavior
- All movements are recorded
4. Trigger the **return-to-policy input**
- Policy resumes autonomous execution from the current state
- You can intervene again at any time (repeat steps 24)
5. End and save the episode when the task is complete (or episode time limit is reached)
6. **Reset**: Teleop moves to robot position, you can move the robot to the starting position
7. Start the next episode
**Foot Pedal Setup (Linux):**
If using a USB foot pedal (PCsensor FootSwitch), ensure access:
```bash
sudo setfacl -m u:$USER:rw /dev/input/by-id/usb-PCsensor_FootSwitch-event-kbd
```
### Step 3: Fine-tune the Policy
Fine-tune on the **combined** dataset (`demo-dataset` + `hil-dataset` merged together):
```bash
python src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your-username/hil-dataset \
--policy.type=pi0 \
--policy.pretrained_path=outputs/pretrain/checkpoints/last/pretrained_model \
--output_dir=outputs/hil_finetune \
--steps=20000
```
Then deploy the fine-tuned policy and repeat from Step 2 to target its remaining failure modes.
---
## Tips for Effective HIL Collection
### When to Intervene
Intervene when you see:
- Robot about to make an irreversible mistake
- Robot hesitating or showing uncertain behavior
- Robot deviating from the expected trajectory
### Recovery: Teleoperating Back to a Good State
During recovery, teleoperate the robot back to a state where:
- The robot is in a familiar, in-distribution configuration
- The current subtask can still be completed
- The recovery trajectory itself is informative training data
### Quality of Corrections
During correction:
- Provide **confident, clean** trajectories
- Complete the current subtask fully
- Don't overcorrect or add unnecessary movements
---
## Related Work
This HIL data collection approach builds on ideas from interactive imitation learning:
- **DAgger** (Ross et al., 2011) introduced the core idea: instead of only training on expert demonstrations, query the expert for corrections on states the _learner_ visits. This breaks the compounding-error cycle of standard behavioral cloning by iteratively collecting on-policy data.
- **HG-DAgger** (Kelly et al., 2019) made this practical for robotics: a human expert monitors the robot and only intervenes when needed, rather than labeling every state. The gating between autonomous and human control is exactly the pause → takeover → return-to-policy loop used in the scripts here.
- **RaC** (Hu et al., 2025) scales this loop to long-horizon tasks by explicitly decomposing interventions into **recovery** (teleoperating back to a good state) and **correction** (demonstrating the right behavior from there). This decomposition is the protocol followed by the HIL scripts in `examples/hil`.
- **π0.6/RECAP** (Physical Intelligence, 2025) applies the same iterative collect-and-finetune loop at scale with VLA models, showing that even large pretrained policies benefit substantially from targeted human corrections on their own failure modes. π0.6 is trained using RECAP.
```bibtex
@article{ross2011dagger,
title={A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning},
author={Ross, Stéphane and Gordon, Geoffrey and Bagnell, Drew},
journal={Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
year={2011}
}
@article{kelly2019hgdagger,
title={HG-DAgger: Interactive Imitation Learning with Human Experts},
author={Kelly, Michael and Sidrane, Chelsea and Driggs-Campbell, Katherine and Kochenderfer, Mykel J},
journal={arXiv preprint arXiv:1810.02890},
year={2019}
}
@article{hu2025rac,
title={RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction},
author={Hu, Zheyuan and Wu, Robyn and Enock, Naveen and Li, Jasmine and Kadakia, Riya and Erickson, Zackory and Kumar, Aviral},
journal={arXiv preprint arXiv:2509.07953},
year={2025}
}
@article{pi2025recap,
title={π0.6: a VLA That Learns From Experience},
author={Physical Intelligence},
year={2025}
}
```
File diff suppressed because it is too large Load Diff
+228
View File
@@ -0,0 +1,228 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Shared utilities for Human-in-the-Loop data collection scripts."""
import logging
import time
from dataclasses import dataclass, field
from pathlib import Path
from lerobot.processor import (
IdentityProcessorStep,
RobotAction,
RobotObservation,
RobotProcessorPipeline,
)
from lerobot.processor.converters import (
observation_to_transition,
robot_action_observation_to_transition,
transition_to_observation,
transition_to_robot_action,
)
from lerobot.robots import Robot
from lerobot.teleoperators import Teleoperator
from lerobot.utils.control_utils import is_headless
from lerobot.utils.robot_utils import precise_sleep
logger = logging.getLogger(__name__)
@dataclass
class HILDatasetConfig:
repo_id: str
single_task: str
root: str | Path | None = None
fps: int = 30
episode_time_s: float = 120
num_episodes: int = 50
video: bool = True
push_to_hub: bool = True
private: bool = False
tags: list[str] | None = None
num_image_writer_processes: int = 0
num_image_writer_threads_per_camera: int = 4
video_encoding_batch_size: int = 1
vcodec: str = "auto"
streaming_encoding: bool = True
encoder_queue_maxsize: int = 30
encoder_threads: int | None = None
rename_map: dict[str, str] = field(default_factory=dict)
def teleop_has_motor_control(teleop: Teleoperator) -> bool:
"""Check if teleoperator has motor control capabilities."""
return all(hasattr(teleop, attr) for attr in ("enable_torque", "disable_torque", "write_goal_positions"))
def teleop_disable_torque(teleop: Teleoperator) -> None:
"""Disable teleop torque if supported."""
if hasattr(teleop, "disable_torque"):
teleop.disable_torque()
def teleop_enable_torque(teleop: Teleoperator) -> None:
"""Enable teleop torque if supported."""
if hasattr(teleop, "enable_torque"):
teleop.enable_torque()
def teleop_smooth_move_to(teleop: Teleoperator, target_pos: dict, duration_s: float = 2.0, fps: int = 50):
"""Smoothly move teleop to target position if motor control is available."""
if not teleop_has_motor_control(teleop):
logger.warning("Teleop does not support motor control - cannot mirror robot position")
return
teleop_enable_torque(teleop)
current = teleop.get_action()
steps = max(int(duration_s * fps), 1)
for step in range(steps + 1):
t = step / steps
interp = {}
for k in current:
if k in target_pos:
interp[k] = current[k] * (1 - t) + target_pos[k] * t
else:
interp[k] = current[k]
teleop.write_goal_positions(interp)
time.sleep(1 / fps)
def init_keyboard_listener():
"""Initialize keyboard listener with HIL controls."""
events = {
"exit_early": False,
"rerecord_episode": False,
"stop_recording": False,
"policy_paused": False,
"correction_active": False,
"resume_policy": False,
"in_reset": False,
"start_next_episode": False,
}
if is_headless():
logger.warning("Headless environment - keyboard controls unavailable")
return None, events
from pynput import keyboard
def on_press(key):
try:
if events["in_reset"]:
if key in [keyboard.Key.space, keyboard.Key.right]:
logger.info("[HIL] Starting next episode...")
events["start_next_episode"] = True
elif hasattr(key, "char") and key.char == "c":
events["start_next_episode"] = True
elif key == keyboard.Key.esc:
logger.info("[HIL] ESC - Stop recording, pushing to hub...")
events["stop_recording"] = True
events["start_next_episode"] = True
else:
if key == keyboard.Key.space:
if not events["policy_paused"] and not events["correction_active"]:
logger.info("[HIL] PAUSED - Press 'c' to take control or 'p' to resume policy")
events["policy_paused"] = True
elif hasattr(key, "char") and key.char == "c":
if events["policy_paused"] and not events["correction_active"]:
logger.info("[HIL] Taking control...")
events["start_next_episode"] = True
elif hasattr(key, "char") and key.char == "p":
if events["policy_paused"] or events["correction_active"]:
logger.info("[HIL] Resuming policy...")
events["resume_policy"] = True
elif key == keyboard.Key.right:
logger.info("[HIL] End episode")
events["exit_early"] = True
elif key == keyboard.Key.left:
logger.info("[HIL] Re-record episode")
events["rerecord_episode"] = True
events["exit_early"] = True
elif key == keyboard.Key.esc:
logger.info("[HIL] ESC - Stop recording...")
events["stop_recording"] = True
events["exit_early"] = True
except Exception as e:
logger.info(f"Key error: {e}")
listener = keyboard.Listener(on_press=on_press)
listener.start()
return listener, events
def make_identity_processors():
"""Create identity processors for recording."""
teleop_proc = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
steps=[IdentityProcessorStep()],
to_transition=robot_action_observation_to_transition,
to_output=transition_to_robot_action,
)
obs_proc = RobotProcessorPipeline[RobotObservation, RobotObservation](
steps=[IdentityProcessorStep()],
to_transition=observation_to_transition,
to_output=transition_to_observation,
)
return teleop_proc, obs_proc
def reset_loop(robot: Robot, teleop: Teleoperator, events: dict, fps: int):
"""Reset period where human repositions environment."""
logger.info("[HIL] RESET")
events["in_reset"] = True
events["start_next_episode"] = False
obs = robot.get_observation()
robot_pos = {k: v for k, v in obs.items() if k.endswith(".pos") and k in robot.observation_features}
teleop_smooth_move_to(teleop, robot_pos, duration_s=2.0, fps=50)
logger.info("Press any key to enable teleoperation")
while not events["start_next_episode"] and not events["stop_recording"]:
precise_sleep(0.05)
if events["stop_recording"]:
return
events["start_next_episode"] = False
teleop_disable_torque(teleop)
logger.info("Teleop enabled - press any key to start episode")
while not events["start_next_episode"] and not events["stop_recording"]:
loop_start = time.perf_counter()
action = teleop.get_action()
robot.send_action(action)
precise_sleep(1 / fps - (time.perf_counter() - loop_start))
events["in_reset"] = False
events["start_next_episode"] = False
events["exit_early"] = False
events["policy_paused"] = False
events["correction_active"] = False
events["resume_policy"] = False
def print_controls(rtc: bool = False):
"""Print control instructions."""
mode = "Human-in-the-Loop Data Collection" + (" (RTC)" if rtc else "")
logger.info(
"%s\n Controls:\n"
" SPACE - Pause policy\n"
" c - Take control\n"
" p - Resume policy after pause/correction\n"
" → - End episode\n"
" ESC - Stop and push to hub",
mode,
)
+16 -9
View File
@@ -69,15 +69,20 @@ Usage:
--policy.path=lerobot-data-collection/folding_final \
--robot.type=bi_openarm_follower \
--robot.cameras='{left_wrist: {type: opencv, index_or_path: "/dev/video4", width: 1280, height: 720, fps: 30}, base: {type: opencv, index_or_path: "/dev/video2", width: 640, height: 480, fps: 30}, right_wrist: {type: opencv, index_or_path: "/dev/video0", width: 1280, height: 720, fps: 30}}' \
--robot.left_arm_config.port=can1 \
--robot.left_arm_config.port=can0 \
--robot.left_arm_config.side=left \
--robot.left_arm_config.can_interface=socketcan \
--robot.right_arm_config.port=can0 \
--robot.left_arm_config.disable_torque_on_disconnect=true \
--robot.left_arm_config.max_relative_target=8.0 \
--robot.right_arm_config.port=can1 \
--robot.right_arm_config.side=right \
--robot.right_arm_config.can_interface=socketcan \
--robot.right_arm_config.disable_torque_on_disconnect=true \
--robot.right_arm_config.max_relative_target=8.0 \
--task="Fold the T-shirt properly" \
--fps=30 \
--duration=2000 \
--interpolation_multiplier=3 \
--rtc.enabled=true \
--rtc.execution_horizon=20 \
--rtc.max_guidance_weight=5.0 \
@@ -104,9 +109,7 @@ from lerobot.configs.policies import PreTrainedConfig
from lerobot.configs.types import RTCAttentionSchedule
from lerobot.datasets.feature_utils import build_dataset_frame, hw_to_dataset_features
from lerobot.policies.factory import get_policy_class, make_pre_post_processors
from lerobot.policies.rtc.action_queue import ActionQueue
from lerobot.policies.rtc.configuration_rtc import RTCConfig
from lerobot.policies.rtc.latency_tracker import LatencyTracker
from lerobot.policies.rtc import ActionInterpolator, ActionQueue, LatencyTracker, RTCConfig
from lerobot.processor import (
NormalizerProcessorStep,
RelativeActionsProcessorStep,
@@ -181,6 +184,7 @@ class RTCDemoConfig(HubMixin):
# Demo parameters
duration: float = 30.0 # Duration to run the demo (seconds)
fps: float = 10.0 # Action execution frequency (Hz)
interpolation_multiplier: int = 1 # Control rate multiplier (1=off, 2=2x, 3=3x)
# Compute device
device: str | None = None # Device to run on (cuda, cpu, auto)
@@ -461,20 +465,23 @@ def actor_control(
action_keys = [k for k in robot.action_features() if k.endswith(".pos")]
action_count = 0
action_interval = 1.0 / cfg.fps
interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
action_interval = interpolator.get_control_interval(cfg.fps)
while not shutdown_event.is_set():
start_time = time.perf_counter()
# Try to get an action from the queue with timeout
action = action_queue.get()
if interpolator.needs_new_action():
new_action = action_queue.get()
if new_action is not None:
interpolator.add(new_action.cpu())
action = interpolator.get()
if action is not None:
action = action.cpu()
action_dict = {key: action[i].item() for i, key in enumerate(action_keys)}
action_processed = robot_action_processor((action_dict, None))
robot.send_action(action_processed)
action_count += 1
dt_s = time.perf_counter() - start_time
+29
View File
@@ -0,0 +1,29 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Real-Time Chunking (RTC) utilities for action-chunking policies."""
from lerobot.policies.rtc.action_interpolator import ActionInterpolator
from lerobot.policies.rtc.action_queue import ActionQueue
from lerobot.policies.rtc.configuration_rtc import RTCConfig
from lerobot.policies.rtc.latency_tracker import LatencyTracker
from lerobot.policies.rtc.modeling_rtc import RTCProcessor
__all__ = [
"ActionInterpolator",
"ActionQueue",
"LatencyTracker",
"RTCConfig",
"RTCProcessor",
]
@@ -0,0 +1,116 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Action interpolation for smoother robot control.
Provides configurable Nx control rate by interpolating between consecutive actions.
Useful with RTC and action-chunking policies to reduce jerkiness.
"""
from torch import Tensor
class ActionInterpolator:
"""Interpolates between consecutive actions for smoother control.
When enabled with multiplier N, produces N actions per policy action
by linearly interpolating between the previous and current action.
Example with multiplier=3:
prev_action -> [1/3 interpolated, 2/3 interpolated, current_action]
This effectively multiplies the control rate for smoother motion.
Usage:
interpolator = ActionInterpolator(multiplier=2) # 2x control rate
# In control loop:
if interpolator.needs_new_action():
new_action = queue.get()
if new_action:
interpolator.add(new_action.cpu())
action = interpolator.get()
if action:
robot.send_action(action)
"""
def __init__(self, multiplier: int = 1):
"""Initialize the interpolator.
Args:
multiplier: Control rate multiplier (1 = no interpolation, 2 = 2x, 3 = 3x, etc.)
"""
if multiplier < 1:
raise ValueError(f"multiplier must be >= 1, got {multiplier}")
self.multiplier = multiplier
self._prev: Tensor | None = None
self._buffer: list[Tensor] = []
self._idx = 0
@property
def enabled(self) -> bool:
"""Whether interpolation is active (multiplier > 1)."""
return self.multiplier > 1
def reset(self):
"""Reset interpolation state (call between episodes)."""
self._prev = None
self._buffer = []
self._idx = 0
def needs_new_action(self) -> bool:
"""Check if a new action is needed from the queue."""
return self._idx >= len(self._buffer)
def add(self, action: Tensor) -> None:
"""Add a new action and compute interpolated sequence.
Args:
action: New action tensor from policy/queue (already on CPU).
"""
if self.multiplier > 1 and self._prev is not None:
self._buffer = []
for i in range(1, self.multiplier + 1):
t = i / self.multiplier
interp = self._prev + t * (action - self._prev)
self._buffer.append(interp)
else:
# First step: no previous action yet, so run at base FPS without interpolation.
self._buffer = [action.clone()]
self._prev = action.clone()
self._idx = 0
def get(self) -> Tensor | None:
"""Get the next interpolated action.
Returns:
Next action tensor, or None if buffer is exhausted.
"""
if self._idx >= len(self._buffer):
return None
action = self._buffer[self._idx]
self._idx += 1
return action
def get_control_interval(self, fps: float) -> float:
"""Get the control interval based on interpolation multiplier.
Args:
fps: Base frames per second.
Returns:
Control interval in seconds (divided by multiplier).
"""
return 1.0 / (fps * self.multiplier)
+46 -19
View File
@@ -79,6 +79,13 @@ class ActionQueue:
self.last_index += 1
return action.clone()
def clear(self) -> None:
"""Clear queued actions and reset consumption index."""
with self.lock:
self.queue = None
self.original_queue = None
self.last_index = 0
def qsize(self) -> int:
"""Get the number of remaining actions in the queue.
@@ -123,14 +130,26 @@ class ActionQueue:
with self.lock:
if self.original_queue is None:
return None
return self.original_queue[self.last_index :]
return self.original_queue[self.last_index :].clone()
def get_processed_left_over(self) -> Tensor | None:
"""Get leftover processed actions (the actions currently executed by the robot).
Returns:
Tensor | None: Remaining processed actions (remaining_steps, action_dim),
or None if no processed queue exists.
"""
with self.lock:
if self.queue is None:
return None
return self.queue[self.last_index :].clone()
def merge(
self,
original_actions: Tensor,
processed_actions: Tensor,
real_delay: int,
action_index_before_inference: int | None = 0,
action_index_before_inference: int | None = None,
):
"""Merge new actions into the queue.
@@ -145,10 +164,10 @@ class ActionQueue:
action_index_before_inference: Index before inference started, for validation.
"""
with self.lock:
self._check_delays(real_delay, action_index_before_inference)
delay = self._check_and_resolve_delays(real_delay, action_index_before_inference)
if self.cfg.enabled:
self._replace_actions_queue(original_actions, processed_actions, real_delay)
self._replace_actions_queue(original_actions, processed_actions, delay)
return
self._append_actions_queue(original_actions, processed_actions)
@@ -164,12 +183,13 @@ class ActionQueue:
processed_actions: Post-processed actions for robot.
real_delay: Number of time steps to skip due to inference delay.
"""
self.original_queue = original_actions[real_delay:].clone()
self.queue = processed_actions[real_delay:].clone()
clamped_delay = max(0, min(real_delay, len(original_actions), len(processed_actions)))
self.original_queue = original_actions[clamped_delay:].clone()
self.queue = processed_actions[clamped_delay:].clone()
logger.debug(f"original_actions shape: {self.original_queue.shape}")
logger.debug(f"processed_actions shape: {self.queue.shape}")
logger.debug(f"real_delay: {real_delay}")
logger.debug(f"real_delay: {real_delay}, clamped_delay: {clamped_delay}")
self.last_index = 0
@@ -196,7 +216,9 @@ class ActionQueue:
self.last_index = 0
def _check_delays(self, real_delay: int, action_index_before_inference: int | None = None):
def _check_and_resolve_delays(
self, real_delay: int, action_index_before_inference: int | None = None
) -> int:
"""Validate that computed delays match expectations.
Compares the delay computed from inference latency with the actual
@@ -205,15 +227,20 @@ class ActionQueue:
Args:
real_delay: Delay computed from inference latency.
action_index_before_inference: Action index when inference started.
"""
if action_index_before_inference is None:
return
indexes_diff = self.last_index - action_index_before_inference
if indexes_diff != real_delay:
# Let's check that action index difference (real delay calculated based on action queue)
# is the same as delay calculated based on inference latency
logger.warning(
f"[ACTION_QUEUE] Indexes diff is not equal to real delay. "
f"Indexes diff: {indexes_diff}, real delay: {real_delay}"
)
Returns:
int: Delay to use.
"""
effective_delay = max(0, real_delay)
if action_index_before_inference is not None:
indexes_diff = max(0, self.last_index - action_index_before_inference)
if indexes_diff != real_delay:
logger.warning(
"Indexes diff is not equal to real delay. indexes_diff=%d, real_delay=%d",
indexes_diff,
real_delay,
)
return real_delay
return effective_delay
@@ -96,9 +96,11 @@ class BiOpenArmFollower(Robot):
left_arm_motors_ft = self.left_arm._motors_ft
right_arm_motors_ft = self.right_arm._motors_ft
# Right first, then left — matches the teleoperator (OpenArmMini) ordering
# and the dataset feature names recorded during data collection.
return {
**{f"left_{k}": v for k, v in left_arm_motors_ft.items()},
**{f"right_{k}": v for k, v in right_arm_motors_ft.items()},
**{f"left_{k}": v for k, v in left_arm_motors_ft.items()},
}
@property
@@ -150,14 +152,16 @@ class BiOpenArmFollower(Robot):
left_cam_keys = set(self.left_arm.cameras.keys())
right_cam_keys = set(self.right_arm.cameras.keys())
left_obs = self.left_arm.get_observation()
for key, value in left_obs.items():
obs_dict[key if key in left_cam_keys else f"left_{key}"] = value
# Right first, then left — matches the teleoperator (OpenArmMini) ordering
# and the dataset feature names recorded during data collection.
right_obs = self.right_arm.get_observation()
for key, value in right_obs.items():
obs_dict[key if key in right_cam_keys else f"right_{key}"] = value
left_obs = self.left_arm.get_observation()
for key, value in left_obs.items():
obs_dict[key if key in left_cam_keys else f"left_{key}"] = value
return obs_dict
@check_if_not_connected
@@ -183,7 +187,7 @@ class BiOpenArmFollower(Robot):
prefixed_sent_action_left = {f"left_{key}": value for key, value in sent_action_left.items()}
prefixed_sent_action_right = {f"right_{key}": value for key, value in sent_action_right.items()}
return {**prefixed_sent_action_left, **prefixed_sent_action_right}
return {**prefixed_sent_action_right, **prefixed_sent_action_left}
@check_if_not_connected
def disconnect(self):
@@ -23,10 +23,12 @@ from ..config import RobotConfig
@RobotConfig.register_subclass("bi_openarm_follower")
@dataclass
@dataclass(kw_only=True)
class BiOpenArmFollowerConfig(RobotConfig):
"""Configuration class for Bi OpenArm Follower robots."""
id: str | None = "bi_openarm_follower"
left_arm_config: OpenArmFollowerConfigBase
right_arm_config: OpenArmFollowerConfigBase
+79 -23
View File
@@ -74,6 +74,8 @@ from pathlib import Path
from pprint import pformat
from typing import Any
import torch
from lerobot.cameras import ( # noqa: F401
CameraConfig, # noqa: F401
)
@@ -90,6 +92,7 @@ from lerobot.datasets.pipeline_features import aggregate_pipeline_dataset_featur
from lerobot.datasets.video_utils import VideoEncodingManager
from lerobot.policies.factory import make_policy, make_pre_post_processors
from lerobot.policies.pretrained import PreTrainedPolicy
from lerobot.policies.rtc import ActionInterpolator
from lerobot.policies.utils import make_robot_action
from lerobot.processor import (
PolicyAction,
@@ -226,6 +229,9 @@ class RecordConfig:
play_sounds: bool = True
# Resume recording on an existing dataset.
resume: bool = False
# Action interpolation multiplier for smoother policy control (1=off, 2=2x, 3=3x)
# Only applies when using a policy (not teleop)
interpolation_multiplier: int = 1
def __post_init__(self):
# HACK: We parse again the cli args here to get the pretrained path if there was one.
@@ -298,6 +304,7 @@ def record_loop(
control_time_s: int | None = None,
single_task: str | None = None,
display_data: bool = False,
interpolator: ActionInterpolator | None = None,
display_compressed_images: bool = False,
):
if dataset is not None and dataset.fps != fps:
@@ -334,6 +341,16 @@ def record_loop(
preprocessor.reset()
postprocessor.reset()
# Reset interpolator if provided
if interpolator is not None:
interpolator.reset()
# Calculate control interval based on interpolation
use_interpolation = interpolator is not None and interpolator.enabled and policy is not None
control_interval = interpolator.get_control_interval(fps) if interpolator else 1 / fps
# Pre-compute action key order outside the hot loop — it won't change mid-episode.
action_keys = sorted(robot.action_features) if use_interpolation else []
no_action_count = 0
timestamp = 0
start_episode_t = time.perf_counter()
@@ -353,28 +370,67 @@ def record_loop(
if policy is not None or dataset is not None:
observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
# Track whether this iteration should be recorded to the dataset.
# Interpolated-only iterations send actions to the robot but don't record frames,
# keeping the dataset at the original fps while the robot moves at the higher rate.
is_record_frame = True
# Get action from either policy or teleop
if policy is not None and preprocessor is not None and postprocessor is not None:
action_values = predict_action(
observation=observation_frame,
policy=policy,
device=get_safe_torch_device(policy.config.device),
preprocessor=preprocessor,
postprocessor=postprocessor,
use_amp=policy.config.use_amp,
task=single_task,
robot_type=robot.robot_type,
)
# With interpolation: only call policy when interpolator needs new action
if use_interpolation:
ran_inference = False
act_processed_policy: RobotAction = make_robot_action(action_values, dataset.features)
if interpolator.needs_new_action():
action_values = predict_action(
observation=observation_frame,
policy=policy,
device=get_safe_torch_device(policy.config.device),
preprocessor=preprocessor,
postprocessor=postprocessor,
use_amp=policy.config.use_amp,
task=single_task,
robot_type=robot.robot_type,
)
act_processed_policy = make_robot_action(action_values, dataset.features)
robot_action_to_send = robot_action_processor((act_processed_policy, obs))
action_tensor = torch.tensor([robot_action_to_send[k] for k in action_keys])
interpolator.add(action_tensor)
ran_inference = True
interp_action = interpolator.get()
if interp_action is not None:
robot_action_to_send = {k: interp_action[i].item() for i, k in enumerate(action_keys)}
action_values = robot_action_to_send
else:
continue
is_record_frame = ran_inference
else:
action_values = predict_action(
observation=observation_frame,
policy=policy,
device=get_safe_torch_device(policy.config.device),
preprocessor=preprocessor,
postprocessor=postprocessor,
use_amp=policy.config.use_amp,
task=single_task,
robot_type=robot.robot_type,
)
act_processed_policy: RobotAction = make_robot_action(action_values, dataset.features)
# Applies a pipeline to the action, default is IdentityProcessor
robot_action_to_send = robot_action_processor((act_processed_policy, obs))
elif policy is None and isinstance(teleop, Teleoperator):
act = teleop.get_action()
if robot.name == "unitree_g1":
teleop.send_feedback(obs)
act = teleop.get_action()
# Applies a pipeline to the raw teleop action, default is IdentityProcessor
act_processed_teleop = teleop_action_processor((act, obs))
action_values = act_processed_teleop
robot_action_to_send = robot_action_processor((act_processed_teleop, obs))
elif policy is None and isinstance(teleop, list):
arm_action = teleop_arm.get_action()
@@ -383,6 +439,8 @@ def record_loop(
base_action = robot._from_keyboard_to_base_action(keyboard_action)
act = {**arm_action, **base_action} if len(base_action) > 0 else arm_action
act_processed_teleop = teleop_action_processor((act, obs))
action_values = act_processed_teleop
robot_action_to_send = robot_action_processor((act_processed_teleop, obs))
else:
no_action_count += 1
if no_action_count == 1 or no_action_count % 10 == 0:
@@ -393,22 +451,14 @@ def record_loop(
)
continue
# Applies a pipeline to the action, default is IdentityProcessor
if policy is not None and act_processed_policy is not None:
action_values = act_processed_policy
robot_action_to_send = robot_action_processor((act_processed_policy, obs))
else:
action_values = act_processed_teleop
robot_action_to_send = robot_action_processor((act_processed_teleop, obs))
# Send action to robot
# Action can eventually be clipped using `max_relative_target`,
# so action actually sent is saved in the dataset. action = postprocessor.process(action)
# TODO(steven, pepijn, adil): we should use a pipeline step to clip the action, so the sent action is the action that we input to the robot.
_sent_action = robot.send_action(robot_action_to_send)
# Write to dataset
if dataset is not None:
# Write to dataset (only on real policy frames, not interpolated-only iterations)
if dataset is not None and is_record_frame:
action_frame = build_dataset_frame(dataset.features, action_values, prefix=ACTION)
frame = {**observation_frame, **action_frame, "task": single_task}
dataset.add_frame(frame)
@@ -420,7 +470,7 @@ def record_loop(
dt_s = time.perf_counter() - start_loop_t
sleep_time_s: float = 1 / fps - dt_s
sleep_time_s: float = control_interval - dt_s
if sleep_time_s < 0:
logging.warning(
f"Record loop is running slower ({1 / dt_s:.1f} Hz) than the target FPS ({fps} Hz). Dataset frames might be dropped and robot control might be unstable. Common causes are: 1) Camera FPS not keeping up 2) Policy inference taking too long 3) CPU starvation"
@@ -506,6 +556,7 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
policy = None if cfg.policy is None else make_policy(cfg.policy, ds_meta=dataset.meta)
preprocessor = None
postprocessor = None
interpolator = None
if cfg.policy is not None:
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=cfg.policy,
@@ -516,6 +567,10 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
"rename_observations_processor": {"rename_map": cfg.dataset.rename_map},
},
)
# Create interpolator for smoother policy control
if cfg.interpolation_multiplier > 1:
interpolator = ActionInterpolator(multiplier=cfg.interpolation_multiplier)
logging.info(f"Action interpolation enabled: {cfg.interpolation_multiplier}x control rate")
robot.connect()
if teleop is not None:
@@ -547,6 +602,7 @@ def record(cfg: RecordConfig) -> LeRobotDataset:
control_time_s=cfg.dataset.episode_time_s,
single_task=cfg.dataset.single_task,
display_data=cfg.display_data,
interpolator=interpolator,
display_compressed_images=display_compressed_images,
)
@@ -32,9 +32,15 @@ from .config_openarm_mini import OpenArmMiniConfig
logger = logging.getLogger(__name__)
# Motors whose direction is inverted during readout
RIGHT_MOTORS_TO_FLIP = ["joint_1", "joint_2", "joint_3", "joint_4", "joint_5"]
RIGHT_MOTORS_TO_FLIP = ["joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_7"]
LEFT_MOTORS_TO_FLIP = ["joint_1", "joint_3", "joint_4", "joint_5", "joint_6", "joint_7"]
# Leader joint 6 maps to follower joint 7 and vice versa
JOINT_REMAP = {"joint_6": "joint_7", "joint_7": "joint_6"}
JOINT_REMAP_REVERSE = {"joint_7": "joint_6", "joint_6": "joint_7"}
GRIPPER_TELEOP_TO_DEGREES = -0.65
class OpenArmMini(Teleoperator):
"""
@@ -95,6 +101,8 @@ class OpenArmMini(Teleoperator):
@property
def action_features(self) -> dict[str, type]:
# Right first, then left — matches the robot (BiOpenArmFollower) ordering
# and the dataset feature names recorded during data collection.
features: dict[str, type] = {}
for motor in self.bus_right.motors:
features[f"right_{motor}.pos"] = float
@@ -276,16 +284,70 @@ class OpenArmMini(Teleoperator):
right_positions = self.bus_right.sync_read("Present_Position")
left_positions = self.bus_left.sync_read("Present_Position")
# Right first, then left — matches the robot (BiOpenArmFollower) ordering
# and the dataset feature names recorded during data collection.
# Joint 6↔7 remap: leader joint_6 → follower joint_7 and vice versa.
action: dict[str, Any] = {}
for motor, val in right_positions.items():
action[f"right_{motor}.pos"] = -val if motor in RIGHT_MOTORS_TO_FLIP else val
target = JOINT_REMAP.get(motor, motor)
if motor == "gripper":
# Convert gripper from teleop 0-100 to openarms degrees: 0→0°, 100→-65°
action[f"right_{target}.pos"] = val * GRIPPER_TELEOP_TO_DEGREES
else:
action[f"right_{target}.pos"] = -val if motor in RIGHT_MOTORS_TO_FLIP else val
for motor, val in left_positions.items():
action[f"left_{motor}.pos"] = -val if motor in LEFT_MOTORS_TO_FLIP else val
target = JOINT_REMAP.get(motor, motor)
if motor == "gripper":
action[f"left_{target}.pos"] = val * GRIPPER_TELEOP_TO_DEGREES
else:
action[f"left_{target}.pos"] = -val if motor in LEFT_MOTORS_TO_FLIP else val
dt_ms = (time.perf_counter() - start) * 1e3
logger.debug(f"{self} read action: {dt_ms:.1f}ms")
return action
def enable_torque(self) -> None:
"""Enable torque on both arms for position control."""
self.bus_right.enable_torque()
self.bus_left.enable_torque()
def disable_torque(self) -> None:
"""Disable torque on both arms for free movement."""
self.bus_right.disable_torque()
self.bus_left.disable_torque()
def write_goal_positions(self, positions: dict[str, float]) -> None:
"""Write goal positions to motors (inverse of get_action flip/gripper/remap logic)."""
right_goals: dict[str, float] = {}
left_goals: dict[str, float] = {}
for key, val in positions.items():
if not key.endswith(".pos"):
continue
motor_name = key.removesuffix(".pos")
if motor_name.startswith("right_"):
base = motor_name.removeprefix("right_")
# Reverse remap: follower joint_7 → leader joint_6 and vice versa
target = JOINT_REMAP_REVERSE.get(base, base)
if base == "gripper":
# Convert robot degrees to teleop 0-100: 0°→0, -65°→100
right_goals[target] = val / GRIPPER_TELEOP_TO_DEGREES
else:
# Un-flip using the ORIGINAL motor name (target = leader motor)
right_goals[target] = -val if target in RIGHT_MOTORS_TO_FLIP else val
elif motor_name.startswith("left_"):
base = motor_name.removeprefix("left_")
target = JOINT_REMAP_REVERSE.get(base, base)
if base == "gripper":
left_goals[target] = val / GRIPPER_TELEOP_TO_DEGREES
else:
left_goals[target] = -val if target in LEFT_MOTORS_TO_FLIP else val
if right_goals:
self.bus_right.sync_write("Goal_Position", right_goals)
if left_goals:
self.bus_left.sync_write("Goal_Position", left_goals)
def send_feedback(self, feedback: dict[str, float]) -> None:
raise NotImplementedError("Feedback is not yet implemented for OpenArm Mini.")
@@ -0,0 +1,559 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for ActionInterpolator and its interaction with ActionQueue (RTC)."""
import pytest
import torch
from lerobot.policies.rtc.action_interpolator import ActionInterpolator
from lerobot.policies.rtc.action_queue import ActionQueue
from lerobot.policies.rtc.configuration_rtc import RTCConfig
# ====================== Fixtures ======================
@pytest.fixture
def interp2():
"""Create an ActionInterpolator with multiplier=2."""
return ActionInterpolator(multiplier=2)
@pytest.fixture
def interp3():
"""Create an ActionInterpolator with multiplier=3."""
return ActionInterpolator(multiplier=3)
# ====================== Initialization Tests ======================
def test_interpolator_multiplier_1_no_interpolation():
"""Test multiplier=1 creates a disabled interpolator."""
interp = ActionInterpolator(multiplier=1)
assert interp.multiplier == 1
assert not interp.enabled
def test_interpolator_multiplier_2_enabled():
"""Test multiplier=2 creates an enabled interpolator."""
interp = ActionInterpolator(multiplier=2)
assert interp.multiplier == 2
assert interp.enabled
def test_interpolator_multiplier_0_raises():
"""Test multiplier=0 raises ValueError."""
with pytest.raises(ValueError, match="multiplier must be >= 1"):
ActionInterpolator(multiplier=0)
def test_interpolator_negative_multiplier_raises():
"""Test negative multiplier raises ValueError."""
with pytest.raises(ValueError, match="multiplier must be >= 1"):
ActionInterpolator(multiplier=-1)
def test_interpolator_default_multiplier_is_1():
"""Test default multiplier is 1 (disabled)."""
interp = ActionInterpolator()
assert interp.multiplier == 1
assert not interp.enabled
# ====================== needs_new_action Tests ======================
def test_needs_new_action_true_initially(interp2):
"""Test needs_new_action() returns True before any action is added."""
assert interp2.needs_new_action()
def test_needs_new_action_false_after_add(interp2):
"""Test needs_new_action() returns False right after add()."""
interp2.add(torch.tensor([1.0, 2.0]))
assert not interp2.needs_new_action()
def test_needs_new_action_true_after_buffer_exhausted(interp2):
"""Test needs_new_action() returns True after consuming all buffered actions."""
interp2.add(torch.tensor([1.0, 2.0]))
interp2.get()
assert interp2.needs_new_action()
def test_needs_new_action_true_after_all_interpolated_consumed(interp2):
"""Test needs_new_action() tracks interpolated sub-steps correctly."""
interp2.add(torch.tensor([0.0, 0.0]))
interp2.get()
assert interp2.needs_new_action()
interp2.add(torch.tensor([2.0, 4.0]))
interp2.get()
assert not interp2.needs_new_action()
interp2.get()
assert interp2.needs_new_action()
# ====================== Passthrough Tests (multiplier=1) ======================
def test_passthrough_single_action_returned_as_is():
"""Test multiplier=1 returns the action unchanged."""
interp = ActionInterpolator(multiplier=1)
action = torch.tensor([3.0, 5.0])
interp.add(action)
result = interp.get()
assert result is not None
torch.testing.assert_close(result, action)
def test_passthrough_none_after_single_get():
"""Test multiplier=1 returns None after consuming the single action."""
interp = ActionInterpolator(multiplier=1)
interp.add(torch.tensor([1.0]))
interp.get()
assert interp.get() is None
def test_passthrough_sequential_actions():
"""Test multiplier=1 passes through consecutive actions one at a time."""
interp = ActionInterpolator(multiplier=1)
for val in [1.0, 2.0, 3.0]:
action = torch.tensor([val])
interp.add(action)
result = interp.get()
torch.testing.assert_close(result, action)
assert interp.get() is None
# ====================== Interpolation Tests (multiplier=2) ======================
def test_interpolation_2x_first_action_no_interpolation(interp2):
"""Test first action has no previous, so buffer is just [action]."""
interp2.add(torch.tensor([0.0, 0.0]))
result = interp2.get()
torch.testing.assert_close(result, torch.tensor([0.0, 0.0]))
assert interp2.get() is None
def test_interpolation_2x_second_action_produces_two_steps(interp2):
"""Test second action produces 2 interpolated sub-steps."""
interp2.add(torch.tensor([0.0, 0.0]))
interp2.get()
interp2.add(torch.tensor([2.0, 4.0]))
step1 = interp2.get()
step2 = interp2.get()
torch.testing.assert_close(step1, torch.tensor([1.0, 2.0]))
torch.testing.assert_close(step2, torch.tensor([2.0, 4.0]))
assert interp2.get() is None
def test_interpolation_2x_three_consecutive_actions(interp2):
"""Test interpolation across three consecutive actions."""
a0 = torch.tensor([0.0])
a1 = torch.tensor([4.0])
a2 = torch.tensor([10.0])
interp2.add(a0)
torch.testing.assert_close(interp2.get(), a0)
interp2.add(a1)
torch.testing.assert_close(interp2.get(), torch.tensor([2.0]))
torch.testing.assert_close(interp2.get(), torch.tensor([4.0]))
interp2.add(a2)
torch.testing.assert_close(interp2.get(), torch.tensor([7.0]))
torch.testing.assert_close(interp2.get(), torch.tensor([10.0]))
# ====================== Interpolation Tests (multiplier=3) ======================
def test_interpolation_3x_produces_three_steps(interp3):
"""Test multiplier=3 produces 3 interpolated sub-steps."""
interp3.add(torch.tensor([0.0, 0.0]))
interp3.get()
interp3.add(torch.tensor([3.0, 6.0]))
s1 = interp3.get()
s2 = interp3.get()
s3 = interp3.get()
torch.testing.assert_close(s1, torch.tensor([1.0, 2.0]))
torch.testing.assert_close(s2, torch.tensor([2.0, 4.0]))
torch.testing.assert_close(s3, torch.tensor([3.0, 6.0]))
assert interp3.get() is None
def test_interpolation_3x_last_step_equals_target(interp3):
"""Test last interpolated step equals the target action exactly."""
interp3.add(torch.tensor([10.0]))
interp3.get()
target = torch.tensor([100.0])
interp3.add(target)
interp3.get()
interp3.get()
last = interp3.get()
torch.testing.assert_close(last, target)
# ====================== Reset Tests ======================
def test_reset_clears_buffer(interp2):
"""Test reset() clears the action buffer."""
interp2.add(torch.tensor([1.0]))
interp2.reset()
assert interp2.needs_new_action()
assert interp2.get() is None
def test_reset_clears_prev(interp2):
"""Test after reset, next add produces single-element buffer (no prev)."""
interp2.add(torch.tensor([0.0]))
interp2.get()
interp2.add(torch.tensor([10.0]))
interp2.get()
interp2.get()
interp2.reset()
interp2.add(torch.tensor([5.0]))
result = interp2.get()
torch.testing.assert_close(result, torch.tensor([5.0]))
assert interp2.get() is None
def test_reset_episode_boundary(interp2):
"""Test reset between two simulated episodes."""
interp2.add(torch.tensor([0.0]))
interp2.get()
interp2.add(torch.tensor([10.0]))
interp2.get()
interp2.get()
interp2.reset()
interp2.add(torch.tensor([100.0]))
result = interp2.get()
torch.testing.assert_close(result, torch.tensor([100.0]))
assert interp2.get() is None
# ====================== get_control_interval Tests ======================
def test_control_interval_30fps_multiplier_1():
"""Test control interval at 30fps with no interpolation."""
interp = ActionInterpolator(multiplier=1)
assert interp.get_control_interval(30.0) == pytest.approx(1.0 / 30.0)
def test_control_interval_30fps_multiplier_2(interp2):
"""Test control interval at 30fps with 2x interpolation."""
assert interp2.get_control_interval(30.0) == pytest.approx(1.0 / 60.0)
def test_control_interval_30fps_multiplier_3(interp3):
"""Test control interval at 30fps with 3x interpolation."""
assert interp3.get_control_interval(30.0) == pytest.approx(1.0 / 90.0)
def test_control_interval_60fps_multiplier_2(interp2):
"""Test control interval at 60fps with 2x interpolation."""
assert interp2.get_control_interval(60.0) == pytest.approx(1.0 / 120.0)
# ====================== get() on Empty Tests ======================
def test_get_returns_none_before_any_add():
"""Test get() returns None when no action has been added."""
interp = ActionInterpolator(multiplier=2)
assert interp.get() is None
def test_get_returns_none_after_reset(interp2):
"""Test get() returns None after reset."""
interp2.add(torch.tensor([1.0]))
interp2.reset()
assert interp2.get() is None
# ====================== Multi-Dimensional Action Tests ======================
def test_6dof_interpolation(interp2):
"""Test interpolation works correctly with 6-dimensional actions."""
prev = torch.zeros(6)
target = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
interp2.add(prev)
interp2.get()
interp2.add(target)
mid = interp2.get()
end = interp2.get()
torch.testing.assert_close(mid, target / 2)
torch.testing.assert_close(end, target)
# ====================== Simulated Control Loop Tests ======================
def test_control_loop_produces_correct_action_count():
"""Test N policy actions with multiplier M yields 1 + (N-1)*M robot commands."""
multiplier = 3
n_policy_actions = 5
interp = ActionInterpolator(multiplier=multiplier)
robot_commands = 0
for i in range(n_policy_actions):
action = torch.tensor([float(i)])
if interp.needs_new_action():
interp.add(action)
while True:
a = interp.get()
if a is None:
break
robot_commands += 1
expected = 1 + (n_policy_actions - 1) * multiplier
assert robot_commands == expected
def test_control_loop_monotonic_increase():
"""Test actions [0, 1, 2, 3] with multiplier=2 produce monotonically increasing values."""
interp = ActionInterpolator(multiplier=2)
all_values = []
for i in range(4):
interp.add(torch.tensor([float(i)]))
while True:
a = interp.get()
if a is None:
break
all_values.append(a.item())
for i in range(1, len(all_values)):
assert all_values[i] >= all_values[i - 1]
# ====================== ActionQueue + ActionInterpolator Integration Tests ======================
def _make_chunk(n_steps: int, action_dim: int = 2, offset: float = 0.0) -> torch.Tensor:
"""Create a simple action chunk: each row is [offset + step_idx, offset + step_idx]."""
return torch.arange(n_steps, dtype=torch.float32).unsqueeze(1).expand(-1, action_dim) + offset
def test_queue_interpolator_consumption_rate_matches_base_fps():
"""Test queue.get() is called at base fps rate, not multiplied fps."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=3)
chunk = _make_chunk(10)
queue.merge(chunk, chunk.clone(), real_delay=0)
queue_gets = 0
control_ticks = 0
while True:
if interp.needs_new_action():
if queue.empty():
break
action = queue.get()
if action is None:
break
interp.add(action)
queue_gets += 1
result = interp.get()
if result is not None:
control_ticks += 1
assert queue_gets == 10
assert control_ticks == 1 + 9 * 3
def test_queue_interpolator_leftover_decreases_only_on_queue_get():
"""Test get_left_over() shrinks only on queue.get(), not on interpolator sub-steps."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=3)
chunk = _make_chunk(6)
queue.merge(chunk, chunk.clone(), real_delay=0)
assert interp.needs_new_action()
interp.add(queue.get())
leftover_after_first_get = queue.get_left_over()
assert leftover_after_first_get is not None
assert len(leftover_after_first_get) == 5
interp.get()
assert len(queue.get_left_over()) == 5
interp.add(queue.get())
assert len(queue.get_left_over()) == 4
for _ in range(3):
assert interp.get() is not None
assert len(queue.get_left_over()) == 4
def test_queue_interpolator_processed_leftover_tracks_queue_index():
"""Test get_processed_left_over() reflects queue's last_index, not interpolator state."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=2)
original = _make_chunk(8, offset=0.0)
processed = _make_chunk(8, offset=100.0)
queue.merge(original, processed, real_delay=0)
left = queue.get_processed_left_over()
assert len(left) == 8
for _ in range(3):
if interp.needs_new_action():
action = queue.get()
if action is not None:
interp.add(action)
interp.get()
proc_left = queue.get_processed_left_over()
orig_left = queue.get_left_over()
assert proc_left is not None and orig_left is not None
assert len(proc_left) == len(orig_left)
assert proc_left[0, 0].item() >= 100.0
assert orig_left[0, 0].item() < 100.0
def test_queue_interpolator_merge_resets_queue_but_interpolator_keeps_prev():
"""Test queue merge doesn't affect interpolator's prev, enabling smooth transitions."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=2)
chunk1 = torch.tensor([[0.0], [2.0], [4.0], [6.0], [8.0]])
queue.merge(chunk1, chunk1.clone(), real_delay=0)
consumed = []
for _ in range(5):
if interp.needs_new_action():
a = queue.get()
if a is not None:
interp.add(a)
r = interp.get()
if r is not None:
consumed.append(r.item())
assert interp.needs_new_action()
assert consumed[-1] == pytest.approx(4.0)
idx_before = queue.get_action_index()
chunk2 = torch.tensor([[10.0], [12.0], [14.0]])
queue.merge(chunk2, chunk2.clone(), real_delay=0, action_index_before_inference=idx_before)
first_action = queue.get()
assert first_action is not None
interp.add(first_action)
first_from_new = interp.get()
assert first_from_new is not None
assert first_from_new.item() == pytest.approx(7.0)
def test_queue_interpolator_reset_does_not_affect_queue():
"""Test interpolator reset leaves queue state untouched."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=2)
chunk = _make_chunk(5)
queue.merge(chunk, chunk.clone(), real_delay=0)
interp.add(queue.get())
interp.get()
interp.add(queue.get())
interp.get()
interp.get()
assert queue.qsize() == 3
interp.reset()
assert queue.qsize() == 3
assert len(queue.get_left_over()) == 3
interp.add(queue.get())
result = interp.get()
assert result is not None
assert queue.qsize() == 2
def test_queue_interpolator_no_interpolation_1_to_1():
"""Test multiplier=1 produces exactly 1 robot command per queue.get()."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=1)
chunk = _make_chunk(5)
queue.merge(chunk, chunk.clone(), real_delay=0)
robot_commands = 0
while not queue.empty():
if interp.needs_new_action():
action = queue.get()
if action is not None:
interp.add(action)
result = interp.get()
if result is not None:
robot_commands += 1
assert robot_commands == 5
def test_queue_interpolator_delay_skips_stale_actions():
"""Test merge with delay correctly skips stale actions for the interpolator."""
cfg = RTCConfig(enabled=True, execution_horizon=10)
queue = ActionQueue(cfg)
interp = ActionInterpolator(multiplier=2)
chunk1 = _make_chunk(10)
queue.merge(chunk1, chunk1.clone(), real_delay=0)
for _ in range(5):
if interp.needs_new_action():
a = queue.get()
if a is not None:
interp.add(a)
interp.get()
assert queue.get_action_index() == 3
chunk2 = _make_chunk(10, offset=100.0)
queue.merge(chunk2, chunk2.clone(), real_delay=3, action_index_before_inference=0)
first_action = queue.get()
assert first_action is not None
torch.testing.assert_close(first_action, torch.tensor([103.0, 103.0]))