Refactorgym_manipulator.py using the universal pipeline (#1650)

* Migrate gym_manipulator to use the pipeline Added get_teleop_events function to capture relevant events from teleop devices unrelated to actions * Added the capability to record a dataset * Added the replay functionality with the pipeline * Refactored `actor.py` to use the pipeline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * RL works at this commit - fixed actor.py and bugs in gym_manipulator * change folder structure to reduce the size of gym_manip * Refactored hilserl config * Remove dataset and mode from HilSerlEnvConfig to a GymManipulatorConfig to reduce verbose of configs during training * format docs * removed get_teleop_events from abc * Refactor environment configuration and processing pipeline for GymHIL support. Removed device attribute from HILSerlRobotEnvConfig, added DummyTeleopDevice for simulation, and updated processor creation to accommodate GymHIL environments. * Improved typing for HILRobotEnv config and GymManipulator config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Migrated `gym_manipulator` to use a more modular structure similar to phone teleop * Refactor gripper handling and transition processing in HIL and robot kinematic processors - Updated gripper position handling to use a consistent key format across processors - Improved the EEReferenceAndDelta class to handle reference joint positions. - Added support for discrete gripper actions in the GripperVelocityToJoint processor. - Refactored the gym manipulator to improve modularity and clarity in processing steps. * Added delta_action_processor mapping wrapper * Added missing file delta_action_processor and improved imports in `gym_manipulator` * nit * Added missing file joint_observation_processor * Enhance processing architecture with new teleoperation processors - Introduced `AddTeleopActionAsComplimentaryData` and `AddTeleopEventsAsInfo` for integrating teleoperator actions and events into transitions. - Added `Torch2NumpyActionProcessor` and `Numpy2TorchActionProcessor` for seamless conversion between PyTorch tensors and NumPy arrays. - Updated `__init__.py` to include new processors in module exports, improving modularity and clarity in the processing pipeline. - GymHIL is now fully supported with HIL using the pipeline * Refactor configuration structure for gym_hil integration - Renamed sections for better readability, such as changing "Gym Wrappers Configuration" to "Processor Configuration." - Enhanced documentation with clear examples for dataset collection and policy evaluation configurations. * Enhance reset configuration and teleoperation event handling - Added `terminate_on_success` parameter to `ResetConfig` and `InterventionActionProcessor` for controlling episode termination behavior upon success detection. - Updated documentation to clarify the impact of `terminate_on_success` on data collection for reward classifier training. - Refactored teleoperation event handling to use `TeleopEvents` constants for improved readability and maintainability across various modules. * fix(keyboard teleop), delta action keys * Added transform features and feature contract * Added transform features for image crop * Enum for TeleopEvents * Update tranform_features delta action proc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-05-27 22:49:48 +00:00 · 2025-08-11 11:07:55 +02:00
parent fd5d8b3d5f
commit 0053defa2e
17 changed files with 1975 additions and 2251 deletions
@@ -53,6 +53,9 @@ class EEReferenceAndDelta:
    kinematics: RobotKinematics
    end_effector_step_sizes: dict
    motor_names: list[str]
+    use_latched_reference: bool = (
+        True  # If True, latch reference on enable; if False, always use current pose
+    )

    reference_ee_pose: np.ndarray | None = field(default=None, init=False, repr=False)
    _prev_enabled: bool = field(default=False, init=False, repr=False)
@@ -69,7 +72,10 @@ class EEReferenceAndDelta:
                "raw_joint_positions is not in complementary data and is required for EEReferenceAndDelta"
            )

-        q = np.array([float(raw[n]) for n in self.motor_names], dtype=float)
+        if "reference_joint_positions" in comp:
+            q = comp["reference_joint_positions"]
+        else:
+            q = np.array([float(raw[n]) for n in self.motor_names], dtype=float)

        # Current pose from FK on measured joints
        t_curr = self.kinematics.forward_kinematics(q)
@@ -85,11 +91,12 @@ class EEReferenceAndDelta:
        desired = None

        if enabled:
-            # Latch a reference at the rising edge; also be defensive if None
-            if not self._prev_enabled or self.reference_ee_pose is None:
-                self.reference_ee_pose = t_curr.copy()
-
-            ref = self.reference_ee_pose if self.reference_ee_pose is not None else t_curr
+            ref = t_curr
+            if self.use_latched_reference:
+                # Latched reference mode: latch reference at the rising edge
+                if not self._prev_enabled or self.reference_ee_pose is None:
+                    self.reference_ee_pose = t_curr.copy()
+                ref = self.reference_ee_pose if self.reference_ee_pose is not None else t_curr

            delta_p = np.array(
                [
@@ -100,7 +107,6 @@ class EEReferenceAndDelta:
                dtype=float,
            )
            r_abs = Rotation.from_rotvec([wx, wy, wz]).as_matrix()
-
            desired = np.eye(4, dtype=float)
            desired[:3, :3] = ref[:3, :3] @ r_abs
            desired[:3, 3] = ref[:3, 3] + delta_p
@@ -292,6 +298,8 @@ class InverseKinematicsEEToJoints:
            else:
                new_act[f"action.{name}.pos"] = float(q_target[i])
        transition[TransitionKey.ACTION] = new_act
+        if not self.initial_guess_current_joints:
+            transition[TransitionKey.COMPLEMENTARY_DATA]["reference_joint_positions"] = q_target
        return transition

    def transform_features(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
@@ -332,6 +340,7 @@ class GripperVelocityToJoint:
    speed_factor: float = 20.0
    clip_min: float = 0.0
    clip_max: float = 100.0
+    discrete_gripper: bool = False

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        obs = transition.get(TransitionKey.OBSERVATION) or {}
@@ -347,6 +356,15 @@ class GripperVelocityToJoint:
            transition[TransitionKey.ACTION] = new_act
            return transition

+        if self.discrete_gripper:
+            # Discrete gripper actions are in [0, 1, 2]
+            # 0: open, 1: close, 2: stay
+            # We need to shift them to [-1, 0, 1] and then scale them to clip_max
+            gripper_action = act.get("action.gripper", 1.0)
+            gripper_action = gripper_action - 1.0
+            gripper_action *= self.clip_max
+            act["action.gripper"] = gripper_action
+
        # Get current gripper position from complementary data
        raw = comp.get("raw_joint_positions") or {}
        curr_pos = float(raw.get("gripper"))