fix(eval): use task_description instead of task for language conditioning

env.call("task") returns the LIBERO task name with underscores (e.g. "pick_up_the_black_bowl_...") instead of the natural language description ("pick up the black bowl ..."). The VLM tokenizes these completely differently, causing 0.0 reward across all episodes. Made-with: Cursor
2026-05-11 14:49:43 +00:00 · 2026-04-07 13:12:42 +02:00
parent 1f7e7b4a90
commit 6aeb7c54f9
1 changed files with 8 additions and 2 deletions
@@ -165,9 +165,15 @@ def rollout(
        if return_observations:
            all_observations.append(deepcopy(observation))

-        # Infer "task" from sub-environments.
+        # Infer "task" from sub-environments (prefer natural language description).
        # env.call() works with both SyncVectorEnv and AsyncVectorEnv.
-        observation["task"] = list(env.call("task"))
+        try:
+            observation["task"] = list(env.call("task_description"))
+        except Exception:
+            try:
+                observation["task"] = list(env.call("task"))
+            except Exception:
+                observation["task"] = [""] * env.num_envs

        # Apply environment-specific preprocessing (e.g., LiberoProcessorStep for LIBERO)
        observation = env_preprocessor(observation)