🐛 fix v30_to_v21 ArrowTypeError on pandas extension dtypes

`table.slice(...).to_pandas()` produces pandas ExtensionArrays for `array[float32]` columns (e.g. `observation.states.end.orientation`) on newer pandas/pyarrow combos, which then fail in `pa.Table.from_pandas` inside `Dataset.from_pandas(...).to_parquet(...)`. Skip the pandas round-trip and wrap the `pa.Table` slice in a `Dataset` directly with `Dataset(episode_table).to_parquet(...)`. This preserves the HuggingFace dataset metadata that `Dataset.to_parquet` writes, while avoiding the ExtensionArray crash. No version pin on datasets/pyarrow needed. Closes #87
2026-05-11 12:09:41 +00:00 · 2026-04-30 07:03:03 +00:00
parent 8aa7343137
commit 723bd71cf2
1 changed files with 2 additions and 2 deletions
@@ -181,7 +181,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
                    f"episode_index={episode_index}, length={length}"
                )

-            episode_table = table.slice(start, length).to_pandas()
+            episode_table = table.slice(start, length)

            dest_chunk = episode_index // DEFAULT_CHUNK_SIZE
            dest_path = new_root / LEGACY_DATA_PATH_TEMPLATE.format(
@@ -189,7 +189,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
                episode_index=episode_index,
            )
            dest_path.parent.mkdir(parents=True, exist_ok=True)
-            Dataset.from_pandas(episode_table).to_parquet(dest_path)
+            Dataset(episode_table).to_parquet(dest_path)


 def _group_episodes_by_video_file(