mirror of
https://github.com/Tavish9/any4lerobot.git
synced 2026-05-11 12:09:41 +00:00
🐛 fix v30_to_v21 ArrowTypeError on pandas extension dtypes
`table.slice(...).to_pandas()` produces pandas ExtensionArrays for `array[float32]` columns (e.g. `observation.states.end.orientation`) on newer pandas/pyarrow combos, which then fail in `pa.Table.from_pandas` inside `Dataset.from_pandas(...).to_parquet(...)`. Skip the pandas round-trip and wrap the `pa.Table` slice in a `Dataset` directly with `Dataset(episode_table).to_parquet(...)`. This preserves the HuggingFace dataset metadata that `Dataset.to_parquet` writes, while avoiding the ExtensionArray crash. No version pin on datasets/pyarrow needed. Closes #87
This commit is contained in:
@@ -181,7 +181,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
|
||||
f"episode_index={episode_index}, length={length}"
|
||||
)
|
||||
|
||||
episode_table = table.slice(start, length).to_pandas()
|
||||
episode_table = table.slice(start, length)
|
||||
|
||||
dest_chunk = episode_index // DEFAULT_CHUNK_SIZE
|
||||
dest_path = new_root / LEGACY_DATA_PATH_TEMPLATE.format(
|
||||
@@ -189,7 +189,7 @@ def convert_data(root: Path, new_root: Path, episode_records: list[dict[str, Any
|
||||
episode_index=episode_index,
|
||||
)
|
||||
dest_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
Dataset.from_pandas(episode_table).to_parquet(dest_path)
|
||||
Dataset(episode_table).to_parquet(dest_path)
|
||||
|
||||
|
||||
def _group_episodes_by_video_file(
|
||||
|
||||
Reference in New Issue
Block a user