Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 15, 2021

Arrow only supports 1-dim arrays. Previously we were converting all the numpy arrays to python list before instantiating arrow arrays to workaround this limitation.
However in #2361 we started to keep numpy arrays in order to keep their dtypes.
It works when we pass any multi-dim numpy array (the conversion to arrow has been added on our side), but not for lists of multi-dim numpy arrays.

In this PR I added two strategies:

  • one that takes a list of multi-dim numpy arrays on returns an arrow array in an optimized way (more common case)
  • one that takes a list of possibly very nested data (lists, dicts, tuples) containing multi-dim arrays. This one is less optimized since it converts all the multi-dim numpy arrays into lists of 1-d arrays for compatibility with arrow. This strategy is simpler that just trying to create the arrow array from a possibly very nested data structure, but in the future we can improve it if needed.

Fix #2921

@lhoestq lhoestq merged commit c974f3c into master Sep 15, 2021
@lhoestq lhoestq deleted the fix-multidim-arrays-in-list-to-arrow branch September 15, 2021 17:21
@lhoestq lhoestq changed the title Fix multidim arrays in list to arrow Fix conversion of multidim arrays in list to arrow Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using a list of multi-dim numpy arrays raises an error "can only convert 1-dimensional array values"

2 participants