Skip to content

Conversation

@albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Jun 7, 2021

When loading a Dataset from a JSON file whose column names are not sorted alphabetically, we should get the same column name order, whether we pass features (in the same order as in the file) or not.

I found this issue while working on #2366.

@albertvillanova albertvillanova marked this pull request as draft June 7, 2021 16:38
@albertvillanova albertvillanova changed the title Fix features type Fix features order Jun 7, 2021
@albertvillanova albertvillanova marked this pull request as ready for review June 7, 2021 18:44
@albertvillanova albertvillanova changed the title Fix features order Keep original features order Jun 8, 2021
@lhoestq
Copy link
Member

lhoestq commented Jun 8, 2021

The arrow writer was supposing that the columns were always in the sorted order. I just pushed a fix to reorder the arrays accordingly to the schema. It was failing for many datasets like squad

@lhoestq
Copy link
Member

lhoestq commented Jun 8, 2021

and obviously it broke everything

@lhoestq
Copy link
Member

lhoestq commented Jun 8, 2021

Feel free to revert my commit. I can investigate this in the coming days

@albertvillanova
Copy link
Member Author

albertvillanova commented Jun 8, 2021

@lhoestq I do not understand when you say:

It was failing for many datasets like squad

All the tests were green after my last commit.

@lhoestq
Copy link
Member

lhoestq commented Jun 8, 2021

All the tests were green after my last commit.

Yes but loading the actual squad dataset was failing :/

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring out why some columns were missing !

@albertvillanova albertvillanova merged commit cc1d805 into huggingface:master Jun 15, 2021
@albertvillanova albertvillanova added this to the 1.9 milestone Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants