Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 1, 2021

As mentioned in #2401, there is an issue when loading the features of natural_questions since the order of the nested fields in the features don't match. The order is important since it matters for the underlying arrow schema.

To fix that I re-order the features based on the arrow schema:

inferred_features = Features.from_arrow_schema(arrow_table.schema)
self.info.features = self.info.features.reorder_fields_as(inferred_features)
assert self.info.features.type == inferred_features.type

The re-ordering is a recursive function. It takes into account that the Sequence feature type is a struct of list and not a list of struct.

Now it's possible to load natural_questions again :)

@lhoestq lhoestq requested a review from albertvillanova June 1, 2021 16:09
@lhoestq lhoestq merged commit 92aacfe into master Jun 4, 2021
@lhoestq lhoestq deleted the fix-natural_questions-nested-features-order branch June 4, 2021 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants