-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
If you do
dataset = load_dataset("json", data_files=data_files, features=features)Then depending on the order of the features in the json data field it fails:
[...]
~/Desktop/hf/datasets/src/datasets/packaged_modules/json/json.py in _generate_tables(self, files)
94 if self.config.schema:
95 # Cast allows str <-> int/float, while parse_option explicit_schema does NOT
---> 96 pa_table = pa_table.cast(self.config.schema)
97 yield i, pa_table
[...]
ValueError: Target schema's field names are not matching the table's field names: ['tokens', 'ner_tags'], ['ner_tags', 'tokens']This is because one must first re-order the columns of the table to match the self.config.schema before calling cast.
One way to fix the cast would be to replace it with:
# reorder the arrays if necessary + cast to schema
# we can't simply use .cast here because we may need to change the order of the columns
pa_table = pa.Table.from_arrays([pa_table[name] for name in schema.names], schema=schema)Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working