Skip to content

[Arrow writer, Trivia_qa] Could not convert TagMe with type str: converting to null type #211

@patrickvonplaten

Description

@patrickvonplaten

Running the following code

import nlp
ds = nlp.load_dataset("trivia_qa", "rc", split="validation[:1%]")  # this might take 2.3 min to download but it's cached afterwards...
ds.map(lambda x: x, load_from_cache_file=False)

triggers a ArrowInvalid: Could not convert TagMe with type str: converting to null type error.

On the other hand if we remove a certain column of trivia_qa which seems responsible for the bug, it works:

import nlp
ds = nlp.load_dataset("trivia_qa", "rc", split="validation[:1%]")  # this might take 2.3 min to download but it's cached afterwards...
ds.map(lambda x: x, remove_columns=["entity_pages"], load_from_cache_file=False)

. Seems quite hard to debug what's going on here... @lhoestq @thomwolf - do you have a good first guess what the problem could be?

Note BTW: I think this could be a good test to check that the datasets work correctly: Take a tiny portion of the dataset and check that it can be written correctly.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions