-
Notifications
You must be signed in to change notification settings - Fork 3k
Better cast error when generating dataset #6509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
albertvillanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user wants to catch one of the errors, they will have to find out where the errors are defined: datasets.builder.CastErrorDuringDatasetGeneration and datasets.table.CastError, respectively.
I would suggest defining the errors in a single place: datasets.exceptions.
Additionally, I would suggest making them inherit from DatasetsError, so that a user can catch any error specifically generated by the datasets library.
|
I created I also added a help message at the end of the error: |
albertvillanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|

I want to improve the error message for datasets like https://huggingface.co/datasets/m-a-p/COIG-CQIA
Cc @albertvillanova @severo is this new error ok ? Or should I use a dedicated error class ?
New:
Previously: