-
Notifications
You must be signed in to change notification settings - Fork 3k
Add fsspec support for to_json, to_csv, and to_parquet
#6096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fsspec support for to_json, to_csv, and to_parquet
#6096
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Included for `to_json`, `to_parquet`, and `to_csv` only
|
Hi here @lhoestq @mariosasko I just realised this PR is still open, should we close it in case this is something not to include within |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping ! It looks good to me, I just added a few suggestions before merging:
Co-authored-by: Quentin Lhoest <[email protected]>
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks !
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
|
Thanks @alvarobartt. I am linking this PR to the corresponding issue (on the right column, under "Development") and closing the issue. For future contributions, please add to the PR description the word "fix" followed by the issue number, e.g.:
|
Hi @albertvillanova, fair, I missed that, thanks for the edit and the heads up! |

Hi to whoever is reading this! 🤗 (Most likely @mariosasko)
What's in this PR?
This PR replaces the
openfrom Python withfsspec.openand adds the argumentstorage_optionsfor the methodsto_json,to_csv, andto_parquet, to allow users to export any 🤗Datasetinto a file in a file-system as requested at #6086.What's missing in this PR?
As per
to_json,to_csv, andto_parquetdocstrings for the recently includedstorage_optionsarg, I've scoped it to 2.15.0, so we should check that before merging in case we want to scope that for 2.14.2 instead.Additionally, should we also add
fsspecsupport for thefrom_csv,from_json, andfrom_parquetmethods? If you want me to do so @mariosasko just let me know and I'll create another PR to support that too!Fix #6086.