Skip to content

Support cloud storage in load_dataset #5281

@lhoestq

Description

@lhoestq

Would be nice to be able to do

data_files=["s3://..."]  # or gs:// or any cloud storage path
storage_options = {...}
load_dataset(..., data_files=data_files, storage_options=storage_options)

The idea would be to use fsspec as in download_and_prepare and save_to_disk.

This has been requested several times already. Some users want to use their data from private cloud storage to train models

related:

#3490
#5244
forum

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood second issueIssues a bit more difficult than "Good First" issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions