Skip to content

Latest fsspec==2023.10.0 issue with streaming datasets #6330

@ZachNagengast

Description

@ZachNagengast

Describe the bug

Loading a streaming dataset with this version of fsspec fails with the following error:

NotImplementedError: Loading a streaming dataset cached in a LocalFileSystem is not supported yet.

I suspect the issue is with this PR

fsspec/filesystem_spec#1381

Steps to reproduce the bug

  1. Upgrade fsspec to version 2023.10.0
  2. Attempt to load a streaming dataset e.g. load_dataset("laion/gpt4v-emotion-dataset", split="train", streaming=True)
  3. Observe the following exception:
  File "/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/datasets/load.py", line 2146, in load_dataset
    return builder_instance.as_streaming_dataset(split=split)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/datasets/builder.py", line 1318, in as_streaming_dataset
    raise NotImplementedError(
NotImplementedError: Loading a streaming dataset cached in a LocalFileSystem is not supported yet.

Expected behavior

Should stream the dataset as normal.

Environment info

datasets@main
fsspec==2023.10.0

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions