Skip to content

Streaming dataset hangs consistently #8169

@michaelpginn

Description

@michaelpginn

Describe the bug

When loading a dataset with streaming=True, some background process prevents the script from ever returning (or at least for a very long time).

This did not happen with huggingface-hub < 1.

Steps to reproduce the bug

from datasets import load_dataset

ds = load_dataset("IRIIS-RESEARCH/Nepali-Text-Corpus", split="train", streaming=True)
print(next(iter(ds)))

Expected behavior

Should clean up as soon as iterable dataset goes out of scope

Environment info

  • datasets version: 4.8.5
  • Platform: macOS-26.3.1-arm64-arm-64bit
  • Python version: 3.11.13
  • huggingface_hub version: 1.13.0
  • PyArrow version: 24.0.0
  • Pandas version: 3.0.2
  • fsspec version: 2026.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions