Skip to content

load_dataset requires a network connection for local download? #6750

@MiroFurtado

Description

@MiroFurtado

Describe the bug

Hi all - I see that in the past a network dependency has been mistakenly introduced into load_dataset even for local loads. Is it possible this has happened again?

Steps to reproduce the bug

>>> import datasets
>>> datasets.load_dataset("hh-rlhf")
Repo card metadata block was not found. Setting CardData to empty.
*hangs bc i'm firewalled*

stack trace from ctrl-c:

^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jobuser/.local/lib/python3.10/site-packages/datasets/load.py", line 2582, in load_dataset
    builder_instance.download_and_prepare(
    output_path = get_from_cache(                                                                                                                              [0/122]
  File "/home/jobuser/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 532, in get_from_cache
    response = http_head(
  File "/home/jobuser/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 419, in http_head
    response = _request_with_retry(
  File "/home/jobuser/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 304, in _request_with_retry
    response = requests.request(method=method.upper(), url=url, timeout=timeout, **params)
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/requests/adapters.py", line 487, in send
    resp = conn.urlopen(
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/connection.py", line 363, in connect
    self.sock = conn = self._new_conn()
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/home/jobuser/build/lipy-flytekit-image/environments/satellites/python/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
KeyboardInterrupt

Expected behavior

loads the dataset

Environment info

> pip show datasets
Name: datasets
Version: 2.18.0

Python 3.10.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions