Skip to content

AttributeError: type object 'tqdm' has no attribute '_lock' #1603

@NielsRogge

Description

@NielsRogge

Describe the bug

Getting a tqdm issue when writing a Dask dataframe to the hub.

Similar to huggingface/datasets#6066. Using latest Datasets version doesn't seem to resolve it

Steps to reproduce the bug

This is a minimal reproducer:

import dask.dataframe as dd
import pandas as pd
import random

import huggingface_hub

data = {"number": [random.randint(0,10) for _ in range(1000)]}
df = pd.DataFrame.from_dict(data)
dataframe = dd.from_pandas(df, npartitions=1)
dataframe = dataframe.repartition(npartitions=2)

repo_id = "nielsr/test-dask"
repo_path = f"hf://datasets/{repo_id}"
huggingface_hub.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
dd.to_parquet(dataframe, path=f"{repo_path}/data")

Note: I'm intentionally repartioning the Dask dataframe to 2 partitions, as it does work when only having one partition.

Expected behavior

Would expect to write to the hub without any problem.

Environment info

Datasets version 2.14.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions