-
Notifications
You must be signed in to change notification settings - Fork 862
Closed
Description
Describe the bug
Getting a tqdm issue when writing a Dask dataframe to the hub.
Similar to huggingface/datasets#6066. Using latest Datasets version doesn't seem to resolve it
Steps to reproduce the bug
This is a minimal reproducer:
import dask.dataframe as dd
import pandas as pd
import random
import huggingface_hub
data = {"number": [random.randint(0,10) for _ in range(1000)]}
df = pd.DataFrame.from_dict(data)
dataframe = dd.from_pandas(df, npartitions=1)
dataframe = dataframe.repartition(npartitions=2)
repo_id = "nielsr/test-dask"
repo_path = f"hf://datasets/{repo_id}"
huggingface_hub.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
dd.to_parquet(dataframe, path=f"{repo_path}/data")
Note: I'm intentionally repartioning the Dask dataframe to 2 partitions, as it does work when only having one partition.
Expected behavior
Would expect to write to the hub without any problem.
Environment info
Datasets version 2.14.4
Metadata
Metadata
Assignees
Labels
No labels