-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Describe the bug
For the beans dataset (did not try on other), the order of samples is not the same on different machines. Tested on my local laptop, github actions machine, and ec2 instance. The three yield a different order.
Steps to reproduce the bug
In a clean docker container or conda environment with datasets==2.6.1, run
from datasets import load_dataset
from pprint import pprint
data = load_dataset("beans", split="validation")
pprint(data["image_file_path"])Expected behavior
The order of the images is the same on all machines.
Environment info
On the EC2 instance:
- `datasets` version: 2.6.1
- Platform: Linux-4.14.291-218.527.amzn2.x86_64-x86_64-with-glibc2.2.5
- Python version: 3.7.10
- PyArrow version: 9.0.0
- Pandas version: 1.3.5
- Numpy version: not checked
On my local laptop:
- `datasets` version: 2.6.1
- Platform: Linux-5.15.0-50-generic-x86_64-with-glibc2.35
- Python version: 3.9.12
- PyArrow version: 7.0.0
- Pandas version: 1.3.5
- Numpy version: 1.23.1
On github actions:
- `datasets` version: 2.6.1
- Platform: Linux-5.15.0-1022-azure-x86_64-with-glibc2.2.5
- Python version: 3.8.14
- PyArrow version: 9.0.0
- Pandas version: 1.5.1
- Numpy version: 1.23.4
Metadata
Metadata
Assignees
Labels
No labels