Skip to content

torchvision.datasets.mnist has a broken url in mirrors to download the dataset #8568

@cwestergren

Description

@cwestergren

🐛 Describe the bug

While attempting to download the MNIST dataset using torchvision.datasets.MNIST, I encountered an error that prevents the dataset from downloading successfully. The error indicates an issue with accessing one of the download URLs.

`import torchvision.datasets as datasets
from torch.utils.data import DataLoader

val_ds = datasets.MNIST(root='.', train=False, download=True)
val_dl = DataLoader(val_ds, batch_size=128, shuffle=True)`

Expected Behavior

The MNIST dataset should be downloaded successfully without encountering any HTTP errors.

Actual Behavior

The download fails with a 403 Forbidden error when attempting to access http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz.

Observations

  1. Unencrypted HTTP Resource: The download is attempting to access a resource over HTTP instead of HTTPS, which may not be secure.
  2. 403 Forbidden Error: The server is returning a 403 Forbidden error, indicating that access to the resource is not allowed.

It's been this way for some time, so suggest updating the list of mirrors in https://github.com/pytorch/vision/blob/main/torchvision/datasets/mnist.py to not lead to an unsecure/broken endpoint.

mirrors = [ "http://yann.lecun.com/exdb/mnist/", "https://ossci-datasets.s3.amazonaws.com/mnist/", ]

Notably, trying to download the same files directly from @ylecun page https://yann.lecun.com/exdb/mnist/index.html fails with the same error.

Versions

PyTorch version: 2.3.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-11-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3696
DeviceID=CPU0
Family=207
L2CacheSize=2560
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3696
Name=Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.1+cu118
[pip3] torchinfo==1.8.0
[pip3] torchvision==0.18.1+cu118
[pip3] torchviz==0.0.2
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions