Skip to content

ocaelen/ml-dataset-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ml-dataset-loader

A lightweight Python utility for loading, scaling, and subsampling standard machine learning datasets quickly for experiments and teaching.


🚀 Installation

Install directly from GitHub:

pip install git+https://github.com/your-username/ml-dataset-loader.git

✅ Recommended: Use a virtual environment for clean workflows.


📦 Usage

from ml_dataset_loader import DatasetLoader

loader = DatasetLoader(max_samples=100, random_state=42)
X, y = loader.load("iris")

print(X.shape)  # (100, 4)
print(y.shape)  # (100,)

✅ The dataset is automatically:

  • Loaded
  • Scaled (zero mean, unit variance)
  • Subsampled if needed

📜 Supported datasets

Run:

from ml_dataset_loader import DatasetLoader
print(DatasetLoader.supported_datasets())

to list all available datasets for your experiments.


➕ Adding a new dataset

1️⃣ Open ml_dataset_loader/loader.py.

2️⃣ Add at the end:

@register_dataset("your_dataset_name")
def _load_your_dataset_name():
    # Load your dataset here
    return X, y

X: numpy array of shape (n_samples, n_features)
y: numpy array of shape (n_samples,) with integer labels

3️⃣ Done. You can now use:

X, y = loader.load("your_dataset_name")

⚖️ License

MIT License


Built for clean, reusable ML experiments.

About

Lightweight ML dataset loader for experiments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages