A lightweight Python utility for loading, scaling, and subsampling standard machine learning datasets quickly for experiments and teaching.
Install directly from GitHub:
pip install git+https://github.com/your-username/ml-dataset-loader.git✅ Recommended: Use a virtual environment for clean workflows.
from ml_dataset_loader import DatasetLoader
loader = DatasetLoader(max_samples=100, random_state=42)
X, y = loader.load("iris")
print(X.shape) # (100, 4)
print(y.shape) # (100,)✅ The dataset is automatically:
- Loaded
- Scaled (zero mean, unit variance)
- Subsampled if needed
Run:
from ml_dataset_loader import DatasetLoader
print(DatasetLoader.supported_datasets())to list all available datasets for your experiments.
1️⃣ Open ml_dataset_loader/loader.py.
2️⃣ Add at the end:
@register_dataset("your_dataset_name")
def _load_your_dataset_name():
# Load your dataset here
return X, y✅ X: numpy array of shape (n_samples, n_features)
✅ y: numpy array of shape (n_samples,) with integer labels
3️⃣ Done. You can now use:
X, y = loader.load("your_dataset_name")MIT License
Built for clean, reusable ML experiments.