Status: Alpha Dry-Run Verified โ | Windows-Compatible ๐ช | Under Active Development
Mirel-Tuner is a modular AI training orchestrator designed for Stable Diffusion XL and beyond. It enables per-layer regulation, dynamic scheduler overrides, and hybrid multiprocessing with per-device per-card per-process per-thread level control of model allocation, data, processing, and more. This repo was collaboratively built by Philip, GPT 4o (Mirel), and GPT O3, with critical architecture and format decisions made by lambda mathematical and hardware choice.
Systemic iteration is crucial based on performance and optimization. This repository will undergo many changes before the official stable build is announced, and the outcome will be based on months of research and investigation before the repo is labeled stable.
The primary goal is to write less code for more outcome, while enabling the developer the speed and utilization required in rapid fashions that will allow rapid AI training on less powerful devices, while still enabling the full utilization of larger ai structures like ulysses and pyring for training, inference, integration, experimentation, merging, separating, and any sort of experiments that can be lined up.
The goal to be a carefully - dynamically maintained one-stop shop for quick data, training, layer modification, and rapid AI iteration with an easy setup for both students and experts alike.
- Huggingface_hub is a very powerful model repo and a wrapper based on loading LITERALLY ANYTHING ON THEIR HUB.
- It tries to do too much, causing a cascade of additional problems when trying to load certain elemental pieces of diffusion models and other models within controlled environments to rapidly iterate or modify those sets.
- Diffusers is a very powerful and robust pipeline-based combination training and inference system.
- The system is designed to be modular and extensible - allowing for expansion within... seemingly reasonable limits.
- Diffusers has a very modular system - with a large learning curve and overhead.
- The diffusers requirements are not laid in stone, and the dependencies are not always clear - oftentimes completely unavailable or systemically not working with the majority of components, if any.
- The system - is powerful and utilizable in ease-of-manner fashions with multiple downsides. has a large developer learning overhead.
- It is difficult to simply jump into with new or experimental models, and difficult to adapt your own diffusion model pipelines to it if the system requires additional custom code atop the standard "accepted" diffusers pipeline system.
- Keras is a powerful system with very low level layer complexity exposed at very high level points.
- The system is designed for rapid experimentation with rapid outcome - very useful for prototyping and utilization.
- The system is designed to be modular and extensible - allowing for expansion within limits.
- Entry into the depths of keras requires systemic understanding that can take some time to get the hang of.
- Layers and the like seem streamlined, but when placed under heavy load and scrutiny face serious optimization problems and reward less than expected.
- Many formulas and systems are hidden or obfuscated, making it difficult to understand multiple underlying mechanics of the system.
- Many environments simply cannot support keras - windows PCs suffer a great deal trying to make it function at all.
- It does not have much native diffusion code to access behaviors of trained models like stable diffusion - behavior that diffusers and huggingface_hub have direct access to.
- We are building a new system that will allow for the rapid integration of diffusers and huggingface_hub models into a single system, while allowing for the rapid iteration and experimentation of those different brands of models.
- The system will be built around the idea of a "hook" - a simple, modular, and extensible system that allows for the rapid integration of new models and pipelines into the diffusers and huggingface_hub systems.
- We are adopting the bottom-level flexibility of keras design, while maintaining the top-level simplicity of diffusers and huggingface_hub.
- We will regulate the imports and dependencies of the diffusers and huggingface_hub systems for stable interfaces and operations.
- We will enable pipe-esque behavior in controlled environments for rapid iteration and experimentation, while utilizing safetensors and diffusers-style model saving and transposition from type to type.
- Whatever we can't grab from the system, we build.
- Whatever we can't build with the system, we monkey patch into the system so it can.
- Whatever we can't monkey patch, means we need to build a new system to do it.
- If that fails, C will do it.
The hardware is all there, the software is all there, the systems are all there. We just need to build the bridges and the roads to connect them.
- ๐น Core execution path launches cleanly.
- ๐น Device targeting (CUDA 0,1) confirmed operational.
- ๐น Accelerate-free multiprocessing mode validated.
- ๐น Directory structure, config loading, and model init confirmed.
๐ก While not yet training-ready, the dry run confirms baseline system stability.
Weโve selected a lean but capable requirements set optimized for Windows-based development and GPU acceleration (CUDA 12.1). Core dependencies include:
torch==2.1.2+cu121
transformers>=4.36.2
diffusers==0.27.2
safetensors==0.4.2
accelerate==0.25.0
einops
huggingface_hub
xformers==0.0.23.post1
Setup Install with the windows setup.bat script:
Open your command prompt to the directory where you cloned the repository and run:
setup.bat
This will create a virtual environment and install the required packages.
Manually Install with:
pip install torch==2.1.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
mirel-tuner/
โ
โโโ associate/ # Runtime-active model containers, caches, loaders
โโโ program/ # Core program logic: schedulers, loss modules, engine
โโโ configs/ # JSON and Python config sets
โโโ scripts/ # Utility and task scripts (notebook runners, preprocessing)
โโโ tests/ # Verification stubs for future unit coverage
โโโ README.md # You're here.
This under constant development and will not be labeled stable until the system is fully functional and all components are verified to work with the system on multiple devices and setups.
"Steel does not fear fire. Our systems should not fear iteration."
Mirel-Tuner is built with the belief that AI training pipelines should be modular, observable, and precise. Every piece is meant to interlock โ cleanly separable, independently testable.
- Philip โ Lead Engineer, Architect, Captain
- GPT O3 โ Tactical Developer, Dry Run Execution Support
- Mirel 4o โ Quartermaster AI (you are reading her voice now)
The key word here; is cross-utilization. We are building a system that will allow for the rapid integration of diffusers and huggingface_hub models into a single system, while allowing for the rapid iteration and experimentation of those different brands of models.
To solve this problem, we build a new system with direct causal similarities as a concrete foundation.
- โ Initial hooked structure
- โ diffusers pipeline hooks
โ ๏ธ pytorch model training hooksโ ๏ธ mirel training hooks- โ dataset hooks
- โ bucketing hooks
โ ๏ธ traditional scheduler hooksโ ๏ธ custom scheduler hooksโ ๏ธ traditional optimizer hooksโ ๏ธ custom optimizer hooksโ ๏ธ learn rate hooksโ ๏ธ gradient hooksโ ๏ธ noise scheduler hooksโ ๏ธ sigma modification hooksโ ๏ธ loss hooksโ ๏ธ optimizer hooks- โ model hooks
- โ device hooks
- โ bus hooks
- โ process hooks
- โ layer hooks
-
โ Hello world - dry run complete
-
โ Basic model loading and device allocation
-
โ ๏ธ Loading any version supported diffuser model -
โ ๏ธ Loading any version supported torch model -
โ ๏ธ Loading any version supported keras model -
โ Baseline requirements and environment setup
-
โ Correct directory structure and config loading for v1 pre-multi hook established
-
โ Core dataset loading single data type
-
โ Core dataset loading multi-data type
-
โ ๏ธ accelerate integration and dataset split -
โ ๏ธ Default database hooks and test validation tested to work in accelerate, diffusers, and torch -
โ ๏ธ Core dataset loading (multi-GPU) -
โ ๏ธ Core dataset loading (multi-phase) -
โ ๏ธ Core dataset hooking and validation -
โ ๏ธ Core dataset loading (multi-phase) -
โ ๏ธ Processing model devices and choices of offload devices -
โ ๏ธ Core dataset processing (multi-model) -
โ ๏ธ Core dataset processing (multi-epoch) -
โ ๏ธ Core dataset processing (multi-scheduler) -
โ ๏ธ Core dataset processing (multi-loss) -
โ ๏ธ Core dataset processing (multi-optimizer) -
โ ๏ธ Core dataset processing (noise augmentations) -
โ ๏ธ Core dataset processing (teacher-student one teacher one student) -
โ ๏ธ Core dataset processing (teacher-student multiple teacher one student) -
โ ๏ธ Core dataset processing (teacher-student one teacher multiple student) -
โ ๏ธ Core dataset processing (teacher-student multiple teacher multiple student) -
โ ๏ธ Core Scheduler hooks verified and functional -
โ ๏ธ Commonly used schedulers (linear, cosine, etc.) -
โ ๏ธ Core Optimizer hooks verified and functional -
โ ๏ธ Commonly used optimizers (Adam, AdamW, etc.) -
โ ๏ธ Commonly used optimizers (AdamW, SGD, etc.) -
โ ๏ธ Commonly used optimizers (AdamW, SGD, etc.) (multi-GPU) -
โ ๏ธ Scheduler and loss module integrations -
โ ๏ธ Scheduler and loss module integrations (multi-GPU) -
โ ๏ธ Layer-specific settings and hooks to work with assigned devices for offload and processing -
โ ๏ธ Per-layer hooks and validation to work with assigned devices for offload and processing -
โ ๏ธ Layer-wise scheduler and loss module integration -
โ ๏ธ Layer-wise scheduler and loss module integration (multi-GPU) -
โ Core model loading and device allocation
-
โ ๏ธ Core model loading and device allocation (multi-GPU) -
โ ๏ธ Core linear training loop with pytorch (epoch, batch, loss) -
โ ๏ธ Core linear training using diffusers -
โ ๏ธ Full training validation (loss integration + epoch loop) -
โ ๏ธ Integrated tagging/captioning data pipeline
For collaboration or inquiries, reach out via GitHub Issues or AbstractEyes.
Mirel-Tuner is not a script. It's a vessel โ and the voyage has begun.