A personalized AI image generator for creating custom images of Penguinoh the Stuffed Penguin using DreamBooth fine-tuning and the FLUX.1-dev diffusion model.
This project uses DreamBooth to fine-tune the FLUX.1-dev text-to-image model on custom images of a stuffed penguin, enabling the generation of new, creative images featuring the character in various scenarios, styles, and settings.
The project supports both local training and cloud deployment via Modal, providing a scalable infrastructure for GPU-accelerated training and inference.
- DreamBooth Fine-tuning: Train a LoRA adapter on custom images using Hugging Face's DreamBooth implementation
- FLUX.1-dev Model: Uses Black Forest Labs' state-of-the-art diffusion model as the base
- Local Inference: Generate images locally using pre-trained models from Hugging Face Hub
- Cloud Deployment: Deploy training and inference on Modal with A100 GPUs
- Web Interface: Interactive Gradio-based UI for generating images
- Training Monitoring: Integration with Weights & Biases for experiment tracking
- FastAPI Backend: REST API endpoints for programmatic access
.
├── main.py # Local training script (download & train)
├── inference.py # Local inference script using pre-trained model
├── modal_app.py # Modal deployment (cloud training & inference)
├── pyproject.toml # Project dependencies
├── data/
│ └── images/ # Training images of the subject
├── dreambooth/ # Hugging Face DreamBooth training scripts
├── model/ # Output directory for trained models
└── output/ # Generated images from inference
- Python 3.12+
- CUDA-compatible GPU (recommended: A100) for training
- Hugging Face account and API token
- Modal account (for cloud deployment)
- Weights & Biases account (optional, for training monitoring)
- Clone the repository:
git clone <repository-url>
cd penguinoh_generator- Install dependencies using uv (or pip):
uv sync- Set up your API keys:
- Create a
.envfile or export environment variables:
export HF_TOKEN=<your-huggingface-token> export WANDB_API_KEY=<your-wandb-key> # Optional
- Create a
- Ensure uv dependencies are installed:
uv sync
- Download the base model:
uv run main.py download- Train the model:
uv run main.py train- Download and train in one step:
uv run main.pyRun inference using the pre-trained model from Hugging Face Hub:
- Ensure the output directory exists:
mkdir -p output- Run the inference script:
uv run inference.pyThis will:
- Load the FLUX.1-dev base model with bfloat16 precision
- Load the fine-tuned LoRA weights from
eigenben/FLUX.1-dev-penguinoh-generator - Generate 10 example images with various prompts (Paris, San Francisco, pastel drawing, etc.)
- Save timestamped images to the
output/directory
You can modify inference.py to customize the prompts and generate your own images.
- Set up Modal secrets:
modal secret create huggingface-secret HF_TOKEN=<your-token>
modal secret create wandb-secret WANDB_API_KEY=<your-key> # Optional- Upload training images to Modal volume:
# Upload your images to the Modal volume at /workspace/data/images- Run training on Modal:
uv run modal run modal_app.py --max-train-steps 250- Deploy the web interface:
uv run modal deploy modal_app.pyThis will create a public URL for the Gradio interface where you can generate images.
- Ensure rsync is installed:
apt update && apt install rsync -y - Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh - Sync project to remote host:
bin/rspec 1.1.1.1:22 /workspace(/workspaceis implied/default) - Ensure HF_TOKEN and other env vars are set in
.envon remote host - SSH into remote host and run:
cd /workspace && uv sync && uv run --env-file .env python main.py
- Build:
docker build -t penguinoh-generator:latest . - Ensure HF_TOKEN and other env vars are set in
.envon remote host - Run (NVIDIA):
docker run --rm --gpus all --env-file .env -v "$PWD/model:/app/model" penguinoh-generator:latest
Training parameters can be configured in main.py:
@dataclass
class TrainConfig(SharedConfig):
instance_name: str = "pngnh" # Instance identifier
class_name: str = "Stuffed Penguin" # Class description
model_name: str = "black-forest-labs/FLUX.1-dev"
resolution: int = 512 # Training resolution
train_batch_size: int = 3 # Batch size
rank: int = 16 # LoRA rank
learning_rate: float = 4e-4
max_train_steps: int = 500 # Training iterations
seed: int = 117 # Random seedInference parameters are in modal_app.py:
@dataclass
class AppConfig(SharedConfig):
num_inference_steps: int = 50 # Quality vs speed tradeoff
guidance_scale: float = 6 # Prompt adherence strength- Download: Fetches the FLUX.1-dev base model from Hugging Face
- Training: Uses DreamBooth with LoRA to fine-tune the model on custom images
- Creates a unique instance phrase: "pngnh the Stuffed Penguin"
- Trains the model to recognize and generate this specific character
- Saves LoRA weights for efficient storage and loading
- Inference: Loads the base model + LoRA weights to generate new images
- Web Interface: Provides an easy-to-use Gradio UI for image generation
- Method: DreamBooth with LoRA (Low-Rank Adaptation)
- Base Model: FLUX.1-dev (from Black Forest Labs)
- Precision: BFloat16 (when CUDA available)
- Optimizer: AdamW with configurable learning rate
- Scheduler: Constant learning rate (configurable)
- GPU: A100-80GB recommended for training
The deployed Gradio interface allows users to:
- Enter text prompts describing the desired image
- Generate 512x512 images featuring Penguinoh
- Experiment with different artistic styles and scenarios
Example prompts:
- "pngnh the Stuffed Penguin in space"
- "pngnh the Stuffed Penguin painted by Van Gogh"
- "pngnh the Stuffed Penguin as a superhero"
- Built with Hugging Face Diffusers
- Deployed on Modal
- UI powered by Gradio
- Base model: FLUX.1-dev