Penguinoh Generator

A personalized AI image generator for creating custom images of Penguinoh the Stuffed Penguin using DreamBooth fine-tuning and the FLUX.1-dev diffusion model.

Overview

This project uses DreamBooth to fine-tune the FLUX.1-dev text-to-image model on custom images of a stuffed penguin, enabling the generation of new, creative images featuring the character in various scenarios, styles, and settings.

The project supports both local training and cloud deployment via Modal, providing a scalable infrastructure for GPU-accelerated training and inference.

Features

DreamBooth Fine-tuning: Train a LoRA adapter on custom images using Hugging Face's DreamBooth implementation
FLUX.1-dev Model: Uses Black Forest Labs' state-of-the-art diffusion model as the base
Local Inference: Generate images locally using pre-trained models from Hugging Face Hub
Cloud Deployment: Deploy training and inference on Modal with A100 GPUs
Web Interface: Interactive Gradio-based UI for generating images
Training Monitoring: Integration with Weights & Biases for experiment tracking
FastAPI Backend: REST API endpoints for programmatic access

Project Structure

.
├── main.py              # Local training script (download & train)
├── inference.py         # Local inference script using pre-trained model
├── modal_app.py         # Modal deployment (cloud training & inference)
├── pyproject.toml       # Project dependencies
├── data/
│   └── images/          # Training images of the subject
├── dreambooth/          # Hugging Face DreamBooth training scripts
├── model/               # Output directory for trained models
└── output/              # Generated images from inference

Requirements

Python 3.12+
CUDA-compatible GPU (recommended: A100) for training
Hugging Face account and API token
Modal account (for cloud deployment)
Weights & Biases account (optional, for training monitoring)

Installation

Clone the repository:

git clone <repository-url>
cd penguinoh_generator

Install dependencies using uv (or pip):

uv sync

Set up your API keys:

Create a .env file or export environment variables:

export HF_TOKEN=<your-huggingface-token>
export WANDB_API_KEY=<your-wandb-key>  # Optional

Usage

Local Training

Ensure uv dependencies are installed:

uv sync

Download the base model:

uv run main.py download

Train the model:

uv run main.py train

Download and train in one step:

uv run main.py

Local Inference

Run inference using the pre-trained model from Hugging Face Hub:

Ensure the output directory exists:

mkdir -p output

Run the inference script:

uv run inference.py

This will:

Load the FLUX.1-dev base model with bfloat16 precision
Load the fine-tuned LoRA weights from eigenben/FLUX.1-dev-penguinoh-generator
Generate 10 example images with various prompts (Paris, San Francisco, pastel drawing, etc.)
Save timestamped images to the output/ directory

You can modify inference.py to customize the prompts and generate your own images.

Cloud Training with Modal

Set up Modal secrets:

modal secret create huggingface-secret HF_TOKEN=<your-token>
modal secret create wandb-secret WANDB_API_KEY=<your-key>  # Optional

Upload training images to Modal volume:

# Upload your images to the Modal volume at /workspace/data/images

Run training on Modal:

uv run modal run modal_app.py --max-train-steps 250

Deploy the web interface:

uv run modal deploy modal_app.py

This will create a public URL for the Gradio interface where you can generate images.

Run on a remote GPU host

Ensure rsync is installed: apt update && apt install rsync -y
Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh
Sync project to remote host: bin/rspec 1.1.1.1:22 /workspace (/workspace is implied/default)
Ensure HF_TOKEN and other env vars are set in .env on remote host
SSH into remote host and run: cd /workspace && uv sync && uv run --env-file .env python main.py

Docker (GPU)

Build: docker build -t penguinoh-generator:latest .
Ensure HF_TOKEN and other env vars are set in .env on remote host
Run (NVIDIA): docker run --rm --gpus all --env-file .env -v "$PWD/model:/app/model" penguinoh-generator:latest

Configuration

Training parameters can be configured in main.py:

@dataclass
class TrainConfig(SharedConfig):
    instance_name: str = "pngnh"           # Instance identifier
    class_name: str = "Stuffed Penguin"    # Class description
    model_name: str = "black-forest-labs/FLUX.1-dev"
    resolution: int = 512                   # Training resolution
    train_batch_size: int = 3              # Batch size
    rank: int = 16                         # LoRA rank
    learning_rate: float = 4e-4
    max_train_steps: int = 500             # Training iterations
    seed: int = 117                        # Random seed

Inference parameters are in modal_app.py:

@dataclass
class AppConfig(SharedConfig):
    num_inference_steps: int = 50          # Quality vs speed tradeoff
    guidance_scale: float = 6              # Prompt adherence strength

How It Works

Download: Fetches the FLUX.1-dev base model from Hugging Face
Training: Uses DreamBooth with LoRA to fine-tune the model on custom images
- Creates a unique instance phrase: "pngnh the Stuffed Penguin"
- Trains the model to recognize and generate this specific character
- Saves LoRA weights for efficient storage and loading
Inference: Loads the base model + LoRA weights to generate new images
Web Interface: Provides an easy-to-use Gradio UI for image generation

Training Details

Method: DreamBooth with LoRA (Low-Rank Adaptation)
Base Model: FLUX.1-dev (from Black Forest Labs)
Precision: BFloat16 (when CUDA available)
Optimizer: AdamW with configurable learning rate
Scheduler: Constant learning rate (configurable)
GPU: A100-80GB recommended for training

Web Interface

The deployed Gradio interface allows users to:

Enter text prompts describing the desired image
Generate 512x512 images featuring Penguinoh
Experiment with different artistic styles and scenarios

Example prompts:

"pngnh the Stuffed Penguin in space"
"pngnh the Stuffed Penguin painted by Van Gogh"
"pngnh the Stuffed Penguin as a superhero"

Credits

Built with Hugging Face Diffusers
Deployed on Modal
UI powered by Gradio
Base model: FLUX.1-dev

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
dreambooth		dreambooth
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
inference.py		inference.py
main.py		main.py
modal_app.py		modal_app.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Penguinoh Generator

Overview

Features

Project Structure

Requirements

Installation

Usage

Local Training

Local Inference

Cloud Training with Modal

Run on a remote GPU host

Docker (GPU)

Configuration

How It Works

Training Details

Web Interface

Credits

About

Uh oh!

Releases

Packages

Languages

eigenben/penguinoh_generator

Folders and files

Latest commit

History

Repository files navigation

Penguinoh Generator

Overview

Features

Project Structure

Requirements

Installation

Usage

Local Training

Local Inference

Cloud Training with Modal

Run on a remote GPU host

Docker (GPU)

Configuration

How It Works

Training Details

Web Interface

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages