This project implements an evaluation platform for image generation models, focusing on their ability to generate a specific number of objects. It supports direct generation, two-pass generation (base + edit), and automated analysis using Vision Language Models (VLMs).
- Direct Generation: Generate images using SOTA models (Gemini 2.5 Flash Image, GPT Image 1, Recraft V3).
- Automated Analysis: Count objects in generated images using VLMs (Qwen3 VL, Gemini 3 Pro).
- Auto-Correction Loop: Automatically attempt to fix incorrect counts by editing the image (GPT Image 1, Recraft V3).
- Modular Architecture: Easily extensible interfaces for Generators, Editors, and Analyzers.
This project uses uv for dependency management.
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Sync dependencies:
uv sync
-
Environment Setup: Create a
.envfile in the root directory with your API keys:GEMINI_API_KEY=your_gemini_key OPENAI_API_KEY=your_openai_key OPENROUTER_API_KEY=your_openrouter_key FAL_KEY=your_fal_key
Run the evaluation CLI using uv run. Images are saved to the output/ directory.
Generate an image and analyze it once.
uv run python main.py --prompt "3 apples on a table" --count 3 --object "apples" --mode direct --generator gemini --analyzer qwenGenerate an image, analyze it, and if the count is wrong, attempt to edit it (up to 2 retries).
uv run python main.py --prompt "5 cats" --count 5 --mode loop --generator openai --editor openai --analyzer qwen| Type | Model | CLI Argument |
|---|---|---|
| Generator | Gemini 2.5 Flash Image | gemini |
| GPT Image 1 | openai |
|
| Recraft V3 (via Fal) | fal |
|
| Editor | GPT Image 1 | openai |
| Recraft V3 (via Fal) | fal |
|
| Gemini Editor | (Coming Soon) | |
| Analyzer | Qwen3 VL 235B (OpenRouter) | qwen |
| Gemini 3 Pro | gemini |
See docs/ for more detailed documentation.