Image Generation and Analysis Platform

This project implements an evaluation platform for image generation models, focusing on their ability to generate a specific number of objects. It supports direct generation, two-pass generation (base + edit), and automated analysis using Vision Language Models (VLMs).

Features

Direct Generation: Generate images using SOTA models (Gemini 2.5 Flash Image, GPT Image 1, Recraft V3).
Automated Analysis: Count objects in generated images using VLMs (Qwen3 VL, Gemini 3 Pro).
Auto-Correction Loop: Automatically attempt to fix incorrect counts by editing the image (GPT Image 1, Recraft V3).
Modular Architecture: Easily extensible interfaces for Generators, Editors, and Analyzers.

Installation

This project uses uv for dependency management.

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Sync dependencies:
```
uv sync
```

Environment Setup: Create a .env file in the root directory with your API keys:

GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key
OPENROUTER_API_KEY=your_openrouter_key
FAL_KEY=your_fal_key

Usage

Run the evaluation CLI using uv run. Images are saved to the output/ directory.

Direct Generation Mode

Generate an image and analyze it once.

uv run python main.py --prompt "3 apples on a table" --count 3 --object "apples" --mode direct --generator gemini --analyzer qwen

Loop Mode (Auto-Correction)

Generate an image, analyze it, and if the count is wrong, attempt to edit it (up to 2 retries).

uv run python main.py --prompt "5 cats" --count 5 --mode loop --generator openai --editor openai --analyzer qwen

Supported Models

Type	Model	CLI Argument
Generator	Gemini 2.5 Flash Image	`gemini`
	GPT Image 1	`openai`
	Recraft V3 (via Fal)	`fal`
Editor	GPT Image 1	`openai`
	Recraft V3 (via Fal)	`fal`
	Gemini Editor	(Coming Soon)
Analyzer	Qwen3 VL 235B (OpenRouter)	`qwen`
	Gemini 3 Pro	`gemini`

Documentation

See docs/ for more detailed documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Generation and Analysis Platform

Features

Installation

Usage

Direct Generation Mode

Loop Mode (Auto-Correction)

Supported Models

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

trilogy-group/imgcount

Folders and files

Latest commit

History

Repository files navigation

Image Generation and Analysis Platform

Features

Installation

Usage

Direct Generation Mode

Loop Mode (Auto-Correction)

Supported Models

Documentation

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages