Text Detection Dataset Generator

A dataset-agnostic engine that consumes YOLOv8-Seg text detection datasets, extracts/generates text assets, and synthesizes new training data with diverse augmentations.

Features

Dataset-agnostic ingestion: Accepts any YOLOv8-Seg formatted datasets
Real word extraction: Extracts text crops with alpha matting using RMBG-2.0
Synthetic text generation: Renders text with fonts, effects, and styling
Background cleaning: Removes text from images using LaMa inpainting
Advanced composition: Places text with warping, blending, and collision avoidance
Comprehensive augmentations: Photometric, motion, scene, and document effects
Automatic train/val splitting: Configurable split ratios

Installation

pip install -r requirements.txt

Requirements

Python 3.8+
CUDA-capable GPU (recommended: A100 for best performance)
~10GB GPU memory for model inference

Usage

Basic Usage

python -m engine.main \
    --dataset-dir /path/to/datasets/ \
    --wordlist /path/to/words.txt \
    --fonts-dir /path/to/fonts/ \
    --output-dir /path/to/output/

With All Options

python -m engine.main \
    --dataset-dir /path/to/datasets/ \
    --wordlist /path/to/words.txt \
    --fonts-dir /path/to/fonts/ \
    --output-dir /path/to/output/ \
    --per-sample 5 \
    --real-ratio 0.5 \
    --seed 42 \
    --preview 10 \
    --batch-size 8 \
    --config config.yaml

Using Config File

python -m engine.main --config my_config.yaml

Input Structure

/dataset-dir/
    /dataset_A/
        /images/
            img001.jpg
            img002.jpg
        /labels/
            img001.txt    # YOLOv8-Seg format
            img002.txt
    /dataset_B/
        /images/
        /labels/
    ...

/wordlist.txt           # One word per line
/fonts/
    font1.ttf
    font2.otf

YOLOv8-Seg Label Format

Each line in a label file:

<class_id> x1 y1 x2 y2 x3 y3 ... xn yn

Example:

0 0.155 0.238 0.201 0.231 0.217 0.289 0.159 0.303
0 0.450 0.120 0.520 0.115 0.525 0.180 0.455 0.185

Output Structure

/output/
    /assets/
        /real_text/           # Extracted word assets
        /synth_text/          # Generated synthetic text
        /backgrounds/         # LaMa-cleaned backgrounds
    /preview/                 # Preview samples (if --preview > 0)
    /train/
        /images/
        /labels/
    /val/
        /images/
        /labels/
    dataset.yaml              # YOLO training config
    generation_report.json    # Statistics

Configuration

CLI Arguments

Argument	Default	Description
`--dataset-dir`	Required	Root directory with dataset subfolders
`--wordlist`	Required	Text file with words (one per line)
`--fonts-dir`	Required	Directory with TTF/OTF fonts
`--output-dir`	Required	Output directory
`--per-sample`	5	Alternatives per input image
`--real-ratio`	0.5	Real vs synthetic asset ratio
`--seed`	None	Random seed for reproducibility
`--preview`	0	Preview samples to generate
`--batch-size`	8	Batch size for model inference
`--device`	cuda	Device (cuda/cpu)
`--config`	None	YAML config file

YAML Configuration

input:
  dataset_dir: /path/to/datasets
  wordlist: /path/to/words.txt
  fonts_dir: /path/to/fonts

output:
  output_dir: /path/to/output
  resolution: [1024, 1024]

generation:
  per_sample: 5
  real_ratio: 0.5
  seed: 42

models:
  lama_checkpoint: big-lama
  rmbg_model: briaai/RMBG-2.0
  batch_size: 8
  device: cuda

text:
  scale_min: 0.03
  scale_max: 0.25

warp:
  types:
    - perspective
    - curve
    - arc
    - sine_wave
    - circular
    - spiral
    - freeform_polygon
  intensity: moderate

blending:
  mode: intermediate
  edge_blur: 1.5
  shadow_opacity: 0.4

augmentation:
  probability: 0.5

preview:
  count: 10

split:
  val_ratio: 0.2

Pipeline Overview

Ingestion: Scan and validate input datasets
Asset Extraction: Extract word crops with RMBG alpha matting
Background Cleaning: Remove text using LaMa inpainting
Preview Generation: Optional preview samples for inspection
Dataset Generation:
- Compose text onto backgrounds
- Apply warps (perspective, curve, arc, sine wave, etc.)
- Blend with color matching and shadows
- Apply augmentations
Splitting: Create train/val sets

Train/Val Split Logic

Train: 100% generated alternatives + 80% original images
Val: 20% original images (held out)

Training with Generated Dataset

yolo segment train \
    model=yolov8m-seg.yaml \
    data=/path/to/output/dataset.yaml \
    epochs=100 \
    imgsz=1024

Warp Types

Type	Description
perspective	3D perspective transformation
curve	Bezier curve bending
arc	Circular arc deformation
sine_wave	Sinusoidal wave distortion
circular	Cylindrical wrap
spiral	Twist/spiral effect
freeform_polygon	Random grid displacement

Augmentations

Photometric: Brightness, contrast, exposure, color temperature, JPEG artifacts

Motion/Sensor: Camera shake, Gaussian noise, lens distortion

Scene: Fog, rain streaks

Document: Paper stains, folds, crumple texture

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataprep		dataprep
engine		engine
README.md		README.md
SPEC.md		SPEC.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Detection Dataset Generator

Features

Installation

Requirements

Usage

Basic Usage

With All Options

Using Config File

Input Structure

YOLOv8-Seg Label Format

Output Structure

Configuration

CLI Arguments

YAML Configuration

Pipeline Overview

Train/Val Split Logic

Training with Generated Dataset

Warp Types

Augmentations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Detection Dataset Generator

Features

Installation

Requirements

Usage

Basic Usage

With All Options

Using Config File

Input Structure

YOLOv8-Seg Label Format

Output Structure

Configuration

CLI Arguments

YAML Configuration

Pipeline Overview

Train/Val Split Logic

Training with Generated Dataset

Warp Types

Augmentations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages