transformer-ocr

ocr with vision transformer

Overview

Simple and understandable vision transformer sytle ocr project.
The model in this repository heavily relied on high-level open source projects like timm and x_transformers.
And also you can find that the procedure of training is intuitive thanks to legibility of pytorch-lightning.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 95.6%.

Data

./data/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in specific directory. Ground truth information would be provided with txt file. In the file, write \t seperated image file name and label in same line.

Configuration

In settings/ you can find default.yaml. You can set almost every hyper-parameters in that file. Copy and change it with your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --setting settings/default.yaml --version 0 --max_epochs 100 --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

It's not really hard to add prediction function to the pytorch-lightning module with fully-trained model. I will leave it empty for now. But I would glady do it if there's any request.

enjoy the code.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
settings		settings
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
models.py		models.py
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

About

Uh oh!

Languages

YongWookHa/transformer-ocr

Folders and files

Latest commit

History

Repository files navigation

transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages