Skip to content

Qengineering/PaddleOCR-Lite-Document

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ncnn - PaddleOCR document scanner

output image
Image source: Dutch government.

License

Paper: E-book: Dive Into OCR (PDF)

PaddleOCR is a text engine. It can detect text in a scene, determine the orientation, and recognise the characters (Chinese or Latin).
This application uses an ncnn version of PaddleOCR.

In the early days, the PaddleOCR version had to match its corresponding Paddle-Lite version to run properly on a Raspberry Pi. However, since Paddle-Lite has been largely abandoned on GitHub, while the main PaddleOCR project has continued advancing with more powerful models (such as PP-OCRv4 and v5), that compatibility is no longer practical.
For this reason, we use the ncnn port of the latest PaddleOCR version (v5).
It eliminates the need for installing PaddlePaddle or Paddle-Lite, making setup much simpler.
Once everything is installed, it runs entirely on a Raspberry Pi — no cloud services or expensive licenses required.

Inference time (RPi 4 @ 1925 MHz - 64 bits Bullseye OS):
Detect text: 366 mSec.
Recognize text: 134 mSec per line.

Inference time (RPi 5 @ 2450 MHz - 64 bits Bookworm OS):
Detect text: 183 mSec.
Recognize text: 67 mSec per line.

Special made for a bare Raspberry Pi 4, see Q-engineering deep learning examples


Dependencies.

To run the application, you have to:

  • A Raspberry Pi 4 with a 32 or 64-bit operating system. It can be the Raspberry 64-bit OS, or Ubuntu 18.04 / 20.04. Install 64-bit OS
  • ncnn. Install ncnn
  • OpenCV 64-bit installed. Install OpenCV 4.5
  • Optional Code::Blocks installed. ($ sudo apt-get install codeblocks)

Installing the app.

Before you can run the application, you need to install quite a lot of software. Let's start.

OpenCV

The first package you need is OpenCV. On our website, there are several guides on how to install OpenCV on your Raspberry Pi.
It shouldn't be a problem if you follow the instructions carefully.

ncnn

The next package you need is the ncnn framework. On our website, there is a guide on how to install ncnn.
Again, an easy installation.

Code::Blocks

For compilation, you can choose between CMake or Code::Blocks. If you want to write code, Code::Blocks can be more comfortable to work with due to its IDE.

$ sudo apt-get install codeblocks

Installing the app.

With all the tools in place, it is time to build the application.

$ mkdir MyDir
$ cd MyDir
$ git clone https://github.com/Qengineering/PaddleOCR-Lite-Document.git

Your MyDir folder must now look like this:

.
|-- 11.jpg
|-- CMakeLists.txt
|-- include
|   |-- clipper.h
|   |-- cls_process.h
|   |-- crnn_process.h
|   `-- db_post_process.h
|-- LICENSE
|-- models
|   |-- cls-sim-op.bin
|   |-- cls-sim-op.param
|   |-- config.txt
|   |-- PP_OCRv5_mobile_det.bin
|   |-- PP_OCRv5_mobile_det.param
|   |-- PP_OCRv5_mobile_rec.bin
|   |-- PP_OCRv5_mobile_rec.param
|   |-- PP_OCRv5_server_det.bin
|   |-- PP_OCRv5_server_det.param
|   |-- PP_OCRv5_server_rec.bin
|   |-- PP_OCRv5_server_rec.param
|   `-- PP_OCRv5_vocab.txt
|-- PaddleOCR-Lite-Doc.cbp
|-- PaddleOCR-Lite-Doc.depend
|-- PaddleOCR-Lite-Doc.layout
|-- README.md
|-- src
|   |-- clipper.cpp
|   |-- cls_process.cc
|   |-- crnn_process.cc
|   |-- db_post_process.cc
|   `-- main.cpp
|-- WillekePass.jpg
`-- word_1.jpg

Load the PaddleOCR-Lite-Doc.cbp project file in Code::Blocks and build the app.


CMake.

Instead of Code::Blocks, you can also use CMake to build the application.
Please follow the next instructions. Assuming you're in the 'main' directory.

$ mkdir build
$ cd build
$ cmake ..
$ make  -j4
$ cd ..

Running the app.

Once successfully built, you will find the executable ocr.

The parameter list:

./ocr Mode Detection model file Orientation classifier model file Recognition model file  Hardware  Precision Threads Batchsize  Test image path Dictionary  

To detect the locations of text in an image:

./ocr det ./models/PP_OCRv5_mobile_det arm8 FP32 4 1 ./11.jpg ./models/config.txt

To ocr a single word:

./ocr rec ./models/PP_OCRv5_mobile_rec arm8 FP32 4 1 ./word_1.jpg ./models/PP_OCRv5_vocab.txt ./models/config.txt

To process a document word:

./ocr system ./models/PP_OCRv5_mobile_det ./models/PP_OCRv5_mobile_rec ./models/cls-sim-op arm8 FP32 4 1 ./WillekePass.jpg ./models/config.txt ./models/PP_OCRv5_vocab.txt

Notes.

  1. config.txt of the detector and classifier, as shown below:
max_side_len  960          #  Limit the maximum image height and width to 960
det_db_thresh  0.3         # Used to filter the binarised image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_box_thresh  0.5     # DDB post-processing filter box threshold, if there is a missing box is detected, it can be reduced as appropriate
det_db_unclip_ratio  1.6   # Indicates the compactness of the text box; the smaller the value, the closer the text box to the text
use_direction_classify  0  # Whether to use the direction classifier, 0 means not to use, 1 means to use
rec_image_height  48       # The height of the input image of the recognition model: **must be 48**.