A receipt scanner and reader which makes use of tesseract-ocr and imagemagick. It executes five basic functionalities (hence the program’s name):
- scan receipt image (edge detection and warp transformation with opencv)
- preprocess scan (clean, sharpen, and contrast)
- run OCR (tesseract for optical character recognition)
- analyze OCR output (with fuzzy finder and preconfigured dictionary)
- summarize analysis in a csv file
To prepare for the scanning of the receipts, create a directory called
imgs/ in the repository, and place pictures of the receipts in it;
e.g. in Terminal (cd into the repository first) type something of the sort:
mkdir -p imgs/
cp ~/Downloads/*.JPG imgs/This program uses
To run pentaplex, type (of course cd into repository first):
./pentaplex [optional: auto]For code documentation visit: https://phdenzel.github.io/pentaplex/