This is a tool that allows you to automatically detect, translate and overlay translation right on top of the original text bodies in a matter of seconds, as well as read them with TTS.
ATTENTION: Requires Gemini API key.
(Currently text handling is very basic, and often having hard tome conforming to polygon areas)
- Navigate to folder you want to install GPOST to
- Open CMD or Powershell
- Use
git clone https://github.com/Anzhc/GPOST - Run
setup.bat - Launch GPOST with
run_gpost.bat - It will ask you for Gemini API key. Currently i support only Gemini, so you'll have to get it. You can ignore it and launch program to see how it works, but you would not be able to receive translations.
That should take care of everything.
GPOST automatically checks for new base YOLO models from my huggingface repo.
I would recommend binding 3 shortcuts to either your mouse, or hotkey, this will significantly enhance your experience.
You need just 3 buttons: Select Sub-Area, Run Clean - Inference - Translate and Clear Overlays
Running inference queues YOLO for detection. It will try to detect text classes it was trained on in selected area.
Translate will send it to Gemini. Once we receive response - it will be overlayed on top of original text. If we do not receive it, or there is an error - you will see it in Translation Output window.
There are multiple various functions that allow you to tweak performance of program, but those 3 buttons are all you need to start.

I utilize YOLO models for detection and segmentation, which then crop areas that require translation and send those chunks to Gemini. Then we read json that Gemini returns(if any), and overlay it on top of original text.
I have added UI section that allows user to filter classes they want to translate, and which should be skipped. Those classes are populated straight from models loaded.
If any of the TTS are selected, translated text will be sent for voiceover. Once we receive it, it is played and then saved to TTS folder, for future listening, if needed(But i think i forgot to add saving to 11Labs TTS).