GPOST

General Purpose On-Screen Translator

This is a tool that allows you to automatically detect, translate and overlay translation right on top of the original text bodies in a matter of seconds, as well as read them with TTS.
ATTENTION: Requires Gemini API key. (Currently text handling is very basic, and often having hard tome conforming to polygon areas)

Installation

Navigate to folder you want to install GPOST to
Open CMD or Powershell
Use git clone https://github.com/Anzhc/GPOST
Run setup.bat
Launch GPOST with run_gpost.bat
It will ask you for Gemini API key. Currently i support only Gemini, so you'll have to get it. You can ignore it and launch program to see how it works, but you would not be able to receive translations.

That should take care of everything.
GPOST automatically checks for new base YOLO models from my huggingface repo.

How to use

I would recommend binding 3 shortcuts to either your mouse, or hotkey, this will significantly enhance your experience.
You need just 3 buttons: Select Sub-Area, Run Clean - Inference - Translate and Clear Overlays
Running inference queues YOLO for detection. It will try to detect text classes it was trained on in selected area.
Translate will send it to Gemini. Once we receive response - it will be overlayed on top of original text. If we do not receive it, or there is an error - you will see it in Translation Output window.

There are multiple various functions that allow you to tweak performance of program, but those 3 buttons are all you need to start.

How it works

I utilize YOLO models for detection and segmentation, which then crop areas that require translation and send those chunks to Gemini. Then we read json that Gemini returns(if any), and overlay it on top of original text.
I have added UI section that allows user to filter classes they want to translate, and which should be skipped. Those classes are populated straight from models loaded.
If any of the TTS are selected, translated text will be sent for voiceover. Once we receive it, it is played and then saved to TTS folder, for future listening, if needed(But i think i forgot to add saving to 11Labs TTS).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
model_helper.py		model_helper.py
overlays.py		overlays.py
requirements.txt		requirements.txt
run_gpost.bat		run_gpost.bat
setup.bat		setup.bat
translate.py		translate.py
translation_overlay.py		translation_overlay.py
translation_threads.py		translation_threads.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPOST

General Purpose On-Screen Translator

Installation

How to use

How it works

About

Uh oh!

Releases

Packages

Languages

License

Anzhc/GPOST

Folders and files

Latest commit

History

Repository files navigation

GPOST

General Purpose On-Screen Translator

Installation

How to use

How it works

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages