-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Question
I am trying to convert a test document to Markdown. The doc has both English and Arabic (live text and clean text as image for each). Text extraction works in the standalone RapidOCR library (using params:
"Rec.engine_type": EngineType.ONNXRUNTIME,
"Rec.lang_type": LangRec.ARABIC, # ARABIC uses python-bidi
"Rec.model_type": ModelType.MOBILE,
"Rec.ocr_version": OCRVersion.PPOCRV5,
with python-bidi installed.
I tried from Docling (passed the .onnx path in RapidOcrOptions and patched the model -> PatchedRapidOcrModel(RapidOcrModel)) but no success.
Does Docling currently support languages other than English and Chinese?
If yes, how do I achieve that?
Here's a sample file for testing:
docling_test_ak.pdf
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested