How to Extract Scanned Arabic Text?

### Question
I am trying to convert a test document to Markdown. The doc has both English and Arabic (live text and clean text as image for each). Text extraction works in the standalone RapidOCR library (using params:
```
       "Rec.engine_type": EngineType.ONNXRUNTIME, 
        "Rec.lang_type": LangRec.ARABIC, # ARABIC uses python-bidi
        "Rec.model_type": ModelType.MOBILE,
        "Rec.ocr_version": OCRVersion.PPOCRV5,
```
with `python-bidi` installed. 
I tried from Docling (passed the .onnx path in RapidOcrOptions and patched  the model -> PatchedRapidOcrModel(RapidOcrModel)) but no success. 
Does Docling currently support languages other than English and Chinese? 
If yes, how do I achieve that?
Here's a sample file for testing: 
[docling_test_ak.pdf](https://github.com/user-attachments/files/25468728/docling_test_ak.pdf)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Extract Scanned Arabic Text? #3021

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to Extract Scanned Arabic Text? #3021

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions