Enable XSLX and PPTX support for OCR (when either Tesseract or Textract are chosen as engine)

Currently, only DOC/DOCX are handled for text extraction when OCR is requested, but extending this functionality to both XLS/XLSX and PPT/PPTX should be reasonable.

NOTE: it's not clear if DOC/XLS/PPT actually are as simple, or that DOC is working, as they may not have the same open format that can be extracted.

Ideally, any OCR request, no matter if the engine is Tesseract or Textract, should extract text from these documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable XSLX and PPTX support for OCR (when either Tesseract or Textract are chosen as engine) #409

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable XSLX and PPTX support for OCR (when either Tesseract or Textract are chosen as engine) #409

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions