Currently, only DOC/DOCX are handled for text extraction when OCR is requested, but extending this functionality to both XLS/XLSX and PPT/PPTX should be reasonable.
NOTE: it's not clear if DOC/XLS/PPT actually are as simple, or that DOC is working, as they may not have the same open format that can be extracted.
Ideally, any OCR request, no matter if the engine is Tesseract or Textract, should extract text from these documents.