environment
llmware v0.3.8
macos 15
active db: sqlite
vector db: chromadb
for illustration of issue using example file: slicing_and_dicing_office_docs.py and the Microsoft Investor Relations data - However, issue was discovered initially on our private data - which is very OCR heavy.
issue:
run lib.add_files() 
and ingest documents that the C parser will extract images pending downstream OCR with lib.run_ocr_on_images(add_to_library=True)
next, perform the ocr with llmware's "convenience" method on the images extracted to the image directory,
lib.run_ocr_on_images(add_to_library=True, other_params)
The result will be a new collection written to the db each entry per image referencing originating doc by 'doc_ID' (and so forth), with block_IDs starting at 100,000 and incrementing, and where the text chunks extracted by tesseract OCR populate only 'text_search'
perform a new embedding with llmware's
lib.install_new_embedding(params)
chunks/sentences for embedding are  retrieved and collated into batches from 'text_search'
so far so good
at Query time -
Query.query(query="a query highly pertaining to the corpus", query_type="semantic", other_params)
would return results where 'text' is empty!  - a little digging reveals that while the query text is indeed being compared to embedded chunks that are bonafide --  returned results for 'text' are retrieved from 'text_block' which remain empty after OCR.
the following images show this clearly...


