-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug
When processing a .docx file with the simple pipeline, the doctags output correctly preserves the hierarchical structure of headings (section_header_level_1 through section_header_level_4).
However, when the same document is converted to .pdf and processed using pdfpipeline(), the doctags output only contains section_header_level_1. All nested heading levels (2–4) are flattened or lost.
Does this mean hybrid chunking for docling documents is not working here ?(Using only section_header_level_(1) for chunking?)
...
Docling version
Docling version: 2.32.0
...
Python version
Python 3.10.12
...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working