You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am testing docling with PDF documents containing footnotes. I observe that the latter are identified, but they aren't linked to the text from where the reference is made (See example below). I note the PDF files contain the text, not scanned images, and the footnote references in the body are in superscript.
It would be very useful if:
the footnote text was linked (parent-child) with its "parent" text.
we had a CLI option --merge_footnotes, to automatically embed the footnotes into the main text, as separate sentenses (e.g. in brackets, in place of the original footnote reference), for RAG applications.
With kind regards
Nikos
{
"self_ref": "#/texts/1",
"parent": {
"$ref": "#/body"
},
"children": [],
"content_layer": "body",
"label": "text",
"prov": [
{
"page_no": 1,
"bbox": {
"l": 72.025,
"t": 744.1780000000001,
"r": 508.35599999999994,
"b": 703.653,
"coord_origin": "BOTTOMLEFT"
},
"charspan": [
0,
223
]
}
],
"orig": "5G is reaching a ... areas outside of premises. 1",
"text": "5G is reaching a ... areas outside of premises. 1"
},
{
"self_ref": "#/texts/5",
"parent": {
"$ref": "#/body"
},
"children": [],
"content_layer": "body",
"label": "footnote",
"prov": [
{
"page_no": 1,
"bbox": {
"l": 72.047,
"t": 80.79499999999996,
"r": 496.556,
"b": 69.98800000000006,
"coord_origin": "BOTTOMLEFT"
},
"charspan": [
0,
105
]
}
],
"orig": "1 Please refer to ... are defined",
"text": "1 Please refer to ... are defined"
},
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am testing docling with PDF documents containing footnotes. I observe that the latter are identified, but they aren't linked to the text from where the reference is made (See example below). I note the PDF files contain the text, not scanned images, and the footnote references in the body are in superscript.
It would be very useful if:
With kind regards
Nikos
Beta Was this translation helpful? Give feedback.
All reactions