Skip to content

feat: added the XML export#16

Merged
PeterStaar-IBM merged 10 commits intomainfrom
dev/add-export-to-xml-and-html
Sep 9, 2024
Merged

feat: added the XML export#16
PeterStaar-IBM merged 10 commits intomainfrom
dev/add-export-to-xml-and-html

Conversation

@PeterStaar-IBM
Copy link
Member

@PeterStaar-IBM PeterStaar-IBM commented Sep 9, 2024

  • Adds the possibility to export documents as document tokens
  • Choose which object types are included in the exports (tokens and markdown)
  • Specialize a new type for Figure objects

Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
@PeterStaar-IBM PeterStaar-IBM marked this pull request as ready for review September 9, 2024 12:15
"paragraph",
"caption",
}:
if isinstance(item, BaseText) and item_type in {}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was intended this way:

Suggested change
if isinstance(item, BaseText) and item_type in {}:
if isinstance(item, BaseText) and item_type in main_text_labels:

@PeterStaar-IBM PeterStaar-IBM changed the title added the XML export feat: added the XML export Sep 9, 2024
@PeterStaar-IBM PeterStaar-IBM merged commit acdf816 into main Sep 9, 2024
@PeterStaar-IBM PeterStaar-IBM deleted the dev/add-export-to-xml-and-html branch September 9, 2024 15:42
muhark added a commit to muhark/docling-core that referenced this pull request Mar 19, 2025
* added the XML export

Signed-off-by: Peter Staar <[email protected]>

* reformatted all

Signed-off-by: Peter Staar <[email protected]>

* fixed tests

Signed-off-by: Peter Staar <[email protected]>

* added the DocumentTokens class

Signed-off-by: Peter Staar <[email protected]>

* updating the to-xml method

Signed-off-by: Peter Staar <[email protected]>

* updating the to-xml method

Signed-off-by: Peter Staar <[email protected]>

* fixed the to-md method

Signed-off-by: Peter Staar <[email protected]>

* added the strict-text in the to-md method

Signed-off-by: Peter Staar <[email protected]>

* added page-tokens

Signed-off-by: Peter Staar <[email protected]>

* updated the location/page tokens

Signed-off-by: Peter Staar <[email protected]>

---------

Signed-off-by: Peter Staar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants