Skip to content

feat: Support rich table cell content#285

Closed
cau-git wants to merge 5 commits intomainfrom
cau/rich-table-cells
Closed

feat: Support rich table cell content#285
cau-git wants to merge 5 commits intomainfrom
cau/rich-table-cells

Conversation

@cau-git
Copy link
Copy Markdown
Member

@cau-git cau-git commented May 6, 2025

Feature

This PR introduces changes to allow table cells to contain any rich content instead of a basic text string. To enable this, the TableCell model now inherits from NodeItem. In order to remain backward compatible, the TableCell.text field is still present (default = "") and the self_ref uses a relative JSON pointer ("0") to refer itself as long as the TableCell is constructed in isolation before belonging to a table in a doc (as in previous usage conventions). A new method TableCell.has_rich_content() reports if the table cell contains child nodes.

New methods in TableItem allow to modify a table's cells after adding the table to a DoclingDocument, which ensures proper reference handling of table cells:

  • TableItem.update_cell is used to insert or overwrite a cell in a table, which allows to use DoclingDocument.add_text and other APIs with the returned cell as parent argument.
  • TableItem.delete_cells allows to delete cells in a given row/col index range

Table cells with valid self_ref are created by the update_cell method and will look like this example: #/tables/0/data/table_cells/2

The serializers are updated to consider table cells with rich content to ensure the text field is not considered when children are present. Children are serialized like any other subtree in a document.

@cau-git cau-git requested a review from vagenas May 6, 2025 11:06
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 6, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2025

Codecov Report

❌ Patch coverage is 64.60177% with 40 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling_core/types/doc/document.py 63.30% 40 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Christoph Auer <[email protected]>
vagenas and others added 3 commits May 6, 2025 14:48
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git changed the title feat: Make TableCell a NodeItem to support rich table cell content feat: Support rich table cell content May 6, 2025
@cau-git cau-git marked this pull request as ready for review May 6, 2025 19:03
Comment thread test/test_rich_tables.py

doc = _construct_doc()

html_pred = doc.export_to_html()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to other tests (see here for example) we could simplify the test results removing the styling

Suggested change
html_pred = doc.export_to_html()
html_pred = doc.export_to_html(html_head="")

"""TableCell."""

bbox: Optional[BoundingBox] = None
row_span: int = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a variable here? I think this can be a method computing end_row_offset_idx-start_row_offset_idx?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do something about this but it is unrelated to the changes of this PR, and we cannot break backwards-compatibility.


bbox: Optional[BoundingBox] = None
row_span: int = 1
col_span: int = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a variable here? I think this can be a method computing end_col_offset_idx-start_col_offset_idx?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do something about this but it is unrelated to the changes of this PR, and we cannot break backwards-compatibility.

row_span: int = 1
col_span: int = 1
start_row_offset_idx: int
end_row_offset_idx: int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check somewhere that end_*>start_*?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can generally put a model validator for such things.

@PeterStaar-IBM
Copy link
Copy Markdown
Member

superseded by #368

@vagenas vagenas deleted the cau/rich-table-cells branch March 20, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants