Skip to content

feat: Implementation of HTML backend with headless browser#2969

Draft
maxmnemonic wants to merge 19 commits intomainfrom
dev/html_backend_rendered
Draft

feat: Implementation of HTML backend with headless browser#2969
maxmnemonic wants to merge 19 commits intomainfrom
dev/html_backend_rendered

Conversation

@maxmnemonic
Copy link
Member

Implementation of HTML backend that uses headless browser (via playwright) to materialize HTML pages into images, and add provenances with bboxes to all elements in the converted docling document

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

…ight) to materialize HTML pages into images, and add provenances with bboxes to all elements in the converted docling document

Signed-off-by: Maksym Lysak <[email protected]>
@maxmnemonic maxmnemonic self-assigned this Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

DCO Check Failed

Hi @maxmnemonic, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for Maksym Lysak <[email protected]>

I, Maksym Lysak <[email protected]>, hereby add my Signed-off-by to this commit: a59cc62c2628bdb6723c2b170ca7d4f83158b4cf"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@maxmnemonic maxmnemonic added the html issue related to html backend label Feb 9, 2026
@mergify
Copy link

mergify bot commented Feb 9, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Maksym Lysak added 7 commits February 9, 2026 17:57
…er values, and restricting key-value only for the ones that satisfy scope if there are such.

Signed-off-by: Maksym Lysak <[email protected]>
…e and render scale compute, and an example on how to run html_backend with rendering

Signed-off-by: Maksym Lysak <[email protected]>
Maksym Lysak added 11 commits February 23, 2026 17:00
… Example that uses multi-processing for conversion;

Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…when inside key-value pair

Signed-off-by: Maksym Lysak <[email protected]>
…ding order inside the field_item

Signed-off-by: Maksym Lysak <[email protected]>
…compute checkbox bboxes, improved handling of single-character inline groups

Signed-off-by Maksym Lysak <[email protected]>
… overflowing viewport, removal of empty inline groups and elements with negative bounding boxes

Signed-off-by: Maksym Lysak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

html issue related to html backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants