PI: Optimize loop for layout mode text extraction#3543
Merged
stefan6419846 merged 9 commits intopy-pdf:mainfrom Dec 3, 2025
Merged
PI: Optimize loop for layout mode text extraction#3543stefan6419846 merged 9 commits intopy-pdf:mainfrom
stefan6419846 merged 9 commits intopy-pdf:mainfrom
Conversation
…able in C. It doesn't perform loops and is ideal for very large strings.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3543 +/- ##
=======================================
Coverage 97.15% 97.15%
=======================================
Files 56 56
Lines 9772 9783 +11
Branches 1783 1784 +1
=======================================
+ Hits 9494 9505 +11
Misses 167 167
Partials 111 111 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
stefan6419846
requested changes
Dec 2, 2025
Collaborator
stefan6419846
left a comment
There was a problem hiding this comment.
Thanks for the PR. Could you please add a corresponding timeout-based based test as well?
added 2 commits
December 2, 2025 16:43
Contributor
Author
|
The test is now downloading the file as requested. With the new optimizations, the method that previously took 6-10 minutes locally is now averaging 10 seconds! |
stefan6419846
added a commit
that referenced
this pull request
Dec 7, 2025
## What's new ### Performance Improvements (PI) - Optimize loop for layout mode text extraction (#3543) by @FelipeErmeson ### Bug Fixes (BUG) - Do not fail on choice field without /Opt key (#3540) by @jhuber-de ### Documentation (DOC) - Document possible issues with merge_page and clipping (#3546) by @stefan6419846 - Add some notes about library security (#3545) by @stefan6419846 ### Maintenance (MAINT) - Use CORE_FONT_METRICS for widths where possible (#3526) by @PJBrs [Full Changelog](6.4.0...6.4.1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR aims to remove the multiple calls to the 'ord' function and eliminate the loop in the string, inserting an optimized table in C. This will improve processing time.