Skip to content

Comments

PI: Optimize loop for layout mode text extraction#3543

Merged
stefan6419846 merged 9 commits intopy-pdf:mainfrom
FelipeErmeson:add-maketrans-extract-text-mode-layout
Dec 3, 2025
Merged

PI: Optimize loop for layout mode text extraction#3543
stefan6419846 merged 9 commits intopy-pdf:mainfrom
FelipeErmeson:add-maketrans-extract-text-mode-layout

Conversation

@FelipeErmeson
Copy link
Contributor

This PR aims to remove the multiple calls to the 'ord' function and eliminate the loop in the string, inserting an optimized table in C. This will improve processing time.

…able in C. It doesn't perform loops and is ideal for very large strings.
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.15%. Comparing base (c90fb72) to head (9a4f4de).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3543   +/-   ##
=======================================
  Coverage   97.15%   97.15%           
=======================================
  Files          56       56           
  Lines        9772     9783   +11     
  Branches     1783     1784    +1     
=======================================
+ Hits         9494     9505   +11     
  Misses        167      167           
  Partials      111      111           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Could you please add a corresponding timeout-based based test as well?

@FelipeErmeson
Copy link
Contributor Author

The test is now downloading the file as requested. With the new optimizations, the method that previously took 6-10 minutes locally is now averaging 10 seconds!

Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stefan6419846 stefan6419846 changed the title PI: Loop optimization on a huge string PI: Optimize loop for layout mode text extraction Dec 3, 2025
@stefan6419846 stefan6419846 merged commit 66f97a3 into py-pdf:main Dec 3, 2025
17 checks passed
stefan6419846 added a commit that referenced this pull request Dec 7, 2025
## What's new

### Performance Improvements (PI)
- Optimize loop for layout mode text extraction (#3543) by @FelipeErmeson

### Bug Fixes (BUG)
- Do not fail on choice field without /Opt key (#3540) by @jhuber-de

### Documentation (DOC)
- Document possible issues with merge_page and clipping (#3546) by @stefan6419846
- Add some notes about library security (#3545) by @stefan6419846

### Maintenance (MAINT)
- Use CORE_FONT_METRICS for widths where possible (#3526) by @PJBrs

[Full Changelog](6.4.0...6.4.1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants