Skip to content

Conversation

@Ashhhh010101
Copy link

This PR introduces a new feature for the PdfConverter:

  • Extracts tables from PDFs using pdfplumber.
  • Converts tables into properly aligned Markdown format.
  • Preserves all rows in the tables exactly as they appear.
  • Normalizes whitespace in text blocks.
  • Falls back to pdfminer if pdfplumber fails.

This improves Markdown output from PDFs, making tables clean, readable, and aligned without altering the original table content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant