Describe the bug
When using extract_table() with vertical_strategy="explicit" and horizontal_strategy="text", the last column of a table is often missed or misinterpreted—especially when the column is narrow or contains multi-line text. This leads to incomplete or split rows in the output.
Have you tried repairing the PDF?
Yes, tested with repair=True. The issue persists.
Code to reproduce the problem
import pdfplumber
with pdfplumber.open("sample.pdf", repair=True) as pdf:
page = pdf.pages[0]
table = page.extract_table({
"vertical_strategy": "explicit",
"horizontal_strategy": "text",
"intersection_tolerance": 5,
"snap_tolerance": 3,
"join_tolerance": 5
})
for row in table:
print(row)
Describe the bug
When using
extract_table()withvertical_strategy="explicit"andhorizontal_strategy="text", the last column of a table is often missed or misinterpreted—especially when the column is narrow or contains multi-line text. This leads to incomplete or split rows in the output.Have you tried repairing the PDF?
Yes, tested with
repair=True. The issue persists.Code to reproduce the problem