Skip to content

Table extraction fails when last column is misaligned or multi-line using vertical_strategy="explicit" #1335

Description

@03musab

Describe the bug

When using extract_table() with vertical_strategy="explicit" and horizontal_strategy="text", the last column of a table is often missed or misinterpreted—especially when the column is narrow or contains multi-line text. This leads to incomplete or split rows in the output.

Have you tried repairing the PDF?

Yes, tested with repair=True. The issue persists.

Code to reproduce the problem

import pdfplumber

with pdfplumber.open("sample.pdf", repair=True) as pdf:
    page = pdf.pages[0]
    table = page.extract_table({
        "vertical_strategy": "explicit",
        "horizontal_strategy": "text",
        "intersection_tolerance": 5,
        "snap_tolerance": 3,
        "join_tolerance": 5
    })
    for row in table:
        print(row)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions