Skip to content

ValueError when parsing arabic numbering #7

@krrome

Description

@krrome

Input data

headings = [{'text': 'APPENDIX 1 TO ANNEX I  ', 'font_size': np.float64(10.331999999999994), 'is_bold': False, 'is_italic': False, 'top_left': 111.356, 'text_direction:': <TextDirection.LEFT_TO_RIGHT: 'left_to_right'>, 'font': '/Times New Roman', 'reference': '#/texts/0'}, {'text': 'PRODUCT-SPECIFIC RULES ', 'font_size': np.float64(10.331999999999994), 'is_bold': False, 'is_italic': False, 'top_left': 138.9559999999999, 'text_direction:': <TextDirection.LEFT_TO_RIGHT: 'left_to_right'>, 'font': '/Times New Roman', 'reference': '#/texts/1'}, {'text': 'Interpretative Notes ', 'font_size': np.float64(10.093000000000075), 'is_bold': False, 'is_italic': False, 'top_left': 180.54899999999998, 'text_direction:': <TextDirection.LEFT_TO_RIGHT: 'left_to_right'>, 'font': '/Times New Roman,Bold', 'reference': '#/texts/2'}, {'text': '25-97 EFTA-Central America ', 'font_size': np.float64(10.092999999999961), 'is_bold': False, 'is_italic': False, 'top_left': 83.92899999999997, 'text_direction:': <TextDirection.LEFT_TO_RIGHT: 'left_to_right'>, 'font': '/Times New Roman,Bold', 'reference': '#/texts/33'}]

builder = DocumentHierarchyBuilder(headings)
return builder.infer()

raises: invalid literal for int() with base 10: '25-97'

Either fix the regex for numerical header detection or handle int-conversion Exceptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions