Skip to content

Egret layout models fail with RuntimeError: label hyphen/underscore mismatch in _build_label_map #3053

@Steve-Allison

Description

@Steve-Allison

Bug

All Egret layout models (DOCLING_LAYOUT_EGRET_MEDIUM, DOCLING_LAYOUT_EGRET_LARGE, DOCLING_LAYOUT_EGRET_XLARGE) fail with a RuntimeError when initializing the layout pipeline. The Heron models work fine.

Root Cause

The Egret model configs on HuggingFace use hyphenated label names in id2label:

{
  "3": "List-item",
  "4": "Page-footer",
  "5": "Page-header",
  "7": "Section-header",
  "11": "Document Index",
  "13": "Checkbox-Selected",
  "14": "Checkbox-Unselected",
  "16": "Key-Value Region"
}

_build_label_map() in layout_object_detection_model.py normalizes with .upper() only, producing LIST-ITEM, but the DocItemLabel enum expects LIST_ITEM (underscore):

label_enum_name = label_name.upper()  # "List-item" -> "LIST-ITEM"
label_map[label_id] = DocItemLabel[label_enum_name]  # KeyError: 'LIST-ITEM'

The Heron models use underscored labels (list_item, page_footer, etc.) which .upper() maps correctly to the enum.

Steps to Reproduce

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import LayoutOptions
from docling.datamodel.layout_model_specs import DOCLING_LAYOUT_EGRET_LARGE
from docling.document_converter import DocumentConverter, PdfFormatOption

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options={"layout_options": LayoutOptions(model_spec=DOCLING_LAYOUT_EGRET_LARGE)}
        ),
    }
)
# Convert any PDF
result = converter.convert("any_document.pdf")

Error

File "docling/models/stages/layout/layout_object_detection_model.py", line 67, in _build_label_map
    label_map[label_id] = DocItemLabel[label_enum_name]
KeyError: 'LIST-ITEM'

RuntimeError: Label 'List-item' (ID 3) from model config does not match any DocItemLabel enum value.

Suggested Fix

Normalize hyphens and spaces to underscores in _build_label_map:

label_enum_name = label_name.upper().replace("-", "_").replace(" ", "_")

Affected Models

Verified all three Egret model configs on HuggingFace use hyphenated labels:

  • docling-project/docling-layout-egret-medium
  • docling-project/docling-layout-egret-large
  • docling-project/docling-layout-egret-xlarge

Heron models (docling-layout-heron, docling-layout-heron-101) use underscored labels and work correctly.

Docling version

docling 2.75.0

Python version

Python 3.14.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions