-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Bug
All Egret layout models (DOCLING_LAYOUT_EGRET_MEDIUM, DOCLING_LAYOUT_EGRET_LARGE, DOCLING_LAYOUT_EGRET_XLARGE) fail with a RuntimeError when initializing the layout pipeline. The Heron models work fine.
Root Cause
The Egret model configs on HuggingFace use hyphenated label names in id2label:
{
"3": "List-item",
"4": "Page-footer",
"5": "Page-header",
"7": "Section-header",
"11": "Document Index",
"13": "Checkbox-Selected",
"14": "Checkbox-Unselected",
"16": "Key-Value Region"
}_build_label_map() in layout_object_detection_model.py normalizes with .upper() only, producing LIST-ITEM, but the DocItemLabel enum expects LIST_ITEM (underscore):
label_enum_name = label_name.upper() # "List-item" -> "LIST-ITEM"
label_map[label_id] = DocItemLabel[label_enum_name] # KeyError: 'LIST-ITEM'The Heron models use underscored labels (list_item, page_footer, etc.) which .upper() maps correctly to the enum.
Steps to Reproduce
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import LayoutOptions
from docling.datamodel.layout_model_specs import DOCLING_LAYOUT_EGRET_LARGE
from docling.document_converter import DocumentConverter, PdfFormatOption
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options={"layout_options": LayoutOptions(model_spec=DOCLING_LAYOUT_EGRET_LARGE)}
),
}
)
# Convert any PDF
result = converter.convert("any_document.pdf")Error
File "docling/models/stages/layout/layout_object_detection_model.py", line 67, in _build_label_map
label_map[label_id] = DocItemLabel[label_enum_name]
KeyError: 'LIST-ITEM'
RuntimeError: Label 'List-item' (ID 3) from model config does not match any DocItemLabel enum value.
Suggested Fix
Normalize hyphens and spaces to underscores in _build_label_map:
label_enum_name = label_name.upper().replace("-", "_").replace(" ", "_")Affected Models
Verified all three Egret model configs on HuggingFace use hyphenated labels:
docling-project/docling-layout-egret-mediumdocling-project/docling-layout-egret-largedocling-project/docling-layout-egret-xlarge
Heron models (docling-layout-heron, docling-layout-heron-101) use underscored labels and work correctly.
Docling version
docling 2.75.0
Python version
Python 3.14.3