CodeFormulaV2 Preset Missing torch_dtype Causes Flash Attention 2 Incompatibility 

# The Error 
When cuda_use_flash_attention2=True is set in AcceleratorOptions, CodeFormulaV2 loads successfully but every
inference batch fails with: 
FlashAttention only support fp16 and bf16 data type

The model loads in fp32 (the HuggingFace default when dtype=None), but the attention implementation is set to
flash_attention_2 which requires fp16/bf16 tensors. 

## Issue 1: Preset definition omits torch_dtype 
File: docling/datamodel/stage_model_specs.py, lines 1108-1129 
CODE_FORMULA_CODEFORMULAV2 = StageModelPreset(
preset_id="codeformulav2",
...
model_spec=VlmModelSpec(
...
engine_overrides={
VlmEngineType.TRANSFORMERS: EngineModelConfig(
extra_config={
"transformers_model_type": TransformersModelType.AUTOMODEL_IMAGETEXTTOTEXT,
"extra_generation_config": {"skip_special_tokens": False},
}
### NOTE: no torch_dtype specified here
),
},
),
...
default_engine_type=VlmEngineType.AUTO_INLINE,
)

## Issue 2: get_engine_config() drops torch_dtype 
File: docling/datamodel/stage_model_specs.py, lines 216-241 
Even if torch_dtype were added to the CodeFormulaV2 preset, it would be silently discarded by
VlmModelSpec.get_engine_config(): 
def get_engine_config(self, engine_type: VlmEngineType) -> EngineModelConfig:
repo_id = self.get_repo_id(engine_type)
revision = self.get_revision(engine_type)
  extra_config = {}
if engine_type in self.engine_overrides:
extra_config = self.engine_overrides[engine_type].extra_config.copy()
  return EngineModelConfig(
repo_id=repo_id,
revision=revision,
extra_config=extra_config,
### NOTE: torch_dtype is NOT extracted from self.engine_overrides[engine_type]
)

This method constructs a new EngineModelConfig with only repo_id, revision, and extra_config. The torch_dtype
field from the engine override is never read. 

## Issue 3: AutoInlineVlmEngine creates engine options with default torch_dtype=NoneFile: docling/models/inference_engines/vlm/auto_inline_engine.py, lines 196-207 
When AutoInlineVlmEngine (CodeFormulaV2’s default engine) selects the Transformers backend, it creates
TransformersVlmEngineOptions() with no arguments: 
else: ### TRANSFORMERS
transformers_options = TransformersVlmEngineOptions() # torch_dtype=None
self.actual_engine = TransformersVlmEngine(
options=transformers_options,
accelerator_options=self.accelerator_options,
artifacts_path=self.artifacts_path,
model_config=model_config, # model_config.torch_dtype is also None (Issue 2)
)

TransformersVlmEngineOptions defaults torch_dtype to None (line 70 in vlm_engine_options.py): 
class TransformersVlmEngineOptions(BaseVlmEngineOptions):
torch_dtype: Optional[str] = Field(
default=None, description="PyTorch dtype (e.g., 'float16', 'bfloat16')"
)

And even though model_config is passed, it also has torch_dtype=None due to Issue 2. 

# Suggested Fixes 

Ideal fix: Automatic dtype/attention compatibility 
When cuda_use_flash_attention2=True and torch_dtype=None, the engine should automatically cast to bf16 rather
than loading in fp32 and failing at inference time. This mirrors how HuggingFace’s from_pretrained handles
attn_implementation="flash_attention_2" when torch_dtype=torch.float32 is explicitly passed — it raises an
error, but when dtype is unset, it could default to bf16 for compatibility. 
Affected PresetsPreset Stage torch_dtype in TRANSFORMERS override FA2 compatible?smoldocling VLM_CONVERT "bfloat16" Yessmolvlm PICTURE_DESC "bfloat16" Yescodeformulav2 CODE_FORMULA Not set (None) Nogranite_docling CODE_FORMULA Not set (None) Nogranite_vision PICTURE_DESC Not set (None) No (but typically API-based) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeFormulaV2 Preset Missing torch_dtype Causes Flash Attention 2 Incompatibility #3026

The Error

Issue 1: Preset definition omits torch_dtype

NOTE: no torch_dtype specified here

Issue 2: get_engine_config() drops torch_dtype

NOTE: torch_dtype is NOT extracted from self.engine_overrides[engine_type]

Issue 3: AutoInlineVlmEngine creates engine options with default torch_dtype=NoneFile: docling/models/inference_engines/vlm/auto_inline_engine.py, lines 196-207

Suggested Fixes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CodeFormulaV2 Preset Missing torch_dtype Causes Flash Attention 2 Incompatibility #3026

Description

The Error

Issue 1: Preset definition omits torch_dtype

NOTE: no torch_dtype specified here

Issue 2: get_engine_config() drops torch_dtype

NOTE: torch_dtype is NOT extracted from self.engine_overrides[engine_type]

Issue 3: AutoInlineVlmEngine creates engine options with default torch_dtype=NoneFile: docling/models/inference_engines/vlm/auto_inline_engine.py, lines 196-207

Suggested Fixes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions