[HybridAttention] Support past key values in attention_mask for stateful inference by apaniukov · Pull Request #1641 · huggingface/optimum-intel

apaniukov · 2026-03-19T12:14:48Z

What does this PR do?

GraniteMoeHybrid crashes with a shape mismatch error:

RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:224:
Exception from src/plugins/intel_cpu/src/node.cpp:787:
[CPU] Add node with name '__module.model/aten::add/Add_1' Check 'input_shape[j] == 1' failed at src/plugins/intel_cpu/src/shape_inference/custom/eltwise.cpp:52:
Eltwise shape infer input shapes dim index: 3 mismatch

The RuntimeError at the Add node that combines the causal mask with the position-expanded attention_mask:

port 0 (causal mask): [batch, 1, seq_len, seq_len]
port 1 (attn mask): [batch, 1, 1, total_seq_len]
where seq_len != total_seq_len during 2nd+ generation steps when seq_len > 1 (e.g. second prefill stage).

The issue caused by patched _update_causal_mask method. Hybrid models that doesn't patch it (Qwen3-Next and LFM2) work fine.

Fix:

Update inputs property to use "past_sequence_length + sequence_length" for attention_mask for -with-past conversion
[unrelated] remove unnecessary triu operation

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

…attention mask and generate appropriate dummy inputs

HuggingFaceDocBuilderDev · 2026-03-19T15:52:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rkazants

can you please add test to reproduce this case?

apaniukov · 2026-03-19T18:19:10Z

can you please add test to reproduce this case?

Done.

optimum/exporters/openvino/model_configs.py

apaniukov changed the title ~~Support past key values in attention_mask for stateful inference~~ [GraniteMoeHybrid] Support past key values in attention_mask for stateful inference Mar 19, 2026

apaniukov changed the title ~~[GraniteMoeHybrid] Support past key values in attention_mask for stateful inference~~ [HybridAttention] Support past key values in attention_mask for stateful inference Mar 19, 2026

apaniukov added 2 commits March 19, 2026 14:02

Enhance GraniteMoeHybridOpenVINOConfig to support past key values in …

11349e7

…attention mask and generate appropriate dummy inputs

Update Qwen3-Next Conversion

22c0cf8

rkazants reviewed Mar 19, 2026

View reviewed changes

apaniukov added 2 commits March 19, 2026 16:10

Add LFM2 Support

3e38f73

Refactor config

73269ff

apaniukov requested a review from rkazants March 19, 2026 18:30

apaniukov added 2 commits March 19, 2026 20:36

Make different fix and add test

2c79e4f

Rename var

879b51d

rkazants reviewed Mar 23, 2026

View reviewed changes

optimum/exporters/openvino/model_configs.py Show resolved Hide resolved

apaniukov added 2 commits March 25, 2026 15:27

Add skip on empty

a26b7cb

Fix Lint/Format

28320b8

rkazants approved these changes Mar 26, 2026

View reviewed changes

Fix Lint

60aca29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HybridAttention] Support past key values in attention_mask for stateful inference#1641

[HybridAttention] Support past key values in attention_mask for stateful inference#1641
apaniukov wants to merge 9 commits intohuggingface:mainfrom
apaniukov:fix-granite-moe-hybrid-state

apaniukov commented Mar 19, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 19, 2026

Uh oh!

rkazants left a comment

Uh oh!

apaniukov commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apaniukov commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Mar 19, 2026

Uh oh!

rkazants left a comment

Choose a reason for hiding this comment

Uh oh!

apaniukov commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

apaniukov commented Mar 19, 2026 •

edited

Loading