[Snippets][CPU] Introduce MatMul tokenization config, do not tokenize Transpose after MatMul on ARM64 #32592

aobolensk · 2025-10-28T16:13:30Z

Details:

Introduce MatMulConfig as a part of TokenizationConfig in snippets
Disable tokenization of MatMul (Gemm) operation result on ARM64 (since it cannot be fused there for now)

Tickets:

176061

v-Golubev · 2025-10-29T09:41:57Z

src/common/snippets/src/pass/collapse_subgraph.cpp

+    auto label = ov::pass::pattern::any_input([config](const ov::Output<ov::Node>& out) {
        const auto n = out.get_node_shared_ptr();
+        // Config-aware gating: optionally reject specific Transpose cases around MatMul
+        if (ov::is_type<ov::op::v1::Transpose>(n)) {


Let's move this check to is_supported_transpose lambda in is_supported_op. We will need to change their signatures, but I think it is okay

Good idea, done

v-Golubev · 2025-10-29T09:57:52Z

src/common/snippets/include/snippets/pass/tokenization_config.hpp

+    // Transpose before MatMul on input 0
+    bool is_supported_transpose_a = true;
+    // Transpose before MatMul on input 1
+    bool is_supported_transpose_b = true;
+    // Transpose after MatMul on its output
+    bool is_supported_transpose_c = true;


I suppose just boolean values are not enough here: not all types of transposes are actually supported even on X86. Ideally, we need to have lambdas here which will replace is_supported_transpose in collapse_subgraph.cpp and is_valid_transpose in mha_tokenization.cpp

That's a good point to discuss it. Right now, these flags are just being used in is_supported_transpose. It seemed for me that this way we will get less code duplication here and the logic seems to look simpler. What do you think?

But the original ideas of this PR are to avoid unsupported transposes tokenization + make supported transposes configuration device specific. And the current behavior doesn't fully solves the both problems:

We still tokenize transposes in MHATokenization and then we have to move them out of the Subgraph

The supported transposes checks in MHATokenization, related to X64, are performed for ARM as well

Disabled all transposes on ARM

v-Golubev · 2025-10-29T09:59:31Z

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

+    // Disable Transpose after MatMul in general tokenization on ARM64 by default.
+    // Keep Transpose before MatMul inputs allowed for flexibility.
+    ov::snippets::pass::MatMulConfig mm_cfg;
+    mm_cfg.is_supported_transpose_c = false;


Shouldn't is_supported_transpose_a/is_supported_transpose_b be false as well on ARM?

They are being converted to GemmCopyB, actually. Output one is the one that is troublesome right now

Not really:

Transpose a is placed on A, not B input. So we don't even insert GemmCopyB there

Transpose b = true is not supported by GemmCopyB. Please see ExplicitTransposeMatMulInputs callback for the details

Fair enough, fixed

aobolensk · 2025-11-04T14:49:50Z

Decided to switch to an alternative approach. PR can be found here: #32676

### Details: Introduce a callback function for `ExtractUnsupportedTransposes` pass as a part of `CommonOptimizations::Config` to customize pass behavior depending on Transpose support. For example, ARM64 platform supports transpose decomposition, but MatMul with Transpose A/B is not supported so far. Rest of the (potential) platforms mark Transpose as not supported completely An alternative approach for #32592 ### Tickets: - 176061

aobolensk added 2 commits October 28, 2025 15:00

[Snippets] Do not tokenize Transpose on ARM

53ac48e

a b c

58cfb46

aobolensk requested review from a team as code owners October 28, 2025 16:13

github-actions bot added category: CPU OpenVINO CPU plugin category: snippets labels Oct 28, 2025

fmt

2b9bf37

aobolensk assigned v-Golubev Oct 28, 2025

v-Golubev reviewed Oct 29, 2025

View reviewed changes

aobolensk added 2 commits October 29, 2025 12:10

move check to 'is_supported_op' function

52eb4e8

disable all transposes on arm

6b1e23e

aobolensk mentioned this pull request Nov 4, 2025

[Snippets][CPU] Make ExtractUnsupportedTransposes HW dependent #32676

Merged

aobolensk closed this Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Snippets][CPU] Introduce MatMul tokenization config, do not tokenize Transpose after MatMul on ARM64 #32592

[Snippets][CPU] Introduce MatMul tokenization config, do not tokenize Transpose after MatMul on ARM64 #32592

Uh oh!

aobolensk commented Oct 28, 2025 •

edited

Loading

Uh oh!

v-Golubev Oct 29, 2025

Uh oh!

aobolensk Oct 29, 2025

Uh oh!

v-Golubev Oct 29, 2025

Uh oh!

aobolensk Oct 29, 2025

Uh oh!

v-Golubev Oct 29, 2025

Uh oh!

aobolensk Oct 29, 2025

Uh oh!

v-Golubev Oct 29, 2025

Uh oh!

aobolensk Oct 29, 2025

Uh oh!

v-Golubev Oct 29, 2025

Uh oh!

aobolensk Oct 29, 2025

Uh oh!

aobolensk commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Snippets][CPU] Introduce MatMul tokenization config, do not tokenize Transpose after MatMul on ARM64 #32592

[Snippets][CPU] Introduce MatMul tokenization config, do not tokenize Transpose after MatMul on ARM64 #32592

Uh oh!

Conversation

aobolensk commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aobolensk commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aobolensk commented Oct 28, 2025 •

edited

Loading