Skip to content

Conversation

@aobolensk
Copy link
Contributor

@aobolensk aobolensk commented Oct 28, 2025

Details:

  • Introduce MatMulConfig as a part of TokenizationConfig in snippets
  • Disable tokenization of MatMul (Gemm) operation result on ARM64 (since it cannot be fused there for now)

Tickets:

  • 176061

@aobolensk aobolensk requested review from a team as code owners October 28, 2025 16:13
auto label = ov::pass::pattern::any_input([config](const ov::Output<ov::Node>& out) {
const auto n = out.get_node_shared_ptr();
// Config-aware gating: optionally reject specific Transpose cases around MatMul
if (ov::is_type<ov::op::v1::Transpose>(n)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this check to is_supported_transpose lambda in is_supported_op. We will need to change their signatures, but I think it is okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done

Comment on lines +20 to +25
// Transpose before MatMul on input 0
bool is_supported_transpose_a = true;
// Transpose before MatMul on input 1
bool is_supported_transpose_b = true;
// Transpose after MatMul on its output
bool is_supported_transpose_c = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose just boolean values are not enough here: not all types of transposes are actually supported even on X86. Ideally, we need to have lambdas here which will replace is_supported_transpose in collapse_subgraph.cpp and is_valid_transpose in mha_tokenization.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point to discuss it. Right now, these flags are just being used in is_supported_transpose. It seemed for me that this way we will get less code duplication here and the logic seems to look simpler. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the original ideas of this PR are to avoid unsupported transposes tokenization + make supported transposes configuration device specific. And the current behavior doesn't fully solves the both problems:

  1. We still tokenize transposes in MHATokenization and then we have to move them out of the Subgraph
  2. The supported transposes checks in MHATokenization, related to X64, are performed for ARM as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled all transposes on ARM

// Disable Transpose after MatMul in general tokenization on ARM64 by default.
// Keep Transpose before MatMul inputs allowed for flexibility.
ov::snippets::pass::MatMulConfig mm_cfg;
mm_cfg.is_supported_transpose_c = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't is_supported_transpose_a/is_supported_transpose_b be false as well on ARM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are being converted to GemmCopyB, actually. Output one is the one that is troublesome right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really:

  1. Transpose a is placed on A, not B input. So we don't even insert GemmCopyB there
  2. Transpose b = true is not supported by GemmCopyB. Please see ExplicitTransposeMatMulInputs callback for the details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, fixed

@aobolensk
Copy link
Contributor Author

Decided to switch to an alternative approach. PR can be found here: #32676

@aobolensk aobolensk closed this Nov 4, 2025
github-merge-queue bot pushed a commit that referenced this pull request Nov 5, 2025
### Details:
Introduce a callback function for `ExtractUnsupportedTransposes` pass as
a part of `CommonOptimizations::Config` to customize pass behavior
depending on Transpose support.

For example, ARM64 platform supports transpose decomposition, but MatMul
with Transpose A/B is not supported so far. Rest of the (potential)
platforms mark Transpose as not supported completely

An alternative approach for
#32592

### Tickets:
 - 176061
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants