sycl: addressing non-contiguous src1 mul_mats (nc and batched)#13343
Merged
Alcpz merged 2 commits intoggml-org:masterfrom May 8, 2025
Merged
sycl: addressing non-contiguous src1 mul_mats (nc and batched)#13343Alcpz merged 2 commits intoggml-org:masterfrom
Alcpz merged 2 commits intoggml-org:masterfrom
Conversation
s-Nick
reviewed
May 7, 2025
ShanoToni
approved these changes
May 7, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#13308 disabled the mul_mats for non-contiguous matrices since we were producing wrong results.
This PR fixes the non-contiguous mulmats based on #13155.
I had to disable the path for
ggml_sycl_mul_mat_vec_ncsince that doesn't deal with non-contiguous src1.27aa259 broken mul_mat performance (Before disabling non-contiguous src1)
5215b91 the current performance
e3177917 this PR
We have lost a little bit of performance due to disabling
ggml_sycl_mul_mat_vec_nc, but I think it's better to rethink the whole mul_mat dispatch and refactor matrix multiplications.I'd prefer to address non-critical concerns in a different PR due to the massive performance regression we have (assuming I didn't mess up something along the way).
I am not entirely sure if the performance loss in qwen2 tg128 is noise or caused by this PR