sycl: addressing non-contiguous src1 mul_mats (nc and batched) by Alcpz · Pull Request #13343 · ggml-org/llama.cpp

Alcpz · 2025-05-06T17:09:45Z

#13308 disabled the mul_mats for non-contiguous matrices since we were producing wrong results.

This PR fixes the non-contiguous mulmats based on #13155.
I had to disable the path for ggml_sycl_mul_mat_vec_nc since that doesn't deal with non-contiguous src1.

model / test	size	params	backend	ngl	sm	t/s (`27aa259`)	t/s (`5215b91`)	t/s (e3177917)
qwen2 1.5B Q4_0 · pp512	1013.62 MiB	1.78 B	SYCL	99	none	6309.62 ± 33.92	2381.98 ± 33.28	6011.69 ± 14.02
qwen2 1.5B Q4_0 · tg128	1013.62 MiB	1.78 B	SYCL	99	none	105.86 ± 0.50	102.06 ± 2.82	102.55 ± 0.71
llama 7B Q4_0 · pp512	3.57 GiB	6.74 B	SYCL	99	none	1649.66 ± 2.95	492.51 ± 16.00	1622.15 ± 2.65
llama 7B Q4_0 · tg128	3.57 GiB	6.74 B	SYCL	99	none	40.79 ± 0.28	40.76 ± 0.16	40.39 ± 0.14
phi3 3B Q4_0 · pp512	2.03 GiB	3.82 B	SYCL	99	none	2465.55 ± 5.34	588.31 ± 10.27	2404.85 ± 4.10
phi3 3B Q4_0 · tg128	2.03 GiB	3.82 B	SYCL	99	none	61.67 ± 0.70	62.00 ± 0.62	62.50 ± 0.64

27aa259 broken mul_mat performance (Before disabling non-contiguous src1)
5215b91 the current performance
e3177917 this PR

We have lost a little bit of performance due to disabling ggml_sycl_mul_mat_vec_nc, but I think it's better to rethink the whole mul_mat dispatch and refactor matrix multiplications.

I'd prefer to address non-critical concerns in a different PR due to the massive performance regression we have (assuming I didn't mess up something along the way).

I am not entirely sure if the performance loss in qwen2  tg128 is noise or caused by this PR

ggml/src/ggml-sycl/convert.cpp

s-Nick

LGTM

sycl: fixed non-contiguous src1 mul_mats (nc and batched)

31eec7c

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 6, 2025

s-Nick reviewed May 7, 2025

View reviewed changes

ggml/src/ggml-sycl/convert.cpp Outdated Show resolved Hide resolved

ShanoToni approved these changes May 7, 2025

View reviewed changes

ggml/src/ggml-sycl/convert.cpp Outdated Show resolved Hide resolved

Fixed wrong static_cast inside kernel

c0d42d5

s-Nick approved these changes May 7, 2025

View reviewed changes

Alcpz merged commit 8733e0c into ggml-org:master May 8, 2025
46 checks passed

Alcpz mentioned this pull request May 12, 2025

sycl: use oneDNN for matrices multiplication #12972

Merged

Alcpz deleted the Alcpz/non-cont-batched-mul_mat branch November 27, 2025 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: addressing non-contiguous src1 mul_mats (nc and batched)#13343

sycl: addressing non-contiguous src1 mul_mats (nc and batched)#13343
Alcpz merged 2 commits intoggml-org:masterfrom
Alcpz:Alcpz/non-cont-batched-mul_mat

Alcpz commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

s-Nick left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Alcpz commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-Nick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alcpz commented May 6, 2025 •

edited

Loading