SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend by s-Nick · Pull Request #10584 · ggml-org/llama.cpp

s-Nick · 2024-11-29T16:29:53Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

This patch move oneMKL interface calls to gemm from real time to compile time for NVIDIA backend, bringing improvements specially in text generation.

Tested on A100:
Current

model	size	params	backend	ngl	sm	test	t/s
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	none	pp512	705.51 ± 2.23
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	none	tg128	14.28 ± 0.05
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	pp512	5426.17 ± 29.64
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	tg128	81.36 ± 1.29
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	none	pp512	5592.87 ± 89.22
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	none	tg128	72.96 ± 0.91

build: 0f77aae (20)

With changes

model	size	params	backend	ngl	threads	sm	test	t/s
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	720.68 ± 1.62
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	18.52 ± 0.07
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	5489.17 ± 30.44
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	91.99 ± 0.05
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	5439.13 ± 216.28
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	89.34 ± 0.05

build: ffd0a99 (4222)

… NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

s-Nick · 2024-11-29T16:30:52Z

@Alcpz Could you check it out?

Alcpz · 2024-11-29T16:34:13Z

@Rbiessy Feel free to give it a look, since you have experience working with oneMKL interface

NeoZhangJianyu

Same comments for other update.

NeoZhangJianyu · 2024-12-02T07:55:08Z

ggml/src/ggml-sycl/dpct/helper.hpp

            oneapi::mkl::blas::column_major::gemm(
-                q, a_trans, b_trans, m, n, k, alpha_value, data_a, lda,
-                data_b, ldb, beta_value, data_c, ldc);
+#ifdef GGML_SYCL_NVIDIA


The macro make the code is hard to understand.
I suggest:

#ifdef GGML_SYCL_NVIDIA oneapi::mkl::blas::column_major::gemm( oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas>{ q }, a_trans, b_trans, m, n, k, alpha_value, data_a, lda, data_b, ldb, beta_value, data_c, ldc); } #else oneapi::mkl::blas::column_major::gemm( q, a_trans, b_trans, m, n, k, alpha_value, data_a, lda, data_b, ldb, beta_value, data_c, ldc); } #endif

If we start adding support for Intel GPU as well I think it would make more sense to have a helper function that returns either a backend_selector or a queue based on the backend.
It would avoid duplicating the call to gemm which I think is a risk.

Please remember, the SYCL backend is initiated to support Intel GPU. :)
Support more vendor GPUs is added later.
The default code path should be optimized for Intel GPU.

It's OK to set special queue for other vendor GPUs.

Code update for readability in f6e6fc4

NeoZhangJianyu · 2024-12-02T08:49:10Z

@s-Nick
I guess the same method could help for Intel GPU.
Is it possible to test for Intel GPU too? like oneapi::mkl::backend::mklgpu.

s-Nick · 2024-12-02T10:41:53Z

Thank you for your review @NeoZhangJianyu
Currently Intel GPU implementation uses oneMKL closed source library directly and it doesn't have nor need a backend_selector, therefore these changes aren't required or useful

Rbiessy

The oneMKL Interface changes look good to me.

NeoZhangJianyu · 2024-12-02T14:30:04Z

Thank you for your review @NeoZhangJianyu Currently Intel GPU implementation uses oneMKL closed source library directly and it doesn't have nor need a backend_selector, therefore these changes aren't required or useful

OK, I see!

Alcpz

changes lgtm. Let's wait for the remaining thread to be resolved before merging.

…IDIA backend (ggml-org#10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

s-Nick added 2 commits November 29, 2024 13:48

[SYCL] Move to Compile Time backend selection on oneMKL Interface for…

a7e15b0

… NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

Formatting

ffd0a99

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 29, 2024

Alcpz requested review from NeoZhangJianyu and airMeng November 29, 2024 16:33

NeoZhangJianyu reviewed Dec 2, 2024

View reviewed changes

Rbiessy approved these changes Dec 2, 2024

View reviewed changes

Address PR comments to increase readibility

f6e6fc4

Alcpz approved these changes Dec 3, 2024

View reviewed changes

NeoZhangJianyu approved these changes Dec 4, 2024

View reviewed changes

NeoZhangJianyu merged commit 40c6d79 into ggml-org:master Dec 4, 2024

Rbiessy mentioned this pull request Dec 16, 2024

SYCL: Fixes for building SYCL backend for AMD GPUs #10851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend#10584

SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend#10584
NeoZhangJianyu merged 3 commits intoggml-org:masterfrom
s-Nick:onemkl_nvidia_CT

s-Nick commented Nov 29, 2024

Uh oh!

s-Nick commented Nov 29, 2024

Uh oh!

Alcpz commented Nov 29, 2024

Uh oh!

NeoZhangJianyu left a comment

Uh oh!

NeoZhangJianyu Dec 2, 2024

Uh oh!

Rbiessy Dec 2, 2024

Uh oh!

NeoZhangJianyu Dec 2, 2024

Uh oh!

s-Nick Dec 3, 2024

Uh oh!

NeoZhangJianyu commented Dec 2, 2024 •

edited

Loading

Uh oh!

s-Nick commented Dec 2, 2024

Uh oh!

Rbiessy left a comment

Uh oh!

NeoZhangJianyu commented Dec 2, 2024

Uh oh!

Alcpz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

s-Nick commented Nov 29, 2024

Uh oh!

s-Nick commented Nov 29, 2024

Uh oh!

Alcpz commented Nov 29, 2024

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

Rbiessy Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

s-Nick Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-Nick commented Dec 2, 2024

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Dec 2, 2024

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NeoZhangJianyu commented Dec 2, 2024 •

edited

Loading