SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend#10584
Conversation
… NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
|
@Alcpz Could you check it out? |
|
@Rbiessy Feel free to give it a look, since you have experience working with oneMKL interface |
NeoZhangJianyu
left a comment
There was a problem hiding this comment.
Same comments for other update.
| oneapi::mkl::blas::column_major::gemm( | ||
| q, a_trans, b_trans, m, n, k, alpha_value, data_a, lda, | ||
| data_b, ldb, beta_value, data_c, ldc); | ||
| #ifdef GGML_SYCL_NVIDIA |
There was a problem hiding this comment.
The macro make the code is hard to understand.
I suggest:
#ifdef GGML_SYCL_NVIDIA
oneapi::mkl::blas::column_major::gemm(
oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas>{ q },
a_trans, b_trans, m, n, k, alpha_value, data_a, lda,
data_b, ldb, beta_value, data_c, ldc);
}
#else
oneapi::mkl::blas::column_major::gemm(
q,
a_trans, b_trans, m, n, k, alpha_value, data_a, lda,
data_b, ldb, beta_value, data_c, ldc);
}
#endif
There was a problem hiding this comment.
If we start adding support for Intel GPU as well I think it would make more sense to have a helper function that returns either a backend_selector or a queue based on the backend.
It would avoid duplicating the call to gemm which I think is a risk.
There was a problem hiding this comment.
Please remember, the SYCL backend is initiated to support Intel GPU. :)
Support more vendor GPUs is added later.
The default code path should be optimized for Intel GPU.
It's OK to set special queue for other vendor GPUs.
|
@s-Nick |
|
Thank you for your review @NeoZhangJianyu |
Rbiessy
left a comment
There was a problem hiding this comment.
The oneMKL Interface changes look good to me.
OK, I see! |
Alcpz
left a comment
There was a problem hiding this comment.
changes lgtm. Let's wait for the remaining thread to be resolved before merging.
…IDIA backend (ggml-org#10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
This patch move oneMKL interface calls to gemm from real time to compile time for NVIDIA backend, bringing improvements specially in text generation.
Tested on A100:
Current
build: 0f77aae (20)
With changes
build: ffd0a99 (4222)