ggml: aarch64: Implement SVE F16 kernels for vector functions #15115
Merged
CISC merged 7 commits intoggml-org:masterfrom Sep 1, 2025
Merged
ggml: aarch64: Implement SVE F16 kernels for vector functions #15115CISC merged 7 commits intoggml-org:masterfrom
CISC merged 7 commits intoggml-org:masterfrom
Conversation
compilade
reviewed
Aug 6, 2025
Contributor
|
@ggerganov , request you to kindly review the PR and support for merger |
Contributor
Author
|
@ggerganov, @compilade, please review this PR. |
ggerganov
approved these changes
Aug 28, 2025
Member
ggerganov
left a comment
There was a problem hiding this comment.
Same comment as in #15057 (review). We don't even have CI hardware to test these changes, so it's difficult to approve these.
Let's merge after you fix the editor config errors.
walidbr
pushed a commit
to walidbr/llama.cpp
that referenced
this pull request
Sep 7, 2025
…rg#15115) * Added sve implementation for vec_dot_fp16 Kernel * removed white spaces * Added comment * removed white spaces * changed GGML_F16x_VEC_FMA for code consistency * Update vec.h --------- Co-authored-by: vithulep <[email protected]>
Nexesenex
added a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Oct 7, 2025
…ggml-org#15115)" This reverts commit a0c2b20.
Nexesenex
added a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Oct 26, 2025
…ggml-org#15115)" This reverts commit a0c2b20.
blime4
referenced
this pull request
in blime4/llama.cpp
Feb 5, 2026
* Added sve implementation for vec_dot_fp16 Kernel * removed white spaces * Added comment * removed white spaces * changed GGML_F16x_VEC_FMA for code consistency * Update vec.h --------- Co-authored-by: vithulep <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds SVE kernel support for the f16 (ggml_vec_dot_f16() Kernel) data type to reduce the time required for image encoding during LMM model (llava-v1.6-mistral) inference on ARM architecture.
Major code changes:
In vec.cpp file:
In vec.h file:
In simd-mappings.h:
Performance: Graviton3E
On Graviton3E with different threads, got 5-15% speedup on Image Encoding time for multimodal (LMM) inference.
Model: llava-v1.6-mistral-7b.Q4_K_M
Machine: Graviton3E
Command Used:
Perplexity
I have ran perplexity with the NEON(Original) and SVE (This PR) Implementation.
And below is the summary.
This correction does not appear to have any impact on accuracy.