ggml-cpu : add basic RVV support for vector f32 ops by xctan · Pull Request #15057 · ggml-org/llama.cpp

xctan · 2025-08-03T16:55:03Z

This PR introduces RVV support for several f32 vector kernels.

The implementation required refactoring the vectorization logic. Due to RVV's flexible vector length, its intrinsic types are sizeless, which prevents the compiler from creating arrays of vector registers (a similar limitation also can be found in Arm's SVE). This makes traditional loop unrolling techniques incompatible, necessitating a rewrite of the code to support RVV's architecture.

ggerganov

A bit hesitant about these changes because the GGML_SIMD abstraction was quite simple at unifying the various SIMD implementations. And now with SVE and RVV that do not fit in GGML_SIMD we add extra branches which complicates the code, while AFAIK the SVE and RVV hardware is not very ubiquitous today. Hence we increase the maintenance effort quite a bit disproportionately from the benefits.

Would be nice to think about some refactoring here to make the implementation more manageable in the future.

…upport * origin/master: (61 commits) scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633) kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505) gguf-py: byteswapping improvements (ggml-org#12851) cli : change log to warning to explain reason for stopping (ggml-org#15604) model-conversion : add mmproj conversion target (ggml-org#15628) cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622) server: higher timeout for tests (ggml-org#15621) presets : add qwen3-30B-a3b FIM (ggml-org#15616) HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615) kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610) CANN: refactor mask handling and improve performance in FA (ggml-org#15561) ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057) common : add -m to bash completion for --model [no ci] (ggml-org#15591) OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314) tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599) SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592) mtmd : fix mtmd ios build (ggml-org#15579) tests: add performance test for mul mat id (ggml-org#15543) llamafile: PowerPC Sgemm Optimization (ggml-org#15558) graph : fix assert in memory-less build_attn (ggml-org#15590) ...

…g-model-disabled-agent-prefill * origin/master: (76 commits) scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633) kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505) gguf-py: byteswapping improvements (ggml-org#12851) cli : change log to warning to explain reason for stopping (ggml-org#15604) model-conversion : add mmproj conversion target (ggml-org#15628) cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622) server: higher timeout for tests (ggml-org#15621) presets : add qwen3-30B-a3b FIM (ggml-org#15616) HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615) kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610) CANN: refactor mask handling and improve performance in FA (ggml-org#15561) ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057) common : add -m to bash completion for --model [no ci] (ggml-org#15591) OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314) tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599) SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592) mtmd : fix mtmd ios build (ggml-org#15579) tests: add performance test for mul mat id (ggml-org#15543) llamafile: PowerPC Sgemm Optimization (ggml-org#15558) graph : fix assert in memory-less build_attn (ggml-org#15590) ...

…nemotron-nano-15409 * origin/master: (59 commits) scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633) kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505) gguf-py: byteswapping improvements (ggml-org#12851) cli : change log to warning to explain reason for stopping (ggml-org#15604) model-conversion : add mmproj conversion target (ggml-org#15628) cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622) server: higher timeout for tests (ggml-org#15621) presets : add qwen3-30B-a3b FIM (ggml-org#15616) HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615) kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610) CANN: refactor mask handling and improve performance in FA (ggml-org#15561) ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057) common : add -m to bash completion for --model [no ci] (ggml-org#15591) OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314) tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599) SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592) mtmd : fix mtmd ios build (ggml-org#15579) tests: add performance test for mul mat id (ggml-org#15543) llamafile: PowerPC Sgemm Optimization (ggml-org#15558) graph : fix assert in memory-less build_attn (ggml-org#15590) ...

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

…#15057)"

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 3, 2025

xctan requested a review from ggerganov August 3, 2025 18:05

ggml-cpu : add basic RVV support for vector f32 ops

2231433

xctan force-pushed the rvv-vec-h branch from 835397f to 960c11e Compare August 26, 2025 10:34

ggml-cpu : add RVV support for f32 softmax

7d30c42

xctan force-pushed the rvv-vec-h branch from 960c11e to 7d30c42 Compare August 26, 2025 11:27

ggerganov approved these changes Aug 27, 2025

View reviewed changes

xctan merged commit 1cf123a into ggml-org:master Aug 27, 2025
47 of 48 checks passed

ggerganov mentioned this pull request Aug 28, 2025

ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

Merged

Minh141120 pushed a commit to janhq/llama.cpp that referenced this pull request Aug 29, 2025

ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057)

dbd8ebc

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 6, 2025

Revert "ggml-cpu : add basic RVV support for vector f32 ops (ggml-org…

a03e091

…#15057)"

This was referenced Nov 12, 2025

[RISC-V] Extend support for RVV floating-point kernels riseproject-dev/llama.cpp#1

Merged

ggml-cpu: extend support for RVV floating-point kernels #17318

Merged

This was referenced Nov 20, 2025

UPSTREAM PR #17318: ggml-cpu: extend support for RVV floating-point kernels auroralabs-loci/llama.cpp#264

Closed

UPSTREAM PR #17318: ggml-cpu: extend support for RVV floating-point kernels auroralabs-loci/llama.cpp#318

Open

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

ggml-cpu : add basic RVV support for vector f32 ops (#15057)

e01bfb9

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu : add basic RVV support for vector f32 ops#15057

ggml-cpu : add basic RVV support for vector f32 ops#15057
xctan merged 2 commits intoggml-org:masterfrom
xctan:rvv-vec-h

xctan commented Aug 3, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xctan commented Aug 3, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants