UPSTREAM PR #17951: ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting by loci-dev · Pull Request #531 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-12T05:38:45Z

Mirrored from ggml-org/llama.cpp#17951

Changes included:

Add ggml_cpu_get_rvv_cnt() and RVV vector-length initialization.
Export RVV_CNT in CPU feature list.
Update ggml_repack_get_optimal_repack_type() to enable Q4_0 repack when
ggml_cpu_has_riscv_v() and rvv_cnt >= QK4_0.

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

loci-review · 2025-12-12T06:34:18Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #531

Overview

PR #531 introduces RISC-V vector extension (RVV) support for Q4_0 quantization repacking. The changes add runtime detection of RVV vector length and enable optimized 8x8 block processing when hardware supports vectors ≥256 bits. This is a platform-specific enhancement affecting 4 files with 33 additions and 1 deletion.

Code Changes Analysis

The implementation adds ggml_cpu_get_rvv_cnt() to query RISC-V vector register length at runtime, mirroring the existing ARM SVE pattern. The core modification updates ggml_repack_get_optimal_repack_type() in repack.cpp to include RISC-V in the platform selection logic alongside existing AVX2 and SVE paths. The changes are isolated to RISC-V-specific code paths with appropriate compilation guards, ensuring zero impact on x86-64 and ARM platforms.

Performance Impact

Inference Performance:
No functions in the critical inference path (llama_decode, llama_encode, llama_tokenize) were modified. The changes affect preprocessing during model loading, not the hot path execution. Therefore, tokens per second remains unchanged for all platforms.

Power Consumption:
Analysis shows negligible power consumption changes across all binaries:

libggml-cpu.so: -0.3% (116,901 nJ → 116,550 nJ)
libllama.so: -0.0% (195,495 nJ, no meaningful change)
All other binaries: 0.0% change

The 0.3% reduction in libggml-cpu.so represents 351 nJ absolute change, which is within measurement noise and does not indicate actual power savings.

RISC-V-Specific Impact:
For RISC-V platforms with RVV ≥256 bits, the repack optimization enables vectorized 8x8 block processing during model loading. This is a one-time preprocessing cost with no runtime inference impact. The optimization improves memory access patterns for subsequent Q4_0 matrix operations but does not affect the functions analyzed in previous performance reports (quantize_row_q4_K, quantize_row_q6_K, parameter setters).

Key Findings

No Impact on Analyzed Performance Metrics:
The 10 functions with highest response time changes identified in prior analysis (ggml_vec_argmax_f32 +74 ns, quantize_row_q6_K +14 ns, parameter setters +11-21 ns) are unrelated to this PR. Those regressions stem from validation logic and quantization algorithm changes in the baseline comparison, not from RVV support additions.

Platform Isolation:
All changes are conditionally compiled for RISC-V only. x86-64 and ARM code paths remain identical, confirmed by zero performance delta on non-RISC-V binaries.

Preprocessing vs Runtime:
The repack selection logic executes during model loading, not during token generation. The ggml_repack_get_optimal_repack_type() function determines data layout for subsequent operations but is not called in the inference loop.

loci-review · 2025-12-12T10:29:24Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #531

Version Comparison: 09fbc8c1 vs fd9769c0
Binary: build.bin.libggml-cpu.so

Analysis Classification: Condition 1

This PR introduces RISC-V RVV support for Q4_0 quantization without modifying core computational logic on x86_64 architecture. The observed performance variations are within measurement noise and do not represent functional changes to the inference pipeline.

Performance Metrics:

Power consumption change: -0.48% (-566 nJ) in libggml-cpu.so
All other binaries: 0.00% change
Largest absolute changes: 74 ns (ggml_vec_argmax_f32), 11 ns (parameter accessors)

Code Changes:

Added ggml_cpu_get_rvv_vlen() API for RISC-V vector length detection
Modified ggml_repack_get_optimal_repack_type() to enable Q4_0 8x8 repacking on RISC-V when vector length is sufficient
Changes are architecture-specific and conditionally compiled for RISC-V only

Inference Impact:
No functions in the tokenization or inference pipeline (llama_decode, llama_encode, llama_tokenize) were modified. The changes affect only RISC-V-specific feature detection and quantization path selection. On x86_64 systems, tokens per second remains unchanged.

ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting

f1df1fb

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

loci-dev temporarily deployed to PROD__AL_DEMO December 12, 2025 05:38 — with GitHub Actions Inactive

using the name VLEN instead of CNT

09d31ef

loci-dev had a problem deploying to PROD__AL_DEMO December 12, 2025 09:39 — with GitHub Actions Failure

loci-dev force-pushed the main branch 24 times, most recently from f70847d to 45e0e28 Compare December 14, 2025 22:08

loci-dev force-pushed the main branch 30 times, most recently from 9f1f66d to ec69147 Compare December 19, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17951: ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting#531

UPSTREAM PR #17951: ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting#531
loci-dev wants to merge 2 commits into
mainfrom
upstream-PR17951-branch_ixgbe-fix_riscv_q4_0_repack_selection

loci-dev commented Dec 12, 2025

Uh oh!

loci-review Bot commented Dec 12, 2025

Uh oh!

loci-review Bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

loci-dev commented Dec 12, 2025

Uh oh!

loci-review Bot commented Dec 12, 2025

Performance Analysis Summary: PR #531

Overview

Code Changes Analysis

Performance Impact

Key Findings

Uh oh!

loci-review Bot commented Dec 12, 2025

Performance Analysis Summary: PR #531

Analysis Classification: Condition 1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants