Skip to content

UPSTREAM PR #17461: ggml: add RISC-V cpu-feats#301

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17461-branch_ixgbe-add_riscv_cpu_feats
Open

UPSTREAM PR #17461: ggml: add RISC-V cpu-feats#301
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17461-branch_ixgbe-add_riscv_cpu_feats

Conversation

@loci-dev
Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#17461

This PR introduces the CPU features detection for the RISC-V platform and allows for dynamic backend loading when compiled with -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON.

1、Build this PR using:

cmake -B build -DLLAMA_CURL=OFF -DCMAKE_BUILD_TYPE=Release -DGGML_OPENMP=OFF -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON -DGGML_RV_ZICBOP=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON
cmake --build build --config Release -j $(nproc)

2、Check that there are 2 libggml-cpu*.so files built:

 ls -la build/bin | grep libggml-cpu-
 -rwxr-xr-x  1 root root  499184 11月21日 16:59 libggml-cpu-riscv64_0.so
-rwxr-xr-x  1 root root  552544 11月21日 17:00 libggml-cpu-riscv64_v.so

3、Run a test prompt and let me know which library is loaded via:
build/bin/llama-cli -m Qwen3-0.6B-Q4_K_M.gguf -no-cnv --seed 42 -n 50 -p "Write me a dog walking business idea 1. " 2>&1 | less

Help me paste the first few outputs from the top. It should print something like this at the top and it should run the prompt completely without problems.

load_backend: loaded CPU backend from /home/yangwang/llama.cpp/build/bin/libggml-cpu-riscv64_v.so
build: 7083 (2376b7758) with cc (Bianbu 14.2.0-4ubuntu2~24.04bb1) 14.2.0 for riscv64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 32 key-value pairs and 310 tensors from Qwen3-0.6B-Q4_K_M.gguf (version GGUF V3 (latest))

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
@loci-review
Copy link
Copy Markdown

loci-review bot commented Nov 24, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #301 - RISC-V CPU Features Detection

Assessment

This PR introduces RISC-V CPU feature detection infrastructure without modifying core inference logic. Performance analysis shows no measurable impact across all binaries, with power consumption changes below 0.001% (< 1 nJ absolute delta). No performance-critical functions were modified.

Change Overview

The PR adds three components:

  1. New file: ggml/src/ggml-cpu/arch/riscv/cpu-feats.cpp - Runtime detection of RISC-V Vector (RVV) extensions using Linux auxiliary vectors
  2. Build system updates: CMake configuration for multi-variant RISC-V backend compilation (baseline and RVV-optimized)
  3. Backend scoring: Dynamic selection mechanism to load optimal backend variant at runtime

These changes are build infrastructure only - no modifications to model loading, tokenization, batch processing, or inference paths.

Performance Metrics

Power Consumption (Binary-Level):

  • libllama.so: -0.19 nJ (0.0% change)
  • llama-cvector-generator: +1.02 nJ (0.0% change)
  • All other binaries: No measurable change

Function-Level Analysis:

  • No functions with Response Time or Throughput changes detected
  • Core inference functions (llama_decode, llama_encode, llama_tokenize) unmodified
  • Tokens per second impact: None - inference pipeline unchanged

Flame Graph & CFG Analysis:

  • Not applicable - no function implementations modified
  • Changes limited to build configuration and feature detection initialization

Code Review Findings

Strengths:

  • Follows established ARM/x86 architecture patterns
  • Maintains backward compatibility with single-variant builds
  • Proper platform validation (Linux-only for RISC-V)
  • Clean separation between baseline (rv64gc) and optimized (RVV) variants

Implementation Quality:

  • Feature detection uses standard Linux APIs (getauxval)
  • Backend scoring prevents loading incompatible variants
  • Build system correctly propagates feature flags

Conclusion

This PR establishes foundation for RISC-V optimization without affecting current performance. The infrastructure enables future performance gains (4-8x on vector operations) when RVV-optimized code paths are utilized on compatible hardware. No action required for existing deployments.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from a89c6ad to ad5ad9a Compare November 27, 2025 14:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 38683c7 to fa6cdcc Compare December 3, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants