UPSTREAM PR #17244: vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths by DajanaV · Pull Request #194 · auroralabs-loci/llama.cpp

DajanaV · 2025-11-13T16:39:44Z

loci-review · 2025-11-13T17:17:38Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of version ad307b9e-2eb6-4f60-a555-6db11810d370 compared to baseline bbaaf630-901f-4d9a-a608-cba4e19ac3bc reveals minimal performance variations with no meaningful changes to core inference functions.

Key Findings

Performance Metrics:

Highest Response Time change: std::vector<llm_bigram_spm>::pop_back() (+0.10%, 67 ns)
Highest Throughput change: llama_context::clear_adapter_lora() (+0.13%, 47 ns)
Both functions show measurement-level variations rather than functional changes

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The affected functions are peripheral utilities not involved in the primary tokenization/inference pipeline. Therefore, no impact on tokens per second performance is expected.

Power Consumption Analysis:
Negligible changes across all binaries:

build.bin.libllama.so: -0.0002% (280,855 nJ → 280,855 nJ)
build.bin.llama-run: -0.0001% (282,849 nJ → 282,848 nJ)
All other binaries show zero measurable change

Flame Graph & CFG Analysis:
The std::vector<llm_bigram_spm>::pop_back() function exhibits identical assembly code between versions (20 instructions, 66 ns execution time). The 0.06 ns timing difference represents microarchitectural variations rather than code changes, confirming measurement noise.

GitHub Code Review:
PR #194 implements Vulkan backend improvements for matrix-vector operations, introducing centralized buffer management through ggml_vk_tensor_subbuffer(). The changes consolidate 411 lines into 220 lines while maintaining identical computational behavior. No functional regressions identified.

Conclusion:
The analysis reveals a stable codebase with only statistical measurement variations. The Vulkan improvements enhance code maintainability without affecting performance. No actionable recommendations are required as no verifiable issues or performance regressions were identified.

jeffbolznv added 2 commits November 13, 2025 09:35

vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths

7c119e6

set allow_misalign

eadb984

DajanaV temporarily deployed to PROD__AL_DEMO November 13, 2025 16:39 — with GitHub Actions Inactive

DajanaV force-pushed the main branch 26 times, most recently from 4fb52c0 to 88cd3fd Compare November 16, 2025 22:07

loci-dev force-pushed the main branch 30 times, most recently from 53eeb3f to 2531f8a Compare November 26, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17244: vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths#194

UPSTREAM PR #17244: vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths#194
DajanaV wants to merge 2 commits intomainfrom
upstream-PR17244-branch_jeffbolznv-mul_mat_vec_subbuffer

DajanaV commented Nov 13, 2025

Uh oh!

loci-review bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Nov 13, 2025

Uh oh!

loci-review bot commented Nov 13, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants