Get Rows & Dequantize Function Adapted to repacked weights of type q4_K by swetha097 · Pull Request #4 · swetha097/whisper.cpp

swetha097 · 2025-06-20T06:32:04Z

No description provided.

…ghts of type q4_0

…r.cpp into swe_pr/get_rows

…cked weights of type q4_K

…ghts of type q4_0

…cked weights of type q4_K

…hisper.cpp into swe_pr/get_rows_q4_K

…g#3268) * android : update CMakeLists.txt to use FetchContent for ggml This commit updates the CMakeLists.txt file for the Android Whisper example to use FetchContent for managing the ggml library. The motivation for this change is avoid having to make manual changes to the CMakeLists.txt file after syncing the ggml library. I've built and run the example locally to verify that it works as expected. Refs: ggml-org#3265 (comment) * android.java : update cmake to use FetchContent for ggml This commit updates the CMake configuration for the Android Java example to use `FetchContent` for including the `ggml` library. Do be able to use FetchContent we also update the `compileSdkVersion` and `targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'. This also required a an update to the Gradle plugin version to 7.4.0. The motivation for this change is avoid having to make manual changes to the CMakeLists.txt file after syncing the ggml library.

* ggml : add ggml_roll * use set/get_op_params & std::min

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* metal : add mean kernel ggml-ci * cont : dedup implementation ggml-ci

…fallback to CPU buffer (llama/14249)

Addresses unused reorder path

* Change _contains_any() substrs to std::string_view and fix the find comparison logic.

Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.

* CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const

ggml-ci

…/14288) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

* Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

…g (llama/14254) * Move profiling info into `ggml_backend_opencl_context` * Add `enqueue_ndrange_kernel` to launch kernel

…a/13973)

…ma/14381)

* ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1) * ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929) * ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627) * ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f) * docs: update s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7) * ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: disable fp32->fp16 nnpa conversions for now there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to elif macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix compiler types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: change to typedef vector types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add 4 element loops for fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarified vector naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back fp32->fp16 store nnpa Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add nnpa macro check in ggml-impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add missing __func__ Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: diagnose why __NNPA__ macro is not being defined Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: import vecintrin.h to fix compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update macro tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit 157f856c34589566151630e294563a420702db39. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to importing ggml-cpu-impl instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix macro declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test more macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add debug prints Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bruteforce macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h to cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to private macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 157f856c34589566151630e294563a420702db39) * ggml-cpu: move things around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to quotes for import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add compiler error macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add s390x detection in ggml-src Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: undo cmakelists work Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedefs.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedef from cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h future notes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add todo comment for future reference Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify naming of dlf16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove unnecessary target compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update broken huggingface link for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix duplicate func names during compile Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: fix duplicate func names during compile" This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu" This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp16<->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h import in quants.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h within repack Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix amx mmq missing simd-mappings.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: attempt at fixing loongarch failing build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa together with other fp16<->fp32 simd Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix wrong refactor of ggml-base ref: ggml-org/llama.cpp#14317 (comment) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove dependency on ggml-cpu from ggml-base Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu ref: ggml-org/llama.cpp#14317 (comment) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove mistaken fallback macro fallback logic was already implemented but i was too sleepy to realise Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: ggml-org/llama.cpp#14317 (comment) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures" This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: move ggml_table_f32_f16 to ggml-cpu" This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: ggml-org/llama.cpp#14317 (comment) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4) * ggml: move ggml_table_f32_f16 to ggml-cpu.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: extern c ggml_table_f32_f16 + chore docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h we rely on the variable declaration in ggml-cpu.c instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h" This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back ggml_table_f32_f16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: bring back ggml_table_f32_f16" This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * fix ggml time initialization * fix f32_f16 table init * remove extra line --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: slaren <slarengh@gmail.com>

* musa: enable fp16 mma (all) and cublas on qy2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* metal : batch rows copy in a single threadgroup ggml-ci * metal : handle some edge cases when threadgroup size is not a power of 2 ggml-ci

ggml-ci

…398) * Add shaders-gen sources as target deps

* ggml : add ggml_set_rows Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: #8366 * use I64 for indices * ggml : add repeat impl for i64 * ggml : add ggml_is_contiguous_rows * ggml : ggml_set_rows support broadcast * ggml : ggml_set_rows support quantized dst ggml-ci * ggml : support GGML_TYPE_F32 ".from_float" trait * ggml : ggml_set_rows update comment + better index name * tests : add ggml_set_rows * metal : add ggml_set_rows implementation ggml-ci * ggml : simplify forward_dup_f32 * ggml : fix supports_op * tests : add comment to set_rows * ggml : leave the repeat_i64 for a separate PR ggml-ci * ggml : set_rows use std::min instead of MIN * ggml : better error message for set_rows unsupported type * metal : perform op->type check only once * tests : more consistent implementation + more tests ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This setting needs to be passed through to vulkan-shaders-gen

…e (llama/14378)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched * Review: add type traits and make function more generic * Review: make check more explicit, add back comments, and fix formatting * Review: fix formatting, remove useless type conversion, fix naming for bools

* vulkan: Add fusion support for RMS_NORM+MUL - Add a use_count to ggml_tensor, so we can detect if an output is used more than once. - Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor. - Add detection logic and basic fusion logic in ggml-vulkan. - Add some testing support for fusion. Rather than computing one node at a time, allow for computing the whole graph and just testing one node's results. Add rms_norm_mul tests and enable a llama test. * extract some common fusion logic * fix -Winconsistent-missing-override * move ggml_can_fuse to a common function * build fix * C and C++ versions of can_fuse * move use count to the graph to avoid data races and double increments when used in multiple threads * use hash table lookup to find node index * change use_counts to be indexed by hash table slot * minimize hash lookups style fixes * last node doesn't need single use. fix type. handle mul operands being swapped. * remove redundant parameter --------- Co-authored-by: slaren <slarengh@gmail.com>

* implement unary REGLU/GEGLU/SWIGLU cpu ops * relax constraints * duplicate shape of source * fix ggml_vec_geglu_f16 * special case gated ops * implement unary REGLU/GEGLU/SWIGLU cuda ops * tighten constraints again * refactor into GGML_GLU_OP * metal : add glu kernels ggml-ci * add CUDA_GLU_BLOCK_SIZE [no ci] * more constraints and use 64bit ints ggml-ci * 64bit multiplication [no ci] * implement swapped variants (cpu/cuda) * update comment [no ci] ggml-ci * Vulkan: Add GLU ops and shaders * SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate * ggml : implement GLU for split up/gate (llama/14181) * implement GLU for split up/gate * add tests for ggml_glu_split * Vulkan: Implement glu_split logic and shader support * add split to logging [no ci] * SYCL: refactor element_size ops and add split up and gate support to gated kernels * SYCL: switch GEGLU to use tanh approximation --------- Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> * GGML: increase OP count in assertion * Refactor: Optimize SYCL element-wise operations with unary function inlining This commit refactors the SYCL element-wise operations to improve performance by: - Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead. - Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic. - Replacing direct kernel calls with calls to these inlined functions. - Using `__dpct_inline__` to encourage compiler inlining. - Minor code cleanup and consistency improvements. The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices. * vulkan: Increase workgroup size for GLU, for performance (llama/14345) * vulkan: Increase workgroup size for GLU, for performance * vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup * merge fix * metal : add support for split and swap ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

* SYCL: disable faulty fp16 CPU exponent for now * Revert "SYCL: disable faulty fp16 CPU exponent for now" This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202. * SYCL: disable faulty fp16 CPU exponent for now * Fix logic of disabling exponent kernel

* Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * Remove redundant include path in CMakeLists.txt The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths. * Enable scheduled Docker image builds Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

* metal : disable fast-math for some cpy kernels ggml-ci * cont : disable for q4_1 ggml-ci * cont : disable for iq4_nl ggml-ci

* Conv2D: Add CPU version * Half decent * Tiled approach for F32 * remove file * Fix tests * Support F16 operations * add assert about size * Review: further formatting fixes, add assert and use CPU version of fp32->fp16

ggml-ci

Co-authored-by: linxiaodong <calm.lin@wukongsch.com>

swetha097 and others added 30 commits June 3, 2025 02:56

Get_Rows & Dequantize implementation adapted to work for repacked wei…

d6cc466

…ghts of type q4_0

Resolve PR comments

994e02a

Get_Rows & Dequantize implementation adapted to work for repacked wei…

ed1d3a2

…ghts of type q4_0

Resolve PR comments

6959d41

Merge branch 'swe_pr/get_rows' of https://github.com/swetha097/whispe…

1a79d18

…r.cpp into swe_pr/get_rows

Merge branch 'master' into swe_pr/get_rows

1e72e4b

Add the Get_Rows & Dequantize implementation adapted to work for repa…

066b47a

…cked weights of type q4_K

Get_Rows & Dequantize implementation adapted to work for repacked wei…

b9e152d

…ghts of type q4_0

Resolve PR comments

2705c08

Add the Get_Rows & Dequantize implementation adapted to work for repa…

70cf05a

…cked weights of type q4_K

Merge branch 'swe_pr/get_rows_q4_K' of https://github.com/swetha097/w…

e53bb02

…hisper.cpp into swe_pr/get_rows_q4_K

Remove q4_0 code implementation for get_rows & dequantize

1446e64

Fix warning

099aa24

Add ggml_roll (ggml/1274)

471df13

* ggml : add ggml_roll * use set/get_op_params & std::min

ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (llama/14258)

4fc9c34

ggml-cpu: fix uncaught underscore terminators (llama/14023)

34940ab

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-cpu: reduce asm calls for hsum (llama/14037)

203451b

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

metal : add mean kernel (llama/14267)

b251d73

* metal : add mean kernel ggml-ci * cont : dedup implementation ggml-ci

Vulkan: Set device max size for host memory to avoid OOM warning and …

1aca7b5

…fallback to CPU buffer (llama/14249)

llamafile : support s390x SIMD instruction set (llama/14273)

be4ea08

sycl: Cleanup codepaths in Get Rows in sycl backend (llama/14215)

a02a2d4

Addresses unused reorder path

build : suppress gcc15 compile warnings (llama/14261)

cef59c1

* Change _contains_any() substrs to std::string_view and fix the find comparison logic.

ggml-cpu : remove unnecesary arm feature detection (llama/14281)

71adde9

Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.

CUDA: add conv_2d_dw (llama/14265)

5efd43c

* CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const

ggml: Update KleidiAI to v1.9.0 (llama/14277)

694f435

ggml : fix repack work size for mul_mat_id (llama/14292)

018b2d3

ggml-ci

cuda : synchronize graph capture and cublas handle destruction (llama…

33d1f0a

…/14288) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

IMbackK and others added 30 commits July 1, 2025 17:54

CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324)

6a1d12a

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

opencl: ref count ggml_backend_opencl_context and refactor profilin…

acfbf29

…g (llama/14254) * Move profiling info into `ggml_backend_opencl_context` * Add `enqueue_ndrange_kernel` to launch kernel

sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (llam…

fc28594

…a/13973)

ggml : do not output unprintable characters on GGUF load failure (lla…

99764f5

…ma/14381)

metal : batch rows copy in a single threadgroup (llama/14384)

c848b9f

* metal : batch rows copy in a single threadgroup ggml-ci * metal : handle some edge cases when threadgroup size is not a power of 2 ggml-ci

metal : add special-case mat-vec mul for ne00 == 4 (llama/14385)

c300f1e

ggml-ci

cmake: regen vulkan shaders when shaders-gen sources change (llama/14…

566462a

…398) * Add shaders-gen sources as target deps

vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (llama/14427)

0b09f5b

This setting needs to be passed through to vulkan-shaders-gen

fix async_mode bug (llama/14432)

f31ed38

vulkan: lock accesses of pinned_memory vector (llama/14333)

9e48afb

vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

f641a4c

…e (llama/14378)

ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (llama/14443)

130044f

scripts : make the shell scripts cross-platform (llama/14341)

3239359

ggml-cpu: sycl: Re-enable exp f16 (llama/14462)

e29e36a

metal : disable fast-math for some cpy kernels (llama/14460)

679f31a

* metal : disable fast-math for some cpy kernels ggml-ci * cont : disable for q4_1 ggml-ci * cont : disable for iq4_nl ggml-ci

Add Conv2d for CPU (llama/14388)

b5b237d

* Conv2D: Add CPU version * Half decent * Tiled approach for F32 * remove file * Fix tests * Support F16 operations * add assert about size * Review: further formatting fixes, add assert and use CPU version of fp32->fp16

opencl : add GEGLU, REGLU, SWIGLU (llama/14456)

1e930ab

ggml : remove trailing whitespace (llama/0)

c4ea72b

talk-llama : sync llama.cpp

1f816de

sync : ggml

bca021c

ggml-ci

feat: support vad for addon.node (ggml-org#3301)

d9999d5

Co-authored-by: linxiaodong <calm.lin@wukongsch.com>

Merge remote-tracking branch 'origin/master' into swe_pr/q4_K_get_rows

74bc3c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get Rows & Dequantize Function Adapted to repacked weights of type q4_K#4

Get Rows & Dequantize Function Adapted to repacked weights of type q4_K#4
swetha097 wants to merge 91 commits intomasterfrom
swe_pr/q4_K_get_rows

swetha097 commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

swetha097 commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants