Merged
Conversation
abetlen
pushed a commit
to abetlen/llama.cpp
that referenced
this pull request
Apr 10, 2023
…antization-PR Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32
SlyEcho
pushed a commit
to SlyEcho/llama.cpp
that referenced
this pull request
Jun 2, 2023
Replace invalid characters instead of crashing.
flowgrad
pushed a commit
to flowgrad/llama.cpp
that referenced
this pull request
Jun 27, 2023
* chunked RMS and mulmat for testing * linux compilation fix - not super clean
rooprob
pushed a commit
to rooprob/llama.cpp
that referenced
this pull request
Aug 2, 2023
Remove unused config parameter
Deadsg
pushed a commit
to Deadsg/llama.cpp
that referenced
this pull request
Dec 19, 2023
jesusmb1995
pushed a commit
to jesusmb1995/llama.cpp
that referenced
this pull request
Sep 23, 2025
Quality and Speed tuning scripts
This was referenced Nov 28, 2025
Merged
3 tasks
rururush
pushed a commit
to USTC-ADSL/llama.cpp
that referenced
this pull request
Mar 16, 2026
* more log * split graph implementation into cpp file * rename: ggml_qnn_graph -> qnn_graph * add imput/output tensor to graph * fix assert * wip * add _ggml_tensor field in qnn tensor * add comments * add set_data_buffer with raw memory buffer * use set_data_buffer * op param buffer use qnn_buffer_ptr * add qnn_mem_buffer_slice * use qnn_buffer_ptr as tensor buffer * use new set_data_buffer to reduce copy * ggml_qnn_op_config: add function to set input/output tensor before init node * remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead * wip * add initialize_op_nodes without tensor params * wip * add op caps table * merge kGgmlOpToQnnOp and kOpCaps tables * wip * add cache parameter to create_tensors * add init_from_ggml_graph * disable gelu for all backend * wip * move op index calc to op config module * use the ggml_tensor as parameter of build_graph * add log * use create_operation_from_op_tensor in old build_graph function * remove unused constructors * fix parameter count * remove unused member func/var * make init_from_ggml_graph as a class member: build_graph_from_ggml_graph * move graph finalize into member function `finalize()` * get graph key from ggml op tensor directly * append output type * reduce tensor key length * add function to generate key from ggml_cgraph * simplify graph cache insert and delete * remove template param at get_qnn_graph_from_cache * wip * merge kQnnUnaryOpsTable and kQnnBinaryOpsTable * refactor device_supports_op * add log * wip * use framework function to check same shape * wip * extract some logic into separated function * wip * add execution function that runs graph * add function to create qnn graph from ggml_cgraph with cache * execute graph directly * return null graph key for empty graph * add more qualcomm chipset enums * add cap for reshape * disable some ops * try to skip GGML_OP_VIEW * moew log for view tensor * append param tensor into intermedia tensor key * use 'ordered' set * fix warning in release * wip
julien-c
pushed a commit
to julien-c/llama.cpp
that referenced
this pull request
Mar 17, 2026
agent: make subagents opt-in via --subagents flag
spiritbuun
added a commit
to spiritbuun/llama-cpp-turboquant-cuda
that referenced
this pull request
Mar 27, 2026
turbo4 prefill dequant+MMA disabled due to QJL fp16 precision loss. Added experiment #16b for potential solutions (float32 buffer or inline MMA dequant). Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.