Skip to content

Fix a typo in model name#16

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
jooray:patch-1
Mar 11, 2023
Merged

Fix a typo in model name#16
ggerganov merged 1 commit intoggml-org:masterfrom
jooray:patch-1

Conversation

@jooray
Copy link
Copy Markdown
Contributor

@jooray jooray commented Mar 11, 2023

No description provided.

@ggerganov ggerganov merged commit 6b2cb63 into ggml-org:master Mar 11, 2023
abetlen pushed a commit to abetlen/llama.cpp that referenced this pull request Apr 10, 2023
…antization-PR

Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32
SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this pull request Jun 2, 2023
Replace invalid characters instead of crashing.
flowgrad pushed a commit to flowgrad/llama.cpp that referenced this pull request Jun 27, 2023
* chunked RMS and mulmat for testing
* linux compilation fix - not super clean
rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023
Remove unused config parameter
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023
jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Sep 23, 2025
This was referenced Nov 28, 2025
rururush pushed a commit to USTC-ADSL/llama.cpp that referenced this pull request Mar 16, 2026
* more log

* split graph implementation into cpp file

* rename: ggml_qnn_graph -> qnn_graph

* add imput/output tensor to graph

* fix assert

* wip

* add _ggml_tensor field in qnn tensor

* add comments

* add set_data_buffer with raw memory buffer

* use set_data_buffer

* op param buffer use qnn_buffer_ptr

* add qnn_mem_buffer_slice

* use qnn_buffer_ptr as tensor buffer

* use new set_data_buffer to reduce copy

* ggml_qnn_op_config: add function to set input/output tensor before init node

* remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead

* wip

* add initialize_op_nodes without tensor params

* wip

* add op caps table

* merge kGgmlOpToQnnOp and kOpCaps tables

* wip

* add cache parameter to create_tensors

* add init_from_ggml_graph

* disable gelu for all backend

* wip

* move op index calc to op config module

* use the ggml_tensor as parameter of build_graph

* add log

* use create_operation_from_op_tensor in old build_graph function

* remove unused constructors

* fix parameter count

* remove unused member func/var

* make init_from_ggml_graph as a class member: build_graph_from_ggml_graph

* move graph finalize into member function `finalize()`

* get graph key from ggml op tensor directly

* append output type

* reduce tensor key length

* add function to generate key from ggml_cgraph

* simplify graph cache insert and delete

* remove template param at get_qnn_graph_from_cache

* wip

* merge kQnnUnaryOpsTable and kQnnBinaryOpsTable

* refactor device_supports_op

* add log

* wip

* use framework function to check same shape

* wip

* extract some logic into separated function

* wip

* add execution function that runs graph

* add function to create qnn graph from ggml_cgraph with cache

* execute graph directly

* return null graph key for empty graph

* add more qualcomm chipset enums

* add cap for reshape

* disable some ops

* try to skip GGML_OP_VIEW

* moew log for view tensor

* append param tensor into intermedia tensor key

* use 'ordered' set

* fix warning in release

* wip
julien-c pushed a commit to julien-c/llama.cpp that referenced this pull request Mar 17, 2026
agent: make subagents opt-in via --subagents flag
spiritbuun added a commit to spiritbuun/llama-cpp-turboquant-cuda that referenced this pull request Mar 27, 2026
turbo4 prefill dequant+MMA disabled due to QJL fp16 precision loss.
Added experiment #16b for potential solutions (float32 buffer or
inline MMA dequant).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants