Fix a typo in model name by jooray · Pull Request #16 · ggml-org/llama.cpp

jooray · 2023-03-11T17:28:28Z

No description provided.

…antization-PR Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32

Replace invalid characters instead of crashing.

* chunked RMS and mulmat for testing * linux compilation fix - not super clean

Remove unused config parameter

Quality and Speed tuning scripts

* more log * split graph implementation into cpp file * rename: ggml_qnn_graph -> qnn_graph * add imput/output tensor to graph * fix assert * wip * add _ggml_tensor field in qnn tensor * add comments * add set_data_buffer with raw memory buffer * use set_data_buffer * op param buffer use qnn_buffer_ptr * add qnn_mem_buffer_slice * use qnn_buffer_ptr as tensor buffer * use new set_data_buffer to reduce copy * ggml_qnn_op_config: add function to set input/output tensor before init node * remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead * wip * add initialize_op_nodes without tensor params * wip * add op caps table * merge kGgmlOpToQnnOp and kOpCaps tables * wip * add cache parameter to create_tensors * add init_from_ggml_graph * disable gelu for all backend * wip * move op index calc to op config module * use the ggml_tensor as parameter of build_graph * add log * use create_operation_from_op_tensor in old build_graph function * remove unused constructors * fix parameter count * remove unused member func/var * make init_from_ggml_graph as a class member: build_graph_from_ggml_graph * move graph finalize into member function `finalize()` * get graph key from ggml op tensor directly * append output type * reduce tensor key length * add function to generate key from ggml_cgraph * simplify graph cache insert and delete * remove template param at get_qnn_graph_from_cache * wip * merge kQnnUnaryOpsTable and kQnnBinaryOpsTable * refactor device_supports_op * add log * wip * use framework function to check same shape * wip * extract some logic into separated function * wip * add execution function that runs graph * add function to create qnn graph from ggml_cgraph with cache * execute graph directly * return null graph key for empty graph * add more qualcomm chipset enums * add cap for reshape * disable some ops * try to skip GGML_OP_VIEW * moew log for view tensor * append param tensor into intermedia tensor key * use 'ordered' set * fix warning in release * wip

agent: make subagents opt-in via --subagents flag

turbo4 prefill dequant+MMA disabled due to QJL fp16 precision loss. Added experiment #16b for potential solutions (float32 buffer or inline MMA dequant). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Fix a typo in model name

d3bbccc

ggerganov merged commit 6b2cb63 into ggml-org:master Mar 11, 2023

abetlen pushed a commit to abetlen/llama.cpp that referenced this pull request Apr 10, 2023

Merge pull request ggml-org#16 from saharNooby/outliers-preserving-qu…

84e0698

…antization-PR Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this pull request Jun 2, 2023

Merge pull request ggml-org#16 from anon998/fix-log-json

4dd72fc

Replace invalid characters instead of crashing.

flowgrad pushed a commit to flowgrad/llama.cpp that referenced this pull request Jun 27, 2023

Test ggml compute chunks (ggml-org#16)

dc2cbce

* chunked RMS and mulmat for testing * linux compilation fix - not super clean

rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023

Merge pull request ggml-org#16 from zejunh/master

1b63a95

Remove unused config parameter

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Fix ggml-org#16

ae004eb

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

bjodah mentioned this pull request May 26, 2025

Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use #13812

Closed

jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Sep 23, 2025

Merge pull request ggml-org#16 from jesusmb1995/jmb/tune_scripts2

646fdc5

Quality and Speed tuning scripts

uttampc1 mentioned this pull request Nov 18, 2025

Throughput improvement for small batch sizes #17342

Open

This was referenced Nov 28, 2025

Model: Qwen3 Next #16095

Merged

Qwen3-Next --ubatch-size issue #17578

Closed

kkaarrss mentioned this pull request Jan 24, 2026

Eval bug: GLM-4.7-Flash flash attention error on (long?) prompts with KV quantization #19036

Closed

sainnhe mentioned this pull request Jan 25, 2026

Eval bug: coredump due to ops of discontinuous tensor memory #19078

Closed

jacekpoplawski mentioned this pull request Feb 10, 2026

models : optimizing qwen3next graph #19375

Merged

3 tasks

henry701 mentioned this pull request Feb 19, 2026

Eval bug: CUDA backend crash on GLM-4.7-Flash with FA on and quantized KV cache #19724

Closed

MartinEmrich mentioned this pull request Feb 28, 2026

Eval bug: Memory leak? using ROCm #19979

Open

feyleth mentioned this pull request Mar 11, 2026

Eval bug: vision model crash #20418

Open

julien-c pushed a commit to julien-c/llama.cpp that referenced this pull request Mar 17, 2026

Merge pull request ggml-org#16 from gary149/subagents-opt-in

5f29083

agent: make subagents opt-in via --subagents flag

rubin55 mentioned this pull request Mar 26, 2026

Eval bug: Unresolved Symbol <__memcpy_chk> when running (any?) model #21041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a typo in model name#16

Fix a typo in model name#16
ggerganov merged 1 commit intoggml-org:masterfrom
jooray:patch-1

jooray commented Mar 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jooray commented Mar 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants