Remove Unprintable by beiller · Pull Request #26 · ggml-org/llama.cpp

beiller · 2023-03-11T22:14:30Z

Fixes #11

This fixes a Japanese prompt I was attempting to run

EG:

./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128 -n 512 -p $'人生の意味は'

Output before change:

So it is outputting some characters but some �

Output after change:

人生の意当者： Dr. Yukari Takamatsu 作成時間: 2015年9月8日（金）、第3回ルプセンター上研修会「Mini-Workshop」で学生がしたことについて書き伝えます。ニュアスミレショナの実行は、10位けんだあるから重要なメッセージを与り開くがうれやで報告したことについて書き伝えます。当者: Dr. Yukari Takamatsu, MD PhD FRCR FRCP (Hon) Prof Emeritus of Hokkaido Univ School Med Sys Biol and Nanboku University Medical Sch Professor at Imperial College London Senior Member ESMO IASLC

Fixes ggml-org#11 This fixes a Japanese prompt I was attempting to run EG: `./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128 -n 512 -p $'人生の意味は'` Output before change: `人生の意��、フロントカードに��いてる。 2019年3月　© All Rights Reserved. [end of text]` So it is outputting some characters but some � Output after change: `人生の意は、一人が一人ということであります。は安部が立していたので、去からは一人の人にれるのはにとどまったのですが、そう`

beiller · 2023-03-12T04:30:57Z

Closing the PR because what is really needed is different tokenization mechanism see discussion here:

#11

add easy Windows install instructions to the readme

fix bug: Parameter --reverse-prompt won't accept text

* fix warning * wip * add todo for graph key generate * rename some file to meet upstream guideline * remove local .clang-format * expend supported/unsupported counter to all ops * append device name to log * port to ggml logger * fix warning after adapt to ggml logger * append \n to all log * use case op instead of convert * Revert "use case op instead of convert" This reverts commit e662fc2. * fix op that needs same shape * opt kQnnOpsTable * refresh params name field when getting op config * opt npu log print * remove unused functions

…gml-org#26 Massive reduction in constant memory and compute: - 256KB of dense matrices → 512 bytes of sign arrays - O(d²) = 16,384 ops → O(d log d) = 896 ops per rotation - Metal shader file: 1.5MB → 432KB Speed: still 2.4 tok/s. WHT reduced per-rotation cost but the bottleneck is redundant calls (8-32× per block from flash attention). The dequantize function is called per 4/16-element chunk, each time doing the full 128-element WHT. Need to modify the flash attention kernel to dequantize once per block. Quality: WHT+signs gives BETTER quality than dense QR on real KV tensors (cosine 0.94 vs 0.79 at 2-bit). Sub-Gaussian distribution (kurtosis 1.53) means fewer outliers hitting extreme centroids. Reviewed by Codex: WHT butterfly correct, inverse order verified, QJL correction matches reference C implementation. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

beiller closed this Mar 12, 2023

Hades32 pushed a commit to Hades32/llama.cpp that referenced this pull request Mar 21, 2023

Merge pull request ggml-org#26 from mcmonkey4eva/master

1e82fa8

add easy Windows install instructions to the readme

flowgrad pushed a commit to flowgrad/llama.cpp that referenced this pull request Jun 27, 2023

fix --reverse-prompt parameter bug (ggml-org#26)

e97d148

fix bug: Parameter --reverse-prompt won't accept text

Edisonwei54 mentioned this pull request Oct 30, 2023

CUDA error 9 at ggml-cuda.cu:6863: invalid configuration argument #3855

Closed

4 tasks

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

atrivedi-tsavoritesi mentioned this pull request Jun 12, 2025

Llama.cpp: Webserver & HTML pages support tsisw/llama.cpp#8

Merged

jzju mentioned this pull request Oct 11, 2025

Misc. bug: -dev CUDA0 uses 496MiB on device 1 #16509

Closed

jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Oct 30, 2025

Merge pull request ggml-org#26 from jesusmb1995/jmb/fix-ci-macos-x64

7e7b7f4

uttampc1 mentioned this pull request Nov 18, 2025

Throughput improvement for small batch sizes #17342

Open

sainnhe mentioned this pull request Jan 25, 2026

Eval bug: coredump due to ops of discontinuous tensor memory #19078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Unprintable#26

Remove Unprintable#26
beiller wants to merge 1 commit intoggml-org:masterfrom
beiller:feature/remove_unprintable

beiller commented Mar 11, 2023 •

edited

Loading

Uh oh!

beiller commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beiller commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beiller commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beiller commented Mar 11, 2023 •

edited

Loading