Conversation
|
In the next PR I will try to rework these 3 methods with something like llama.cpp/src/llama-kv-cache.h Lines 45 to 56 in 9023ae3 The main goal is to be able to run SWA caches with just When this rework is ready, I will use the new llama.cpp/src/llama-kv-cache.h Lines 37 to 41 in 9023ae3 Simulating a full cache will be now achieved by initializing the appropriate batches and just not processing them. Any suggestions about the plan are welcome. |
0a8cdc3 to
eda2e13
Compare
ggml-ci
1ec785c to
0dc4804
Compare
0dc4804 to
dd394a6
Compare
|
While this change does not have a measurable impact on the performance under normal conditions, when building in ./scripts/compare-commits.sh master gg/kv-cache-simplify-part2 -m ./models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -fa 1 -d 8192 -n 128 -p 0,1024 -r 5
diff --git a/scripts/compare-commits.sh b/scripts/compare-commits.sh
index e40d1cc6d..7d9ca79cf 100755
--- a/scripts/compare-commits.sh
+++ b/scripts/compare-commits.sh
@@ -24,7 +24,7 @@ dir="build-bench"
function run {
rm -fr ${dir} > /dev/null
- cmake -B ${dir} -S . $cmake_opts > /dev/null
+ cmake -DCMAKE_BUILD_TYPE=Debug -B ${dir} -S . $cmake_opts > /dev/null
cmake --build ${dir} -t llama-bench > /dev/null
${dir}/bin/llama-bench -o sql -oe md $bench_args | sqlite3 llama-bench.sqlite
} |
cont #13194
The KV cells editing logic is now implemented via the new
struct llama_kv_cells_unifiedin the newsrc/llama-kv-cells.hsource. The goal is to simplify the implementation inllama-kv-cache.cppand make it easier to understand and update in the future.One of the primary simplifications is that
llama_kv_cache_unifiedno longer tracks the number ofusedcells manually. This is now automatically tracked by thellama_kv_cells_unifiedbased on the edits that we apply, such as adding and removing sequences from the cells. Same for thehas_shiftflag.pos,delta,seq) is now a structure of arrays for better cache localitystd::bitsetinstead ofstd::setHere is an example of the position shift logic before and after the change:
Next
n = cell_max()) instead of searching for it on every batch