Display the size of the tensors overriden during the tensor loading (#1318)

Nexesenex · ikawrakow · web-flow · commit 0bf7043a7b50 · 2026-02-25T07:36:27.000+01:00
* Display the size of the tensors overriden during the tensor loading

Ex:

`Tensor blk.60.ffn_gate_exps.weight buffer type overriden to CPU
Tensor blk.60.ffn_up_exps.weight buffer type overriden to CPU`

become

`Tensor blk.60.ffn_up_exps.weight (size = 668467200 bytes) buffer type overriden to CPU
Tensor blk.60.ffn_gate_exps.weight (size = 668467200 bytes) buffer type overriden to CPU`

And pass in debug the later displayed size of the unnamed buffer overrides.

Ex : `llm_load_tensors:        CPU buffer size =   XXX.XX MiB`

That double display is cluttering the screen without being very informative.

* change bytes display to MiB.

Co-authored-by: Kawrakow &lt;iwankawrakow@gmail.com&gt;

---------

Co-authored-by: Kawrakow &lt;iwankawrakow@gmail.com&gt;
diff --git a/src/llama-load-tensors.cpp b/src/llama-load-tensors.cpp
@@ -312,7 +312,9 @@ ggml_context * create_tensors_helper::get_context_for_tensor(ggml_context * ctx,
         for (const auto * overrides = ml.tensor_buft_overrides; overrides->pattern != nullptr; ++overrides) {
             std::regex pattern(overrides->pattern);
             if (std::regex_search(name, pattern)) {
-                LLAMA_LOG_INFO("Tensor %s buffer type overriden to %s\n", name.c_str(), ggml_backend_buft_name(overrides->buft));
+                const struct ggml_tensor * cur = ml.get_tensor_meta(name.c_str());
+                const size_t nbytes = cur ? ggml_nbytes(cur) : 0;
+                LLAMA_LOG_INFO("Tensor %s (size = %.2f MiB) buffer type overriden to %s\n", name.c_str(), nbytes/1024./1024., ggml_backend_buft_name(overrides->buft));
                 ctx = ctx_for_buft(overrides->buft);
                 break;
             }
diff --git a/src/llama.cpp b/src/llama.cpp
@@ -2209,7 +2209,7 @@ static bool llm_load_tensors(
 
     // print memory requirements
     for (ggml_backend_buffer_t buf : model.bufs) {
-        LLAMA_LOG_INFO("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
+        LLAMA_LOG_DEBUG("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
     }
 
     // populate tensors_by_name

Original file line number	Diff line number	Diff line change
`@@ -2209,7 +2209,7 @@ static bool llm_load_tensors(`
`2209`	`2209`
`2210`	`2210`	`// print memory requirements`
`2211`	`2211`	`for (ggml_backend_buffer_t buf : model.bufs) {`
`2212`		`- LLAMA_LOG_INFO("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);`
	`2212`	`+ LLAMA_LOG_DEBUG("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);`
`2213`	`2213`	`}`
`2214`	`2214`
`2215`	`2215`	`// populate tensors_by_name`