Skip to content

Commit 0bf7043

Browse files
Nexesenexikawrakow
andauthored
Display the size of the tensors overriden during the tensor loading (#1318)
* Display the size of the tensors overriden during the tensor loading Ex: `Tensor blk.60.ffn_gate_exps.weight buffer type overriden to CPU Tensor blk.60.ffn_up_exps.weight buffer type overriden to CPU` become `Tensor blk.60.ffn_up_exps.weight (size = 668467200 bytes) buffer type overriden to CPU Tensor blk.60.ffn_gate_exps.weight (size = 668467200 bytes) buffer type overriden to CPU` And pass in debug the later displayed size of the unnamed buffer overrides. Ex : `llm_load_tensors: CPU buffer size = XXX.XX MiB` That double display is cluttering the screen without being very informative. * change bytes display to MiB. Co-authored-by: Kawrakow <[email protected]> --------- Co-authored-by: Kawrakow <[email protected]>
1 parent 170467e commit 0bf7043

2 files changed

Lines changed: 4 additions & 2 deletions

File tree

src/llama-load-tensors.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,9 @@ ggml_context * create_tensors_helper::get_context_for_tensor(ggml_context * ctx,
312312
for (const auto * overrides = ml.tensor_buft_overrides; overrides->pattern != nullptr; ++overrides) {
313313
std::regex pattern(overrides->pattern);
314314
if (std::regex_search(name, pattern)) {
315-
LLAMA_LOG_INFO("Tensor %s buffer type overriden to %s\n", name.c_str(), ggml_backend_buft_name(overrides->buft));
315+
const struct ggml_tensor * cur = ml.get_tensor_meta(name.c_str());
316+
const size_t nbytes = cur ? ggml_nbytes(cur) : 0;
317+
LLAMA_LOG_INFO("Tensor %s (size = %.2f MiB) buffer type overriden to %s\n", name.c_str(), nbytes/1024./1024., ggml_backend_buft_name(overrides->buft));
316318
ctx = ctx_for_buft(overrides->buft);
317319
break;
318320
}

src/llama.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2209,7 +2209,7 @@ static bool llm_load_tensors(
22092209

22102210
// print memory requirements
22112211
for (ggml_backend_buffer_t buf : model.bufs) {
2212-
LLAMA_LOG_INFO("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
2212+
LLAMA_LOG_DEBUG("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
22132213
}
22142214

22152215
// populate tensors_by_name

0 commit comments

Comments
 (0)