Skip to content

ggml : add ggml_backend_sched_debug_tensor ggml_backend API#18019

Closed
danbev wants to merge 1 commit intoggml-org:masterfrom
danbev:ggml-debug-tensor
Closed

ggml : add ggml_backend_sched_debug_tensor ggml_backend API#18019
danbev wants to merge 1 commit intoggml-org:masterfrom
danbev:ggml-debug-tensor

Conversation

@danbev
Copy link
Member

@danbev danbev commented Dec 14, 2025

This commit adds a new function ggml_backend_sched_debug_tensor to the ggml_backend API. This function allows users to print the values of a specified tensor after graph computation, along with the mean squared value.

The motivation for this addition is that it can be useful to use this as ha "ballpark" check to check tensors before/after operations have been been executed. This came out of use cases when converting new models to llama.cpp and the need to track down discrepancies in tensor values.

As an example of usage, this function can be called after the graph has been excuted, for example in process_ubatch in llama-context.cpp:

    ggml_backend_sched_debug_tensor(sched.get(), res->get_gf(), "inp_embd", 10);

This will log something like the following, assuming logging is set to debug/verbose level:

ggml_backend_sched_debug_tensor: Tensor 'inp_embd', type: f32
ggml_backend_sched_debug_tensor: ne = [2048 6 1 1]
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 0]: 7.241361
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 1]: 5.649519
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 2]: 9.418730
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 3]: 8.292873
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 4]: 9.473540
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 5]: 9.034624
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 6]: 9.187912
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 7]: 1.406322
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 8]: 4.729420
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 9]: 4.343110
ggml_backend_sched_debug_tensor: inp_embd mean_sq = 41.4566065470

One thing to keep in mind is that the tensor needs to have a name and also we need to ensure that the graph does not reuse the tensor during scheduling. This can be done by setting the tensor as output to preserve it.

This commit adds a new function `ggml_backend_sched_debug_tensor` to the
ggml_backend API. This function allows users to print the values of a
specified tensor after graph computation, along with the mean squared
value.

The motivation for this addition is that it can be useful to use this as
ha "ballpark" check to check tensors before/after operations have been
been executed. This came out of use cases when converting new models to
llama.cpp and the need to track down discrepancies in tensor values.

As an example of usage, this function can be called after the graph has
been excuted, for example in `process_ubatch` in llama-context.cpp:
```c++
    ggml_backend_sched_debug_tensor(sched.get(), res->get_gf(), "inp_embd", 10);
```
This will log something like the following, assuming logging is set to
debug/verbose level:
```console
ggml_backend_sched_debug_tensor: Tensor 'inp_embd', type: f32
ggml_backend_sched_debug_tensor: ne = [2048 6 1 1]
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 0]: 7.241361
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 1]: 5.649519
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 2]: 9.418730
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 3]: 8.292873
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 4]: 9.473540
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 5]: 9.034624
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 6]: 9.187912
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 7]: 1.406322
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 8]: 4.729420
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 9]: 4.343110
ggml_backend_sched_debug_tensor: inp_embd mean_sq = 41.4566065470
```
One thing to keep in mind is that the tensor needs to have a name and
also we need to ensure that the graph does not reuse the tensor during
scheduling. This can be done by setting the tensor as output to
preserve it.
}
}

void ggml_backend_sched_debug_tensor(ggml_backend_sched_t sched, struct ggml_cgraph * graph, const char * name, size_t n_values_to_log) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to have a default value for n_values_to_log, also if we can print index's of NaN/inf values.

for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
for (int64_t i0 = 0; i0 < t->ne[0]; i0++) {
const float v = ggml_get_float_value(d, t->type, t->nb, i0, i1, i2, i3);
sum_sq += v * v;
Copy link
Contributor

@am17an am17an Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can easily overflow, perhaps sum_sq should be double, or we can maintain a running mean

@ggerganov
Copy link
Member

ggerganov commented Dec 14, 2025

@danbev What do you plan to use it for? I think the llama-eval-callback already provides this functionality, so this seems a bit redundant.

@danbev
Copy link
Member Author

danbev commented Dec 14, 2025

What do you plan to use it for? I think the llama-eval-callback already provides this functionality, so this seems a bit redundant.

I found myself wanting to have this when debugging models, where I've been adding this for specific tensors to be able to compare with an original models tensor. I found it convenient to use this approach as it is simple to select a specific tensor and also it allows be me to use different executable (llama-logits, llama-completion) without having to modify them.

But I'll take a closer look at eval_callback as I have not really tried using it and perhaps it could be used instead. That was part of my motivation for opening this as a draft to see what others thought about this.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 14, 2025
@ggerganov
Copy link
Member

It would be better to consolidate things into llama-eval-callback. The llama-logits can completely be merged in the llama-eval-callback by adding additional options for output (i.e. logits/embeddings/none). We can expand it by regex matching tensor names that we want to observe - this way we can filter only the relevant information and get the convenience that you are looking for.

Btw llama-eval-callback already supports all standard parameters, which would allow us to do tests with different devices, number of gpu layers, flash attention on/off, etc. So that's also a reason to look into merging the 2 tools.

@danbev
Copy link
Member Author

danbev commented Dec 14, 2025

Btw llama-eval-callback already supports all standard parameters, which would allow us to do tests with different devices, number of gpu layers, flash attention on/off, etc. So that's also a reason to look into merging the 2 tools.

Sounds much better. I'll take a look at that, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants