ggml : add ggml_backend_sched_debug_tensor ggml_backend API#18019
ggml : add ggml_backend_sched_debug_tensor ggml_backend API#18019danbev wants to merge 1 commit intoggml-org:masterfrom
Conversation
This commit adds a new function `ggml_backend_sched_debug_tensor` to the
ggml_backend API. This function allows users to print the values of a
specified tensor after graph computation, along with the mean squared
value.
The motivation for this addition is that it can be useful to use this as
ha "ballpark" check to check tensors before/after operations have been
been executed. This came out of use cases when converting new models to
llama.cpp and the need to track down discrepancies in tensor values.
As an example of usage, this function can be called after the graph has
been excuted, for example in `process_ubatch` in llama-context.cpp:
```c++
ggml_backend_sched_debug_tensor(sched.get(), res->get_gf(), "inp_embd", 10);
```
This will log something like the following, assuming logging is set to
debug/verbose level:
```console
ggml_backend_sched_debug_tensor: Tensor 'inp_embd', type: f32
ggml_backend_sched_debug_tensor: ne = [2048 6 1 1]
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 0]: 7.241361
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 1]: 5.649519
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 2]: 9.418730
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 3]: 8.292873
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 4]: 9.473540
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 5]: 9.034624
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 6]: 9.187912
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 7]: 1.406322
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 8]: 4.729420
ggml_backend_sched_debug_tensor: Tensor value at [0, 0, 0, 9]: 4.343110
ggml_backend_sched_debug_tensor: inp_embd mean_sq = 41.4566065470
```
One thing to keep in mind is that the tensor needs to have a name and
also we need to ensure that the graph does not reuse the tensor during
scheduling. This can be done by setting the tensor as output to
preserve it.
| } | ||
| } | ||
|
|
||
| void ggml_backend_sched_debug_tensor(ggml_backend_sched_t sched, struct ggml_cgraph * graph, const char * name, size_t n_values_to_log) { |
There was a problem hiding this comment.
might be useful to have a default value for n_values_to_log, also if we can print index's of NaN/inf values.
| for (int64_t i1 = 0; i1 < t->ne[1]; i1++) { | ||
| for (int64_t i0 = 0; i0 < t->ne[0]; i0++) { | ||
| const float v = ggml_get_float_value(d, t->type, t->nb, i0, i1, i2, i3); | ||
| sum_sq += v * v; |
There was a problem hiding this comment.
This can easily overflow, perhaps sum_sq should be double, or we can maintain a running mean
|
@danbev What do you plan to use it for? I think the |
I found myself wanting to have this when debugging models, where I've been adding this for specific tensors to be able to compare with an original models tensor. I found it convenient to use this approach as it is simple to select a specific tensor and also it allows be me to use different executable (llama-logits, llama-completion) without having to modify them. But I'll take a closer look at eval_callback as I have not really tried using it and perhaps it could be used instead. That was part of my motivation for opening this as a draft to see what others thought about this. |
|
It would be better to consolidate things into Btw |
Sounds much better. I'll take a look at that, thanks! |
This commit adds a new function
ggml_backend_sched_debug_tensorto the ggml_backend API. This function allows users to print the values of a specified tensor after graph computation, along with the mean squared value.The motivation for this addition is that it can be useful to use this as ha "ballpark" check to check tensors before/after operations have been been executed. This came out of use cases when converting new models to llama.cpp and the need to track down discrepancies in tensor values.
As an example of usage, this function can be called after the graph has been excuted, for example in
process_ubatchin llama-context.cpp:ggml_backend_sched_debug_tensor(sched.get(), res->get_gf(), "inp_embd", 10);This will log something like the following, assuming logging is set to debug/verbose level:
One thing to keep in mind is that the tensor needs to have a name and also we need to ensure that the graph does not reuse the tensor during scheduling. This can be done by setting the tensor as output to preserve it.