Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM#17063
Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM#17063ggerganov merged 30 commits intoggml-org:masterfrom
Conversation
|
@gabe-l-hart guess you'll be interested in this one as well :) |
gabe-l-hart
left a comment
There was a problem hiding this comment.
A few comments on the op signatures and how they relate to the versions I have on the SSD branch
| @@ -18,29 +18,31 @@ Legend: | |||
| | ACC | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | | |||
There was a problem hiding this comment.
Whelp, glad I know about this file now
| struct ggml_context * ctx, | ||
| struct ggml_tensor * a); | ||
|
|
||
| GGML_API struct ggml_tensor * ggml_softplus( |
| @@ -983,6 +1011,10 @@ extern "C" { | |||
| struct ggml_context * ctx, | |||
| struct ggml_tensor * a); | |||
|
|
|||
| GGML_API struct ggml_tensor * ggml_cumsum( | |||
There was a problem hiding this comment.
In an effort to remove permutations and conts, I updated this to also allow an additional dim argument and then added a ggml_cumsum_0 convenience wrapper for dim 0 https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R997
There was a problem hiding this comment.
I have no strong opinion, but if we want the dim version, we could bring that in here, or we could consider changing that one to ggml_cumsum_dim on my branch to keep the function signature after this is merged.
There was a problem hiding this comment.
To be honest, I just made it to mirror the behavior of ggml_sum exactly. I do agree the PyTorch approach (of specifying the dimension) is better and I think minimizing permutes / conts is good, my perfectionist self just thinks that should entail the refactoring of OP_SUM as well :>
There was a problem hiding this comment.
This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.
| @@ -2187,6 +2219,17 @@ extern "C" { | |||
| int shift2, | |||
| int shift3); | |||
|
|
|||
| // Make matrix into a triangular one (upper, upper + diagonal, lower or lower + diagonal) with constant value | |||
| GGML_API struct ggml_tensor * ggml_tri( | |||
There was a problem hiding this comment.
Similar to cumsum, I added a version of this with the ability to specify dimensions https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R2219
There was a problem hiding this comment.
Here I'd be a bit skeptical. The convention is that the first two dimensions are the matrix dimensions. All the code is written under that assumption. PyTorch also defines .tril() without the dimension parameters. I don't think extending that is worth it tbh and it makes it much harder to write optimized kernels, so I'd drop it, but I guess @ggerganov should have a say.
There was a problem hiding this comment.
Yep, totally fair. This was very much "let's see if I can engineer any improvements." I'm definitely not clear if it makes any significant difference in the places I've used it currently.
| @@ -983,6 +1011,10 @@ extern "C" { | |||
| struct ggml_context * ctx, | |||
| struct ggml_tensor * a); | |||
|
|
|||
| GGML_API struct ggml_tensor * ggml_cumsum( | |||
There was a problem hiding this comment.
This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.
|
@slaren @ggerganov Should be ready for final review. |
|
@ggerganov Aight, paralellized CUMSUM, added docs, removed TRI_KEEP, renamed TRI_KEEP to TRI, added CONST with const1234d helpers. |
|
Aight, @ggerganov @slaren @CISC it's ready to merge I think. |
ggerganov
left a comment
There was a problem hiding this comment.
@pwilkin Thanks for the contribution.
As a constructive feedback for the future, try to split the changes in ggml in even smaller parts. It would improve the review process because there are many little details (naming, API design, code formatting) that are not obvious at first and it takes some time to get accustomed to them.
| void initialize_tensors(ggml_context * ctx) override { | ||
| for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { | ||
| if (strcmp(t->name, "a") == 0) { | ||
| init_tensor_tril(t, 0.1, 1.0f); | ||
| } else { | ||
| init_tensor_uniform(t, 0.1, 1.0f); | ||
| } |
There was a problem hiding this comment.
Any reason to not include negative numbers here?
There was a problem hiding this comment.
Not really other than that we really don't want zeroes, and I don't think there's a way to exclude them.
There was a problem hiding this comment.
Unless you mean for the second tensor - I guess changing it to (-1, 1) should be fine there.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from ggml-org#14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
The ops needed for the new hybrid models including Qwen3 Next and Kimi Linear.
Prerequisite to merging #16095