ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16#16805
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16#16805ggerganov merged 7 commits intoggml-org:masterfrom
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Can you demonstrate the performance is preserved?
ggml/src/ggml-cpu/ops.cpp
Outdated
| break; | ||
| default: | ||
| //rope type not supported, silently default to NORMAL | ||
| rotate_pairs<T>(n_dims, 1, cache, src, dst_data, 1); |
There was a problem hiding this comment.
Isn't it better to GGML_ABORT here?
There was a problem hiding this comment.
I thought so too. I was unsure because I saw that test-rope.cpp tests for an unsupported (not yet implemented maybe?) rope type. Shall I change both?
There was a problem hiding this comment.
The GLM rope type was removed - we should remove it from the test-rope.
|
@ggerganov It seems there's an improvement in performance, I didn't expect that (see my updated first comment). |
83928e8 to
2a6c387
Compare
8505c7f to
11a1992
Compare
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
3507202 to
c6330ac
Compare
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…g#16805) * extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This PR is a small refactoring of ggml_compute_forward_rope_f32 and _f16.
I extracted the rotate_pairs logic to remove some duplicate code.
Also, I kept the current behavior for unsupported rope type - it defaults to normal type (use consecutive values for the pairs).
Later edit: I added some performance tests to test-backend-ops.
Here is the output from compare-llama-bench.py