UPSTREAM PR #17737: CANN: implement the SSM_CONV operator#416
UPSTREAM PR #17737: CANN: implement the SSM_CONV operator#416
Conversation
Co-authored-by: Aleksei Lobanov, <[email protected]> Co-authored-by: Sujin Kang, <[email protected]>
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #416 - CANN SSM_CONV Operator ImplementationOverviewPR #416 implements the SSM_CONV operator for the CANN backend, adding support for state-space model convolution operations on Ascend NPUs. The changes introduce 137 new lines across 4 files with no deletions, representing a pure feature addition rather than a modification of existing code paths. Performance Impact AnalysisPower Consumption: Analysis across all binaries shows 0.0% change in power consumption between versions. The measured values for key binaries remain identical:
Inference Performance: No functions in the core inference path (llama_decode, llama_encode, llama_tokenize) were modified. The new ggml_cann_ssm_conv function is an isolated addition to the CANN backend operator set and does not affect existing CPU or GPU inference paths. Tokens per second for standard transformer models remains unchanged. Code Changes:
Scope: This PR exclusively affects state-space models (Mamba, RWKV architectures) running on CANN backend. Standard transformer models and non-CANN backends are unaffected. The implementation adds 123 lines of tensor manipulation and convolution setup code without modifying any existing operator implementations. |
d15b30f to
738bfbf
Compare
f01b714 to
47d1dc9
Compare
ca4155f to
b86b588
Compare
1daebfe to
75a97fd
Compare
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #416Analysis Scope: CANN backend SSM_CONV operator implementation SummaryThis PR adds SSM convolution operator support for CANN backend without measurable performance impact. Analysis shows zero performance change across all 16 binaries and no function-level metrics available for comparison. The code introduces 109 new lines implementing Power Consumption: All binaries show 0.0% change. Four binaries have negligible absolute deltas under 1.1 nJ due to floating-point precision: Inference Impact: No tokenization or inference functions modified. Functions Code Changes: New operator adds F32-only SSM convolution with tensor reshaping from CLN to NCL format, depthwise convolution with |
Mirrored from ggml-org/llama.cpp#17737
Description
We implement the
SSM_CONVoperator using depthwise 1D convolution.We use high-level builtin
aclnnConvolutionfunction.The goal is to compute the following:
where the shape of$y$ is $[dinner, nt, ns]$ , $x$ is $[dconv - 1 + nt, dinner, ns]$ and $w$ is $[dconv, dinner]$ .
In order to use
aclnnConvolutionto implement this formula, we reshape the tensors and set the groups parameter tod_innerto calculate the convolution for each channel independently.Testing
We ran test-backend-ops test suite for
SSM_CONVon two different cards: 310P3 and 910B3.For the 310P3 card, it requires setting the
cubeMathTypeparameter toALLOW_FP32_DOWN_PRECISION, and it seems that causes the computation to be done not in f32, which in turn causes the tests to not pass with a small error (NMSE 0.000000114, greater than the allowed 1e-7). We had to overridemax_nmse_err()method fortest_ssm_convto set the maximum error to 1e-6 which allows the tests to pass.On the 910B card, the operator runs in f32 natively, it passes the tests at the original 1e-7 precision.