feat(cuda): Add highly optimized CUDA kernel for HardSwish activation by Chandan-Sugreevu · Pull Request #17943 · ggml-org/llama.cpp

Chandan-Sugreevu · 2025-12-11T18:59:58Z

What this PR does: Implements a highly optimized, custom CUDA kernel for the GGML_UNARY_OP_HARDSWISH activation function.

Why it is valuable: This enables faster inference for any model using the HardSwish activation when running on NVIDIA GPUs via the GGML CUDA backend.

Testing: Verified functionality on NVIDIA GeForce MX250. Test passes (OK/Mismatch) when memory-support logic is bypassed.

aviallon

I have a few Code Quality comments, but I did not test the performance impact of this PR.

aviallon · 2025-12-11T19:23:02Z

ggml/src/ggml-cuda/ggml-cuda.cu

        case GGML_OP_COS:
        case GGML_OP_CLAMP:
        case GGML_OP_LOG:
-            return true;


Did you intend to remove this return true; here?
Because it makes the switch fall-through the GGML_OP_SSM_SCAN implementation check, which is most probably not what you intended.

aviallon · 2025-12-11T19:23:26Z

ggml/src/ggml-cuda/unary.cu

 }

+
+// --- Custom HardSwish Implementation by Chandan ---


Why the little comment here?

aviallon · 2025-12-11T19:23:50Z

ggml/src/ggml-cuda/unary.cu

+    cudaStream_t stream = ctx.stream();
+    k_hardswish<<<grid_size, block_size, 0, stream>>>(src_d, dst_d, num_elements);
 }
+// --------------------------------------------------


Unnecessary decoration.

aviallon · 2025-12-11T19:24:39Z

CMakeLists.txt

    endif()
 endfunction()

-llama_option_depr(FATAL_ERROR LLAMA_CUBLAS              GGML_CUDA)


Why did you remove this? This is unrelated with your PR.
If you have a fatal error, just use the correct option.

aviallon · 2025-12-11T19:25:12Z

ggml/src/ggml-cuda/ggml-cuda.cu

                case GGML_UNARY_OP_HARDSIGMOID:
                    ggml_cuda_op_hardsigmoid(ctx, dst);
                    break;
-                case GGML_UNARY_OP_HARDSWISH:


Don't move this block, this is unnecessary noise.

aviallon · 2025-12-11T19:27:39Z

ggml/src/ggml-cuda/ggml-cuda.cu

+
+
+
+void ggml_cuda_op_hardswish(ggml_backend_cuda_context & ctx, ggml_tensor * dst);


Look at how things are done in this repo: we don't define prototypes here, but in an op-specific file.

aviallon · 2025-12-12T09:17:57Z

@CISC did you close because it was a low effort PR?

CISC · 2025-12-12T12:11:58Z

@CISC did you close because it was a low effort PR?

I closed it because it was utter nonsense.

Sad to say you just wasted 5 minutes reviewing pure unadulterated slop. :)

feat(cuda): Add highly optimized CUDA kernel for HardSwish activation

32c0d2c

Chandan-Sugreevu requested a review from ggerganov as a code owner December 11, 2025 18:59

github-actions bot added build Compilation issues Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 11, 2025

aviallon suggested changes Dec 11, 2025

View reviewed changes

loci-dev mentioned this pull request Dec 11, 2025

UPSTREAM PR #17943: feat(cuda): Add highly optimized CUDA kernel for HardSwish activation auroralabs-loci/llama.cpp#525

Open

CISC closed this Dec 11, 2025

wallentri88 mentioned this pull request Feb 24, 2026

Eval bug: qwen35 and qwen35moe graph split issues (Severe PP impact, crashes) #19864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cuda): Add highly optimized CUDA kernel for HardSwish activation#17943

feat(cuda): Add highly optimized CUDA kernel for HardSwish activation#17943
Chandan-Sugreevu wants to merge 1 commit intoggml-org:masterfrom
Chandan-Sugreevu:feature/cuda-hardswish

Chandan-Sugreevu commented Dec 11, 2025

Uh oh!

aviallon left a comment

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon Dec 11, 2025

Uh oh!

aviallon commented Dec 12, 2025

Uh oh!

CISC commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants




		void ggml_cuda_op_hardswish(ggml_backend_cuda_context & ctx, ggml_tensor * dst);

Conversation

Chandan-Sugreevu commented Dec 11, 2025

Uh oh!

aviallon left a comment

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aviallon commented Dec 12, 2025

Uh oh!

CISC commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants