Skip to content

Conversation

@zufayu
Copy link
Contributor

@zufayu zufayu commented Nov 13, 2025

Motivation

tune silu&act

Technical Details

Vectorized loads and stores combined with the packed multiply path

Test Result

~10% performace for act_and_mul_kernel in F16 datatype
~peak mem r/w increase from 2.5 to 2.8 TB/s

Submission Checklist

Copilot AI review requested due to automatic review settings November 13, 2025 09:00
Copilot finished reviewing on behalf of zufayu November 13, 2025 09:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the SiLU and activation kernels for AMD GPUs by implementing vectorized memory operations and packed multiplication instructions, achieving approximately 10% performance improvement for the act_and_mul_kernel in FP16 datatype with peak memory bandwidth increasing from 2.5 to 2.8 TB/s.

Key Changes:

  • Implemented vectorized stores using segmented buffer writes for non-power-of-2 vector sizes
  • Added packed multiplication path using v_pk_mul_f32 inline assembly for processing two elements simultaneously
  • Added separate read/write bandwidth metrics to performance tests for better analysis

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
op_tests/test_activation.py Added separate read (RD TB/s) and write (WR TB/s) bandwidth metrics to both test functions for more granular performance analysis
csrc/kernels/activation_kernels.cu Optimized act_and_mul_kernel with vectorized stores and packed multiply operations; added minimum vec_size constraint of 2; improved code formatting consistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@valarLip valarLip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valarLip valarLip merged commit 7d91888 into main Nov 19, 2025
39 of 44 checks passed
@valarLip valarLip deleted the act_silu_hip_tune branch November 19, 2025 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants