Skip to content

bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large.#718

Merged
yzh119 merged 2 commits into
flashinfer-ai:mainfrom
bobboli:rmsnorm_smem
Jan 6, 2025
Merged

bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large.#718
yzh119 merged 2 commits into
flashinfer-ai:mainfrom
bobboli:rmsnorm_smem

Conversation

@bobboli
Copy link
Copy Markdown
Contributor

@bobboli bobboli commented Jan 6, 2025

The original implementation will cause RuntimeError: invalid argument when hidden_size=16384.

Copy link
Copy Markdown
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you!

@yzh119 yzh119 merged commit 9a00cc2 into flashinfer-ai:main Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants