Skip to content

Conversation

@ZJY0516
Copy link
Contributor

@ZJY0516 ZJY0516 commented Nov 7, 2025

Reverts #28095

This PR makes b contain Nan sometimes

@ZJY0516 ZJY0516 requested a review from sighingnow as a code owner November 7, 2025 12:07
@mergify mergify bot added the qwen Related to Qwen models label Nov 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that fused the computation of g and beta for the Gated Delta Net. The reason for the revert is to fix a bug that reportedly caused NaN values. The changes correctly separate the computation of beta from the fused_gdn_gating Triton kernel, moving it back to a PyTorch sigmoid operation. The modifications to the fused_gdn_gating function and its underlying Triton kernel fused_gdn_gating_kernel are consistent and correctly remove the logic for beta computation. The revert appears to be complete and correct, addressing the reported issue.

@ZJY0516 ZJY0516 marked this pull request as draft November 7, 2025 13:20
@ZJY0516 ZJY0516 closed this Nov 10, 2025
@ZJY0516 ZJY0516 deleted the revert-28095-fused_gdn_gating branch November 13, 2025 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant