-
Notifications
You must be signed in to change notification settings - Fork 730
Switch to PyTorch's built-in RMSNorm #2054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,18 +5,15 @@ | |
| # LICENSE file in the root directory of this source tree. | ||
|
|
||
| import torch | ||
|
|
||
| import torch.nn.functional as F | ||
| from torch import nn | ||
|
|
||
|
|
||
| class RMSNorm(nn.Module): | ||
| """ | ||
| Implements Root Mean Square Normalization introduced in | ||
| https://arxiv.org/abs/1910.07467. | ||
| Root Mean Square Normalization in fp32. | ||
|
|
||
| Reference implementation (used for correctness verification) | ||
| can be found here: | ||
| https://github.com/facebookresearch/llama/blob/main/llama/model.py | ||
| See: https://pytorch.org/docs/stable/generated/torch.nn.RMSNorm.html | ||
|
|
||
| Args: | ||
| dim (int): embedding size | ||
|
|
@@ -25,6 +22,7 @@ class RMSNorm(nn.Module): | |
|
|
||
| def __init__(self, dim: int, eps: float = 1e-6) -> None: | ||
| super().__init__() | ||
| self.normalized_shape = (dim,) | ||
| self.eps = eps | ||
| self.scale = nn.Parameter(torch.ones(dim)) | ||
|
|
||
|
|
@@ -37,8 +35,9 @@ def forward(self, x: torch.Tensor) -> torch.Tensor: | |
| torch.Tensor: The normalized and scaled tensor having the same shape as ``x``. | ||
| """ | ||
| # computation is in fp32 | ||
| x_fp32 = x.float() | ||
| x_normed = ( | ||
| x_fp32 * torch.rsqrt(x_fp32.pow(2).mean(-1, keepdim=True) + self.eps) | ||
| ).type_as(x) | ||
| return x_normed * self.scale | ||
| return F.rms_norm( | ||
| x.float(), | ||
| normalized_shape=self.normalized_shape, | ||
| weight=self.scale, | ||
| eps=self.eps, | ||
|
Comment on lines
+41
to
+42
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noob question: when we load the model in bf16, will self.eps and self.scale also become bf16 or do they stay float32? If they are cast to bf16, its might be worth digging a bit
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| ).to(x.dtype) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why this wasnt failing before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the model wasn't casted to fp16, which means the scale parameter was still fp32. And since
x_normed * self.scaleoccurred after the cast back to fp16, the output ended up in fp32.