Use Llama RMSNorm for Gemma #2974

WoosukKwon · 2024-02-22T01:01:35Z

Gemma's RMSNorm is only slightly different from Llama's RMSNorm. Thus, we can use the existing custom op for it. This optimization leads to ~10% latency reduction.

vllm/model_executor/models/gemma.py

Yard1

Nice!

vllm/model_executor/models/gemma.py

Yard1 · 2024-02-22T01:22:01Z

Before we merge, let's make sure it doesn't change the outputs (maybe we could add a test like we have for other models, using transformers as a reference).

WoosukKwon · 2024-02-22T01:45:49Z

For a note, using the custom op brings a slight numerical difference in handling the residual connection.

While the original implementation uses the current dtype (f16 of bf16) in hidden_states + residual, the fused RMSNorm op upcasts both to FP32 before addition:

vllm/vllm/model_executor/layers/layernorm.py

Line 35 in 8fbd84b

x = x + residual.to(torch.float32)

WoosukKwon added 2 commits February 22, 2024 00:59

Use RMSNorm for Gemma

ca6e482

Minor

ef29c95

WoosukKwon requested a review from Yard1 February 22, 2024 01:03

Yard1 reviewed Feb 22, 2024

View reviewed changes

vllm/model_executor/models/gemma.py Show resolved Hide resolved

Yard1 approved these changes Feb 22, 2024

View reviewed changes

Yard1 reviewed Feb 22, 2024

View reviewed changes

vllm/model_executor/models/gemma.py Outdated Show resolved Hide resolved

Address comment

1986eb9

WoosukKwon merged commit 95529e3 into main Feb 22, 2024

WoosukKwon deleted the optimize-gemma branch February 22, 2024 02:28

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Use Llama RMSNorm custom op for Gemma (vllm-project#2974)

e8f9cc5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use Llama RMSNorm for Gemma #2974

Use Llama RMSNorm for Gemma #2974

Uh oh!

WoosukKwon commented Feb 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

Yard1 left a comment

Uh oh!

Uh oh!

Yard1 commented Feb 22, 2024 •

edited

Loading

Uh oh!

WoosukKwon commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Use Llama RMSNorm for Gemma #2974

Use Llama RMSNorm for Gemma #2974

Uh oh!

Conversation

WoosukKwon commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Yard1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yard1 commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WoosukKwon commented Feb 22, 2024 •

edited

Loading

Yard1 commented Feb 22, 2024 •

edited

Loading