-
-
Notifications
You must be signed in to change notification settings - Fork 13k
[torch.compile] Adding "torch compile" annotations to some models #9758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
| from transformers.configuration_utils import PretrainedConfig | ||
|
|
||
| from vllm.attention import Attention, AttentionMetadata | ||
| from vllm.compilation.decorators import support_torch_compile | ||
| from vllm.config import CacheConfig, LoRAConfig | ||
| from vllm.distributed import get_pp_group, get_tensor_model_parallel_world_size | ||
| from vllm.model_executor.layers.fused_moe import FusedMoE | ||
|
|
@@ -429,6 +430,7 @@ def forward( | |
| return hidden_states, residual | ||
|
|
||
|
|
||
| @support_torch_compile | ||
| class PhiMoEModel(nn.Module): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for this model, it seems directly running it with
need to investigate it later.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: this is unrelated to |
||
|
|
||
| def __init__( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to run this model successfully on H100, I have to change the config:
initially, I want to simply change
"num_hidden_layers": 35,to"num_hidden_layers": 2,, but I met various random illegal memory access error. might be caused by fused moe kernel, with extremely large input sizes.