-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Model][MM] Extract conv layer as CustomOp #28455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request successfully extracts the VisionPatchEmbed layer as a CustomOp, improving modularity and extensibility across various vision models. The refactoring in ernie45_vl.py, glm4_1v.py, moonvit.py, qwen2_5_vl.py, qwen2_vl.py, qwen3_omni_moe_thinker.py, and qwen3_vl.py correctly utilizes the new factory function get_vision_patch_embed to instantiate the appropriate patch embedding layers. This change streamlines the codebase by centralizing the logic for creating vision patch embedding layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
7df8f76 to
a627804
Compare
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think we can finish the convolution layer implementation in this PR and only modify several typical model's implementation. Then migrate remained models in following PR.
Otherwise you have to modify all models' implementation in one time when refine layer implementation each time, which is quite inefficient and easy to make mistake.
| self.proj = nn.Conv2d( | ||
| in_channels=in_channels, | ||
| out_channels=out_channels, | ||
| kernel_size=kernel_size, | ||
| stride=stride, | ||
| padding=padding, | ||
| dilation=dilation, | ||
| groups=groups, | ||
| bias=bias, | ||
| padding_mode=padding_mode, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.proj = nn.Conv2d( | |
| in_channels=in_channels, | |
| out_channels=out_channels, | |
| kernel_size=kernel_size, | |
| stride=stride, | |
| padding=padding, | |
| dilation=dilation, | |
| groups=groups, | |
| bias=bias, | |
| padding_mode=padding_mode, | |
| ) | |
| self.weight = nn.Parameter( | |
| torch.empty( | |
| (in_channels, out_channels // groups, *kernel_size), | |
| ) | |
| ) |
This will create a nn.Conv2d as a submodule inside Conv2dLayer, let's initialize weight and bias in Conv2dLayer directly.
| x = self.proj(x) | ||
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| x = self.proj(x) | |
| return x | |
| return F.conv2d( | |
| input, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups | |
| ) |
Then we can control the dispatch at ops-level like F.conv2d etc.
| class LinearConvLayer(ConvLayerBase): | ||
| """Conv layer with linear module.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| input_size: int, | ||
| output_size: int, | ||
| bias: bool = True, | ||
| skip_bias_add: bool = False, | ||
| params_dtype: torch.dtype | None = None, | ||
| quant_config: QuantizationConfig | None = None, | ||
| prefix: str = "", | ||
| *, | ||
| return_bias: bool = True, | ||
| disable_tp: bool = False, | ||
| ) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class LinearConvLayer(ConvLayerBase): | |
| """Conv layer with linear module.""" | |
| def __init__( | |
| self, | |
| input_size: int, | |
| output_size: int, | |
| bias: bool = True, | |
| skip_bias_add: bool = False, | |
| params_dtype: torch.dtype | None = None, | |
| quant_config: QuantizationConfig | None = None, | |
| prefix: str = "", | |
| *, | |
| return_bias: bool = True, | |
| disable_tp: bool = False, | |
| ) -> None: | |
| class LinearConvLayer(ConvLayerBase): | |
| """Conv layer with linear module.""" | |
| def __init__( | |
| self, | |
| in_channels: int, | |
| out_channels: int, | |
| kernel_size: int, | |
| stride: int = 1, | |
| padding: int | tuple | str = 0, | |
| dilation: int | tuple = 1, | |
| groups: int = 1, | |
| bias: bool = True, | |
| padding_mode: str = "zeros", | |
| ) -> None: |
Hmmm, we only use linear to replace convolution when kernel_size == stride, which is a special case for nn.Conv2d/nn.Conv3d. I think we can implement the linearized convolution inside Conv2dLayer, if not we should at least keep args consistent with ConvLayer. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, we only use linear to replace convolution when
kernel_size == stride, which is a special case fornn.Conv2d/nn.Conv3d. I think we can implement the linearized convolution insideConv2dLayer, if not we should at least keep args consistent withConvLayer. WDYT?
Oh, that's a good suggestion, maybe we can merge these two types into Conv2dLayer and add some checks to dispatch the forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only replace conv3d with linear currently because of a performance regression in torch2.9 (see #27406).
BTW, I remember that linear is faster than convolution for a point-wise convolution computation in early pytorch, perhaps other platforms (like CPU platform) can benefit from linearized conv2d/3d because they have custom optimized gemm ops.
I think we can implement both conv layer like this to automatically linearize conv ops:
class Conv2dLayer(CustomOp):
"""Conv layer with Conv2d."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int | tuple,
stride: int | tuple | None,
padding: int | tuple | str | None,
dilation: int | tuple | None,
groups: int | None,
bias: bool | None,
padding_mode: str | None,
) -> None:
super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.kernel_size = kernel_size
self.stride = stride
...
self.can_linearize = (self.kernel_size == self.stride and not self.padding)
if self.can_linearize:
self.weight = nn.Parameter(
torch.empty(out_channels, in_channels * math.prod(kernel_size))
)
else:
self.weight = nn.Parameter(
torch.empty(out_channels, in_channels, *kernel_size)
)
...
def forward_native(self, x: torch.Tensor) -> torch.Tensor:
.... # <- do some reshape here
if self.can_linearize:
return F.linear(x, self.weight, self.bias)
else:
return F.conv2d(
x,
self.weight,
bias=self.bias,
stride=self.stride,
padding=self.padding,
dilation=self.dilation,
groups=self.groups,
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Isotr0py Currently, I find that vllm only convert Conv3D to Linear with conv3d_to_linear_weight() in vision.py and don't apply this optimization for Conv2D. Could we directly implement it as below?
Conv2dLayer-> only useF.conv2d(). (because we don't convert theConv2Dweight when loading)Conv3dLayer-> directly useF.linear()whenkernel_size==stridefor better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(because we don't convert the Conv2D weight when loading)
In fact, conv3d_to_linear_weight is just a workaround for nn.Conv3d when fixing the performance regression. Since we decide to expose conv layer as CustomOP, we can optimize its usage with a better implementation.
We can implement weight_loader for conv layer like linear layer so that it can automatically convert weight when calling weight_loader(param, loaded_weight):
set_weight_attrs(weight, {"weight_loader": self.weight_loader})def weight_loader(self, param: Parameter, loaded_weight: torch.Tensor):
if self.can_linearize:
out_channels = loaded_weight.shape[0]
loaded_weight = loaded_weight.reshape(out_channels, -1)
param.data.copy_(loaded_weight)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's a good idea, I will follow this to implement it today.
|
BTW, let me update the layer implementation for you tonight. :) |
Signed-off-by: shen-shanshan <[email protected]>
2d9f1c7 to
60111dd
Compare
Oh sorry I didn't notice that and have just force pushed a new commit.. 😂 |
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request successfully extracts ConvLayer as CustomOp, which improves modularity and extensibility. The changes correctly integrate the new Conv2dLayer and Conv3dLayer into existing models, replacing direct nn.Conv2d or ReplicatedLinear calls and removing the conv3d_to_linear_weight utility function. This is a positive step towards better management of convolution operations within the framework. However, there are a few areas in the new conv.py file that require attention to ensure robustness and correctness, particularly regarding the handling of kernel_size and stride in CausalConv2dLayer when they are provided as tuples.
Signed-off-by: Isotr0py <[email protected]>
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if the CI can pass now.
|
I have also tested this PR on ascend platform and it worked well, even getting a better performance. |
|
Also cc @jikunshang @bigPYJ1151. CPU/XPU platform may need to update |
Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: George D. Torres <[email protected]>
|
@Isotr0py thanks for reminder! for XPU we will call |
Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Bram Wasti <[email protected]>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Signed-off-by: Kurumi5210 <[email protected]>
Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>
Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Purpose
Extract
ConvLayerasCustomOpfor better management and extensibility (especially for plugin devices).Find more details at the comments below.
Test Plan
I have tested this PR on ascend platform and it worked well, even getting a better performance.
Test Result
See: vllm-project/vllm-ascend#4198.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.