[Model][MM] Extract conv layer as CustomOp #28455

shen-shanshan · 2025-11-11T09:08:29Z

Purpose

Extract ConvLayer as CustomOp for better management and extensibility (especially for plugin devices).

Find more details at the comments below.

Test Plan

I have tested this PR on ascend platform and it worked well, even getting a better performance.

Test Result

See: vllm-project/vllm-ascend#4198.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

The pull request successfully extracts the VisionPatchEmbed layer as a CustomOp, improving modularity and extensibility across various vision models. The refactoring in ernie45_vl.py, glm4_1v.py, moonvit.py, qwen2_5_vl.py, qwen2_vl.py, qwen3_omni_moe_thinker.py, and qwen3_vl.py correctly utilizes the new factory function get_vision_patch_embed to instantiate the appropriate patch embedding layers. This change streamlines the codebase by centralizing the logic for creating vision patch embedding layers.

vllm/model_executor/layers/multi_modal/__init__.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/multi_modal/__init__.py

vllm/model_executor/layers/multi_modal/vision_patch_embed.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/multi_modal/vision_patch_embed.py

Isotr0py

BTW, I think we can finish the convolution layer implementation in this PR and only modify several typical model's implementation. Then migrate remained models in following PR.

Otherwise you have to modify all models' implementation in one time when refine layer implementation each time, which is quite inefficient and easy to make mistake.

Isotr0py · 2025-11-12T14:33:34Z

vllm/model_executor/layers/multi_modal/conv.py

+        self.proj = nn.Conv2d(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias,
+            padding_mode=padding_mode,
+        )


Suggested change

self.proj = nn.Conv2d(

in_channels=in_channels,

out_channels=out_channels,

kernel_size=kernel_size,

stride=stride,

padding=padding,

dilation=dilation,

groups=groups,

bias=bias,

padding_mode=padding_mode,

)

self.weight = nn.Parameter(

torch.empty(

(in_channels, out_channels // groups, *kernel_size),

)

)

This will create a nn.Conv2d as a submodule inside Conv2dLayer, let's initialize weight and bias in Conv2dLayer directly.

Isotr0py · 2025-11-12T14:36:26Z

vllm/model_executor/layers/multi_modal/conv.py

+        x = self.proj(x)
+        return x


Suggested change

x = self.proj(x)

return x

return F.conv2d(

input, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups

)

Then we can control the dispatch at ops-level like F.conv2d etc.

Isotr0py · 2025-11-12T14:48:59Z

vllm/model_executor/layers/multi_modal/conv.py

+class LinearConvLayer(ConvLayerBase):
+    """Conv layer with linear module."""
+
+    def __init__(
+        self,
+        input_size: int,
+        output_size: int,
+        bias: bool = True,
+        skip_bias_add: bool = False,
+        params_dtype: torch.dtype | None = None,
+        quant_config: QuantizationConfig | None = None,
+        prefix: str = "",
+        *,
+        return_bias: bool = True,
+        disable_tp: bool = False,
+    ) -> None:


Suggested change

class LinearConvLayer(ConvLayerBase):

"""Conv layer with linear module."""

def __init__(

self,

input_size: int,

output_size: int,

bias: bool = True,

skip_bias_add: bool = False,

params_dtype: torch.dtype | None = None,

quant_config: QuantizationConfig | None = None,

prefix: str = "",

*,

return_bias: bool = True,

disable_tp: bool = False,

) -> None:

class LinearConvLayer(ConvLayerBase):

"""Conv layer with linear module."""

def __init__(

self,

in_channels: int,

out_channels: int,

kernel_size: int,

stride: int = 1,

padding: int | tuple | str = 0,

dilation: int | tuple = 1,

groups: int = 1,

bias: bool = True,

padding_mode: str = "zeros",

) -> None:

Hmmm, we only use linear to replace convolution when kernel_size == stride, which is a special case for nn.Conv2d/nn.Conv3d. I think we can implement the linearized convolution inside Conv2dLayer, if not we should at least keep args consistent with ConvLayer. WDYT?

Hmmm, we only use linear to replace convolution when kernel_size == stride, which is a special case for nn.Conv2d/nn.Conv3d. I think we can implement the linearized convolution inside Conv2dLayer, if not we should at least keep args consistent with ConvLayer. WDYT?

Oh, that's a good suggestion, maybe we can merge these two types into Conv2dLayer and add some checks to dispatch the forward.

We only replace conv3d with linear currently because of a performance regression in torch2.9 (see #27406).

BTW, I remember that linear is faster than convolution for a point-wise convolution computation in early pytorch, perhaps other platforms (like CPU platform) can benefit from linearized conv2d/3d because they have custom optimized gemm ops.

I think we can implement both conv layer like this to automatically linearize conv ops:

class Conv2dLayer(CustomOp): """Conv layer with Conv2d.""" def __init__( self, in_channels: int, out_channels: int, kernel_size: int | tuple, stride: int | tuple | None, padding: int | tuple | str | None, dilation: int | tuple | None, groups: int | None, bias: bool | None, padding_mode: str | None, ) -> None: super().__init__() self.in_channels = in_channels self.out_channels = out_channels self.kernel_size = kernel_size self.stride = stride ... self.can_linearize = (self.kernel_size == self.stride and not self.padding) if self.can_linearize: self.weight = nn.Parameter( torch.empty(out_channels, in_channels * math.prod(kernel_size)) ) else: self.weight = nn.Parameter( torch.empty(out_channels, in_channels, *kernel_size) ) ... def forward_native(self, x: torch.Tensor) -> torch.Tensor: .... # <- do some reshape here if self.can_linearize: return F.linear(x, self.weight, self.bias) else: return F.conv2d( x, self.weight, bias=self.bias, stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.groups, )

@Isotr0py Currently, I find that vllm only convert Conv3D to Linear with conv3d_to_linear_weight() in vision.py and don't apply this optimization for Conv2D. Could we directly implement it as below?

Conv2dLayer -> only use F.conv2d(). (because we don't convert the Conv2D weight when loading)

Conv3dLayer -> directly use F.linear() when kernel_size==stride for better performance.

(because we don't convert the Conv2D weight when loading)

In fact, conv3d_to_linear_weight is just a workaround for nn.Conv3d when fixing the performance regression. Since we decide to expose conv layer as CustomOP, we can optimize its usage with a better implementation.

We can implement weight_loader for conv layer like linear layer so that it can automatically convert weight when calling weight_loader(param, loaded_weight):

set_weight_attrs(weight, {"weight_loader": self.weight_loader})

def weight_loader(self, param: Parameter, loaded_weight: torch.Tensor): if self.can_linearize: out_channels = loaded_weight.shape[0] loaded_weight = loaded_weight.reshape(out_channels, -1) param.data.copy_(loaded_weight)

OK, that's a good idea, I will follow this to implement it today.

vllm/model_executor/layers/multi_modal/conv.py

Isotr0py · 2025-11-13T09:33:03Z

BTW, let me update the layer implementation for you tonight. :)

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan · 2025-11-13T13:13:13Z

BTW, let me update the layer implementation for you tonight. :)

Oh sorry I didn't notice that and have just force pushed a new commit.. 😂

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-11-13T16:19:18Z

/gemini review

gemini-code-assist

Code Review

The pull request successfully extracts ConvLayer as CustomOp, which improves modularity and extensibility. The changes correctly integrate the new Conv2dLayer and Conv3dLayer into existing models, replacing direct nn.Conv2d or ReplicatedLinear calls and removing the conv3d_to_linear_weight utility function. This is a positive step towards better management of convolution operations within the framework. However, there are a few areas in the new conv.py file that require attention to ensure robustness and correctness, particularly regarding the handling of kernel_size and stride in CausalConv2dLayer when they are provided as tuples.

vllm/model_executor/layers/conv.py

Signed-off-by: Isotr0py <[email protected]>

Isotr0py

Let's see if the CI can pass now.

shen-shanshan · 2025-11-14T07:08:39Z

I have also tested this PR on ascend platform and it worked well, even getting a better performance.

see: vllm-project/vllm-ascend#4198.

Isotr0py · 2025-11-14T13:25:29Z

Also cc @jikunshang @bigPYJ1151. CPU/XPU platform may need to update forward_cpu/forward_xpu for convolution layer optimization as well.

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: George D. Torres <[email protected]>

jikunshang · 2025-11-17T00:27:39Z

@Isotr0py thanks for reminder! for XPU we will call forward_native by default. and for CPU, it will call forward_cuda. ideally both should work since it's just torch operator calls here. we will check whether this affect functionality/perf on XPU&CPU.

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Signed-off-by: Kurumi5210 <[email protected]>

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

shen-shanshan requested a review from sighingnow as a code owner November 11, 2025 09:08

shen-shanshan marked this pull request as draft November 11, 2025 09:08

shen-shanshan mentioned this pull request Nov 11, 2025

[RFC]: Remove VL Modeling Files vllm-project/vllm-ascend#4084

Open

15 tasks

mergify bot added the qwen Related to Qwen models label Nov 11, 2025

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/layers/multi_modal/__init__.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/layers/multi_modal/__init__.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/multi_modal/vision_patch_embed.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/multi_modal/vision_patch_embed.py Outdated Show resolved Hide resolved

shen-shanshan marked this pull request as ready for review November 11, 2025 12:10

chatgpt-codex-connector bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/layers/multi_modal/vision_patch_embed.py Outdated Show resolved Hide resolved

Isotr0py reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/layers/multi_modal/vision_patch_embed.py Outdated Show resolved Hide resolved

shen-shanshan force-pushed the mm-model branch from 7df8f76 to a627804 Compare November 12, 2025 09:50

shen-shanshan requested a review from patrickvonplaten as a code owner November 12, 2025 09:50

shen-shanshan requested a review from Isotr0py November 12, 2025 10:01

Isotr0py reviewed Nov 12, 2025

View reviewed changes

shen-shanshan changed the title ~~[Model][MM] Extract VisionPatchEmbed as CustomOp~~ [Model][MM] Extract conv layer as CustomOp Nov 12, 2025

Extract conv layer as CustomOp

60111dd

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan force-pushed the mm-model branch from 2d9f1c7 to 60111dd Compare November 13, 2025 12:46

Isotr0py added 5 commits November 13, 2025 21:30

refactor conv2d and conv3d

d9ec8a9

Signed-off-by: Isotr0py <[email protected]>

fix

ec8d224

Signed-off-by: Isotr0py <[email protected]>

tune cuda platform

e07e51c

Signed-off-by: Isotr0py <[email protected]>

oops

0127f0d

Signed-off-by: Isotr0py <[email protected]>

clean get_conv_layer

728ba81

Signed-off-by: Isotr0py <[email protected]>

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

vllm/model_executor/layers/conv.py Show resolved Hide resolved

vllm/model_executor/layers/conv.py Show resolved Hide resolved

vllm/model_executor/layers/conv.py Show resolved Hide resolved

Isotr0py added 2 commits November 14, 2025 00:34

gemini

70a1f6c

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into mm-model

3544cfd

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2025

Isotr0py approved these changes Nov 13, 2025

View reviewed changes

Isotr0py merged commit 41b92f7 into vllm-project:main Nov 14, 2025
57 checks passed

Isotr0py mentioned this pull request Nov 17, 2025

[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer #28842

Merged

5 tasks

wangxiyuan mentioned this pull request Nov 25, 2025

upgrade to vllm 0.11.2 vllm-project/vllm-ascend#4400

Merged

Uh oh!

[Model][MM] Extract conv layer as CustomOp #28455

[Model][MM] Extract conv layer as CustomOp #28455

Uh oh!

Conversation

shen-shanshan commented Nov 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Isotr0py commented Nov 13, 2025

Uh oh!

shen-shanshan commented Nov 13, 2025

Uh oh!

Isotr0py commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

shen-shanshan commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented Nov 14, 2025

Uh oh!

shen-shanshan commented Nov 11, 2025 •

edited by github-actions bot

Loading

shen-shanshan Nov 13, 2025 •

edited

Loading

shen-shanshan commented Nov 14, 2025 •

edited

Loading