Skip to content

Conversation

@ChenxiQ
Copy link
Contributor

@ChenxiQ ChenxiQ commented Dec 1, 2025

What this PR does / why we need it?

  1. Fixes the environment path used to locate custom op shared libraries.
  2. Uses empty tensor initialization for op outputs instead of zero-initialization for better efficiency.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: QianChenxi <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request includes a bug fix and a performance optimization. In vllm_ascend/platform.py, a duplicated path segment in the construction of CUSTOM_OPP_PATH is corrected, which is a necessary fix. In csrc/torch_binding.cpp, at::zeros is replaced with at::empty for output tensor allocation, which is a good performance improvement. However, I've identified a related critical issue that should be addressed.

Comment on lines +571 to +573
at::Tensor output = at::empty({m, n/2}, x.options().dtype(at::kChar));
at::Tensor output_scale = at::empty({m}, x.options().dtype(at::kFloat));
at::Tensor output_offset = at::empty({m}, x.options().dtype(at::kFloat));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The size of these output tensors depends on n, which is calculated on line 567 from the weight TensorList (weight[0].sizes()[1]). This is unsafe because if the weight TensorList is empty, accessing weight[0] will cause a crash. It's crucial to add a check to ensure weight is not empty before its elements are accessed.

For example, you could add the following check before line 567:

TORCH_CHECK(!weight.empty(), "weight tensor list cannot be empty");

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@ChenxiQ ChenxiQ changed the title [Bugfix] custom op fix [Bugfix] fix custom op GmmSwigluQuantWeightNzTensorList Dec 1, 2025
@ChenxiQ ChenxiQ force-pushed the br_custom_op_fix branch 3 times, most recently from b1c3922 to 29191fb Compare December 2, 2025 13:04
Signed-off-by: QianChenxi <[email protected]>
@wangxiyuan wangxiyuan merged commit 4588cda into vllm-project:main Dec 2, 2025
21 checks passed
global _CUSTOM_OP_ENABLED

# set custom ops path
CUR_DIR = os.path.dirname(os.path.realpath(__file__))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxiyuan This line causes the test tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ to fail.
os.path.realpath involves system calls that are not supported by torch.compile (Dynamo) during graph capture. This triggers a runtime crash when graph mode is enabled.
Maybe move the CUR_DIR calculation to the module level (global scope) to avoid this trace error and unnecessary re-calculation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I found it. Thanks for reminding

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hukongyi #4675 I think this one should fix the CI

wangxiyuan added a commit to wangxiyuan/vllm-ascend that referenced this pull request Dec 3, 2025
ChenCangtao pushed a commit to ChenCangtao/vllm-ascend that referenced this pull request Dec 3, 2025
…#4593)

### What this PR does / why we need it?

1. Fixes the environment path used to locate custom op shared libraries.
2. Uses empty tensor initialization for op outputs instead of
zero-initialization for better efficiency.



- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: QianChenxi <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
…#4593)

### What this PR does / why we need it?

1. Fixes the environment path used to locate custom op shared libraries.
2. Uses empty tensor initialization for op outputs instead of
zero-initialization for better efficiency.

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: QianChenxi <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
…#4593)

### What this PR does / why we need it?

1. Fixes the environment path used to locate custom op shared libraries.
2. Uses empty tensor initialization for op outputs instead of
zero-initialization for better efficiency.

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: QianChenxi <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Meihan-chen pushed a commit to Meihan-chen/vllm-ascend that referenced this pull request Dec 5, 2025
…#4593)

### What this PR does / why we need it?

1. Fixes the environment path used to locate custom op shared libraries.
2. Uses empty tensor initialization for op outputs instead of
zero-initialization for better efficiency.



- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: QianChenxi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants