-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[fsdp] feat: add NPU fusion kernels for Qwen2 and Qwen2.5 dense model. #3923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces NPU fusion kernel patches for the Qwen2 model and refactors existing patches to be more generic. The changes are well-structured, particularly the addition of .contiguous() calls in apply_rotary_pos_emb_npu, which is a crucial correctness fix for NPU operations. My review includes a suggestion to improve code clarity by reverting a function rename to be more specific to its usage, which will enhance long-term maintainability.
|
@FightingZhen The issues raised have been resolved. Please review them again. : ) |
|
@ZLiao097 please pull and rebase the newest code from |
Done~ |
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Tested with Qwen2-32B on Ascend A2, test results with train_prompt_bsz=512 sp8 fsdp64 on Ascend A2. The red line represents without fused operator, the yellow line represents NPU with fused operator.

Rewards:
throughput:

API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)