-
Notifications
You must be signed in to change notification settings - Fork 681
[TSP] last_norm allgather move to model.py #5911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TSP] last_norm allgather move to model.py #5911
Conversation
|
Thanks for your contribution! |
| out = self.norm(hidden_states, residual, forward_meta=forward_meta)[0] | ||
| hidden_states = hidden_states + residual | ||
|
|
||
| if self.norm.is_last_norm and self.norm.fd_config.parallel_config.use_sequence_parallel_moe: | ||
| hidden_states = self.norm.allgather(hidden_states, forward_meta.ids_remove_padding.shape[0]) | ||
|
|
||
| out = self.norm(hidden_states, forward_meta=forward_meta)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_gather可以放在norm后吧 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,这个合理,done~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测了qwen 没啥问题
| if self.norm.is_last_norm and self.norm.fd_config.parallel_config.use_sequence_parallel_moe: | ||
| out = self.norm.allgather(out, forward_meta.ids_remove_padding.shape[0]) | ||
|
|
||
| return out | ||
|
|
||
| if current_platform.is_iluvatar() and forward_meta.attn_backend.mixed: | ||
| out = forward_meta.attn_backend.reverse_transpose(out) | ||
|
|
||
| return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这里return位置不太对
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修 done~
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/online/20251131 #5911 +/- ##
==========================================================
Coverage ? 58.46%
==========================================================
Files ? 320
Lines ? 39195
Branches ? 5916
==========================================================
Hits ? 22916
Misses ? 14436
Partials ? 1843
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
799bb56
into
PaddlePaddle:release/online/20251131
Motivation
开启SP时,LLM主模型的last norm和gather过程耦合在一起,不便于多步投机解码调用。故将此特殊的norm和gather进行解耦。
Modifications
修改各个moe模型的组网,在组网中将其解耦。
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.