[Distributed] enable tensor_parallel_output for finetuning#8370
Conversation
|
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8370 +/- ##
===========================================
- Coverage 55.43% 55.36% -0.07%
===========================================
Files 616 614 -2
Lines 96229 96016 -213
===========================================
- Hits 53346 53164 -182
+ Misses 42883 42852 -31 ☔ View full report in Codecov by Sentry. |
| model_config = AutoConfig.from_pretrained( | ||
| model_args.model_name_or_path, | ||
| tensor_parallel_output=False, | ||
| tensor_parallel_output=True, |
There was a problem hiding this comment.
tensor_parallel_output=True主要是为了加速吗?
tensor_parallel_output=True 设置为True会导致模型指标ACC计算出错,因为没有对结果进行all gather操作
There was a problem hiding this comment.
如果不设置为True的话,一个是影响性能,再一个是影响显存占用。在llm场景会有比较大的性能影响。模型指标ACC的计算是不是可以考虑做一下相应的优化来适配mp场景?
There was a problem hiding this comment.
这里可以加开关,建议默认还是False,generation 那里还没有适配,会有问题。
8fd9ff9 to
d162d0c
Compare
| model_config = AutoConfig.from_pretrained( | ||
| model_args.model_name_or_path, | ||
| tensor_parallel_output=False, | ||
| tensor_parallel_output=True, |
There was a problem hiding this comment.
这里可以加开关,建议默认还是False,generation 那里还没有适配,会有问题。
|
|
||
| # Metrics! | ||
| if self.compute_metrics is not None and all_preds is not None and all_labels is not None: | ||
| if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape: |
There was a problem hiding this comment.
| if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape: | |
| if self.args.tensor_parallel_degree > 1 and isinstance(all_preds, paddle.Tensor) all_preds.shape != all_labels.shape: |
然后这里加一个注释吧,all_gather logits for tp
|
|
||
| # Metrics! | ||
| if self.compute_metrics is not None and all_preds is not None and all_labels is not None: | ||
| if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape: |
There was a problem hiding this comment.
应该在https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/utils.py#L208 CausalLMTrainer这里对logit加上all gather的操作,而不是在这
9a8b420 to
eaf6453
Compare
eaf6453 to
176891c
Compare
* [XPU] llama add xpu support (#8282) * [XPU] llama add xpu support * fix * use try import * fix * refine * refine * refine * refine * update (#8399) * [LLM] Support fuse attention q, k, v weights (#8202) 1. add use-interface & fuse action 1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch * 3.6.1 solve shard branch after rebase develop * code clean * remove debug comment * Redefine fuse and split functions * Redefine fuse and split functions * comment and fix * update method * update QKV fuse and split * support fuse weights in multi-files * add precision compare * simplify function call * support use_fast_ffn * clean modeling and configuration * add test for gpt and opt * fix tp_actions get * add fast_ffn test * add Qwen2Moe * Revert "add Qwen2Moe" This reverts commit 113b883. * add test for split * update doc * update filter_dict_keys --------- Co-authored-by: Zii <[email protected]> * [LLM] Fix fuse or split with same key (#8378) * fix fuse or split with same key * fix * fix eps * update format * [LLM] add decay steps option for finetuning (#8251) * [LLM] add memory stats to logger of trainer (#8269) * [Distributed] fix lora (#8325) * [LLM] fix lora target modules on llama (#8372) * [Distributed] metric calculation supports tp logits (#8370) * Update model_utils.py * Update model_utils.py * Update model_utils.py --------- Co-authored-by: Jianbang Yang <[email protected]> Co-authored-by: DrownFish19 <[email protected]> Co-authored-by: Zii <[email protected]> Co-authored-by: Tian <[email protected]>
PR types
Bug fixes(Performance optimization)
PR changes
Others
Description
enable tensor_parallel_output as default for better performance