[XPU] llama add xpu support#8282
Conversation
|
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8282 +/- ##
===========================================
+ Coverage 55.25% 55.35% +0.10%
===========================================
Files 613 614 +1
Lines 95626 95924 +298
===========================================
+ Hits 52837 53103 +266
- Misses 42789 42821 +32 ☔ View full report in Codecov by Sentry. |
|
LGTM |
| x = paddle.to_tensor(0.0, dtype=dtype) | ||
| y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype) | ||
| expanded_attn_mask = expanded_attn_mask.astype(dtype) | ||
| expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype) |
There was a problem hiding this comment.
当传入的x和y是整型scalar类型时,paddle.where 会将其视为int64、形状[1]的tensor,并会进行broadcast_add操作,详见search.py
|
LGTM |
| LinearConfig.enable_accumulate_steps_opt() | ||
| LinearConfig.set_accumulate_steps(training_args.gradient_accumulation_steps) | ||
| except ImportError: | ||
| pass |
There was a problem hiding this comment.
XPU针对accumulate_steps > 1的场景进行优化,配合下面的paddle_xpu里面的Linear层进行使用
| x = paddle.to_tensor(0.0, dtype=dtype) | ||
| y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype) | ||
| expanded_attn_mask = expanded_attn_mask.astype(dtype) | ||
| expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype) |
There was a problem hiding this comment.
理论上是可以复用的,但是npu里面写死了dtype是float16,xpu跑的程序是可能是float16,也可能是bfloat16的。我们需要修改npu的模块么?
There was a problem hiding this comment.
根据 @wuhuachaocoding 意见,还是分成if elif两个单独的分支
| logits = self.xpu_parallel_matmul( | ||
| hidden_states, self.weight, tensor_parallel_output=tensor_parallel_output, training=self.training |
There was a problem hiding this comment.
training 参数必须要吗?如果参数能一样的话,是不是 把 parallel_matmul 的实现在xpu下替换就好了?
There was a problem hiding this comment.
这里面有两个原因:
- XPU的一个优化是需要将parallel_matmul作为一个对象来存储某些状态
- XPU需要
training信息来进行优化
* [XPU] llama add xpu support (#8282) * [XPU] llama add xpu support * fix * use try import * fix * refine * refine * refine * refine * update (#8399) * [LLM] Support fuse attention q, k, v weights (#8202) 1. add use-interface & fuse action 1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch * 3.6.1 solve shard branch after rebase develop * code clean * remove debug comment * Redefine fuse and split functions * Redefine fuse and split functions * comment and fix * update method * update QKV fuse and split * support fuse weights in multi-files * add precision compare * simplify function call * support use_fast_ffn * clean modeling and configuration * add test for gpt and opt * fix tp_actions get * add fast_ffn test * add Qwen2Moe * Revert "add Qwen2Moe" This reverts commit 113b883. * add test for split * update doc * update filter_dict_keys --------- Co-authored-by: Zii <[email protected]> * [LLM] Fix fuse or split with same key (#8378) * fix fuse or split with same key * fix * fix eps * update format * [LLM] add decay steps option for finetuning (#8251) * [LLM] add memory stats to logger of trainer (#8269) * [Distributed] fix lora (#8325) * [LLM] fix lora target modules on llama (#8372) * [Distributed] metric calculation supports tp logits (#8370) * Update model_utils.py * Update model_utils.py * Update model_utils.py --------- Co-authored-by: Jianbang Yang <[email protected]> Co-authored-by: DrownFish19 <[email protected]> Co-authored-by: Zii <[email protected]> Co-authored-by: Tian <[email protected]>
PR types
New features
PR changes
Models
Description
paddle_xpu(aka.fast_paddle)