[XPU] llama add xpu support by dynamicheart · Pull Request #8282 · PaddlePaddle/PaddleNLP

dynamicheart · 2024-04-17T07:28:10Z

PR types

New features

PR changes

Models

Description

Llama model supports XPU
简要说明：XPU自定义的融合算子通过Paddle C++ Extension的方式接入，XPU自定义Paddle算子库的名称叫做paddle_xpu(aka. fast_paddle)

paddle-bot · 2024-04-17T07:28:15Z

Thanks for your contribution!

codecov · 2024-04-17T09:04:36Z

Codecov Report

Attention: Patch coverage is 42.50000% with 46 lines in your changes missing coverage. Please review.

Project coverage is 55.35%. Comparing base (273c593) to head (6e0316a).
Report is 342 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/llama/modeling.py	39.21%	31 Missing ⚠️
paddlenlp/transformers/linear_utils.py	48.27%	15 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8282      +/-   ##
===========================================
+ Coverage    55.25%   55.35%   +0.10%     
===========================================
  Files          613      614       +1     
  Lines        95626    95924     +298     
===========================================
+ Hits         52837    53103     +266     
- Misses       42789    42821      +32

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cqulilujia · 2024-04-22T03:36:49Z

LGTM

dynamicheart · 2024-04-22T03:42:02Z

+            x = paddle.to_tensor(0.0, dtype=dtype)
+            y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype)
+            expanded_attn_mask = expanded_attn_mask.astype(dtype)
+            expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype)


当传入的x和y是整型scalar类型时，paddle.where 会将其视为int64、形状[1]的tensor，并会进行broadcast_add操作，详见search.py

ZibinGuo

LGTM

wuhuachaocoding · 2024-04-24T06:29:06Z

LGTM

ZHUI · 2024-04-24T08:19:31Z

+            LinearConfig.enable_accumulate_steps_opt()
+            LinearConfig.set_accumulate_steps(training_args.gradient_accumulation_steps)
+        except ImportError:
+            pass


这个是做什么的？

XPU针对accumulate_steps > 1的场景进行优化，配合下面的paddle_xpu里面的Linear层进行使用

…nto xpu_llama

ZHUI · 2024-04-25T03:57:32Z

+            x = paddle.to_tensor(0.0, dtype=dtype)
+            y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype)
+            expanded_attn_mask = expanded_attn_mask.astype(dtype)
+            expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype)


这里和上面 npu 的逻辑看着差不多，可以复用吗？

理论上是可以复用的，但是npu里面写死了dtype是float16，xpu跑的程序是可能是float16，也可能是bfloat16的。我们需要修改npu的模块么？

@SylarTiaNII 看一下？

根据 @wuhuachaocoding 意见，还是分成if elif两个单独的分支

ZHUI · 2024-04-25T03:59:43Z

+            logits = self.xpu_parallel_matmul(
+                hidden_states, self.weight, tensor_parallel_output=tensor_parallel_output, training=self.training


training 参数必须要吗？如果参数能一样的话，是不是把 parallel_matmul 的实现在xpu下替换就好了？

这里面有两个原因：

XPU的一个优化是需要将parallel_matmul作为一个对象来存储某些状态

XPU需要training信息来进行优化

ZHUI

LGTM

* [XPU] llama add xpu support (#8282) * [XPU] llama add xpu support * fix * use try import * fix * refine * refine * refine * refine * update (#8399) * [LLM] Support fuse attention q, k, v weights (#8202) 1. add use-interface & fuse action 1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch * 3.6.1 solve shard branch after rebase develop * code clean * remove debug comment * Redefine fuse and split functions * Redefine fuse and split functions * comment and fix * update method * update QKV fuse and split * support fuse weights in multi-files * add precision compare * simplify function call * support use_fast_ffn * clean modeling and configuration * add test for gpt and opt * fix tp_actions get * add fast_ffn test * add Qwen2Moe * Revert "add Qwen2Moe" This reverts commit 113b883. * add test for split * update doc * update filter_dict_keys --------- Co-authored-by: Zii <[email protected]> * [LLM] Fix fuse or split with same key (#8378) * fix fuse or split with same key * fix * fix eps * update format * [LLM] add decay steps option for finetuning (#8251) * [LLM] add memory stats to logger of trainer (#8269) * [Distributed] fix lora (#8325) * [LLM] fix lora target modules on llama (#8372) * [Distributed] metric calculation supports tp logits (#8370) * Update model_utils.py * Update model_utils.py * Update model_utils.py --------- Co-authored-by: Jianbang Yang <[email protected]> Co-authored-by: DrownFish19 <[email protected]> Co-authored-by: Zii <[email protected]> Co-authored-by: Tian <[email protected]>

dynamicheart force-pushed the xpu_llama branch from 41cc029 to 160c79d Compare April 17, 2024 08:35

dynamicheart force-pushed the xpu_llama branch from 160c79d to 72ef18a Compare April 18, 2024 06:57

dynamicheart added 2 commits April 20, 2024 16:43

[XPU] llama add xpu support

1293619

fix

e388ed6

dynamicheart force-pushed the xpu_llama branch from 2a94e66 to e388ed6 Compare April 20, 2024 08:44

dynamicheart commented Apr 22, 2024

View reviewed changes

QingshuChen reviewed Apr 22, 2024

View reviewed changes

Comment thread llm/run_pretrain.py Outdated

Comment thread paddlenlp/transformers/llama/modeling.py Outdated

use try import

41421f4

dynamicheart closed this Apr 22, 2024

dynamicheart reopened this Apr 22, 2024

ZibinGuo reviewed Apr 22, 2024

View reviewed changes

fix

e9a4b87

dynamicheart closed this Apr 22, 2024

dynamicheart reopened this Apr 22, 2024

dynamicheart closed this Apr 22, 2024

dynamicheart reopened this Apr 22, 2024

ZHUI reviewed Apr 24, 2024

View reviewed changes

Comment thread paddlenlp/transformers/llama/modeling.py Outdated

Comment thread paddlenlp/transformers/llama/modeling.py Outdated

Comment thread paddlenlp/transformers/llama/modeling.py Outdated

Comment thread paddlenlp/transformers/llama/modeling.py Outdated

ZHUI reviewed Apr 24, 2024

View reviewed changes

dynamicheart added 2 commits April 24, 2024 18:08

refine

2a8c639

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

d9dcdbe

…nto xpu_llama

ZHUI reviewed Apr 25, 2024

View reviewed changes

dynamicheart added 3 commits April 25, 2024 20:16

refine

40c23a5

refine

a3935fd

refine

6e0316a

ZHUI approved these changes Apr 28, 2024

View reviewed changes

sijunhe approved these changes Apr 29, 2024

View reviewed changes

sijunhe merged commit ba9d9bd into PaddlePaddle:develop Apr 29, 2024

ZHUI added the XPU label Sep 6, 2024

tizhou86 mentioned this pull request Sep 19, 2024

[XPU] add xpu support for llama sft #9152

Merged

		logits = self.xpu_parallel_matmul(
		hidden_states, self.weight, tensor_parallel_output=tensor_parallel_output, training=self.training

Conversation

dynamicheart commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot Bot commented Apr 17, 2024

Uh oh!

codecov Bot commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cqulilujia commented Apr 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZibinGuo left a comment

Choose a reason for hiding this comment

Uh oh!

wuhuachaocoding commented Apr 24, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dynamicheart commented Apr 17, 2024 •

edited

Loading

codecov Bot commented Apr 17, 2024 •

edited

Loading