[LLM] Add DeepseekV3#9738
Merged
ZHUI merged 9 commits intoJan 10, 2025
Merged
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9738 +/- ##
===========================================
+ Coverage 52.35% 52.38% +0.02%
===========================================
Files 729 730 +1
Lines 117835 115230 -2605
===========================================
- Hits 61694 60359 -1335
+ Misses 56141 54871 -1270 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
DrownFish19
commented
Jan 7, 2025
| from .bit.modeling import * | ||
| from .bit.configuration import * | ||
| from .bit.image_processing import * | ||
| from .artist.configuration import * |
Collaborator
Author
There was a problem hiding this comment.
根据名称重新排序,并新增deepseekv2\v3相关import
31a383a to
1d74d62
Compare
…PaddleNLP into dev_20241231_add_deepseekv3
DrownFish19
commented
Jan 8, 2025
| if x.endswith(key): | ||
| state_keys_map[key] = x | ||
| break | ||
| # break # remove break for math A.key B.key ... |
Collaborator
Author
There was a problem hiding this comment.
此处避免模型参数具有相同后缀,无法拿到TPAction的情况
ZHUI
previously approved these changes
Jan 8, 2025
|
|
||
| class DeepseekV3PretrainedModel(DeepseekV2PretrainedModel): | ||
| config_class = DeepseekV2Config | ||
| base_model_prefix = "deepseek_v3" |
Contributor
There was a problem hiding this comment.
咱们都继承了,要不 base_model_prefix 改成hf一样?参数不好处理的话,就算了
Collaborator
Author
There was a problem hiding this comment.
- 参数比较好处理,重写一下就行
- base_model_prefix = "model" 能节省很多代码,后续的模型直接继承CausalLM就可以,不用从DeepseekV3PretrainedModel开始修改
ZHUI
reviewed
Jan 10, 2025
| y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype) | ||
| expanded_attn_mask = expanded_attn_mask.astype(dtype) | ||
| expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype) | ||
| y = paddle.to_tensor(-1.7005809656952787e38, dtype="float32") |
|
@DrownFish19 @ZHUI 请问Paddle目前已经支持MTP类型的模型训练了吗?如果支持的话,有单测或demo吗?谢谢! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Models
Description
Add DeepseekV3.