forked from PaddlePaddle/Paddle
-
Notifications
You must be signed in to change notification settings - Fork 0
[Auto Parallel] Speed completion #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* refactor mp. * update setup.py. * update mp_layers.py for compatibility. * add documents for mp_layers.py * update init.py * update collective.py. * update. * update mp_ops.py * update. * update code style. * update code style.
…addlePaddle#46169) Co-authored-by: RichardWooSJTU <[email protected]>
…addlePaddle#46132) * [PHI] Support bmm and bmm_grad in xpu (PaddlePaddle#45887) * support bmm and bmm_grad in xpu * add error removal * test=kunlun * refactor code for better structure * test=kunlun * add fp16 kernel for bmm * test=kunlun * test=kunlun
) * add unit test for sum higher level op (PaddlePaddle#45961) * support slice op backward refuse forward and add high level unit test (PaddlePaddle#45960) * support tile op backward refuse forward (PaddlePaddle#45942) * support expand_v2 op backward refuse forward (PaddlePaddle#45941) * support concat backward refuse forward (PaddlePaddle#45940)
…ram (PaddlePaddle#46194) * [dy2static] support user to use decorator in their program (PaddlePaddle#45768) * support deco * fix deco ast type * arg_str * 1 * support callable deco * code style * codestyle * test_error * fix decos in another file * recover conflict codes * [BugFix] fixed a bug in decorator transformer, it can not analyze decorator with params correctly (PaddlePaddle#46055) * fix deco call * add raise * add test * add warn, fix paddle api * fix error type * fix coverage
…tion operators (PaddlePaddle#46184) * [cherry-pick] extend reduce_sum,reduce_sum,eq,ne,ge,abs,pow,etc higher order operators * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators * add cast primitive operators * add pow,square prim2oirg rules * add elementwise_div orig2prim rule * [cherry-pick] add mean,sum,ge,gt,ne,abs,etc higher-order differentiation operators(PaddlePaddle#45888) * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators
…ecific inputs (PaddlePaddle#46148) (PaddlePaddle#46193) * fix return order error and duplicate results with specific inputs
* fix wrong eigen header include * fix complie bug * fix nan_inf_utils_detail * fix resource_manager * fix conv_miopen_helper
* fix static_check error when compile twice (PaddlePaddle#46140) * [CI] fix static check in build_pr_dev (PaddlePaddle#46192) Co-authored-by: Zhou Wei <[email protected]>
…addlePaddle#46226) cherry-pick from PaddlePaddle#45826 LayoutAutotune 支持 inplace 类型的OP 根据 Add eager layout autotune PaddlePaddle#45409 修改意见调整UseAutotune 将LayoutAutotune判断放到controller中,与AMP 判断保持一致
…ddle#46223) * add scope cache & reuse * add gc scope for end of each train step * del scope reuse for jit * refine code * test
…addlePaddle#46178) (PaddlePaddle#46260) * fix preln_residual_bias_fuse_pass bug in TNT_small model
…Paddle#46211) * support cast op backward refuse forward and fix some bugs (PaddlePaddle#46173) * support cast op backward refuse forward * Fix the bug of high order unit test framework * support sign op backward refuse forward (PaddlePaddle#46002)
…46206) * fix linspace error in amp * fix log * fix amp error
cherry-pick : PaddlePaddle#46016, PaddlePaddle#46021, PaddlePaddle#45974 * [Sparse]Sparse add support gpu (PaddlePaddle#45974) * [Sparse]Remove unused code (PaddlePaddle#46021) * [Sparse] Add infer meta (PaddlePaddle#46016)
…lePaddle#46094) (PaddlePaddle#46186) * Fix TransDataBackend Error when call unsqueeze using MKL Tensor * Add UT * Refine UT
…dle#46219) * add config * add config * follow comments * fix serial run
* Support matmul_v2 in Paddle-TensorRT converter.
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Cherry-pick of PR 46045 * Fix bug of reduce_sum kp op. * Fix bug of reduce_sum kp operator compilation. If compilation device is XPU, eigen kernel should be ignored.
* [Eager] Fix ocr (PaddlePaddle#46124) * fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns * [Eager Bug fix]Fix Detection (PaddlePaddle#46147) * fix linspace error in amp * fix log * fix amp error * Revert "Simplify size op impl (PaddlePaddle#45808)" This reverts commit c252b1d. * fix_seg * fix detection Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: Chen Weihang <[email protected]>
…ePaddle#46270) * [Auto Parallel] Change the import way of Auto Parallel (PaddlePaddle#46115) * fix strategy (PaddlePaddle#46256) * [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (PaddlePaddle#46180) * remove no need grad allreduce communication when sharding-dp * remove no need grad allreduce communication when sharding-dp * bugfix * bugfix * bugfix Co-authored-by: Yulong Ao <[email protected]> Co-authored-by: JZ-LIANG <[email protected]>
…lePaddle#46261) * polish code comments * polish data_device_transform.cc
…e#45545) (PaddlePaddle#46280) * Move ITensor construction for Weight (persistable variable) from OpConvert to TensorRTEngine.
* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass
solve comm problem
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Others
Describe
This PR speeds completion by @aoyulong and it will be merged to paddle develop.