[Auto Parallel] Speed completion #3

Caozhou1995 · 2023-02-13T08:29:10Z

PR types

New features

PR changes

Others

Describe

This PR speeds completion by @aoyulong and it will be merged to paddle develop.

* refactor mp. * update setup.py. * update mp_layers.py for compatibility. * add documents for mp_layers.py * update init.py * update collective.py. * update. * update mp_ops.py * update. * update code style. * update code style.

…addlePaddle#46169) Co-authored-by: RichardWooSJTU <[email protected]>

…46198)

…addlePaddle#46132) * [PHI] Support bmm and bmm_grad in xpu (PaddlePaddle#45887) * support bmm and bmm_grad in xpu * add error removal * test=kunlun * refactor code for better structure * test=kunlun * add fp16 kernel for bmm * test=kunlun * test=kunlun

…#46145)

) * add unit test for sum higher level op (PaddlePaddle#45961) * support slice op backward refuse forward and add high level unit test (PaddlePaddle#45960) * support tile op backward refuse forward (PaddlePaddle#45942) * support expand_v2 op backward refuse forward (PaddlePaddle#45941) * support concat backward refuse forward (PaddlePaddle#45940)

…addlePaddle#46179)

…ram (PaddlePaddle#46194) * [dy2static] support user to use decorator in their program (PaddlePaddle#45768) * support deco * fix deco ast type * arg_str * 1 * support callable deco * code style * codestyle * test_error * fix decos in another file * recover conflict codes * [BugFix] fixed a bug in decorator transformer, it can not analyze decorator with params correctly (PaddlePaddle#46055) * fix deco call * add raise * add test * add warn, fix paddle api * fix error type * fix coverage

PaddlePaddle#46195)

…tion operators (PaddlePaddle#46184) * [cherry-pick] extend reduce_sum,reduce_sum,eq,ne,ge,abs,pow,etc higher order operators * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators * add cast primitive operators * add pow,square prim2oirg rules * add elementwise_div orig2prim rule * [cherry-pick] add mean,sum,ge,gt,ne,abs,etc higher-order differentiation operators(PaddlePaddle#45888) * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators

…ecific inputs (PaddlePaddle#46148) (PaddlePaddle#46193) * fix return order error and duplicate results with specific inputs

* fix wrong eigen header include * fix complie bug * fix nan_inf_utils_detail * fix resource_manager * fix conv_miopen_helper

* fix static_check error when compile twice (PaddlePaddle#46140) * [CI] fix static check in build_pr_dev (PaddlePaddle#46192) Co-authored-by: Zhou Wei <[email protected]>

…addlePaddle#46226) cherry-pick from PaddlePaddle#45826 LayoutAutotune 支持 inplace 类型的OP 根据 Add eager layout autotune PaddlePaddle#45409 修改意见调整UseAutotune 将LayoutAutotune判断放到controller中，与AMP 判断保持一致

…ddle#46223) * add scope cache & reuse * add gc scope for end of each train step * del scope reuse for jit * refine code * test

…addlePaddle#46178) (PaddlePaddle#46260) * fix preln_residual_bias_fuse_pass bug in TNT_small model

…Paddle#46211) * support cast op backward refuse forward and fix some bugs (PaddlePaddle#46173) * support cast op backward refuse forward * Fix the bug of high order unit test framework * support sign op backward refuse forward (PaddlePaddle#46002)

…46206) * fix linspace error in amp * fix log * fix amp error

cherry-pick : PaddlePaddle#46016, PaddlePaddle#46021, PaddlePaddle#45974 * [Sparse]Sparse add support gpu (PaddlePaddle#45974) * [Sparse]Remove unused code (PaddlePaddle#46021) * [Sparse] Add infer meta (PaddlePaddle#46016)

…lePaddle#46094) (PaddlePaddle#46186) * Fix TransDataBackend Error when call unsqueeze using MKL Tensor * Add UT * Refine UT

…dle#46219) * add config * add config * follow comments * fix serial run

* Support matmul_v2 in Paddle-TensorRT converter.

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Cherry-pick of PR 46045 * Fix bug of reduce_sum kp op. * Fix bug of reduce_sum kp operator compilation. If compilation device is XPU, eigen kernel should be ignored.

* [Eager] Fix ocr (PaddlePaddle#46124) * fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns * [Eager Bug fix]Fix Detection (PaddlePaddle#46147) * fix linspace error in amp * fix log * fix amp error * Revert "Simplify size op impl (PaddlePaddle#45808)" This reverts commit c252b1d. * fix_seg * fix detection Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: Chen Weihang <[email protected]>

…ePaddle#46270) * [Auto Parallel] Change the import way of Auto Parallel (PaddlePaddle#46115) * fix strategy (PaddlePaddle#46256) * [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (PaddlePaddle#46180) * remove no need grad allreduce communication when sharding-dp * remove no need grad allreduce communication when sharding-dp * bugfix * bugfix * bugfix Co-authored-by: Yulong Ao <[email protected]> Co-authored-by: JZ-LIANG <[email protected]>

…lePaddle#46261) * polish code comments * polish data_device_transform.cc

…e#45545) (PaddlePaddle#46280) * Move ITensor construction for Weight (persistable variable) from OpConvert to TensorRTEngine.

* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass

solve comm problem

wuhuachaocoding and others added 30 commits September 19, 2022 14:13

refactor mp. (PaddlePaddle#45803) (PaddlePaddle#46121)

e5dc9d6

* refactor mp. * update setup.py. * update mp_layers.py for compatibility. * add documents for mp_layers.py * update init.py * update collective.py. * update. * update mp_ops.py * update. * update code style. * update code style.

Add INT8 support for fused_multi_transformer_op (PaddlePaddle#45284) (P…

db368d5

…addlePaddle#46169) Co-authored-by: RichardWooSJTU <[email protected]>

fix_recover_remove_padding kernel (PaddlePaddle#46050) (PaddlePaddle#…

6b59a07

…46198)

fix (PaddlePaddle#46125)

855fdde

Remove redundant code in pe engine (PaddlePaddle#46110) (PaddlePaddle…

7f0c1f0

…#46145)

cherry-pick 46152 (PaddlePaddle#46183)

707d838

Add symbolic shape deduction function for general Plugin mechanism (P…

a056601

…addlePaddle#46179)

[JitLayer]Save property meta file to correct path (PaddlePaddle#46131) (

45a3c65

PaddlePaddle#46195)

[vision.ops.nms] Fix return order error and duplicate results with sp…

be84cac

…ecific inputs (PaddlePaddle#46148) (PaddlePaddle#46193) * fix return order error and duplicate results with specific inputs

Recompute unify incubate (PaddlePaddle#46073) (PaddlePaddle#46210)

4bced24

Fix wrong eigen header include (PaddlePaddle#46082) (PaddlePaddle#46202)

ac8cce2

* fix wrong eigen header include * fix complie bug * fix nan_inf_utils_detail * fix resource_manager * fix conv_miopen_helper

[Cherry-pick] Fix static check (PaddlePaddle#46253)

7712ce1

* fix static_check error when compile twice (PaddlePaddle#46140) * [CI] fix static check in build_pr_dev (PaddlePaddle#46192) Co-authored-by: Zhou Wei <[email protected]>

Run_program_op add scope cache & reuse (PaddlePaddle#45813) (PaddlePa…

4f28a4c

…ddle#46223) * add scope cache & reuse * add gc scope for end of each train step * del scope reuse for jit * refine code * test

[Inference] fix preln_residual_bias_fuse_pass bug in TNT_small model (P…

c384b00

…addlePaddle#46178) (PaddlePaddle#46260) * fix preln_residual_bias_fuse_pass bug in TNT_small model

[Eager] Fix linspace error in amp (PaddlePaddle#46088) (PaddlePaddle#…

38c0fd0

…46206) * fix linspace error in amp * fix log * fix amp error

[Cherry-pick] Sparse add InferMeta (PaddlePaddle#46235)

fd8ec4a

cherry-pick : PaddlePaddle#46016, PaddlePaddle#46021, PaddlePaddle#45974 * [Sparse]Sparse add support gpu (PaddlePaddle#45974) * [Sparse]Remove unused code (PaddlePaddle#46021) * [Sparse] Add infer meta (PaddlePaddle#46016)

Fix TransDataBackend Error when call unsqueeze using MKL Tensor (Padd…

5034030

…lePaddle#46094) (PaddlePaddle#46186) * Fix TransDataBackend Error when call unsqueeze using MKL Tensor * Add UT * Refine UT

[cherry-pick] Refine thread pool config of interpretercore (PaddlePad…

1418a71

…dle#46219) * add config * add config * follow comments * fix serial run

[Paddle-TRT] Support matmul_v2 in Paddle-TensorRT (PaddlePaddle#46177)

654807c

* Support matmul_v2 in Paddle-TensorRT converter.

[PolishComments] Polish some code comments (PaddlePaddle#46032) (Padd…

42e56f6

…lePaddle#46261) * polish code comments * polish data_device_transform.cc

[Paddle-TRT] Full support for ops with persistable input (PaddlePaddl…

adb2f5e

…e#45545) (PaddlePaddle#46280) * Move ITensor construction for Weight (persistable variable) from OpConvert to TensorRTEngine.

xiaoxiaohehe001 and others added 29 commits January 12, 2023 15:03

fix_split_infermeta (PaddlePaddle#49745)

8a93404

fix fc kernel diff (PaddlePaddle#49781)

01c26ab

* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass

fix_arg_release24 (PaddlePaddle#49771)

0699afb

remove ps_core dependency

d8eb448

fix both with_rpc and with_distributed on

d92cf96

add cond interceptor

5bacf03

remove max_slot_num

04627d6

fix test case

563055f

fix cmake

006bf14

Modified compute and amplifier interceptor

28668c2

add multi fetch

88f6413

fix gc and infinite buffer size

f616a58

fix start message

a19f9dc

Add start interceptor and fix bug in switch scope

ff2c262

Optimize gc in executor (PaddlePaddle#50301)

0db7f11

auto parallel for generation

5aceb69

fix gc bug and start interceptor

51d1b69

add config generation_batch_size

a40fad6

speed up remove op

191f93d

modify send recv kernel to broadcast

c702e66

send cond in message

f697337

fix scope

2f0a1a3

fix scope

8cc4602

serialize tensor

d46aebc

adapt for message

338df6e

solve comm cross

3f60b54

Merge pull request #2 from ForFishes/liyurui

fde0b14

solve comm problem

modify vlog

e84b4f9

speed completion

3d077a3

Caozhou1995 closed this Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Auto Parallel] Speed completion #3

[Auto Parallel] Speed completion #3

Uh oh!

Caozhou1995 commented Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[Auto Parallel] Speed completion #3

[Auto Parallel] Speed completion #3

Uh oh!

Conversation

Caozhou1995 commented Feb 13, 2023

PR types

PR changes

Describe

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants