develop #8

esythan · 2021-10-27T06:55:14Z

PR types

PR changes

Describe

* [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group_norm op

* [npu] add box coder * [npu] add box coder

* add align for WorkQueue * add spinlock * merge spinlock

* init functional jacobian api * finish test with dtype float32 * add float64 test case * polish code * use atol=1e-5 with dtype float64 * fix for ci * set timeout for test_jacobian * init hessian API * save status * polish API docstring * modify docstring * add utils.py * save status * fix dygraph double grad dtype error when calling for high differential senario * reinvoke ci * test_hessian.py is ok * polish hessian API * init vhp * Revert "init vhp" This reverts commit cbd4d3b. * add test for partial_engine.cc * modify numerical_delta with dtype float32 * merge fix for dtype float64 * spell fix * polish code * rm _stop_gradient_pre_process Co-authored-by: JiabinYang <[email protected]>

* fix cusparse compile problem, test=develop * Modify file permissions

* update func name * skip cpu * update unittest * update unittest

* add basic support for CUDA Graph * fix ci compile error * fix LOG print, fix windows CI * follow comments and update * small fix for default ctor * fix rocm compile error * fix CPU compile error

* fix raw optim * pre-commit test file Co-authored-by: sneaxiy <[email protected]>

…#36214) * [NPU] modify transpose2 and index_select_grad kernels for model xlnet * add transpose2 int64_t unit test * add more transpose2 unit tests * update test_transpose_op_npu.py

…age (#36254)

* tmp * added concat BF16/FP32 BWD oneDNN kernel * minor change * minor change * fix for CI * added formatting * Reverted deleting static keyword * added reviewers suggestions * reverted deleting concat bf16 test file * fixed concat tests

* Remove unused header. * Use ConvMKLDNNHandlerT for conv2d INT8. * Use absolute module path to import.

* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer * update * update

* Added oneDNN BF16 relu * fixed typo * refactored test, review fixes

Add python interface of subgraph: 1. all_sub_graphs() 2. get_sub_graph(idx)

* [NPU] support NCL and NCL for BatchNorm, test=develop * [NPU] remove debug files, test=develop * update, test=develop

* support CUDA Graph on PE * add ut, fix CI compile * reduce memory consumption * fix CUDA 10 CI * improve coverage * improve python coverage

* fix for conv op * Minor change

对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。

支持C++开发注册GeneratePass，简化针对fusion等子图优化场景开发方式。

* support ClipGradByGlobalNorm in sharding * support ClipGradByGlobalNorm in sharding * test=allcase

* update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <[email protected]>

* fix pool2d convert case * add pool2d convert test case for trt6

…36637) * In cpu parallel using gloo, add various length support for SelectedRows * fix bug * fix bugs * fix by code review * remove timeout

* optimize fast tokenizer

…matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652) * new_Matmul2ToMatmulToMul * new_Matmul2ToMatmulToMul * fix paddle_pass_builder * fix paddle_pass_builder * fix paddle_pass_builder * tem * tem * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * add matmul_broadcast_unitest * fix op_teller

* move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos

* Fix the null ptr bug in build_cinn_pass. * Add test for empty&ctrl var.

* fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check

* [NPU] fix argsort op, test=develop * remove debug files, test=develop * fix typo, test=develop * address review comments, test=develop

…ubate (#36704) 将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。

Update `cond` English document

* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter * Add Cancel For ThreadPool * Add UT for Cancel

* cache exception in child thread * add ut * fix ut

功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

* remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to

* add eigvalsh with is_test * add eigvalsh op * fix backward bug * forward and backward, float and complex, unittest * remove eigvalsh_helper.h * remove changes of cusolver.h * fix unittest * fix unittest bug * update code following eigh * fix test * update lapack * pull develop * update funcor * fix unittest bug * fix details * add tensor_method_func * fix notes

* fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift

* fix matmul dim error * fix wrong dim check in matmul

* bugfix: only check backend when mode == Collecive * fix bug

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

…addlePaddle#38914) * Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Patched python level LoDTensor * Merge Tensor into DenseTensor * Fixed namespace issues,test=allcases * Fixed merge issues * Fixed inference issues * Fixed NPU test issues * Fixed merge issues

…ePaddle#41387) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues * [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul * Fixed issues with phi kernel * Added triple grad test case * Fixed minor issue

Ray2020BD and others added 30 commits September 29, 2021 14:44

[NPU] Add group norm (#35937)

c79de72

* [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group norm * [NPU] Add group_norm op

fix bug of top_k npu op (#36175)

2b8fd70

[npu] add box coder (#36171)

83578cf

* [npu] add box coder * [npu] add box coder

add slot record dataset (#36200)

79bd5f9

Spinlock (#36030)

a9ea41c

* add align for WorkQueue * add spinlock * merge spinlock

fix cusparse compile problem, test=develop (#36199)

3eb5071

* fix cusparse compile problem, test=develop * Modify file permissions

add optest for adamw (#36148)

69eed34

* update func name * skip cpu * update unittest * update unittest

Add basic support for CUDA Graph (#36190)

21b93c3

* add basic support for CUDA Graph * fix ci compile error * fix LOG print, fix windows CI * follow comments and update * small fix for default ctor * fix rocm compile error * fix CPU compile error

fix the undefined variable bug in dist_transformer file (#36211)

8af939f

Fix raw optim (#36176)

5e0f199

* fix raw optim * pre-commit test file Co-authored-by: sneaxiy <[email protected]>

[NPU] modify transpose2 and index_select_grad kernels for model xlnet (…

a66b9fb

…#36214) * [NPU] modify transpose2 and index_select_grad kernels for model xlnet * add transpose2 int64_t unit test * add more transpose2 unit tests * update test_transpose_op_npu.py

add test_hessian time out (#36234)

56b04bc

fix yolo (#36240)

c12176e

add slotrecord datafeed (#36099)

0a3dbe8

added Piotr to authors.md and updated Intel-related paddle authors im…

2cee0ea

…age (#36254)

[OneDNN] Conv op refactor. (#36252)

e928834

* Remove unused header. * Use ConvMKLDNNHandlerT for conv2d INT8. * Use absolute module path to import.

fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (#36237)

730dcaf

* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer * update * update

fix cast cuda implementation (#36266)

9814f89

Added oneDNN BF16 relu (#36265)

1bd9cfe

* Added oneDNN BF16 relu * fixed typo * refactored test, review fixes

add python interface of sub_graph (#36120)

a29ff4c

Add python interface of subgraph: 1. all_sub_graphs() 2. get_sub_graph(idx)

[NPU] BatchNorm support layout of NCL and NLC, test=develop (#35668)

7cb19f5

* [NPU] support NCL and NCL for BatchNorm, test=develop * [NPU] remove debug files, test=develop * update, test=develop

add fs list_files_info (#36224)

ca16e8f

Support CUDA Graph on ParallelExecutor (#36250)

f9591bb

* support CUDA Graph on PE * add ut, fix CI compile * reduce memory consumption * fix CUDA 10 CI * improve coverage * improve python coverage

Fix for oneDNN conv op (#36284)

57e8cbe

* fix for conv op * Minor change

fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') error (#36229)

d8887af

对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。

C++ support register pass via PassDesc (#36095)

2fd8dee

支持C++开发注册GeneratePass，简化针对fusion等子图优化场景开发方式。

support ClipGradByGlobalNorm in sharding (#36012)

623df42

* support ClipGradByGlobalNorm in sharding * support ClipGradByGlobalNorm in sharding * test=allcase

update fft api path (#36219)

c8a0101

* update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <[email protected]>

feng_shuai and others added 27 commits October 26, 2021 10:08

Pool3d 2.0 (#36545)

229bae8

Fix conv2d convert case (#36699)

db633af

* fix pool2d convert case * add pool2d convert test case for trt6

Support various length support for SelectedRows in GLOO::AllGather (#…

eca78a9

…36637) * In cpu parallel using gloo, add various length support for SelectedRows * fix bug * fix bugs * fix by code review * remove timeout

Optimize FasterTokenizer (#36701)

290ded7

* optimize fast tokenizer

enable flags_benchmark for dygraph (#36686)

21bece3

move fft and signal files, move signal APIs (#36540)

81e0c1b

* move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos

Fix the null ptr bug in build_cinn_pass. (#36698)

28bab07

* Fix the null ptr bug in build_cinn_pass. * Add test for empty&ctrl var.

fix wrong trt dim when input dim is 2 (#36614)

43dcf23

* fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check

[NPU] fix argsort op, test=develop (#36576)

3523bbe

* [NPU] fix argsort op, test=develop * remove debug files, test=develop * fix typo, test=develop * address review comments, test=develop

Move fused_attention and fused_feedforward functional api path to inc…

9aeca2f

…ubate (#36704) 将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。

Modify paddle.static.nn.cond doc (#36694)

eb9ef88

Update `cond` English document

[new-exec] Add cancel for thread pool (#36688)

fe6dbdd

* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter * Add Cancel For ThreadPool * Add UT for Cancel

[new-exec] cache exception in child thread (#36692)

87fbbd3

* cache exception in child thread * add ut * fix ut

Add roi_align grad (#36724)

236ed94

roll_op: support Tensor as input for shifts (#36727)

7b1e30f

Remove additional warnning in layer.to (#36700)

63f1e6b

* remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to

show paddle traceback after last user code traceback (#36741)

63f3ae0

Fix inverse in fake quant (#36762)

542ba21

enable trt test check and fix trt ut error（3/3） (#36581)

8c1c72a

add fp16 unittests for kl2 (#36583)

6838a18

fix fftshift/ifftshift on static mode (#36748)

34b6860

* fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift

add matmul_v2 to v1 CPU pass and fix matmul dim error (#36731)

d5245a3

* fix matmul dim error * fix wrong dim check in matmul

bugfix: only check backend when mode == Collecive (#36758)

e625315

* bugfix: only check backend when mode == Collecive * fix bug

Fused transformer encoder layer and fused feedforward layer (#36604)

9f3613f

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

esythan merged commit b39c1e6 into esythan:develop Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

develop #8

develop #8

Uh oh!

esythan commented Oct 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

85 participants

develop #8

develop #8

Uh oh!

Conversation

esythan commented Oct 27, 2021

PR types

PR changes

Describe

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

85 participants