Skip to content

Conversation

@esythan
Copy link
Owner

@esythan esythan commented Oct 27, 2021

PR types

PR changes

Describe

Ray2020BD and others added 30 commits September 29, 2021 14:44
* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group_norm op
* [npu] add box coder

* [npu] add box coder
* add align for WorkQueue

* add spinlock

* merge spinlock
* init functional jacobian api

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* init hessian API

* save status

* polish API docstring

* modify docstring

* add utils.py

* save status

* fix dygraph double grad dtype error when calling for high differential senario

* reinvoke ci

* test_hessian.py is ok

* polish hessian API

* init vhp

* Revert "init vhp"

This reverts commit cbd4d3b.

* add test for partial_engine.cc

* modify numerical_delta with dtype float32

* merge fix for dtype float64

* spell fix

* polish code

* rm _stop_gradient_pre_process

Co-authored-by: JiabinYang <[email protected]>
* fix cusparse compile problem, test=develop

* Modify file permissions
* update func name

* skip cpu

* update unittest

* update unittest
* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error
* fix raw optim

* pre-commit test file

Co-authored-by: sneaxiy <[email protected]>
…#36214)

* [NPU] modify transpose2 and index_select_grad kernels for model xlnet

* add transpose2 int64_t unit test

* add more transpose2 unit tests

* update test_transpose_op_npu.py
* tmp

* added concat BF16/FP32 BWD oneDNN kernel

* minor change

* minor change

* fix for CI

* added formatting

* Reverted deleting static keyword

* added reviewers suggestions

* reverted deleting concat bf16 test file

* fixed concat tests
* Remove unused header.

* Use ConvMKLDNNHandlerT for conv2d INT8.

* Use absolute module path to import.
* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer

* update

* update
* Added oneDNN BF16 relu

* fixed typo

* refactored test, review fixes
Add python interface of subgraph: 1. all_sub_graphs() 2. get_sub_graph(idx)
* [NPU] support NCL and NCL for BatchNorm, test=develop

* [NPU] remove debug files, test=develop

* update, test=develop
* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage
* fix for conv op

* Minor change
对于__getattr__重载后不满足条件的参数,全部抛出AttributeError异常,达到与未重载版本一致。
支持C++开发注册GeneratePass,简化针对fusion等子图优化场景开发方式。
* support ClipGradByGlobalNorm in sharding

* support ClipGradByGlobalNorm in sharding

* test=allcase
* update fft api path
* add sample code for ihfft2

Co-authored-by: chenfeiyu <[email protected]>
feng_shuai and others added 27 commits October 26, 2021 10:08
* fix pool2d convert case

* add pool2d convert test case for trt6
…36637)

* In cpu parallel using gloo, add various length support for SelectedRows

* fix bug

* fix bugs

* fix by code review

* remove timeout
* optimize fast tokenizer
…matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652)

* new_Matmul2ToMatmulToMul

* new_Matmul2ToMatmulToMul

* fix paddle_pass_builder

* fix paddle_pass_builder

* fix paddle_pass_builder

* tem

* tem

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* add matmul_broadcast_unitest

* fix op_teller
* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos
* Fix the null ptr bug in build_cinn_pass.

* Add test for empty&ctrl var.
* fix wrong trt dim when input dim is 2

* update leaky_relu and instance_norm converter unit test

* add instance_norm input dim check
* [NPU] fix argsort op, test=develop

* remove debug files, test=develop

* fix typo, test=develop

* address review comments, test=develop
…ubate (#36704)

将 #35905#35843 PR中新增的的python api接口移到incubate目录下。
Update `cond` English document
* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* update

* update

* update Error MSG

* update EventsWaiter

* Add Cancel For ThreadPool

* Add UT for Cancel
* cache exception in child thread

* add ut

* fix ut
功能:本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
为了减少防存开销,本PR采取了两种优化方法:
(1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
(2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
* remove additional warnning in layer.to

* remove additional warnning in layer.to

* remove additional warnning in layer.to

* remove additional warnning in layer.to

* remove additional warnning in layer.to
* add eigvalsh with is_test

* add eigvalsh op

* fix backward bug

* forward and backward, float and complex, unittest

* remove eigvalsh_helper.h

* remove changes of cusolver.h

* fix unittest

* fix unittest bug

* update code following eigh

* fix test

* update lapack

* pull develop

* update funcor

* fix unittest bug

* fix details

* add tensor_method_func

* fix notes
* fix fftshift/ifftshift on static mode
* update roll_op version
* add more test cases for fftshift/ifftshift
* fix matmul dim error

* fix wrong dim check in matmul
* bugfix: only check backend when mode == Collecive

* fix bug
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
@esythan esythan merged commit b39c1e6 into esythan:develop Oct 27, 2021
esythan pushed a commit that referenced this pull request Feb 11, 2022
…addlePaddle#38914)

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues
esythan pushed a commit that referenced this pull request Apr 11, 2022
…ePaddle#41387)

* [Refactor] refactored eager_gen.py PR #2

* [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes

* Fixed minor issue

* Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition

* Fixed issues

* Supported higher-order grad node generation

* [DoubleGrad PR #4] Supported higher-order GradNode generation

* [DoubleGrad #4] Bug Fixes to Double Grad Node Generation

* Fixed yaml typo

* Fixed yaml typo

* fixed minor issues

* [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad()

* Fixed minor issue

* Fixed CI-Inference issue

* Fixed CI-inference issues

* [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run

* Fixed minor issues

* Fixed issue with backward graph construction logic

* Fixed implementation issues with backward graph reconstruction

* Fixed unittest issue

* Fixed issues

* [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul

* Fixed issues with phi kernel

* Added triple grad test case

* Fixed minor issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.