Instance norm0828 by ceci3 · Pull Request #5 · ceci3/Paddle

ceci3 · 2019-08-28T11:07:40Z

No description provided.

…le#18672) * remove ctr reader, all functions are satisfied in dataset

* hash_op support int64 hash_size * add corresponding UT

* feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop

… and official document web sites (PaddlePaddle#18671) optimize the error reporting information of cuda related API index on develop: 130ac17 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop

…dle#18683)

* add check of executor, test=develop

test=develop

Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)

* fix clip_by_norm doc, test=develop

add license

…dle#18715) * split different comm method for mnist distributed training

do some odd jobs, test=develop

test=develop

…pile time. (PaddlePaddle#18695) test=develop

…dle#18723) test=develop

The change includes 3 things: 1. Set CPU_NUM to 1 in the tests because the ParallelExecutor will print warning that CPU_NUM is not set and use default 1. 2. Old tests compare two RNNs, hand written simple RNN and same RNN built by Paddle, but initialized RNN weights in numpy random and Paddle random separately. Fixed it by setting weights and bias values. 3. Also set numpy random seed in the tests. Now the two RNNs diff can be smaller (rtol from 0.1, 0.2 to. 0.01) in the tests. test=develop

…#18722)

add parameter server launch so that a user can easily launch parameter server

* supports distributed classification training * update API.spec * fix evenly division in python3 * change "index_range" to "index_num" in shard_index operator test=document_preview test=develop

* rewrite the conv_op using cudnn_conv_helper * add workspace limit for v7 test=develop * fix test=develop * add half float test=develop * fix test=develop * fix test=develop * revise code style test=develop * fix test=develop

…gradients. (PaddlePaddle#18664) * support sparse gradients test=develop

（1）support patch data （merge slots of instances of same line id, modify dense layer which changes its size）（2）add fleet load_one_table interface, support load from paddle model and load from pslib model （3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error. （5）add more debug info in TrainFilesWithProfiler

test=develop

test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop

* modify install GPU97 * modify install GPU97

* support tensor input with padding for warpctc op * merge with develop * test=develop * modified python API examples test=develop * nn.py is modified for code coverage test=develop * update documents info about warpctc op in API.spec test=develop * add test_warpctc_with_padding in test_layers test=develop * add warning log for cuda_version back to warpctc_op.cc * modify API.spec for warpctc op test=develop * modify API.spec * update warpctc test to new CompiledProgram API test=develop * modify code examples for warpctc op test=develop * modify API.spec for warpctc op test=develop * modify API.spec for warpctc op test=develop

…e#19430) fix fleet_desc dense_table unsort bug ，not support format for abacus hotstart yet.

* improve sequence_conv api doc test=develop * add warning for padding param test=develop modify into deprecated

PaddlePaddle#19324) * save the callstack information to file when exception throws test=develop

…ddlePaddle#19407) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop

* add signal message to stderr, test=develop * add unittests for ugly SignalHandle, test=develop

* enabel test_parallel_dygraph_se_resnext test=develop

* update for multiple threading test=develop * remove PADDLE_ENFORCE test=develop

…le#18863) * fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop

…executor (PaddlePaddle#19443) test=develop

…addle#18968) * fix con2d transpose bias by create and init it in build_onee * fix API spec * test=develop, invoke ci * fix bias_attr and act has no effect error on layer norm, conv2dTranpose, billinearTensorProduct, sequece_conv. fix original_mode not used error on GRUunit. fix sample_weight not set error on NCE. Add ut for all thoese layer * test=develop, change success standard for conv2dTranspose * test=develop, fix test_layers to invoke some error branch * test=develop, fix sample code * test=develop, fix BilinearTensorProduct failed in dygraph mode * test=develop, fix test_layers segment fault error

…aneously, test=develop (PaddlePaddle#19412) Support Tensor input for row_conv_op

… instance_norm0828

…1223) * Add the first implememtation of fusion_group op PaddlePaddle#19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. PaddlePaddle#21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop

* Add elementwise_sub_mkldnn_op without grad * Add test to static_mode_white_list * Refactor code, change license years * Remove invalid grad implementation * Fix element_wise_sub_op test * Fix CI Approval error * Remove unnecessary EltwiseSubMKLDNNGradKernel class * Fix CI Approval 2 * Fix CI Approval 3 * Fix CI Approval Attempt #4 * Fix CI Approve Attempt #5 * Fix CI Approval Attempt #6 * Fix CI Approval Attemt PaddlePaddle#7 * Change test names containing add to sub * Fix old tests testing add instead of sub * Copy grad implementation from elementwise_add_mkldnn * CI test fix attempt * Revert "CI test fix attempt" This reverts commit c647cacf41e6a87c715385a185de5cbf65fc8900. * Fix CI attempt 2 * Fix elementwise_sub tests, temporary mkldnn broadcast test disable * Add working implementation of elementwise_sub grad * Fix build errors caused by pull * Fix format error * Fix format error 2 * Disable elementwise_sub_mkldnn test on GPU * Apply fix for paddle.fluid import * Revert changes of test_elementwise_sub and Fix mkldnn test * Revert "Apply fix for paddle.fluid import" This reverts commit fc3b122. * fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (PaddlePaddle#35862) * Add changes suggested by reviewers * Change @unittest.skipIf... to @OpTestTool.skip_if_not_cpu_bf16() to satisfy Approval CI * Remove check_dygraph=False to satisify CI Approval Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>

* update fft api path (PaddlePaddle#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> * fix fft axis (PaddlePaddle#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (PaddlePaddle#36114) * fft: modify sample code result (PaddlePaddle#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414) * add rocm support for fft api (PaddlePaddle#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (PaddlePaddle#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ❤️ <33169170+lijiaqi0612@users.noreply.github.com>

…t=allcases (PaddlePaddle#38632) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues

…lePaddle#39085) * updates callers, test=develop * updates tensor, test=develop * fixes errors, test=develop * remove some dtypes, test=develop * fix errors in the base storage modification, test=develop * fixes a bug, test=develop * fixes the bugs in push the whole, test=develop * updates, test=develop * update * update, test=develop * fixes the mac-py3 CI, test=develop * remove the storage impl, test=develop * updates some codes, test=develop * update, test=develop * updates pten allocation, test=develop

…ddle#39236) * Added selected_rows and rw_lock to pten * Renamed the unit test target to fix CI * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid * Remove rw_lock.h,rw_lock_test.cc in fluid * Use pten::RWLock and pten::AutoRDLock, fix CI * Use pten::SelectedRows * Use pten::SelectedRows * Fix to pass NPU CI * Selected_Rows inherits from TensorBase * Use pten::SelectedRows, to pass NPU CI * To fix NPU CI * To fix NPU CI again * Use paddle/pten/core/enforce and polish code * Use pten::DataType instead of using proto_type * Move part of data_type to pten * Polish Code

…sed to paddle.grad() (PaddlePaddle#41198) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues

…efore backward run (PaddlePaddle#41306) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR PaddlePaddle#7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues

…atmul (PaddlePaddle#41387) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR PaddlePaddle#7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues * [DoubleGrad PR PaddlePaddle#8] Enabled triple grads for sigmoid and matmul * Fixed issues with phi kernel * Added triple grad test case * Fixed minor issue

guru4elephant and others added 30 commits July 18, 2019 10:50

remove ctr reader, all functions are satisfied in dataset (PaddlePadd…

5ed713d

…le#18672) * remove ctr reader, all functions are satisfied in dataset

update readme to 1.5.1 (PaddlePaddle#18670)

a5d4c2f

hash_op support int64 hash_size (PaddlePaddle#18674)

bb2f5d2

* hash_op support int64 hash_size * add corresponding UT

Change to use brpc rdma branch instead of personal branch. (PaddlePad…

ec1000c

…dle#18683)

add check of executor (PaddlePaddle#17986)

0b9acb4

* add check of executor, test=develop

Add LeakyRelu MKLDNN support (PaddlePaddle#18656)

d6b6a33

test=develop

MKL-DNN upgrade to 0.20 (PaddlePaddle#18370)

0d8e6c9

test=develop

fix clip_by_norm doc (PaddlePaddle#18688)

185b3ac

* fix clip_by_norm doc, test=develop

test=develop (PaddlePaddle#18701)

ccf06a4

add license

split different comm method for mnist distributed training (PaddlePad…

ebf9797

…dle#18715) * split different comm method for mnist distributed training

add license, test=develop (PaddlePaddle#18709)

7e3963f

do some odd jobs (PaddlePaddle#18641)

d845848

do some odd jobs, test=develop

remove unused gzstream.cmake (PaddlePaddle#18705)

c457a69

test=develop

Make infer shape of pad2d support for input with negative dims in com…

189b08d

…pile time. (PaddlePaddle#18695) test=develop

Change api approval people name (PaddlePaddle#18699)

5846918

Revert "Add LeakyRelu MKLDNN support (PaddlePaddle#18656)" (PaddlePad…

bd22453

…dle#18723) test=develop

add more traceback to py_reader error msg, test=develop (PaddlePaddle…

d07ad4c

…#18722)

add parameter server launch (PaddlePaddle#18687)

70b0376

add parameter server launch so that a user can easily launch parameter server

Fix CPU implementation of roi_align_op backward (PaddlePaddle#18728)

3429e65

supports distributed classification (PaddlePaddle#18690)

157211c

* supports distributed classification training * update API.spec * fix evenly division in python3 * change "index_range" to "index_num" in shard_index operator test=document_preview test=develop

Cudnn convolution reconstruction (PaddlePaddle#18284)

6b78e00

* rewrite the conv_op using cudnn_conv_helper * add workspace limit for v7 test=develop * fix test=develop * add half float test=develop * fix test=develop * fix test=develop * revise code style test=develop * fix test=develop

Make fuse_optimizer_op_pass also work when the model contains sparse …

fd3aad6

…gradients. (PaddlePaddle#18664) * support sparse gradients test=develop

remove unused cmake file (PaddlePaddle#18744)

0ae45f0

test=develop

modify install GPU 97 (PaddlePaddle#18768)

f0cfc3c

* modify install GPU97 * modify install GPU97

vincentXiyu and others added 18 commits August 27, 2019 12:54

Add conv dequant squash for int8 (PaddlePaddle#18905)

2e3ec66

fix fleet_desc bug && support format for abacus hotstart (PaddlePaddl…

0d79498

…e#19430) fix fleet_desc dense_table unsort bug ，not support format for abacus hotstart yet.

improve sequence_conv api doc (PaddlePaddle#19316)

a9cd513

* improve sequence_conv api doc test=develop * add warning for padding param test=develop modify into deprecated

test=develop (PaddlePaddle#19463)

3f392fd

save the callstack information to file when exception throws test=dev… (

b8aa37d

PaddlePaddle#19324) * save the callstack information to file when exception throws test=develop

Delete useless ex-scope in recurrent op (PaddlePaddle#19426)

56dd765

supports multiple NCCL communicators preserved in NCCLCommContext (Pa…

efb05ba

…ddlePaddle#19407) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop

Add signal message to stderr (PaddlePaddle#19421)

caf59d0

* add signal message to stderr, test=develop * add unittests for ugly SignalHandle, test=develop

Open test_parallel_dygraph_se_resnext (PaddlePaddle#19342)

e26411c

* enabel test_parallel_dygraph_se_resnext test=develop

Update ngraph engine for multiple threading (PaddlePaddle#19155)

6421c61

* update for multiple threading test=develop * remove PADDLE_ENFORCE test=develop

make PADDLE_ENFORCE ci check rule more robust (PaddlePaddle#19445)

61389ae

adapte fleet api for localsgd and support nccl comm configuration in …

4ef6b84

…executor (PaddlePaddle#19443) test=develop

fix row_conv_op to force it support lodtensor and tensor input simult…

1d0f043

…aneously, test=develop (PaddlePaddle#19412) Support Tensor input for row_conv_op

add instance norm

02cdb88

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

901c49e

… instance_norm0828

ceci3 closed this Aug 28, 2019

ceci3 deleted the instance_norm0828 branch August 28, 2019 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instance norm0828#5

Instance norm0828#5
ceci3 wants to merge 2563 commits intodevelopfrom
instance_norm0828

ceci3 commented Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ceci3 commented Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants