merge to local by junjun315 · Pull Request #22 · junjun315/Paddle

junjun315 · 2019-05-07T15:10:45Z

No description provided.

test=develop

fix runtime_context_cache bug when gpu model has an op runs only on cpu

…parameter passing (#17058) * resolve #17057 Fixed the bug that fuse_relu/fuse_residual option couldn't be passed to class TestConv2dInt8Op. test=develop * Fix the bug of test_conv2d_int8_mkldnn case which raised by improper parameter passing. test=develop

add checkpoint functions for graph. test=develop

…develop (#17066)

* implement distributed transpiler with fleet

implement dygraph.parallel.DataParallel to hook reduce op.

* Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop

test=develop

…near interpolation in forward (#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop

test=develop

backward of backward: leaky_relu

test=develop

…elop (#17091)

* Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop

test=develop

* fix python/paddle/fluid/__init__.py detecting problems

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop

* refine_dropout_mem,test=develop * # This is a combination of 14 commits. # The first commit's message is: remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066) # This is the 2nd commit message: Fleet unify distributed training (#16791) * implement distributed transpiler with fleet # This is the 3rd commit message: ParallelDyGraph with GPU collective mode (#16827) implement dygraph.parallel.DataParallel to hook reduce op. # This is the 4th commit message: Init mixed precision training interface (#16856) * Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop # This is the 5th commit message: fix reference_count_pass,test=develop (#17060) test=develop # This is the 6th commit message: Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop # This is the 7th commit message: remove unnecessary prepare_data (#17080) test=develop # This is the 8th commit message: fix interpolate cu. test=develop (#17101) # This is the 9th commit message: test=develop, double backward leaky_relu (#17067) backward of backward: leaky_relu # This is the 10th commit message: fix fuse optimizer ops (#17102) test=develop # This is the 11th commit message: truncated_gaussian_random supported in distributed training, test=develop (#17091) # This is the 12th commit message: Detailed coordinate description for yolov3 loss (#17007) * Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop # This is the 13th commit message: fix test_weight_decay (#17109) test=develop # This is the 14th commit message: Path flag (#17105) * fix python/paddle/fluid/__init__.py detecting problems

* 1. move the API check into CPU process 2. adjust the check order

cvm without LoD.

fix runtimeerror : dictionary changed size during iteration when calling uniform_random in python3+

resolve #17147 test=develop

* polish the label_smooth test=develop * polish code test=develop

Update the folder name generation mechanism for saving the quantized model and weights. The folder name would be unique by adding the timestamp postfix. test=develop

fix python3 run_time_error in layers.ops caused by locals()

* remove async executor python api test=develop * remove test_async_executor.py add executor train_from_dataset demo test=develop * fix import bug test=develop

* remove unnecessary set_devices

* test=develop * test=deelop

* enhance_concat, test=develop

test=develop

* add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop

* fix tensor_py,test=develop * change class name,test=develop

test_distillation_strategy always failed on a machine with 4 gpus only, disable temporarily and need to figure out the root cause and add it back later

* fix profiler and name_scope API examples test=develop * update API.spec test=develop

* fix distribute fpn proposals, test=develop

* fix unexecutable API comments, test=develop * add API.spec,test=develop

) * call SetNumThreads everytime to avoid missing omp thread setting resolve #17153 test=develop * add paddle_num_threads into config for test_analyzer_pyramid_dnn resolve #17153 test=develop

* refine api comment, test=develop

test=develop

* cherry-pick commit from 8877054 * cherry-pick commit from 3f0b97d * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit 8643dbc) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit 7ad182e) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit e14ab18) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit 4b9fa42) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit a6fb066) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit 8121b3e) * 1. add shuffle_channel_detect (cherry picked from commit 6efdea8) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit e4726a0) * Modify symbol export rules test=develop

* optimize sum op fuse multi eigen kernel calls into one cuda kernel. refine code test=develop. Signed-off-by: zhaoyuchen <[email protected]> * Refine code. test=develop Signed-off-by: zhaoyuchen <[email protected]> * Refine code according to comments. test=develop * refine code delete sum_op_gpu.h test=develop * Fix test error. test=develop Signed-off-by: zhaoyuchen <[email protected]> * refine code in format. test=develop. * refine code test=develop Signed-off-by: zhaoyuchen <[email protected]> * refine code test=develop Signed-off-by: zhaoyuchen <[email protected]>

* Add MovingAverageAbsMaxScale operator which is only used for calculating the quantization scale. * test=develop * change the output into inplace. test=develop * Revert "test=develop" This reverts commit 696cf62. * Revert "change the output into inplace. test=develop" This reverts commit a19acd2. * test=develop. * update the MovingAverageAbsMaxScaleOp test. test=develop

integer', test=develop

* add attr axis infershape. test=develop * add CUDA kernel. test=develop * fix unittest. test=develop * fix unittest for soft_label. test=develop * fix fp16 unittest. test=develop * remove comment code. test=develop * refine test for axis. test=develop * add python api. test=develop * fix doc. test=develop * fix fp16 unittest. test=develop * fix ngraph test. test=develop * fix ENFORCE for test_imperative_transformer. test=develop * fit for ngraph test. test=develop * fix after rebase develop. test=develop * fix doc. test=develop * fix API.spec. test=develop * fix test_layers. test=develop * fix format. test=develop

* remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop

test=develop

…tivations (#17235) * fix api doc of hash, relu, concat, argmin, argmax, argsoft and all activations funcs with no attrs test=develop * refine doc example code test=develop * remove >>> in doc example test=develop * refine python code block test=develop * update API spec test=develop

test=develop

…p inplace (#17225) * add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop * fix potential inplace bug test=develop * add more skip vars in mem opt pass,test=develop * follow comment,test=develop * follow comments,move duplicate out arg check to program->graph,test=develop

wzzju and others added 30 commits April 22, 2019 19:42

add checkpoint functions for graph. test=develop

bf6b9d6

fix runtime_context_cache bug when gpu model has an op runs only on cpu

490e746

test=develop

Merge pull request #17048 from luotao1/fix_runtime_cache_bug

d9cd989

fix runtime_context_cache bug when gpu model has an op runs only on cpu

Merge pull request #17029 from wzzju/add_graph_checkpoint

b8c166f

add checkpoint functions for graph. test=develop

remove ut test_dist_word2vec in mac ci, will fix it in private, test=…

e707119

…develop (#17066)

Fleet unify distributed training (#16791)

1a4a51d

* implement distributed transpiler with fleet

ParallelDyGraph with GPU collective mode (#16827)

0b07eef

implement dygraph.parallel.DataParallel to hook reduce op.

fix reference_count_pass,test=develop (#17060)

842ded1

test=develop

Speedup roi_perspective_transform op by caching the information of li…

55ce36e

…near interpolation in forward (#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop

remove unnecessary prepare_data (#17080)

aca60e9

test=develop

fix interpolate cu. test=develop (#17101)

10c487e

test=develop, double backward leaky_relu (#17067)

258e000

backward of backward: leaky_relu

fix fuse optimizer ops (#17102)

794a195

test=develop

truncated_gaussian_random supported in distributed training, test=dev…

7330cd6

…elop (#17091)

Detailed coordinate description for yolov3 loss (#17007)

7da7881

* Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop

fix test_weight_decay (#17109)

9ccce57

test=develop

Path flag (#17105)

2192e7b

* fix python/paddle/fluid/__init__.py detecting problems

Use CudnnWorkspaceHandle in exhaustive search (#17082)

b949405

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop

Supplementary monitoring file reason explanation (#17131)

92ce445

1. move the API check into CPU process (#17110)

3acb363

* 1. move the API check into CPU process 2. adjust the check order

test=develop fix bug: fix selected_indices in nms (#17140)

554d3a7

cvm op feature (#17081)

deb510d

cvm without LoD.

fix run_time_error in uniform_random. test=develop (#17152)

626922d

fix runtimeerror : dictionary changed size during iteration when calling uniform_random in python3+

fix assertion failure issue when test_analyzer_bert uses ngraph (#17148)

bf4b21f

resolve #17147 test=develop

polish the label_smooth (#17138)

bc48453

* polish the label_smooth test=develop * polish code test=develop

resolve #17159 (#17172)

e4a52e0

Update the folder name generation mechanism for saving the quantized model and weights. The folder name would be unique by adding the timestamp postfix. test=develop

fix python3 run_time_error in ops. test=develop (#17170)

aa5307c

fix python3 run_time_error in layers.ops caused by locals()

guru4elephant and others added 29 commits May 1, 2019 21:47

remove async executor python api to fix document (#17174)

f938cce

* remove async executor python api test=develop * remove test_async_executor.py add executor train_from_dataset demo test=develop * fix import bug test=develop

Remove unnecessary set_devices (#17158)

3c6ab79

* remove unnecessary set_devices

Modify test timeout (#17181)

8092c40

* test=develop * test=deelop

use two GPUs to run the exclusive test test=develop (#17187)

83c4f77

Enhance concat op to support empty input. (#17015)

a72907b

* enhance_concat, test=develop

It doesn't need sync when fetch_list nit not empty (#17201)

950aec5

test=develop

update ParallelExecutor (#17204)

f2db475

test=develop

Add use_cuda to inplace pass (#17205)

ee2028a

* add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop

Fix tensor_py.h (#17195)

c5eeecc

* fix tensor_py,test=develop * change class name,test=develop

disable the test_distillation_strategy temporarily (#17227)

06325c4

test_distillation_strategy always failed on a machine with 4 gpus only, disable temporarily and need to figure out the root cause and add it back later

fix profiler and name_scope API examples (#17212)

9ec4615

* fix profiler and name_scope API examples test=develop * update API.spec test=develop

fix distribute fpn proposals, test=develop (#16152)

cc95a75

* fix distribute fpn proposals, test=develop

Fix unexecutable API examples (#17218)

5817077

* fix unexecutable API comments, test=develop * add API.spec,test=develop

Fix some APIs' example (#17214)

6b0f27e

call SetNumThreads everytime to avoid missing omp thread setting (#17224

54636a1

) * call SetNumThreads everytime to avoid missing omp thread setting resolve #17153 test=develop * add paddle_num_threads into config for test_analyzer_pyramid_dnn resolve #17153 test=develop

Refine api doc (#17230)

ef66bae

* refine api comment, test=develop

fix api_example of tree_conv (#17239)

16922e0

test=develop

fix the initialization process error. test=develop (#17213)

a40121e

fix build warning like 'comparison between signed and unsigned (#17240)

c2e20e2

integer', test=develop

remove unused FLAGS_warpctc_dir (#17162)

ff1661f

* remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop

fix retry_allocator (#17245)

6fafd37

test=develop

update sofmax with axis arg test=develop (#17190)

e782b54

fix nn.py lack of import warnings (#17249)

8b62f53

test=develop

junjun315 merged commit 0c73630 into junjun315:develop May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge to local#22

merge to local#22
junjun315 merged 64 commits intojunjun315:developfrom
PaddlePaddle:develop

junjun315 commented May 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

junjun315 commented May 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants