Speedup roi_perspective_transform op by caching the information of linear interpolation in forward by wanghaoshuang · Pull Request #17090 · PaddlePaddle/Paddle

wanghaoshuang · 2019-04-25T07:20:45Z

test=develop

…n backward. test=develop

heavengate · 2019-04-25T11:36:39Z

paddle/fluid/operators/detection/roi_perspective_transform_op.cu

+                                       T* in_grad_data) {
+  CUDA_1D_KERNEL_LOOP(index, out_size * 4) {
+    int in_idx = out2in_idx_data[index];
+    if (in_idx > 0) {


should be >= here?

Thx. Fixed.

heavengate · 2019-04-25T11:39:15Z

paddle/fluid/operators/detection/roi_perspective_transform_op.cu

+        out2in_w->mutable_data<T>({out->numel(), 4}, ctx.GetPlace());
+
+    math::SetConstant<platform::CUDADeviceContext, int> init;
+    init(ctx.cuda_device_context(), out2in_idx, static_cast<int>(-1));


out2in_w 需要初始化么？

根据这行的条件：
如果out2in_idx[i] == -1, 则out2in_w[i]就不会被用到

所以只需要将out2in_idx中元素初始化为-1即可

heavengate · 2019-04-25T11:46:37Z

python/paddle/fluid/tests/unittests/test_roi_perspective_transform_op.py

+        self.outputs['Out2InIdx'] = np.zeros(
+            [np.product(self.outputs['Out'].shape), 4]).astype("int32")
+        self.outputs['Out2InWeights'] = np.zeros(
+            [np.product(self.outputs['Out'].shape), 4]).astype("float32")


test_roi_pooling里面test_checkout()也会check Intermediate的output，此处是不是可以修改一下

roi_pooling的infer shape里有对 Intermediate output的ENFORCE CHECK，所以单测的test_checkout()也需要有Intermediate output。

当前pr只修改了CUDA kernel, CPU kernel的计算没有用到Intermediate output，所以单测里也没有check.

test=develop

heavengate

LGTM

# The first commit's message is: remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (PaddlePaddle#17066) # This is the 2nd commit message: Fleet unify distributed training (PaddlePaddle#16791) * implement distributed transpiler with fleet # This is the 3rd commit message: ParallelDyGraph with GPU collective mode (PaddlePaddle#16827) implement dygraph.parallel.DataParallel to hook reduce op. # This is the 4th commit message: Init mixed precision training interface (PaddlePaddle#16856) * Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop # This is the 5th commit message: fix reference_count_pass,test=develop (PaddlePaddle#17060) test=develop # This is the 6th commit message: Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (PaddlePaddle#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop # This is the 7th commit message: remove unnecessary prepare_data (PaddlePaddle#17080) test=develop # This is the 8th commit message: fix interpolate cu. test=develop (PaddlePaddle#17101) # This is the 9th commit message: test=develop, double backward leaky_relu (PaddlePaddle#17067) backward of backward: leaky_relu # This is the 10th commit message: fix fuse optimizer ops (PaddlePaddle#17102) test=develop # This is the 11th commit message: truncated_gaussian_random supported in distributed training, test=develop (PaddlePaddle#17091) # This is the 12th commit message: Detailed coordinate description for yolov3 loss (PaddlePaddle#17007) * Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop # This is the 13th commit message: fix test_weight_decay (PaddlePaddle#17109) test=develop # This is the 14th commit message: Path flag (PaddlePaddle#17105) * fix python/paddle/fluid/__init__.py detecting problems

* refine_dropout_mem,test=develop * # This is a combination of 14 commits. # The first commit's message is: remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066) # This is the 2nd commit message: Fleet unify distributed training (#16791) * implement distributed transpiler with fleet # This is the 3rd commit message: ParallelDyGraph with GPU collective mode (#16827) implement dygraph.parallel.DataParallel to hook reduce op. # This is the 4th commit message: Init mixed precision training interface (#16856) * Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop # This is the 5th commit message: fix reference_count_pass,test=develop (#17060) test=develop # This is the 6th commit message: Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop # This is the 7th commit message: remove unnecessary prepare_data (#17080) test=develop # This is the 8th commit message: fix interpolate cu. test=develop (#17101) # This is the 9th commit message: test=develop, double backward leaky_relu (#17067) backward of backward: leaky_relu # This is the 10th commit message: fix fuse optimizer ops (#17102) test=develop # This is the 11th commit message: truncated_gaussian_random supported in distributed training, test=develop (#17091) # This is the 12th commit message: Detailed coordinate description for yolov3 loss (#17007) * Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop # This is the 13th commit message: fix test_weight_decay (#17109) test=develop # This is the 14th commit message: Path flag (#17105) * fix python/paddle/fluid/__init__.py detecting problems

wanghaoshuang requested a review from qingqing01 April 25, 2019 07:21

Cache the information of linear interpolation in forward and use it i…

f0711c0

…n backward. test=develop

wanghaoshuang force-pushed the fix_roi_per_trans branch from 5bce6ad to f0711c0 Compare April 25, 2019 11:14

heavengate reviewed Apr 25, 2019

View reviewed changes

Fix cuda kernel.

4a47b2c

test=develop

heavengate approved these changes Apr 25, 2019

View reviewed changes

wanghaoshuang merged commit 55ce36e into PaddlePaddle:develop Apr 25, 2019

wanghaoshuang deleted the fix_roi_per_trans branch May 20, 2022 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward#17090

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward#17090
wanghaoshuang merged 2 commits intoPaddlePaddle:developfrom
wanghaoshuang:fix_roi_per_trans

wanghaoshuang commented Apr 25, 2019

Uh oh!

heavengate Apr 25, 2019

Uh oh!

wanghaoshuang Apr 25, 2019

Uh oh!

heavengate Apr 25, 2019

Uh oh!

wanghaoshuang Apr 25, 2019

Uh oh!

heavengate Apr 25, 2019

Uh oh!

wanghaoshuang Apr 25, 2019

Uh oh!

heavengate left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghaoshuang commented Apr 25, 2019

Uh oh!

heavengate Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

wanghaoshuang Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

heavengate Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

wanghaoshuang Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

heavengate Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

wanghaoshuang Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

heavengate left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants