-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Labels
Description
Download and unzip the demo (Link:https://dubox.com/s/1iudoIXxOAyJoo9Qqaxz_Kg Password:csrp).
Run sh run.sh
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
------------------------- Overhead Summary -------------------------
Total time: 20980.3
Computation time Total: 18912.5 Ratio: 90.1441%
Framework overhead Total: 2067.8 Ratio: 9.8559%
------------------------- GpuMemCpy Summary -------------------------
GpuMemcpy Calls: 0 Total: 0 Ratio: 0%
------------------------- Event Summary -------------------------
Event Calls Total Min. Max. Ave. Ratio.
thread0::prelu 16830 17924.6 0.173186 10.5915 1.06504 0.854354
ext_reorder 16830 669.975 0.007853 0.922596 0.31244 0.0373774*
thread0::conv2d 25500 2825.22 0.039415 9.08506 0.110793 0.134661
int_reorder 17390 577.194 0.00928 2.28289 1.4228 0.2043*
thread0::load_combine 1 187.326 187.326 187.326 187.326 0.00892868
thread0::reshape2 510 22.038 0.03293 0.257293 0.0432117 0.00105041
ext_reorder 510 2.19305 0.003012 0.029513 0.006885 0.0995125*
thread0::scale 510 18.7042 0.029137 0.312968 0.036675 0.000891514
thread0::ext_reorder 510 2.39843 0.003014 0.011713 0.0047028 0.000114318*
For the above model, the details of the inference time as follows

Add MKLDNN Kernel for PRelu to reduce the inference time. Other activatios may also need to add MKLDNN kernels.
Hope to fix this issue before 2.2RC.