[OpenCL][Kernel] Use FC replace conv1x1#6365
Merged
zhaoyang-star merged 6 commits intoPaddlePaddle:developfrom Jul 1, 2021
Merged
[OpenCL][Kernel] Use FC replace conv1x1#6365zhaoyang-star merged 6 commits intoPaddlePaddle:developfrom
zhaoyang-star merged 6 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
zhaoyang-star
added a commit
to zhaoyang-star/Paddle-Lite
that referenced
this pull request
Jul 1, 2021
daming5432
pushed a commit
that referenced
this pull request
Jul 1, 2021
* [OpenCL][Kernel] Use FC replace conv1x1 (#6365) * test=develop
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
【本PR工作】
对_特定情况_下的 conv2d_1x1 转换为 FC 计算,同时为了解决 input_channel 较大时单个线程需要遍历计算 input_channel 次乘累加操作,扩大了 4 倍线程数量,即将 input_channel 分成 4 部分,每个线程负责其中一部分的计算,然后 4 个线程通过 local memory 把中间乘累加结果再加在一起。
对比之前的方案,核心差异点:
【效果】

MobileNetV3_small_x1_0_infer 模型,其中 19 个 conv1x1 可以使用 FC 代替,模型整体加速比 和 kernel 加速比如下:
MobileNetV3_small_x1_0_infer kernel profiling on armv7 on 845

MobileNetV3_large_x1_0_infer 模型,其中 17 个 conv1x1 可以使用 FC 代替,模型整体加速比如下:
