-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Single release for PaddlePaddle CPU Image #1607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| ## Runtime Check SIMD for x86 architecture | ||
|
|
||
| ### Background | ||
|
|
||
| Currently, PaddlePaddle supports AVX and SSE3 intrinsics (extensions to the x86 instruction set architecture). When using CMake to compile PaddlePaddle source code, it will check and detect the host which SIMD instruction is supported, then automatically set the legal one. Developer or user also could manually set CMake option `WITH_AVX=ON/OFF` before PaddlePaddle compilation. That's good for local usage. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里我想强调一下
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 正是如此。 |
||
|
|
||
|
|
||
| ### Problem Involved | ||
|
|
||
| Nonetheless, from the perspective of the deployment, there are some drawbacks: | ||
|
|
||
| 1. The online runtime environment is very complex, if an older node does not support AVX or others, | ||
| PaddlePaddle will crash and throw out `illegal instruction is used`. This problem will appear | ||
| frequently on cluster environment, like Kubernetes. **It must be addressed before PaddlePaddle on Cloud** | ||
|
|
||
| 2. Once new version is ready to deliver, we have to release more products to users, for example, `no-avx-cpu`, `avx-cpu`, `no-avx-gpu`, `avx=gpu`. Users do not need to care about details. It sucks! | ||
|
||
|
|
||
|
|
||
| ### How to Address it? | ||
|
|
||
| 1. We can utilize CPU ID information to check SIMD info at runtime. This functionality already merged into | ||
| current develop branch. For full details, please check out [CpuId.cpp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/CpuId.cpp) and [CpuId.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/CpuId.h). | ||
|
|
||
| You can use `HAS_SIMD(__flags)` to runtime check SIMD. For instance, | ||
|
|
||
| ```c++ | ||
| if (HAS_SIMD(SIMD_AVX2 | SIMD_FMA4)) { | ||
| avx2_fm4_stub(); | ||
|
||
| } else if (HAS_SIMD(SIMD_SSE3)) { | ||
| sse3_stub(); | ||
| } | ||
| ``` | ||
|
||
|
|
||
| `avx2_fm4_stub` and `sse3_stub` could be located in different directory: | ||
|
|
||
| ```text | ||
| ------x84---naive | ||
|
||
| | | | ||
| | |---avx2 -- avx2_fm4_stub() | ||
| | | | ||
| | |---sse -- sse_stub() | ||
| | | | ||
| | |---sse3 -- sse3_stub() | ||
| | | ||
| arm--- ... | ||
| ``` | ||
|
||
|
|
||
| Here, each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at | ||
| runtime, `if(HAS_SIMD(__flags)` can select the supported branch (intrinsics) to execute. | ||
|
|
||
| The method could fix the releases and deployment problems. | ||
|
||
|
|
||
|
|
||
| ### How to implement it? | ||
|
|
||
| Since the current `cuda` directory includes heterogeneous source code, we want to refactor `cuda` directory as follows: | ||
|
||
|
|
||
| ``` | ||
| kernels--- cpu --- inc -- x86 -- avx ----- avx_mathfun.h activation.h gru.h ... | ||
| | | | | ||
| | | |- naive --- activation.h gru.h ... | ||
| | | | ||
| | |- src -- x86 -- avx ----- activation.cc | ||
| | | |- gru.cc | ||
| | | |- ... | ||
| | | | ||
| | |- naive --- activation.cc | ||
| | | |- gru.cc | ||
| | |- ... | ||
| |- gpu -- ... | ||
| ``` | ||
|
||
|
|
||
| For simplicity, different arches or intrinsics will be inside the different directories. we need to | ||
| modified CMake files to support this solution. | ||
|
|
||
|
|
||
| ### Reference | ||
|
|
||
| AVX Cheat Sheet, TUM, https://db.in.tum.de/~finis/x86%20intrinsics%20cheat%20sheet%20v1.0.pdf | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个design doc的标题可以修改一下,Runtime Check是其中一个工作;整个工作(或design)的目标应该是,只发布一个Paddle支持各种CPU环境。
我的理解这里至少包括三块工作,
a. Runtime Check
b. 代码修改和目录结构调整 (比如下面讲到的把naive/sse/avx的一些实现放到不同的目录里;另外,在issue #1116 里面也有相关的内容)
c. 最后是编译相关的工作;比如,如何编译一个即包含naive实现,也包含sse、avx实现的paddle(这里涉及编译以外,也涉及代码修改相关工作)。
虽然,后面的实现上也都写到了一些相关的技术细节,但建议design doc还是从整个工作包含哪些方面去讲,这样基于这个design doc可以创建一个个相关issue,比如cuda下面的那些代码怎么调整,把细节的讨论放到issue里面去(一些细节问题可以在issue里面讨论完之后再merge回这个design doc)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞