[DeepEP] Fix compile error of sm90 features #74762
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Communication Library
PR Types
Bug fixes
Description
修复
DISABLE_SM90_FEATURES相关的一个编译bug这个问题解释起来比较复杂:
我们移植的 DeepEP 只支持sm90架构,然后我们的 CMakeLists 里面也是这样限制的,必须
if(WITH_GPU AND (ARCH_BIN_CONTAINS_90 GREATER -1))才能编译,这样一般来说在 sm80 机器上就不会编译 DeepEP,在 sm90 上才编译,看起来挺好但是 CE 同学 cmake 的时候居然指定的是
-DCUDA_ARCH_BIN="80 90",也就是同时指定了两种架构,那么上面那个判断也为真,因为它判断的是“包含sm90就行”,而不是“只有sm90”,于是就会给 sm80 也编译一份,但实际上 sm80 缺一些硬件指令,就编译报错了我现在只能对 DeepEP 代码里面与
DISABLE_SM90_FEATURES有关的汇编加if __CUDA_ARCH__ >= 900来绕过,但是这样还是会编译出 sm80 的僵尸代码,这个问题本质是 ARCH_BIN_CONTAINS_90 这个判断是包含逻辑,没有处理好同时指定两种架构的情况事实上前一版迁移的 DeepEP 也是这样处理的(#71481),我当时还没get到这样做的目的,于是在迁移新版的时候就把那些 #if 覆盖掉了,现在又老老实实加回来,希望以后大家修bug的时候解释一下为什么,避免重复踩坑
Pcard-85711