Skip to content

Conversation

@lshpku
Copy link
Contributor

@lshpku lshpku commented Aug 20, 2025

PR Category

Communication Library

PR Types

Bug fixes

Description

修复DISABLE_SM90_FEATURES相关的一个编译bug

这个问题解释起来比较复杂:

  1. 我们移植的 DeepEP 只支持sm90架构,然后我们的 CMakeLists 里面也是这样限制的,必须if(WITH_GPU AND (ARCH_BIN_CONTAINS_90 GREATER -1))才能编译,这样一般来说在 sm80 机器上就不会编译 DeepEP,在 sm90 上才编译,看起来挺好

  2. 但是 CE 同学 cmake 的时候居然指定的是-DCUDA_ARCH_BIN="80 90",也就是同时指定了两种架构,那么上面那个判断也为真,因为它判断的是“包含sm90就行”,而不是“只有sm90”,于是就会给 sm80 也编译一份,但实际上 sm80 缺一些硬件指令,就编译报错了

  3. 我现在只能对 DeepEP 代码里面与DISABLE_SM90_FEATURES有关的汇编加if __CUDA_ARCH__ >= 900来绕过,但是这样还是会编译出 sm80 的僵尸代码,这个问题本质是 ARCH_BIN_CONTAINS_90 这个判断是包含逻辑,没有处理好同时指定两种架构的情况

事实上前一版迁移的 DeepEP 也是这样处理的(#71481),我当时还没get到这样做的目的,于是在迁移新版的时候就把那些 #if 覆盖掉了,现在又老老实实加回来,希望以后大家修bug的时候解释一下为什么,避免重复踩坑

Pcard-85711

@paddle-bot
Copy link

paddle-bot bot commented Aug 20, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@lshpku lshpku force-pushed the fix-deepep-compile-error branch from 9990af9 to d42f7f9 Compare August 20, 2025 04:25
@lshpku lshpku force-pushed the fix-deepep-compile-error branch from d42f7f9 to a5e9042 Compare August 21, 2025 10:15
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit d95e1a5 into PaddlePaddle:develop Aug 25, 2025
90 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants