-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[NPU] Add label_smooth_op #34828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU] Add label_smooth_op #34828
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关于小OP组合的影响,能帮忙给一个分析报告吗?需要以下信息来分析小OP组合对整体模型的性能和功能上的影响:
(1) 这个OP在大数据输入下对比CPU OP的性能 (考虑到NPU Kernel Launch 的消耗,存在OP性能不如CPU OP的可能)
(2) 这个OP在大数据输入下的显存占用,可以对比huber_loss一个OP实现的显存占用(之前有同学反馈小OP组合会导致显存占用是原来的8倍)
基于以上分析结果,再考虑我们是否能够接受小OP组合的方案,或者需要采用TBE算子开发的方案。
已补充测试,目前有cpu vs npu的测试,测试结果显示在shape增大时候npu运算更有优势,显存开销也不大,小op组合的方案性能尚可。 |
这个分析结果“shape = [2000,4000]时,占用显存最高28061MB。”有点问题呢,NPU单卡最高显存也就15307MB,为什么这里显存可以占用到28061MB呢? |
qili93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… npu_label_smooth_2
PR types
New features
PR changes
OPs
Describe
[NPU] Add label_smooth op