Skip to content

Conversation

@dynamicheart
Copy link
Contributor

@dynamicheart dynamicheart commented Dec 22, 2023

PR types

Bug fixes

PR changes

APIs

Description

This PR #54674 forces the option XPUAPI_DEFAULT_SIZE of xdnn::Context to 1 by default, regardless of whether we set the environment variable XPUAPI_DEFAULT_SIZE to a different value. It triggers a lot of xpu_wait calls.

This comment describes why XPUAPI_DEFAULT_SIZE is originally set to 1: #54674 (comment)

@paddle-bot
Copy link

paddle-bot bot commented Dec 22, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@houj04 houj04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM && 建议戳原作者/原审批人也看一下。

@houj04
Copy link
Contributor

houj04 commented Dec 22, 2023

有个问题:这里引用到的PR是半年之前的,为啥最近发现了这个问题呢?

@dynamicheart
Copy link
Contributor Author

LGTM && 建议戳原作者/原审批人也看一下。

@AlbertVan @zhupengyang 辛苦两位同学看看

Copy link
Contributor

@runzhech runzhech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dynamicheart
Copy link
Contributor Author

有个问题:这里引用到的PR是半年之前的,为啥最近发现了这个问题呢?

PyTorch的XpuContext实现参考了Paddle这边的实现,PyTorch那边先发现了这个问题,大概是2023年10月份发现的。

Copy link
Contributor

@shentanyue shentanyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本质问题是多个xpu_context都会各自去申请一份XPUAPI_DEFAULT_SIZE。
训练侧后面可以再关注下。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants