Skip to content

Conversation

@zhurou603
Copy link
Contributor

PR types

Performance optimization

PR changes

Others

Description

目前sharding在broadcast参数时采用全同步的方式,修改方式为异步提交任务至流上后,通过wait的方式等待所有broadcast完成,达到overlap,提升性能

@paddle-bot
Copy link

paddle-bot bot commented Dec 6, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Dec 6, 2023
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能贴一下性能数据么?在什么机器下,使用什么套件,跑的什么模型,以及模型的大小,使用什么分布式策略以及模型训练策略(例如recompute什么的),性能从多少提升到多少?我理解的一般来说,异步的通讯带来的性能提升主要来自overlap,就是通讯和计算的overlap。但是这个修改感觉没有什么overlap的地方呀。不过也有可能overlap住了launch broadcast kernel的cpu开销。

@zhurou603
Copy link
Contributor Author

cuda上几乎无收益,custom device上release版本有收益,原因是processgroup custom的Wait方法存在10ms的等待,kernel下发后若为sync=true且broadcast本身执行较慢,则会有多次10ms的等待。

@zhurou603
Copy link
Contributor Author

能贴一下性能数据么?在什么机器下,使用什么套件,跑的什么模型,以及模型的大小,使用什么分布式策略以及模型训练策略(例如recompute什么的),性能从多少提升到多少?我理解的一般来说,异步的通讯带来的性能提升主要来自overlap,就是通讯和计算的overlap。但是这个修改感觉没有什么overlap的地方呀。不过也有可能overlap住了launch broadcast kernel的cpu开销。

https://github.com/PaddlePaddle/Paddle/blob/v2.5.2/paddle/fluid/distributed/collective/process_group_custom.cc#L91

Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FeixLiu FeixLiu merged commit 4ca3990 into PaddlePaddle:develop Dec 14, 2023
Xiadalei pushed a commit to Xiadalei/Paddle that referenced this pull request Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants