-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[AutoParallel]Refine ShardOptimizer #62933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel]Refine ShardOptimizer #62933
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
|
|
||
| def _shard_parameter(self, param): | ||
| if param.is_dense(): | ||
| if param.is_dense() and self._mesh is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: raise error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不指定 mesh 的情况下,不允许 param 是 dense 么?当前测试的 llama2-13B 动静统一组网下就是存在 param 为 dense 的情况。
zhiqiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| Args: | ||
| mesh(paddle.distributed.ProcessMesh): The `ProcessMesh` object describes the Cartesian topology of the used processes. | ||
| mesh(None|paddle.distributed.ProcessMesh): If mesh is not None, the `ProcessMesh` object describes the Cartesian topology of the used processes for dense type parameters. Note: Currently, only one mesh configuration is supported for all dense parameters. If there is a need for multiple mesh configurations, please configure them yourself in the upper layer networking code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认值为 None 时的的情况也描述一下吧
PR types
Others
PR changes
Others
Description
Refine ShardOptimizer.
Pcard-73145