Skip to content

Conversation

@bliu3650
Copy link
Contributor

@bliu3650 bliu3650 commented Aug 6, 2021

PR types

New features

PR changes

APIs

Describe

  • Background:
    Barrier operation from specific communication library (e.g. HCCL for Ascend cluster) was not stable, and backup solution with CPU barrier is required;
    CPU barrier op was developed before but not verified;
    Considering performance across thousands of machine to synchronize, calling barrier function directly is better than executing it with program;

  • Implementation:
    Reuse previous barrier op implemented for CPU and add an initialization parallel environment flow for CPU only;
    Both function-level call and op-level are supported;

  • Unittest:
    test_collective_cpu_barrier_with_gloo.py

截屏2021-08-18 下午6 10 24

@paddle-bot-old
Copy link

paddle-bot-old bot commented Aug 6, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@bliu3650 bliu3650 force-pushed the add_gloo_c_barrier_op_cpu branch from 562e071 to 0c1fdde Compare August 10, 2021 01:57
@bliu3650 bliu3650 force-pushed the add_gloo_c_barrier_op_cpu branch from 0c1fdde to defd61f Compare August 13, 2021 08:25
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage没过

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage Failed

@bliu3650 bliu3650 force-pushed the add_gloo_c_barrier_op_cpu branch from bd8cd78 to 268163c Compare August 14, 2021 05:39
gongweibao
gongweibao previously approved these changes Aug 14, 2021
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PangHua
PangHua previously approved these changes Aug 18, 2021
@bliu3650

This comment has been minimized.

lanxianghit
lanxianghit previously approved these changes Aug 19, 2021
XieYunshen
XieYunshen previously approved these changes Aug 19, 2021
PangHua
PangHua previously approved these changes Aug 19, 2021
@bliu3650 bliu3650 dismissed stale reviews from PangHua, XieYunshen, and lanxianghit via a736ebb August 19, 2021 14:04
gongweibao
gongweibao previously approved these changes Aug 20, 2021
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bliu3650

This comment has been minimized.

PangHua
PangHua previously approved these changes Aug 20, 2021
@bliu3650 bliu3650 dismissed stale reviews from PangHua and gongweibao via a9e6665 August 20, 2021 07:05
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for init.py

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit e8f146a into PaddlePaddle:develop Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants