Skip to content

Conversation

@wangxicoding
Copy link
Contributor

@wangxicoding wangxicoding commented Jun 18, 2021

PR types

Performance optimization

PR changes

Others

Describe

Add c_concat and c_allreduce_sum to fp16 gray_list to speedup tensor parallel performance.

Performance Test

gpt117M, fp16, 4cards tensor parallel

version speed(tokens/s) speedup
develop 67452 1.0
this pr 73398 1.088

improve 8.8%

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding merged commit 1681a2d into PaddlePaddle:develop Jun 21, 2021
@wangxicoding wangxicoding deleted the update_mp_fp16_list branch June 21, 2021 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants