optimize ClipGradByGlobalNorm #34586

wangxicoding · 2021-08-03T12:49:18Z

PR types

Others

PR changes

Others

Describe

Optimize ClipGradByGlobalNorm memory usage and performance.
Major changes:
1、Replace reduce_sum(square(x)) to squared_l2_norm(x).
2、scale grad use inplace elementwise_mul.

Test

single cards: run gpt3-1.3B-en, with recompute and amp, seq_len=1024 batch_size=2
16 cards hybrid: run gpt3-13B-en, with mp=4 pp=4, recompute and amp, seq_len=1024 gbs=256 micro_batch_size=2. we record cards(0, 15) memory
Ernie3.0: "hidden_size": 4096, "num_attention_heads": 128, "num_hidden_layers": 76, "num_sharing_layers": 64, mp=8, pp=2, amp, recompute, gbs=32, micro_bs=2

	Memory(MB)			Speed(tokens/s)
	develop	PR	Save	develop	PR	Improve
GPT(PE 1card)	25786	25392	394	3505	3605	2.85%
GPT(Executor 1card)	26114	25720	394	3517	3611	2.67%
GPT(Hybrid 16cards)	(19592,19530)	(19068,17784)	(524, 1746)	5854	5859	0.08%
Ernie3.0(16cards)	(18724, 25004)	(17276, 23340)	(1448, 1664)	2445	2362	-3.5%

In Ernie3.0 is slow... Because squared_l2_norm is slow than reduce_sum(square), need optimize...

paddle-bot-old · 2021-08-03T12:49:26Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG

LGTM for sharding

optimize global gradient clip

6ec8cf3

use squared_l2_norm, fix ci

d8de7ed

wangxicoding requested review from JZ-LIANG, gongweibao and zhwesky2010 August 4, 2021 11:11

wangxicoding changed the title ~~optimize global gradient clip~~ optimize ClipGradByGlobalNorm Aug 4, 2021

fix npu xpu

d1d500a

JZ-LIANG approved these changes Aug 4, 2021

View reviewed changes

wangxicoding merged commit 4d6f8f2 into PaddlePaddle:develop Aug 5, 2021

wangxicoding deleted the optimize_global_gradient_clip branch August 5, 2021 05:12

This was referenced Aug 5, 2021

GlobalNormClip use inplace mul #31686

Closed

[NPU] use squared_l2_norm in GradientClipByGlobalNorm #34836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize ClipGradByGlobalNorm #34586

optimize ClipGradByGlobalNorm #34586

Uh oh!

wangxicoding commented Aug 3, 2021 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Aug 3, 2021

Uh oh!

JZ-LIANG left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

optimize ClipGradByGlobalNorm #34586

optimize ClipGradByGlobalNorm #34586

Uh oh!

Conversation

wangxicoding commented Aug 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Test

Uh oh!

paddle-bot-old bot commented Aug 3, 2021

Uh oh!

JZ-LIANG left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangxicoding commented Aug 3, 2021 •

edited

Loading