[AMP] Support pure fp16 training mode for dygraph by zhangbo9674 · Pull Request #35521 · PaddlePaddle/Paddle

zhangbo9674 · 2021-09-07T04:01:35Z

PR types

New features

PR changes

Others

Describe

1. Background:

The paddy static graph training mode has developed the pure fp16 training mode, but dynamic graph lacks this training mode.

static graph pure fp16 pr：Support pure fp16 training for AMP API. #29544

It is verified on GPT-2 117M model that there is a large gap on training speed between Paddle AMP and Megatron (dygraph mode), Megatron uses fp16 training, all model parameters use fp16, all op executed in fp16.

To sum up, we developed the dynamic graph pure fp16 training mode。

2. Pure fp16 API: (dygraph mode)

# 1. Decorate the models and optimizers for mixed accuracy training, include amp and pure fp16:
# paddle.amp.decorator()
def decorate(models, optimizers=None, level='O1', master_weight=True, save_dtype=None)

#2. The context manager of mixed accuracy training, include amp and pure fp16:
# paddle.amp.auto_cast()
def auto_cast(enable=True, custom_white_list=None, custom_black_list=None, level='O1')

2.1. Rewrite network parameters from fp32 to fp16 by decorate:

In amp training mode, it uses the blac&white list to control the fp16 computation, so it will insert many cast OPs. In pure fp16 training mode, all OPs are executed in fp16, unless this OP does not support fp16.

Therefore, under the dygraph training, it is necessary to rewrite all the network parameters to fp16, so there is no need to insert cast OP for data conversion in the process of execution.

2.2. Optimzier update the fp16 parameters of the network by decorate:

Rewriting the network parameters from fp32 to fp16 in 2.1 is not an inplace operation. Therefore, optimzier needs to update the fp16 network parameters.

2.3. level parameter in decorate and auto_cast:

The level parameter is set to unify the training results of AMP and pure fp16. The level accept values are O1 and O2:

O1 represent amp, the input data type of each operator will be casted by white_list and black_list;
O2 represent pure fp16, all OPs parameters and input data will be casted to fp16, except OPs in black_list, don't support fp16 kernel and batchnorm parameters.
In decorate, default value for level is O1. In amp training mode, the decorate will do nothing, so you do not need call this api, but in pure fp16, you need call decorate explicitly.
In auto_cast, default value for level is O1. This is to be compatible with the original amp training mode of Paddle.

2.4. master_weight in decorate:

Until now, Momentum, Adam and AdamW support the float16 computation. All three of them have the multi_precision parameters, which can avoid poor accuracy or slow convergence in a way.

In decorate, the default master_weight is None. If master_weight is None or True, in pure fp16, multi_precision will be set to True, if user do not want use this strategy, user should set to False.

2.5. save_dtype in decorate:

In pure fp16, the model parameters will rewrite from fp32 to fp16. For Inference, it usually need to save the model of fp32 data type, so we provide an interface save_dtype, when save_dtype is not None, we will register a data type conversion function hook for Layer.state_dict(). So that, all parameters in state_dict will cast to save_dtype. Finally, the data type of the model parameters saved through paddle.save and paddle.jit.save will be save_dtype.

save_dtype support fp16、fp32、fp64 or None. If save_dtype is None, we will not register data type conversion function hook.

2.6. custom_white_list and custom_black_list:

In amp, the input data type of each operator will be casted by white_list and black_list.
In pure fp16, only black_list is effective.

3. Use example:

import paddle
import paddle.nn as nn

# define a fp32 model:
class SimpleNet(nn.Layer):
    def __init__(self, input_size, output_size):
        super(SimpleNet, self).__init__()
        self.linear1 = nn.Linear(input_size, output_size)
        self.relu1 = nn.ReLU()
        self.linear2 = nn.Linear(input_size, output_size)
    def forward(self, x):
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        return x

paddle.seed(10)
epochs = 20
input_size = 4096   
output_size = 4096  
batch_size = 512   
nums_batch = 50

# training data:
train_data = [paddle.randn((batch_size, input_size)) for _ in range(nums_batch)]
label_data = [paddle.randn((batch_size, output_size)) for _ in range(nums_batch)]

# define model:
model = SimpleNet(input_size, output_size)  
# define optimizer：
optimizer = paddle.optimizer.SGD(learning_rate=0.0001, parameters=model.parameters())  
# define GradScaler：
scaler = paddle.amp.GradScaler(init_loss_scaling=1024)

# step1：decorate model & optimizer for pure fp16 training mode:
model, optimizer = paddle.amp.decorate(models=model, optimizers=optimizer, level='O2', save_dtype='float32')

# train:
for epoch in range(epochs):
    datas = zip(train_data, label_data)
    for batch_id, (data, label) in enumerate(datas):
        # step2：open pure fp16 context by auto_cast:
        with paddle.amp.auto_cast(enable=True, custom_white_list=None, custom_black_list=None, level='O2'):
            output = model(data)
            loss = paddle.nn.MSELoss(output, label)  
        scaled = scaler.scale(loss)
        scaled.backward()
        scaler.minimize(optimizer, scaled)
        optimizer.clear_grad()

4. Performance testing：

4.1. Test GPT2 performance by pure fp16:

batch size = 8
repo: https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/gpt-3/dygraph

**4.2. Result：

loss：

speed：

5、文档预览：

中文文档预览：

英文文档预览：

paddle-bot-old · 2021-09-07T04:01:56Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…e_fp16_decorator

… dev/pure_fp16

GuoxiaWang · 2021-09-14T02:15:29Z

paddle/fluid/pybind/op_function_generator.cc

    {"box_coder", {"PriorBox", "PriorBoxVar", "TargetBox"}},
-    {"momentum", {"Param", "Grad", "Velocity", "LearningRate"}},
+    {"momentum", {"Param", "Grad", "Velocity", "LearningRate", "MasterParam"}},
    {"sparse_momentum", {"Param", "Grad", "Velocity", "Index", "LearningRate"}},


下次提交的时候，帮忙把 sparse_momentum 也加上 MasterParam 吧，谢谢

目前框架中还没有找到使用sparse_momentum的优化器，以及动态图调用sparse_momentum的地方，所以pure fp16的pr中暂时先不加入了。

zhiqiu · 2021-09-16T08:38:38Z

python/paddle/fluid/dygraph/amp/auto_cast.py

+            tracer._enable_amp_l1 = original_enable_amp_l1
+            tracer._enable_amp_l2 = original_enable_amp_l2


change tracer.enebale_amp to tracer.amp_level

Done，thanks！

TCChenlong

LGTM for Docs

zhiqiu

LGTM!

ForFishes

LGTM!!

TCChenlong

LGTM

* add pure fp16 major function in auto_cast & tracer * support master weight in dygraph for pure fp16 * check mix dtype of fp16&fp32 for check_finite_and_unscale op * change pure fp16 funtion name * refine some bug in auto_cast * refine auto_cast interface logic * add param _casted_by_pure_fp16 for class Layer * support state_dict hook for save model by user appointed dtype in pure_fp16_decorator * refine pure_fp16_decorator as decorator * add unittest * add comment * add comment * support recompute * add comment for auto_cast and decorator * support to_static_state_dict for paddle.jit.save * unlimite models num and optimizers num * add lookup_table in black_list * fix momentum and layer state_dict * fix bug in layer state_dict * fix bug in layer state_dict_helper * refine unittest * refine test_momentun_op * refine interface and some code * refine amp_decorator interface * refine pure fp16 interface * refine master weight interface

zhangbo9674 added 5 commits September 7, 2021 02:15

add pure fp16 major function in auto_cast & tracer

98c720f

support master weight in dygraph for pure fp16

228c855

check mix dtype of fp16&fp32 for check_finite_and_unscale op

a7f00a1

change pure fp16 funtion name

1dce0cc

refine some bug in auto_cast

422ced8

zhangbo9674 added 13 commits September 7, 2021 11:59

refine auto_cast interface logic

1366dad

add param _casted_by_pure_fp16 for class Layer

5cdc012

support state_dict hook for save model by user appointed dtype in pur…

9e5399b

…e_fp16_decorator

refine pure_fp16_decorator as decorator

91af1e9

merge paddle develop

6cb2108

add unittest

00cfbef

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

64e2af6

… dev/pure_fp16

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ac2342b

… dev/pure_fp16

add comment

6df1354

add comment

528da76

support recompute

ae6d0a4

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

806018b

… dev/pure_fp16

add comment for auto_cast and decorator

d1c277a

GuoxiaWang reviewed Sep 14, 2021

View reviewed changes

zhangbo9674 added 9 commits September 14, 2021 05:27

support to_static_state_dict for paddle.jit.save

c2b472b

unlimite models num and optimizers num

f9f75a4

add lookup_table in black_list

fa9c9d9

fix momentum and layer state_dict

cafea36

merget upstream develop

cd545e6

fix bug in layer state_dict

1502f55

fix bug in layer state_dict_helper

fb9a785

refine unittest

704e7f6

refine test_momentun_op

5b56d84

zhiqiu reviewed Sep 16, 2021

View reviewed changes

zhangbo9674 added 2 commits September 16, 2021 12:02

refine interface and some code

042a953

refine amp_decorator interface

b6e4a99

TCChenlong previously approved these changes Sep 17, 2021

View reviewed changes

refine pure fp16 interface

4524bab

zhangbo9674 dismissed TCChenlong’s stale review via 4524bab September 17, 2021 03:36

refine master weight interface

e1118cb

zhiqiu approved these changes Sep 17, 2021

View reviewed changes

ForFishes approved these changes Sep 17, 2021

View reviewed changes

TCChenlong approved these changes Sep 17, 2021

View reviewed changes

lanxianghit approved these changes Sep 17, 2021

View reviewed changes

zhangbo9674 changed the title ~~[WIP]Dev/pure fp16~~ [AMP]Dev/pure fp16 Sep 17, 2021

raindrops2sea approved these changes Sep 17, 2021

View reviewed changes

zhangbo9674 changed the title ~~[AMP]Dev/pure fp16~~ [AMP]Support pure fp16 training for dygraph Sep 17, 2021

zhangbo9674 changed the title ~~[AMP]Support pure fp16 training for dygraph~~ [AMP] Support pure fp16 training mode for dygraph Sep 17, 2021

zhiqiu merged commit adaeee4 into PaddlePaddle:develop Sep 17, 2021

zhangbo9674 deleted the dev/pure_fp16 branch March 2, 2023 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMP] Support pure fp16 training mode for dygraph#35521

[AMP] Support pure fp16 training mode for dygraph#35521
zhiqiu merged 31 commits intoPaddlePaddle:developfrom
zhangbo9674:dev/pure_fp16

zhangbo9674 commented Sep 7, 2021 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Sep 7, 2021

Uh oh!

GuoxiaWang Sep 14, 2021 •

edited

Loading

Uh oh!

zhangbo9674 Sep 14, 2021

Uh oh!

zhiqiu Sep 16, 2021

Uh oh!

zhangbo9674 Sep 17, 2021

Uh oh!

TCChenlong left a comment

Uh oh!

zhiqiu left a comment

Uh oh!

ForFishes left a comment

Uh oh!

TCChenlong left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

		tracer._enable_amp_l1 = original_enable_amp_l1
		tracer._enable_amp_l2 = original_enable_amp_l2

Conversation

zhangbo9674 commented Sep 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

1. Background:

2. Pure fp16 API: (dygraph mode)

3. Use example:

4. Performance testing：

5、文档预览：

Uh oh!

paddle-bot-old bot commented Sep 7, 2021

Uh oh!

GuoxiaWang Sep 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangbo9674 Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

zhiqiu Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

zhangbo9674 Sep 17, 2021

Choose a reason for hiding this comment

Uh oh!

TCChenlong left a comment

Choose a reason for hiding this comment

Uh oh!

zhiqiu left a comment

Choose a reason for hiding this comment

Uh oh!

ForFishes left a comment

Choose a reason for hiding this comment

Uh oh!

TCChenlong left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zhangbo9674 commented Sep 7, 2021 •

edited

Loading

GuoxiaWang Sep 14, 2021 •

edited

Loading