Skip to content

Error/Gradient clipping survey and plan #6510

@reyoung

Description

@reyoung

Gradient Clipping

Exploding gradients can be handled by gradient clipping. Before optimizing a parameter, we can clip its gradient to stabilize the training process.

The simplest clipping is just clip_by_value. It means we will limit the values of tensor within [clip_min, clip_max]. Every value of this tensor is larger than clip_max, will be clip_max. Every value of this tensor is less than clip_min, will be clip_min.

Just clip a value is not good because it will change the direction of gradients. If we do not want to change the direction of one gradient of the parameter, we can just scale the gradient and make the l2-norm of this gradient is less than a limit.

If we want the whole direction of gradients are not changed, we can scale all gradients and make the l2-norm of them is less than a limit.

So, there are two methods will be implemented.

  • clip_by_value
  • clip_by_l2_norm, which will takes a list of gradient. There could be two higher level API clip_by_local_l2_norm and clip_by_global_l2_norm, which will pass the current gradient or all gradients to clip_by_l2_norm

Error clipping

Just clipping the gradient after backwards cannot handle the exploding while backwards. Gradients could have been exploded during calculate the backward stage.

There is a trick in the previous Paddle called error clipping. It just clipping the gradient of hidden layers while backwards. Tensorflow does not provide this feature by default, but a user could implement this feature by hacking backwards method.

We should make our backward customizable in Python to support error clipping or other manipulation.

Maybe we can add a backward in Python and takes a Python callback. If the user does not provide any callback, it just generates backward operator in normal. If user customizes that callback, users can create error clipping by themselves.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions