@@ -406,14 +406,14 @@ class BaseLossScaleOptimizer(metaclass=LossScaleOptimizerMetaclass):
406406 Args:
407407 inner_optimizer: The `tf.keras.optimizers.Optimizer` or
408408 `tf.keras.optimizers.experimental.Optimizer` instance to wrap.
409- dynamic: Bool indicating whether dynamic loss scaling is used. Defaults to
410- True. If True, the loss scale will be dynamically updated over time
411- using an algorithm that keeps the loss scale at approximately its
412- optimal value. If False, a single fixed loss scale is used and
413- `initial_scale` must be specified, which is used as the loss scale.
409+ dynamic: Bool indicating whether dynamic loss scaling is used. If True,
410+ the loss scale will be dynamically updated over time using an algorithm
411+ that keeps the loss scale at approximately its optimal value. If False,
412+ a single fixed loss scale is used and `initial_scale` must be
413+ specified, which is used as the loss scale.
414414 Recommended to keep as True, as choosing a fixed loss scale can be
415415 tricky. Currently, there is a small performance overhead to dynamic loss
416- scaling compared to fixed loss scaling.
416+ scaling compared to fixed loss scaling. Defaults to `True`.
417417 initial_scale: The initial loss scale. If `dynamic` is True, this defaults
418418 to `2 ** 15`. If `dynamic` is False, this must be specified and acts as
419419 the sole loss scale, as the loss scale does not change over time. When
@@ -422,11 +422,11 @@ class BaseLossScaleOptimizer(metaclass=LossScaleOptimizerMetaclass):
422422 quickly than a loss scale that is too low gets raised.
423423 dynamic_growth_steps: With dynamic loss scaling, every
424424 `dynamic_growth_steps` steps with finite gradients, the loss scale is
425- doubled. Defaults to 2000. If a nonfinite gradient is encountered, the
425+ doubled. If a nonfinite gradient is encountered, the
426426 count is reset back to zero, gradients are skipped that step, and the
427427 loss scale is halved. The count can be queried with
428428 `LossScaleOptimizer.dynamic_counter`. This argument can only be
429- specified if `dynamic` is True.
429+ specified if `dynamic` is True. Defaults to `2000`.
430430
431431 `LossScaleOptimizer` will occasionally skip applying gradients to the
432432 variables, in which case the trainable variables will not change that step.
0 commit comments