@@ -407,14 +407,14 @@ class BaseLossScaleOptimizer(metaclass=LossScaleOptimizerMetaclass):
407407 Args:
408408 inner_optimizer: The `tf.keras.optimizers.Optimizer` or
409409 `tf.keras.optimizers.experimental.Optimizer` instance to wrap.
410- dynamic: Bool indicating whether dynamic loss scaling is used. Defaults to
411- True. If True, the loss scale will be dynamically updated over time
412- using an algorithm that keeps the loss scale at approximately its
413- optimal value. If False, a single fixed loss scale is used and
414- `initial_scale` must be specified, which is used as the loss scale.
410+ dynamic: Bool indicating whether dynamic loss scaling is used. If `True`,
411+ the loss scale will be dynamically updated over time using an algorithm
412+ that keeps the loss scale at approximately its optimal value. If False,
413+ a single fixed loss scale is used and `initial_scale` must be
414+ specified, which is used as the loss scale.
415415 Recommended to keep as True, as choosing a fixed loss scale can be
416416 tricky. Currently, there is a small performance overhead to dynamic loss
417- scaling compared to fixed loss scaling.
417+ scaling compared to fixed loss scaling. Defaults to `True`.
418418 initial_scale: The initial loss scale. If `dynamic` is True, this defaults
419419 to `2 ** 15`. If `dynamic` is False, this must be specified and acts as
420420 the sole loss scale, as the loss scale does not change over time. When
@@ -423,11 +423,11 @@ class BaseLossScaleOptimizer(metaclass=LossScaleOptimizerMetaclass):
423423 quickly than a loss scale that is too low gets raised.
424424 dynamic_growth_steps: With dynamic loss scaling, every
425425 `dynamic_growth_steps` steps with finite gradients, the loss scale is
426- doubled. Defaults to 2000. If a nonfinite gradient is encountered, the
426+ doubled. If a nonfinite gradient is encountered, the
427427 count is reset back to zero, gradients are skipped that step, and the
428428 loss scale is halved. The count can be queried with
429429 `LossScaleOptimizer.dynamic_counter`. This argument can only be
430- specified if `dynamic` is True.
430+ specified if `dynamic` is True. Defaults to `2000`.
431431
432432 `LossScaleOptimizer` will occasionally skip applying gradients to the
433433 variables, in which case the trainable variables will not change that step.
0 commit comments