Skip to content

Commit ed8c436

Browse files
delocksfc-gh-truwase
authored andcommitted
allow seperate learning rate "muon_lr" and "adam_lr" for muon optimizer (deepspeedai#7658)
This PR allows seperate learning rate for muon and adam part of the Muon optimizer. Following up deepspeedai#7657 Signed-off-by: Guokai Ma <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Luke Friedrichs <[email protected]>
1 parent ad66ab2 commit ed8c436

File tree

1 file changed

+10
-4
lines changed

1 file changed

+10
-4
lines changed

deepspeed/runtime/engine.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1599,15 +1599,21 @@ def _configure_basic_optimizer(self, model_parameters):
15991599
param_groups = []
16001600
if muon_params:
16011601
accepted_parameters = dict()
1602-
for key in ["lr", "momentum", "weight_decay"]:
1602+
for key in ["lr", "momentum", "weight_decay", "muon_lr"]:
16031603
if key in optimizer_parameters:
1604-
accepted_parameters[key] = optimizer_parameters[key]
1604+
if key == "muon_lr": # muon_lr will override lr
1605+
accepted_parameters['lr'] = optimizer_parameters[key]
1606+
else:
1607+
accepted_parameters[key] = optimizer_parameters[key]
16051608
param_groups.append(dict(params=muon_params, use_muon=True, **accepted_parameters))
16061609
if non_muon_params:
16071610
accepted_parameters = dict()
1608-
for key in ["lr", "betas", "eps", "weight_decay"]:
1611+
for key in ["lr", "betas", "eps", "weight_decay", "adam_lr"]:
16091612
if key in optimizer_parameters:
1610-
accepted_parameters[key] = optimizer_parameters[key]
1613+
if key == "adam_lr": # adam_lr will override lr
1614+
accepted_parameters['lr'] = optimizer_parameters[key]
1615+
else:
1616+
accepted_parameters[key] = optimizer_parameters[key]
16111617
param_groups.append(dict(params=non_muon_params, use_muon=False, **accepted_parameters))
16121618
optimizer = MuonWithAuxAdam(param_groups)
16131619
else:

0 commit comments

Comments
 (0)