Commit ed8c436
allow seperate learning rate "muon_lr" and "adam_lr" for muon optimizer (deepspeedai#7658)
This PR allows seperate learning rate for muon and adam part of the Muon
optimizer. Following up
deepspeedai#7657
Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Luke Friedrichs <[email protected]>1 parent ad66ab2 commit ed8c436
1 file changed
+10
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1599 | 1599 | | |
1600 | 1600 | | |
1601 | 1601 | | |
1602 | | - | |
| 1602 | + | |
1603 | 1603 | | |
1604 | | - | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
1605 | 1608 | | |
1606 | 1609 | | |
1607 | 1610 | | |
1608 | | - | |
| 1611 | + | |
1609 | 1612 | | |
1610 | | - | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
1611 | 1617 | | |
1612 | 1618 | | |
1613 | 1619 | | |
| |||
0 commit comments