Skip to content

Chongjie-Si/AdaMuon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdaMuon

This is the official repository for the paper AdaMuon: Adaptive Muon Optimizer.

Introduction

AdaMuon is an effective optimizer based on Muon. It can achieve more than 40% training efficiency compared to AdamW.

Quick Start

This repository contains two projects: one is the GPT-2 experiments, and the other is the open-sourced Megatron-LM code, which we included to facilitate large-scale experiments.

To use AdaMuon in your own training pipeline on other architectures and datasets, use the following pseudo code as an example:

from opt_config import configure_optimizers

# Model
model = Model()

# Optimizer
optimizer = configure_optimizers(model.parameters(), weight_decay=0.1, learning_rate=6e-4)

# Training
for epoch in range(epochs):
    for X, Y in data_loader:
        # standard training code
        logits, loss = model(X, Y)
        loss.backward()
        # ...

Performance




License

This repository is licensed under the Apache 2.0 license. See the LICENSE file for more details.

Citation

@article{si2025adamuon,
  title={AdaMuon: Adaptive Muon Optimizer},
  author={Si, Chongjie and Zhang, Debing and Shen, Wei},
  journal={arXiv preprint arXiv:2507.11005},
  year={2025}
}

Contact

If you have any questions, please raise an issue or contact us at [email protected].

About

The official repository for AdaMuon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages