Skip to content

sbintuitions/sparse-upcycling-scaling-laws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Laws for Upcycling Mixture-of-Experts Language Models

This is the official repository for our ICML'25 paper Scaling Laws for Upcycling Mixture-of-Experts Language Models, containing code and data to reproduce analyses of the paper.

Structure

  • data: contains the data obtained from our scaling law experiments.
    • data/result_8x.txt: results for training Mixtral-like MoE from scratch.
    • data/result.txt: results for training dense LLM from scratch.
    • data/result_upcycle_8x_topk_2.txt: results for upcycling Mixtral-like MoE from scratch.
    • data/sparsity.csv: experimental data for fitting the sparsity-active parameter scaling law.
    • data/ablate*: results for various ablation studies.
  • analysis.ipynb: contains example fitting the joint scaling law for Mixtral-like MoE.
  • analyze_sparsity.ipynb: contains example fitting the sparsity-active parameter scaling law.

License

This implementation is licensed under the Apache License 2.0.

Citation

If you find this work helpful, please consider citing our paper:

@inproceedings{liew2025scaling,
  title = {Scaling Laws for Upcycling Mixture-of-Experts Language Models},
  booktitle = {Forty-Second International Conference on Machine Learning},
  author = {Liew, Seng Pei and Kato, Takuya and Takase, Sho},
  year = {2025}
}

About

[ICML 2025] Scaling Laws for Upcycling Mixture-of-Experts Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors