Code and data setup for our paper: Pretrained Models for Multilingual Federated Learning by *Orion Weller, *Marc Marone, Vladimir Braverman, Dawn Lawrie, and Benjamin Van Durme. Many thanks to the great developers at the flwr team who have prepared excellent examples.
NOTE: we used poetry following the advice of the flwr framework.
- Install poetry (
bash enviroment_setup/install_poetry.sh) - Activate poetry (
bash enviroment_setup/activate_poetry.sh) - Install dependecies (
poetry install). NOTE: this takes a few minutes.
- After deciding which data setup you would like, look for the corresponding dataset in
create_dataFor the sake of this readme, we will use themtntdata. cdinto the folder (cd create_data/make_mtnt_data)- Follow the instructions in the
readmelocated in the folder. It will typically have scripts for downloading, preprocessing, splitting, and then moving the data into the final location for the model.
- Make sure the enviroment and the data have been set up as above.
- Depending on the type of model you want to train (classification, LM, or MT) see the corresponding scripts in
bin/run_fl_{mt,tc,lm}.sh. Each script contains information about how to run centralized, non-IID FL, or IID FL learning, as well as random initialization and/or evaluation. - To evaluate BLEU scores, be sure to install the sacrebleu script and evaluating using the format described in
bin/run_sacrebleu_eval.sh.
If you found this code or paper helpful, please consider citing:
@inproceedings{Weller2022PretrainedMF,
title={Pretrained Models for Multilingual Federated Learning},
author={Orion Weller and Marc Marone and Vladimir Braverman and Dawn J Lawrie and Benjamin Van Durme},
booktitle={Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
year={2022}
}