English | 简体中文
This document introduce how to pre-train the ERNIE-4.5-300B-A47B model, the pre-training requires at least 96 NVIDIA H800 80G GPUs.
This repository provide a demo dataset on the path ./demo_data for quick start. If other dataset or user defined dataset are needed,
please reference this document Pretrain dataset.
The CUDA driver on your machine should be ≥525.60.13, and the CUDA toolkit 12.9 image is needed. You can use ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9 for training. And mpi environment should be deployed on the cluster.
mpirun python -m pip install -r requirements.txt --force-reinstall
After the environment is ready, pre-training on 2016 GPUs can be launched by:
mpirun bash scripts/train_2016_gpus.sh,
pre-training on 96 GPUs can be launched by:
mpirun bash scripts/train_96_gpus.sh
- Note that, the
master_ipandportintrain_2016_gpus.shortrain_96_gpus.shshould be replaced according to the real environment.
The toolkit provides a high-performance implementation of ERNIE-4.5-300B-A47B pre-training, including the hybrid parallelism training strategy and FP8 mixed precision optimization. More advanced optimizations are on the way.