PPO Training num_gpus mismatch

Hi authors,

In the Arxiv paper, I see that PPO was trained using 16× GH200s. However, in your PPO script, num_gpus=8.

Is there a typo here, or am I missing something? What would be the best way to replicate the results in your paper?