请问qwen3-235B-moe在4台H100(80GB)上能否训练成功,如果有的话,是否可以提供一下案例呢,因为我看官方给的样例是在96GB的卡上训练的。
May I ask if qwen3-235B moe can be successfully trained on 4 H100 (80GB) machines? If so, could you provide a case study? I noticed that the official sample was trained on 4 H20 (96GB) machines.