-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Encountered issues during cluster computations. #32335
-
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments · 11 replies
-
|
do you have any issues if you use 32 procs on a single node? When distributing across four nodes, what is the "critical" number of processes per node at which you get an error? |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
I've adjusted both my runtime and compilation environments to |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Hello Can you please run the diagnostic script in moose/scripts instead |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Hello This is log after running moose/scripts/diagnostic.sh Meanwhile, I executed I'm afraid I must trouble you further @smpark7 . Might I have a look at your parallel cross-node scheduling script? I've set up my compiler environment manually in an offline environment rather than using Thanks |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
So the problem only occurs on a compute node? Can you try to run the diagnostics script in a job submission script? See if the diags are different |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Hello
It appears to be functioning without issue. I am pondering whether the error might stem from the fact that I set up the environment myself, rather than using a tool like Thanks |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Hello Many thanks for your assistance and advice. |
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 2



Hello
To my surprise, it appears to be an issue stemming from deeper layers of MPICH usage. By employing the
export FI_PROVIDER=verbsdirective, I seem to have resolved the matter, and it now functions correctly across nodes.Many thanks for your assistance and advice.
Jiahui Lv