Skip to content

Upgrading to 0.4.9 stuck multi-gpu training #2635

@Nic-Ma

Description

@Nic-Ma

🐛 Bug description

Hi @vfdev-5 ,

We upgraded ignite from 0.4.8 to 0.4.9 in MONAI 0.9.1 recently: Project-MONAI/MONAI#4605.
Then got the issue report from user:
something changed related to multi-gpu training between 0.9.1 and 0.9.0... monailabel multi-training is not working.. SupervisedTrainer is getting stuck to run inference step to compute the loss.. after debugging a bit.. i see this is the problem... pytorch-ignite==0.4.8 vs pytorch-ignite==0.4.9 when I downgrade it, all is ok..

CC @wyli @SachidanandAlle

Environment

  • PyTorch Version (e.g., 1.4): 1.12.0
  • Ignite Version (e.g., 0.3.0): 0.4.9
  • OS (e.g., Linux): ubuntu
  • How you installed Ignite (conda, pip, source): pip
  • Python version: 3.8
  • Any other relevant information: downgrade to 0.4.8 then everything goes fine

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions