Skip to content

DDP nl fix#5332

Merged
glenn-jocher merged 1 commit intomasterfrom
fix/ddp_nl
Oct 25, 2021
Merged

DDP nl fix#5332
glenn-jocher merged 1 commit intomasterfrom
fix/ddp_nl

Conversation

@glenn-jocher
Copy link
Copy Markdown
Member

@glenn-jocher glenn-jocher commented Oct 25, 2021

Fix for #5160 (comment)

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved multi-GPU training support by ensuring model parameter scaling accounts for the wrapped model.

📊 Key Changes

  • Modified the retrieval of the nl (number of detection layers) to use de_parallel function when the model is in Distributed Data Parallel (DDP) mode.

🎯 Purpose & Impact

  • Purpose: The change ensures that when a model is being used across multiple GPUs, the detection layers count is correctly retrieved even when the model is wrapped for parallel processing.
  • Impact: This improvement could lead to more accurate scaling of hyperparameters (hyp) during multi-GPU training, which enhances model performance and training stability. Users employing DDP will benefit from accurate hyperparameters adjustments irrespective of the number of GPUs used. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker Multi-GPU DDP training hang on destroy_process_group() with wandb option 3

1 participant