Nvidia Apex for FP16 calculations#36
Conversation
Included Compatibility with the Nvidia's Apex library, which can do Floating Point16 calculations. This gives significant speedup in training. This code has been tested on a single RTX2070. If the Nvidia Apex library is not found the code should run as normal. To install Apex: https://github.com/NVIDIA/apex#quick-start Known bugs: -Does not work with adam parameter -Gradient overflow keeps happening at the start, however it automatically reduces loss scale to 8192 after which this notification disappears examples: Loading: https://i.imgur.com/3nZROJz.png Training: https://i.imgur.com/Q2w52m7.png
|
@YacobBY Thank you for pull request! |
|
@YacobBY
best. |
|
@ku21fan Hello JeongHun, I understand. The Apex compatibility code has indeed added a lot of lines and FP16 is very new so not many people have the hardware + library to run it yet. Currently I'm trying out some new deep-learning optimizers such as Pytorch-Lightning and Nvidia Apex. I might be able to implement the apex fusedAdamOptimizer instead of the default adam option if Apex is available. This should fix the adam bug, however it still leads to a lot of extra code lines. If I can get the other Apex functionality working I'll try to get back to you with a neater and more modular version. In any case thanks for your open source code! it's really helpful and I've learned a lot from it. |
Included Compatibility with the Nvidia's Apex library, which can do Floating Point16 calculations. This gives significant speedup in training. This code has been tested on a single RTX2070. If the Nvidia Apex library is not found the code should run as normal.
To install Apex: https://github.com/NVIDIA/apex#quick-start
Known bugs:
-Does not work with adam parameter
-Gradient overflow keeps happening at the start, however it automatically reduces loss scale to 8192 after which this notification disappears
examples:
Loading: https://i.imgur.com/3nZROJz.png
Training: https://i.imgur.com/Q2w52m7.png