Conversation
|
Hello @pruksmhc! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
You can repair most issues by installing black and running: Comment last updated at 2020-02-19 18:50:10 UTC |
|
@pyeres Mind taking the lead on this? |
…taparallel_metric_calculation Conflicts: jiant/trainer.py jiant/utils/utils.py
…com/nyu-mll/jiant into fix_dataparallel_metric_calculation
|
Hey @pruksmhc, fyi, running ReCoRD like this:
It looks like there's an exception related to DataParallel: |
|
Hm, let me debug this, thanks for catching! |
|
Alright, I tested it on ReCoRD and STS-B as well as RTE now. @pyeres ready for another look |
|
If I understand these changes correctly, in Instead of adding special keys to the |
|
Separate from the implementation details above, because multi-GPU support touches many code paths (and address a bug), with your final PR can you please show a performance comparison of multi-GPU vs single-GPU on some task(s). For PR #990 @HaokunLiu ran a performance comparison across a a set of common tasks — running these should make for a good comparison (what Haokun already reported in #990 vs. including your changes and using a single-gpu vs. including your changes and using multi-gpu). If you have trouble getting cluster time, just pass me a script and I can run some of these experiments. |
|
Makes sense! |
|
Hm, for some reason, I'm getting accuracy of 0.56 using 1 GPU in jiant master, but 0.527 with 1 GPU in this branch. Debugging |
…com/nyu-mll/jiant into fix_dataparallel_metric_calculation
|
Ready! @pyeres |
pyeres
left a comment
There was a problem hiding this comment.
Please see the open comment in the conversation thread for a small change requests, and please let me know when you're ready for a final look and approval.
|
Heads up: for CCG it looks like there's an issue with the calculation of training and validation accuracy (accuracy is stuck at zero while training and validation loss is decreasing). If it would be helpful I'm happy to debug this further, please let me know. |
|
These changes have been successfully benchmarked on tasks from GLUE, SuperGLUE and the benchmarks provided with PR #990 in both single-GPU and multi-GPU mode:
*performance on |
pyeres
left a comment
There was a problem hiding this comment.
The benchmark tests for this branch look good. Thanks for making this work!
|
Thanks for running the extensive benchmark tests! |
* moving update_metrics to outside scope of dataparallel * fixing micro_avg calculation * undo debugging * Fixing tests, moving update_metrics out of other tasks * remove extraneous change * fix multiple choice dataparallel forward * adding update_metrics abstraction * delete update_metrics_ notation * spelling check * black formatting * fixing tests * Adding batch_size constraints to multi-GPU setting * adding documentation * adding batch size test * black correct version * Fixing batch size assertion * generalize batch size assertion for more than 2 GPU setting * reducing label loops in code * fixing span forward * Fixing span prediction forward for multi-GPU * fix commonsenseQA forward * adding function documentation * resolving nits, fixing seq_gen forward * remove nit * fixing batch_size assert and SpanPrediction task * Remove debugging * Fix batch size mismatch multi-GPU test * Fix order of assert checking for batch size mismatch * reverse batch size checking * fix SpanPrediction update_metrics for single-GPU * fix update_diagnostic_metric and ccg acc Co-authored-by: Check your git settings! <chris@chris-laptop>
Addressing #947
The problem is there is a race condition with dataparallelism to do with scorers and variable updating where metrics can exceed 1.0. To fix this, this PR moves task.update_metrics() in trainer.py as opposed to the various forward functions in jiant/models.py.