Fix training validation convergence - v1.4 bug fix#16698
Fix training validation convergence - v1.4 bug fix#16698tensor-tang merged 1 commit intoPaddlePaddle:developfrom
Conversation
|
When would you send the latest convergence curve? The bugfix needed be cherry-pick to branch release/1.4. |
| bool is_persistable = | ||
| (p_persistables->find(vi) != p_persistables->end()) ? true : false; | ||
| if (is_test && is_persistable) { | ||
| if (!is_training && is_test && is_persistable) { |
There was a problem hiding this comment.
is_training and is_test are duplicated?
There was a problem hiding this comment.
This can be simplified, but it will need changes otherwhere as well. I am trying to minimize change at this late release stage. That may involve more time for review and validation. Will follow up to remove duplicates after the release.
|
|
||
| while (left < size && ops->at(left)->Type() == framework::kFeedOpType) { | ||
| while (left < size && (ops->at(left)->Type() == framework::kFeedOpType || | ||
| ops->at(left)->Type() == "read")) { |
There was a problem hiding this comment.
Do you add ops->at(left)->Type() == "read" enough?
|
|
||
| while (left < size && ops->at(left)->Type() == framework::kFeedOpType) { | ||
| while (left < size && (ops->at(left)->Type() == framework::kFeedOpType || | ||
| ops->at(left)->Type() == "read")) { |
There was a problem hiding this comment.
We can expand this when we see other cases, but we will need to know and understand the case first. So far it can handle the user cases we know.
| bool is_persistable = | ||
| (p_persistables->find(vi) != p_persistables->end()) ? true : false; | ||
| if (is_test && is_persistable) { | ||
| if (!is_training && is_test && is_persistable) { |
There was a problem hiding this comment.
This can be simplified, but it will need changes otherwhere as well. I am trying to minimize change at this late release stage. That may involve more time for review and validation. Will follow up to remove duplicates after the release.
fix training validation test=develop (PaddlePaddle#16698)
This is a bug fix for v1.4 release ("overfitting" issue).
In inference, the intermediate mkldnn layout was saved for performance boost. In training, the feature needs to be disabled as the weights will be updated each iteration.
This fix is to disable the intermediate layout save, so the validation will be done correctly. Thus this will resolve the "overfitting" issue. The updated convergence curve will be sent for review as I had collected more data points.
Another change in this PR is the "read" op was included to check inputs as py_reader uses read op instead of feed op.
CC. @mozga-intel @jianhang-liu