Skip to content

Fix training validation convergence - v1.4 bug fix#16698

Merged
tensor-tang merged 1 commit intoPaddlePaddle:developfrom
baojun-nervana:fix_training
Apr 9, 2019
Merged

Fix training validation convergence - v1.4 bug fix#16698
tensor-tang merged 1 commit intoPaddlePaddle:developfrom
baojun-nervana:fix_training

Conversation

@baojun-nervana
Copy link
Contributor

@baojun-nervana baojun-nervana commented Apr 7, 2019

This is a bug fix for v1.4 release ("overfitting" issue).
In inference, the intermediate mkldnn layout was saved for performance boost. In training, the feature needs to be disabled as the weights will be updated each iteration.

This fix is to disable the intermediate layout save, so the validation will be done correctly. Thus this will resolve the "overfitting" issue. The updated convergence curve will be sent for review as I had collected more data points.

Another change in this PR is the "read" op was included to check inputs as py_reader uses read op instead of feed op.

CC. @mozga-intel @jianhang-liu

@tensor-tang
Copy link
Contributor

When would you send the latest convergence curve?

The bugfix needed be cherry-pick to branch release/1.4.

bool is_persistable =
(p_persistables->find(vi) != p_persistables->end()) ? true : false;
if (is_test && is_persistable) {
if (!is_training && is_test && is_persistable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_training and is_test are duplicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified, but it will need changes otherwhere as well. I am trying to minimize change at this late release stage. That may involve more time for review and validation. Will follow up to remove duplicates after the release.


while (left < size && ops->at(left)->Type() == framework::kFeedOpType) {
while (left < size && (ops->at(left)->Type() == framework::kFeedOpType ||
ops->at(left)->Type() == "read")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you add ops->at(left)->Type() == "read" enough?


while (left < size && ops->at(left)->Type() == framework::kFeedOpType) {
while (left < size && (ops->at(left)->Type() == framework::kFeedOpType ||
ops->at(left)->Type() == "read")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can expand this when we see other cases, but we will need to know and understand the case first. So far it can handle the user cases we know.

bool is_persistable =
(p_persistables->find(vi) != p_persistables->end()) ? true : false;
if (is_test && is_persistable) {
if (!is_training && is_test && is_persistable) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified, but it will need changes otherwhere as well. I am trying to minimize change at this late release stage. That may involve more time for review and validation. Will follow up to remove duplicates after the release.

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Can address the comments in develop branch

@tensor-tang tensor-tang merged commit 1c8b34d into PaddlePaddle:develop Apr 9, 2019
@baojun-nervana baojun-nervana deleted the fix_training branch April 9, 2019 04:07
colourful-tree added a commit to colourful-tree/Paddle that referenced this pull request Apr 9, 2019
fix training validation test=develop (PaddlePaddle#16698)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants