Skip to content

Conversation

@hemantk-12
Copy link
Contributor

@hemantk-12 hemantk-12 commented May 4, 2023

What changes were proposed in this pull request?

  • Added integration test for SnapDiff when OM leader failover happens
  • Explicitly close snapshotDiffManager and invalidate snapshotCache upon OmSnapshotManager#close.
  • Shutdown snapDiffExecutor and sstDumpToolExecutor in SnapshotDiffManager#close.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8389

How was this patch tested?

Ran newly added tests locally and verified they pass.

@adoroszlai adoroszlai added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label May 4, 2023
@umamaheswararao umamaheswararao requested a review from smengcl May 8, 2023 16:11
@hemantk-12 hemantk-12 marked this pull request as draft May 8, 2023 23:36
@hemantk-12 hemantk-12 force-pushed the HDDS-8389 branch 4 times, most recently from 1313363 to 116cd7c Compare May 10, 2023 02:05
@hemantk-12 hemantk-12 marked this pull request as ready for review May 11, 2023 00:58
@smengcl
Copy link
Contributor

smengcl commented May 11, 2023

@swamirishi Could you take a look at this?

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hemantk-12 . Great test additions. Nicely written.

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments


// If job was IN_PROGRESS or DONE state when OM restarted, it should be
// DONE by this time.
// If job FAILED during crash (which mostly happens in the test because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we have a unit test case for the FAILED state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

});
}

assertEquals(1, snapshotIds.stream().distinct().count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is snapshotId assertion enough? I guess we should also check rocks db contents.

Copy link
Contributor Author

@hemantk-12 hemantk-12 May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking only snapshotId should be enough. Test is to make sure that snapshotId is consistent among all the OM nodes. We don't care about it value.

What do you mean by "check rocks db contents"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list of sst files should be same I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should be tested as part of this test.

@smengcl
Copy link
Contributor

smengcl commented May 18, 2023

There are some conflicts in the test due to HDDS-8645.

@hemantk-12
Copy link
Contributor Author

hemantk-12 commented May 22, 2023

There are some conflicts in the test due to HDDS-8645.

Resolved conflicts and also ran test 100 times to make sure there is no flakiness.

It passed all the time (Workflow failed because of timeout) : https://github.com/hemantk-12/ozone/actions/runs/5049594952/jobs/9059656428

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@smengcl smengcl merged commit 5a8cc7b into apache:master May 23, 2023
@smengcl
Copy link
Contributor

smengcl commented May 23, 2023

Thanks @hemantk-12 for the test addition, and @swamirishi for the review.

@hemantk-12 hemantk-12 deleted the HDDS-8389 branch October 28, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants