-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell #6557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
EungsopYoo
wants to merge
7
commits into
apache:master
Choose a base branch
from
EungsopYoo:HBASE-29039
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+651
−34
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
eb68bdc
optimize read performance for accumulated delete markers on the same …
terence-yoo fbf042e
handle visibilityLabelEnabled
terence-yoo 80044c2
handle scan with filter
terence-yoo 202ea7d
remove unnecessary codes
terence-yoo f6739e2
remove incorrect early return of MatchCode.SKIP
terence-yoo 74ecf71
handle failed test cases
terence-yoo 6390115
make visibilityLabelEnabled a instance variable
terence-yoo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EungsopYoo have you also considered Dual File Compaction #5545 ?
Could you also run some perf test comparing Dual File Compaction with this optimization? This might be really helpful.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani
I have reviewed Dual File Compaction you mentioned. This PR and Dual File Compaction have something in common, especially handling delete markers. But I think there are some differences.
This PR focuses on the accumulated delete markers of the same row or cell, but that handles delete marker of different rows or columns. And this PR can optimize read from both of MemStore and StoreFiles, but that can optimize read from StoreFiles only.
So I think they are complementary and can be used all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EungsopYoo, this is what I was also expecting.
On the Jira https://issues.apache.org/jira/browse/HBASE-25972, Kadir has also provided how full scan is improvement is observed using PE (second comment on the Jira). Could you also run the same steps to see how much improvement you observe using this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani
PE does not have the test case Put, Delete and Get on the same row. Should I add the new test case and run it, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary because the steps mentioned in the Jira will take care of adding many delete markers so you can follow the exact same steps. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the same tests again except KEEP_DELETED_CELL is set false.
master
this PR
It looks like there is some performance degradation on the result 3. I will dig into it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/HBASE-29039
The performance tests on the Jira description are just the cases of reading from MemStore only. So I have run new performance tests, reading from StoreFiles only with or without dual file compaction.
master - without dual file compaction
master - with dual file compaction
this PR - without dual file compaction
this PR - with dual file compaction
The results show that the optimization of this PR works on reading from StoreFiles too, even without dual file compaction. What do you think about this results?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/EungsopYoo/hbase/blob/63901155caf5c226b02564128669234c08251e8d/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L81-L99
The slight performance degradation is due to removing of early return of MatchCode.SKIP in the normal cases. Because of the removal of early return,
checkDeleted()method is executed more than before, and then some burden of computation is added. I found the result by removing added code blocks one by one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Line 644 in 6390115
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Lines 758 to 768 in 6390115
With some more digging, I found the actual cause of the degradation is return value of
matcher.match(). It is very lightweight to process the return value of SKIP. But it is much more heavier to process the return value of SEEK_NEXT_COL.