-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29254 StoreScanner returns incorrect row after flush due to topChanged behavior #6900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| */ | ||
| private NextState needToReturn(List<? super ExtendedCell> outResult) { | ||
| if (!outResult.isEmpty() && topChanged) { | ||
| if (topChanged) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The !outResult.isEmpty is there from the beginning. So would you mind explaining about why here we need to remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s assume the data stored in HBase is as follows:
(1) row0/family2:qf1/DeleteColumn
(2) row0/family2:qf1/Put/value2
(3) row1/family1:qf1/Put/value2
(4) row1/family2:qf1/Put/value2
Now, suppose a user starts scanning from row0.
In RegionScannerImpl#nextInternal, when the current cell’s row is row0, after reading entry (2) in StoreScanner, if a flush happens, a topChanged occurs (Storescanner.peek() is changed where before ...), and the value of StoreScanner’s heap.peek() becomes (4) row1/family2:qf1/Put/value2.
Since it is the next row, StoreScanner should return at that point — but it fails to recognize that it has moved to the next row because outResult is empty, and ends up including the new row in the result.
Then, in RegionScannerImpl, it sees that nextKv’s row is different from the current cell’s row, and returns (since it has moved to a different row).
As a result, even though (3) and (4) belong to the same row (row1), they are returned to the client as if they were from different rows.
(3) and (4) should be combined into a single Result, but they end up being returned as separate Result instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the root cause of the issue described in this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I understand the problem.
Let's see how to reproduce the problem in UT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing the code!
let me know if there's anything I should take care of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that there is a behavior change, but if we can still get 3 cells at once, it does not make client get partial rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that at the StoreScanner level, the behavior doesn't seem immediately problematic.
However, the issue becomes visible on the client side when multiple column families are involved — the row ends up getting split.
This happens because the result from row2 is included in the first next() call, which is supposed to return only row1.
As a result, the client receives fragmented results for row2.
To help illustrate the issue, I’ve added a reproducible test case.
If you run TestTopChanged.testTopChanged on the feature/bug-demo-HBASE-29254 branch, you can observe the behavior directly.
Here’s what you’ll see:
- With the
!outResult.isEmpty()condition inneedToReturn(current behavior):RESULT: rowkey = row2, cf = cf2 RESULT: rowkey = row2, cf = cf1 - Without the
!outResult.isEmpty()condition (modified behavior):RESULT: rowkey = row2, cf = cf1, cf2
It would be great if you could take a look — I know it's a bit of a hassle, but your feedback would be really helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are no data for row1 in cf2? What is the current way to prevert loading row2?
I mean we have delete row1 in cf1, and row2 in cf1 and cf2, how can we make sure that we do not return row2 when calling cf2.next at the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn’t fully understand the question.
If there is no data for cf2 in row1, loading of row2 is prevented by moreCellsInRow in RegionScannerImpl.
This is because:
- currentRowCell:
row1/cf1:cq1/1745317264666/Put/vlen=10/seqid=4 - nextKv:
row2/cf1:cq1/1745317264674/Put/vlen=10/seqid=5
By "What if there are no data for row1 in cf2? What is the current way to prevent loading row2?", are you asking about the current mechanism that prevents row2 from being included in the result while the StoreScanner is reading row1 and topChanged has not occurred?
This is handled in ScanQueryMatcher.preCheck using the rowComparator.
next, StoreScanner (org.apache.hadoop.hbase.regionserver)
match, NormalUserScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)
preCheck, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)
When the currentRow is row1/cf1:cq1/1745314001315/DeleteColumn/vlen=0/seqid=5 and the cell is row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7, the MatchCode becomes DONE.
However, when topChanged occurs (which is the problematic condition), the currentRow becomes row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7 and the cell is row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7.
In ScanQueryMatcher.setToNewRow, the currentRow is set to row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7.
next, StoreScanner (org.apache.hadoop.hbase.regionserver)
seekOrSkipToNextColumn, StoreScanner (org.apache.hadoop.hbase.regionserver)
seekAsDirection, StoreScanner (org.apache.hadoop.hbase.regionserver)
reseek, StoreScanner (org.apache.hadoop.hbase.regionserver)
reopenAfterFlush, StoreScanner (org.apache.hadoop.hbase.regionserver)
resetQueryMatcher, StoreScanner (org.apache.hadoop.hbase.regionserver)
setToNewRow, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)
In this condition, "I mean we have delete row1 in cf1, and row2 in cf1 and cf2, how can we make sure that we do not return row2 when calling cf2.next at the first time?", it is determined as a different row by ScanQueryMatcher.preCheck.
When using a currentRow that hasn't been flushed to disk yet, it may change after a flush when the scanner is reopened (topChanged).
If outResult contains elements, there is no issue. However, if outResult is empty, the return does not occur, and the loop iterates again, which causes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the detailed review and your follow-up questions — I really appreciate it.
I feel that my earlier explanation may not have been clear enough.
To clarify: the main issue is not with partial results, but rather with a single row being split into two separate results.
This split happens because after a topChanged event, if outResult is empty, StoreScanner doesn't correctly detect that the row has changed.
(Upon reflecting on our previous discussion, I realized that my initial explanation may not have been as clear as it should have been. When I first encountered this issue, I thought it was related to partial results, which is why I initially mentioned that in thread. However, after further debugging, the core issue is actually that a single row is being split into two separate Results. I apologize if my earlier explanation was confusing, and I'm sorry for not making this clearer sooner.)
I’ve already added a reproducible test case (TestTopChanged.testTopChanged in the feature/bug-demo-HBASE-29254 branch) to demonstrate this behavior.
Please let me know if there’s anything else you would like me to work on to help move this forward.
Thanks again for your time and support!
| topChanged = false; | ||
| ScanQueryMatcher.MatchCode qcode = matcher.match(cell); | ||
|
|
||
| if (hook != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only place to reproduce the problem? I prefer we override some existing method of StoreScanner to do the special logic so the modification could be limited in test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to introduce a hook class, but I needed to trigger a flush at a specific point within the LOOP.
In HBASE-18295, the test used a MyList class to achieve this, and I wanted to use a similar approach, but I couldn’t find a way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9
At first, I couldn’t think of a way, but after giving it some more thought, I realized that we could remove the hook by taking a different approach.
I created a MyKeyValueHeap class that inherits from KeyValueHeap, and modified it so that a flush is triggered at a specific point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will verify it soon.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
6f949db to
d3869d7
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…Changed behavior (#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5) (cherry picked from commit 524825e)
…Changed behavior (#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5)
…Changed behavior (#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5)
…Changed behavior (#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5) (cherry picked from commit 524825e)
…Changed behavior (apache#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5)
…Changed behavior (apache#6900) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 99bd5b5) (cherry picked from commit 524825e)
HBASE-29254