Skip to content

Conversation

@mwkang
Copy link
Contributor

@mwkang mwkang commented Apr 11, 2025

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

*/
private NextState needToReturn(List<? super ExtendedCell> outResult) {
if (!outResult.isEmpty() && topChanged) {
if (topChanged) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !outResult.isEmpty is there from the beginning. So would you mind explaining about why here we need to remove this check?

Copy link
Contributor Author

@mwkang mwkang Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s assume the data stored in HBase is as follows:

(1) row0/family2:qf1/DeleteColumn
(2) row0/family2:qf1/Put/value2

(3) row1/family1:qf1/Put/value2
(4) row1/family2:qf1/Put/value2

Now, suppose a user starts scanning from row0.

In RegionScannerImpl#nextInternal, when the current cell’s row is row0, after reading entry (2) in StoreScanner, if a flush happens, a topChanged occurs (Storescanner.peek() is changed where before ...), and the value of StoreScanner’s heap.peek() becomes (4) row1/family2:qf1/Put/value2.

Since it is the next row, StoreScanner should return at that point — but it fails to recognize that it has moved to the next row because outResult is empty, and ends up including the new row in the result.

Then, in RegionScannerImpl, it sees that nextKv’s row is different from the current cell’s row, and returns (since it has moved to a different row).

As a result, even though (3) and (4) belong to the same row (row1), they are returned to the client as if they were from different rows.

(3) and (4) should be combined into a single Result, but they end up being returned as separate Result instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the root cause of the issue described in this thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand the problem.

Let's see how to reproduce the problem in UT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the code!
let me know if there's anything I should take care of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that there is a behavior change, but if we can still get 3 cells at once, it does not make client get partial rows?

Copy link
Contributor Author

@mwkang mwkang Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that at the StoreScanner level, the behavior doesn't seem immediately problematic.
However, the issue becomes visible on the client side when multiple column families are involved — the row ends up getting split.

This happens because the result from row2 is included in the first next() call, which is supposed to return only row1.
As a result, the client receives fragmented results for row2.

To help illustrate the issue, I’ve added a reproducible test case.
If you run TestTopChanged.testTopChanged on the feature/bug-demo-HBASE-29254 branch, you can observe the behavior directly.

Here’s what you’ll see:

  • With the !outResult.isEmpty() condition in needToReturn (current behavior):
    RESULT: rowkey = row2, cf = cf2  
    RESULT: rowkey = row2, cf = cf1  
    
  • Without the !outResult.isEmpty() condition (modified behavior):
    RESULT: rowkey = row2, cf = cf1, cf2  
    

It would be great if you could take a look — I know it's a bit of a hassle, but your feedback would be really helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are no data for row1 in cf2? What is the current way to prevert loading row2?

I mean we have delete row1 in cf1, and row2 in cf1 and cf2, how can we make sure that we do not return row2 when calling cf2.next at the first time?

Copy link
Contributor Author

@mwkang mwkang Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn’t fully understand the question.

If there is no data for cf2 in row1, loading of row2 is prevented by moreCellsInRow in RegionScannerImpl.
This is because:

  • currentRowCell: row1/cf1:cq1/1745317264666/Put/vlen=10/seqid=4
  • nextKv: row2/cf1:cq1/1745317264674/Put/vlen=10/seqid=5

By "What if there are no data for row1 in cf2? What is the current way to prevent loading row2?", are you asking about the current mechanism that prevents row2 from being included in the result while the StoreScanner is reading row1 and topChanged has not occurred?

This is handled in ScanQueryMatcher.preCheck using the rowComparator.

next, StoreScanner (org.apache.hadoop.hbase.regionserver)
  match, NormalUserScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)
    preCheck, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)

When the currentRow is row1/cf1:cq1/1745314001315/DeleteColumn/vlen=0/seqid=5 and the cell is row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7, the MatchCode becomes DONE.

However, when topChanged occurs (which is the problematic condition), the currentRow becomes row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7 and the cell is row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7.

In ScanQueryMatcher.setToNewRow, the currentRow is set to row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7.

next, StoreScanner (org.apache.hadoop.hbase.regionserver)
  seekOrSkipToNextColumn, StoreScanner (org.apache.hadoop.hbase.regionserver)
    seekAsDirection, StoreScanner (org.apache.hadoop.hbase.regionserver)
      reseek, StoreScanner (org.apache.hadoop.hbase.regionserver)
        reopenAfterFlush, StoreScanner (org.apache.hadoop.hbase.regionserver)
          resetQueryMatcher, StoreScanner (org.apache.hadoop.hbase.regionserver)
            setToNewRow, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher)

In this condition, "I mean we have delete row1 in cf1, and row2 in cf1 and cf2, how can we make sure that we do not return row2 when calling cf2.next at the first time?", it is determined as a different row by ScanQueryMatcher.preCheck.

When using a currentRow that hasn't been flushed to disk yet, it may change after a flush when the scanner is reopened (topChanged).
If outResult contains elements, there is no issue. However, if outResult is empty, the return does not occur, and the loop iterates again, which causes a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the detailed review and your follow-up questions — I really appreciate it.
I feel that my earlier explanation may not have been clear enough.

To clarify: the main issue is not with partial results, but rather with a single row being split into two separate results.
This split happens because after a topChanged event, if outResult is empty, StoreScanner doesn't correctly detect that the row has changed.

(Upon reflecting on our previous discussion, I realized that my initial explanation may not have been as clear as it should have been. When I first encountered this issue, I thought it was related to partial results, which is why I initially mentioned that in thread. However, after further debugging, the core issue is actually that a single row is being split into two separate Results. I apologize if my earlier explanation was confusing, and I'm sorry for not making this clearer sooner.)

I’ve already added a reproducible test case (TestTopChanged.testTopChanged in the feature/bug-demo-HBASE-29254 branch) to demonstrate this behavior.

Please let me know if there’s anything else you would like me to work on to help move this forward.

Thanks again for your time and support!

topChanged = false;
ScanQueryMatcher.MatchCode qcode = matcher.match(cell);

if (hook != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place to reproduce the problem? I prefer we override some existing method of StoreScanner to do the special logic so the modification could be limited in test code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to introduce a hook class, but I needed to trigger a flush at a specific point within the LOOP.
In HBASE-18295, the test used a MyList class to achieve this, and I wanted to use a similar approach, but I couldn’t find a way to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9
At first, I couldn’t think of a way, but after giving it some more thought, I realized that we could remove the hook by taking a different approach.
I created a MyKeyValueHeap class that inherits from KeyValueHeap, and modified it so that a flush is triggered at a specific point in time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will verify it soon.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@mwkang mwkang force-pushed the feature/HBASE-29254 branch from 6f949db to d3869d7 Compare April 14, 2025 10:36
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 55s master passed
+1 💚 compile 3m 33s master passed
+1 💚 checkstyle 0m 46s master passed
+1 💚 spotbugs 2m 3s master passed
+1 💚 spotless 1m 1s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 3m 47s the patch passed
+1 💚 javac 3m 47s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 43s the patch passed
+1 💚 spotbugs 2m 5s the patch passed
+1 💚 hadoopcheck 14m 48s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 47s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
50m 47s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6900/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6900
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 8e03f5bf182b 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d3869d7
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6900/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 6s master passed
+1 💚 compile 0m 57s master passed
+1 💚 javadoc 0m 29s master passed
+1 💚 shadedjars 6m 1s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 6s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 2s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 217m 5s hbase-server in the patch passed.
244m 36s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6900/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6900
Optional Tests javac javadoc unit compile shadedjars
uname Linux 74808ff5adff 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d3869d7
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6900/3/testReport/
Max. process+thread count 4940 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6900/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9 Apache9 merged commit 99bd5b5 into apache:master Apr 30, 2025
1 check passed
Apache9 pushed a commit that referenced this pull request Apr 30, 2025
…Changed behavior (#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
(cherry picked from commit 524825e)
Apache9 pushed a commit that referenced this pull request Apr 30, 2025
…Changed behavior (#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
Apache9 pushed a commit that referenced this pull request Apr 30, 2025
…Changed behavior (#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
Apache9 added a commit that referenced this pull request May 4, 2025
…Changed behavior (#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
(cherry picked from commit 524825e)
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
…Changed behavior (apache#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
sanjeet006py pushed a commit to sanjeet006py/hbase that referenced this pull request Sep 26, 2025
…Changed behavior (apache#6900)

Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit 99bd5b5)
(cherry picked from commit 524825e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants