Skip to content

Conversation

@krconv
Copy link

@krconv krconv commented Jan 15, 2026

The RegionReplicaSinkWriter.append() method checks table descriptors to determine if a table has region replication enabled (to decide whether to bypass the location cache). When a table is dropped concurrently, tableDescriptors.get(tableName) returns null, and the subsequent call to getRegionReplication() throws a NullPointerException.

This race condition can occur in the following scenario:

  1. WAL entries for a table are queued for replication to region replicas
  2. The table is dropped (via disable + drop or other means)
  3. Before the dropped table is added to the disabledAndDroppedTables cache (which happens when TableNotFoundException is caught during location lookup), the code attempts to read the table descriptor
  4. tableDescriptors.get() returns null for the now-deleted table
  5. NPE crashes the replication endpoint

Since RegionReplicaReplicationEndpoint handles replica updates for all tables on a RegionServer, a single dropped table crashes the entire endpoint. This stops replica updates for all regions (including those from unrelated tables) hosted by that RegionServer until it is restarted.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 39s branch-2 passed
+1 💚 compile 4m 55s branch-2 passed
+1 💚 checkstyle 0m 45s branch-2 passed
+1 💚 spotbugs 1m 50s branch-2 passed
+1 💚 spotless 0m 54s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 19s the patch passed
+1 💚 compile 4m 56s the patch passed
+1 💚 javac 4m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 44s the patch passed
+1 💚 spotbugs 1m 54s the patch passed
+1 💚 hadoopcheck 18m 48s Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.1.
+1 💚 spotless 0m 49s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
46m 17s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7629
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 0e171568eb51 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 9adf8a9
Default Java Eclipse Adoptium-11.0.23+9
Max. process+thread count 80 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@charlesconnell charlesconnell changed the title Fix for NPE in region replication HBASE-29831: Fix for NPE in region replication Jan 15, 2026
@charlesconnell charlesconnell self-requested a review January 15, 2026 12:56
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 36s branch-2 passed
+1 💚 compile 0m 52s branch-2 passed
+1 💚 javadoc 0m 28s branch-2 passed
+1 💚 shadedjars 6m 24s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 0m 52s the patch passed
+1 💚 javac 0m 52s the patch passed
+1 💚 javadoc 0m 26s the patch passed
+1 💚 shadedjars 6m 25s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 236m 43s hbase-server in the patch passed.
266m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #7629
Optional Tests javac javadoc unit compile shadedjars
uname Linux 3e00223695ee 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 9adf8a9
Default Java Eclipse Adoptium-11.0.23+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/testReport/
Max. process+thread count 3117 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 16s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 2m 21s branch-2 passed
+1 💚 compile 0m 38s branch-2 passed
+1 💚 javadoc 0m 24s branch-2 passed
+1 💚 shadedjars 4m 21s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 0s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 javadoc 0m 22s the patch passed
+1 💚 shadedjars 4m 18s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 268m 20s /patch-unit-hbase-server.txt hbase-server in the patch failed.
288m 17s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #7629
Optional Tests javac javadoc unit compile shadedjars
uname Linux 131dc59be6aa 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 9adf8a9
Default Java Temurin-1.8.0_412-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/testReport/
Max. process+thread count 3358 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 18s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 42s branch-2 passed
+1 💚 compile 1m 4s branch-2 passed
+1 💚 javadoc 0m 32s branch-2 passed
+1 💚 shadedjars 6m 53s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 21s the patch passed
+1 💚 compile 1m 4s the patch passed
+1 💚 javac 1m 4s the patch passed
+1 💚 javadoc 0m 29s the patch passed
+1 💚 shadedjars 6m 54s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 293m 27s /patch-unit-hbase-server.txt hbase-server in the patch failed.
322m 18s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7629
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5a8e7d339f94 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 9adf8a9
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/testReport/
Max. process+thread count 3497 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7629/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

if (useCache && locations.size() == 1) {
if (tableDescriptors.get(tableName).getRegionReplication() > 1 && retries <= 3) {
TableDescriptor td = tableDescriptors.get(tableName);
if (td != null && td.getRegionReplication() > 1 && retries <= 3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed through idea's smart suggestions that retries <= 3 seems is unnecessary.

Image

And I analyzed it and it's true

After removing it, there are 3 main cases, and none lead to an infinite loop:

  • case 1

First loop: useCache && locations.size() == 1 && RegionReplication > 1 is true.
Set useCache = false and continue.
Second loop: The logic will proceed and eventually return or break.

  • case2

First loop: useCache && locations.size() == 1 is true but RegionReplication > 1 is false.
Go to subsequent logic.
If !Bytes.equals(primaryLocation.getRegionInfo().getEncodedNameAsBytes(), encodedRegionName) is false: break (loop ends).
If !Bytes.equals(primaryLocation.getRegionInfo().getEncodedNameAsBytes(), encodedRegionName) is true and useCache is true: set useCache = false and continue. Second loop will then return or break.

  • case3

First loop: useCache && locations.size() == 1 is false.
Go to subsequent logic.
If useCache is alread false: return or break.
If useCache is true : similar to case 2, it will either break or retry once (setting useCache=false), then finish.

@guluo2016
Copy link
Member

Is it possible to add a unit test for this? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants