-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29831: Fix for NPE in region replication #7629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-2
Are you sure you want to change the base?
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
| if (useCache && locations.size() == 1) { | ||
| if (tableDescriptors.get(tableName).getRegionReplication() > 1 && retries <= 3) { | ||
| TableDescriptor td = tableDescriptors.get(tableName); | ||
| if (td != null && td.getRegionReplication() > 1 && retries <= 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed through idea's smart suggestions that retries <= 3 seems is unnecessary.
And I analyzed it and it's true
After removing it, there are 3 main cases, and none lead to an infinite loop:
- case 1
First loop: useCache && locations.size() == 1 && RegionReplication > 1 is true.
Set useCache = false and continue.
Second loop: The logic will proceed and eventually return or break.
- case2
First loop: useCache && locations.size() == 1 is true but RegionReplication > 1 is false.
Go to subsequent logic.
If !Bytes.equals(primaryLocation.getRegionInfo().getEncodedNameAsBytes(), encodedRegionName) is false: break (loop ends).
If !Bytes.equals(primaryLocation.getRegionInfo().getEncodedNameAsBytes(), encodedRegionName) is true and useCache is true: set useCache = false and continue. Second loop will then return or break.
- case3
First loop: useCache && locations.size() == 1 is false.
Go to subsequent logic.
If useCache is alread false: return or break.
If useCache is true : similar to case 2, it will either break or retry once (setting useCache=false), then finish.
|
Is it possible to add a unit test for this? Thanks |
The RegionReplicaSinkWriter.append() method checks table descriptors to determine if a table has region replication enabled (to decide whether to bypass the location cache). When a table is dropped concurrently, tableDescriptors.get(tableName) returns null, and the subsequent call to getRegionReplication() throws a NullPointerException.
This race condition can occur in the following scenario:
Since RegionReplicaReplicationEndpoint handles replica updates for all tables on a RegionServer, a single dropped table crashes the entire endpoint. This stops replica updates for all regions (including those from unrelated tables) hosted by that RegionServer until it is restarted.