Skip to content

Conversation

@rmdmattingly
Copy link
Contributor

This is yet another bite of #6593

In #3729 we removed consideration replicas' rack distribution. This was, in my opinion, a mistake — if we want to be flexible for environments that have too few racks, then we should just do so by skipping this check when the rack count is < the max replica count

@rmdmattingly rmdmattingly requested a review from ndimiduk January 21, 2025 18:56
Comment on lines +451 to +454
int numReplicas = region.getReplicaId() + 1;
if (numReplicas > maxReplicas) {
maxReplicas = numReplicas;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a very cheap way to determine the max replica count without actually seeing the table descriptors, which is a limitation of the current balancer implementation.

Sometime I would like to fix this limitation and make the balancer table descriptor aware, but I think that is its own problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that you have insufficient information here. I'm looking at the call hierarchy of registerRegion and I believe that there's no limit by table. So you're approximating the maximum number of region replicas used by any table in the cluster? I suspect that you need to introduce a new index by table name to the number of replicas per table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we could make this better with table specific replica count tracking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but that additional tracking is not a correctness issue for how this value is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BalancerClusterState is only aware of the regions for tables that it is actively trying to balance — so if you're balancing by table (not the default) then I believe table specific tracking will not make this any better. If you're balancing by cluster (the default) then I believe table specific tracking could maybe make things better by checking for rack replica colocation by table... actually now that I think about it the cost functions probably don't support this. So, tldr, if you're doing cluster-wide balancing then this will cause us to ignore rack colocation as a trigger once any single table's replica counts are greater than the number of racks, and the way to get more specific decisions would be to enable byTable balancing

Comment on lines -163 to -164
assertFalse(loadBalancer.needsBalance(HConstants.ENSEMBLE_TABLE_NAME,
new BalancerClusterState(map, null, null, new ForTestRackManagerOne())));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that PR where we made the balancer ignorant to rack colocation, we also just naively switched this assertion to false. But the intention of this test was always for the assertion to be true.

I've also split this test up into two tests, because it was always testing two distinct cases (host and rack colocation).

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 7s master passed
+1 💚 compile 0m 15s master passed
+1 💚 javadoc 0m 13s master passed
+1 💚 shadedjars 5m 47s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 54s the patch passed
+1 💚 compile 0m 15s the patch passed
+1 💚 javac 0m 15s the patch passed
+1 💚 javadoc 0m 11s the patch passed
+1 💚 shadedjars 5m 43s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 6m 28s hbase-balancer in the patch passed.
26m 23s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6622
Optional Tests javac javadoc unit compile shadedjars
uname Linux dbcc0b183cd8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7d07c07
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/testReport/
Max. process+thread count 256 (vs. ulimit of 30000)
modules C: hbase-balancer U: hbase-balancer
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 20s master passed
+1 💚 compile 0m 24s master passed
+1 💚 checkstyle 0m 10s master passed
+1 💚 spotbugs 0m 24s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 1s the patch passed
+1 💚 compile 0m 23s the patch passed
+1 💚 javac 0m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 spotbugs 0m 29s the patch passed
+1 💚 hadoopcheck 11m 45s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
29m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6622
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 6fbfecea5ce9 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7d07c07
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 81 (vs. ulimit of 30000)
modules C: hbase-balancer U: hbase-balancer
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly rmdmattingly requested a review from Apache9 January 26, 2025 14:26
Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Ray. See my question about inferring maxReplicas and how we maybe aught-should track it. Otherwise, LGTM.

Comment on lines +451 to +454
int numReplicas = region.getReplicaId() + 1;
if (numReplicas > maxReplicas) {
maxReplicas = numReplicas;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that you have insufficient information here. I'm looking at the call hierarchy of registerRegion and I believe that there's no limit by table. So you're approximating the maximum number of region replicas used by any table in the cluster? I suspect that you need to introduce a new index by table name to the number of replicas per table.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, LGTM.

@rmdmattingly rmdmattingly merged commit b89c825 into apache:master Jan 27, 2025
1 check passed
rmdmattingly added a commit that referenced this pull request Jan 27, 2025
…colocation (#6622)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 27, 2025
…colocation (#6622)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 28, 2025
…colocation (#6622) (#6639)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 28, 2025
…colocation (#6622) (#6640)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 28, 2025
…colocation (#6622) (#6640)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
…ated ignores rack colocation (apache#6622) (apache#6640) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
…ated ignores rack colocation (apache#6622) (apache#6640) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 31, 2025
…colocation (#6622) (#6640) (#6644)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Mar 5, 2025
…ated ignores rack colocation (apache#6622) (apache#6640) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants