HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack colocation #6622

rmdmattingly · 2025-01-21T18:56:23Z

This is yet another bite of #6593

In #3729 we removed consideration replicas' rack distribution. This was, in my opinion, a mistake — if we want to be flexible for environments that have too few racks, then we should just do so by skipping this check when the rack count is < the max replica count

…colocation

rmdmattingly · 2025-01-21T18:57:27Z

hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java

+    int numReplicas = region.getReplicaId() + 1;
+    if (numReplicas > maxReplicas) {
+      maxReplicas = numReplicas;
+    }


This seemed like a very cheap way to determine the max replica count without actually seeing the table descriptors, which is a limitation of the current balancer implementation.

Sometime I would like to fix this limitation and make the balancer table descriptor aware, but I think that is its own problem

I'm pretty sure that you have insufficient information here. I'm looking at the call hierarchy of registerRegion and I believe that there's no limit by table. So you're approximating the maximum number of region replicas used by any table in the cluster? I suspect that you need to introduce a new index by table name to the number of replicas per table.

Agreed, we could make this better with table specific replica count tracking

Okay but that additional tracking is not a correctness issue for how this value is used?

The BalancerClusterState is only aware of the regions for tables that it is actively trying to balance — so if you're balancing by table (not the default) then I believe table specific tracking will not make this any better. If you're balancing by cluster (the default) then I believe table specific tracking could maybe make things better by checking for rack replica colocation by table... actually now that I think about it the cost functions probably don't support this. So, tldr, if you're doing cluster-wide balancing then this will cause us to ignore rack colocation as a trigger once any single table's replica counts are greater than the number of racks, and the way to get more specific decisions would be to enable byTable balancing

rmdmattingly · 2025-01-21T18:58:28Z

...st/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerRegionReplica.java

-    assertFalse(loadBalancer.needsBalance(HConstants.ENSEMBLE_TABLE_NAME,
-      new BalancerClusterState(map, null, null, new ForTestRackManagerOne())));


In that PR where we made the balancer ignorant to rack colocation, we also just naively switched this assertion to false. But the intention of this test was always for the assertion to be true.

I've also split this test up into two tests, because it was always testing two distinct cases (host and rack colocation).

Apache-HBase · 2025-01-21T19:27:07Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 28s		Docker mode activated.
-0 ⚠️	yetus	0m 2s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+1 💚	mvninstall	3m 7s		master passed
+1 💚	compile	0m 15s		master passed
+1 💚	javadoc	0m 13s		master passed
+1 💚	shadedjars	5m 47s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 54s		the patch passed
+1 💚	compile	0m 15s		the patch passed
+1 💚	javac	0m 15s		the patch passed
+1 💚	javadoc	0m 11s		the patch passed
+1 💚	shadedjars	5m 43s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	6m 28s		hbase-balancer in the patch passed.
		26m 23s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6622
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux dbcc0b183cd8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7d07c07`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/testReport/
Max. process+thread count	256 (vs. ulimit of 30000)
modules	C: hbase-balancer U: hbase-balancer
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-01-21T19:30:13Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 30s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+1 💚	mvninstall	3m 20s		master passed
+1 💚	compile	0m 24s		master passed
+1 💚	checkstyle	0m 10s		master passed
+1 💚	spotbugs	0m 24s		master passed
+1 💚	spotless	0m 45s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 1s		the patch passed
+1 💚	compile	0m 23s		the patch passed
+1 💚	javac	0m 23s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 8s		the patch passed
+1 💚	spotbugs	0m 29s		the patch passed
+1 💚	hadoopcheck	11m 45s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 43s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 9s		The patch does not generate ASF License warnings.
		29m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6622
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 6fbfecea5ce9 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7d07c07`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	81 (vs. ulimit of 30000)
modules	C: hbase-balancer U: hbase-balancer
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6622/1/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

ndimiduk

Nice one Ray. See my question about inferring maxReplicas and how we maybe aught-should track it. Otherwise, LGTM.

ndimiduk · 2025-01-27T15:34:59Z

hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java

+    int numReplicas = region.getReplicaId() + 1;
+    if (numReplicas > maxReplicas) {
+      maxReplicas = numReplicas;
+    }


I'm pretty sure that you have insufficient information here. I'm looking at the call hierarchy of registerRegion and I believe that there's no limit by table. So you're approximating the maximum number of region replicas used by any table in the cluster? I suspect that you need to introduce a new index by table name to the number of replicas per table.

...e-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java

ndimiduk

Okay, LGTM.

…colocation (#6622) Co-authored-by: Ray Mattingly <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]>

…colocation (#6622) (#6639) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…colocation (#6622) (#6640) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…colocation (apache#6622) (apache#6640) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…ated ignores rack colocation (apache#6622) (apache#6640) (will be in 2.6.3) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…colocation (#6622) (#6640) (#6644) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…ated ignores rack colocation (apache#6622) (apache#6640) (will be in 2.6.3) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

…colocation (apache#6622) (apache#6640) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

7d07c07

…colocation

rmdmattingly requested a review from ndimiduk January 21, 2025 18:56

rmdmattingly commented Jan 21, 2025

View reviewed changes

rmdmattingly requested a review from Apache9 January 26, 2025 14:26

ndimiduk approved these changes Jan 27, 2025

View reviewed changes

rmdmattingly merged commit b89c825 into apache:master Jan 27, 2025
1 check passed

rmdmattingly added a commit that referenced this pull request Jan 27, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

9657eca

…colocation (#6622) Co-authored-by: Ray Mattingly <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]>

rmdmattingly mentioned this pull request Jan 27, 2025

Backport "HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …" to branch-3 #6639

Merged

rmdmattingly added a commit that referenced this pull request Jan 27, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

c95945e

…colocation (#6622) Co-authored-by: Ray Mattingly <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]>

rmdmattingly mentioned this pull request Jan 27, 2025

Backport "HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …" to branch-2 #6640

Merged

rmdmattingly added a commit that referenced this pull request Jan 28, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

bd3a20f

…colocation (#6622) (#6639) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

rmdmattingly added a commit that referenced this pull request Jan 28, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

9a6ae5a

…colocation (#6622) (#6640) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

rmdmattingly added a commit that referenced this pull request Jan 28, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

db07989

…colocation (#6622) (#6640) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

rmdmattingly mentioned this pull request Jan 28, 2025

Backport "HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …" to branch-2.6 #6644

Merged

rmdmattingly added a commit that referenced this pull request Jan 31, 2025

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack …

9540d49

…colocation (#6622) (#6640) (#6644) Signed-off-by: Nick Dimiduk <[email protected]> Co-authored-by: Ray Mattingly <[email protected]>

		assertFalse(loadBalancer.needsBalance(HConstants.ENSEMBLE_TABLE_NAME,
		new BalancerClusterState(map, null, null, new ForTestRackManagerOne())));

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack colocation #6622

HBASE-29072 StochasticLoadBalancer#areReplicasColocated ignores rack colocation #6622

Uh oh!

Conversation

rmdmattingly commented Jan 21, 2025

Uh oh!

rmdmattingly Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

ndimiduk Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

rmdmattingly Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

ndimiduk Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

rmdmattingly Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

rmdmattingly Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Jan 21, 2025

Uh oh!

Apache-HBase commented Jan 21, 2025

Uh oh!

ndimiduk left a comment

Choose a reason for hiding this comment

Uh oh!

ndimiduk Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndimiduk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants