Skip to content

Conversation

@rmdmattingly
Copy link
Contributor

Jira

On top of actually making the balancer better, this PR reduces the runtime of the balancer test suite on my machine from about 30min to about 8min

cc @charlesconnell @hgromer @krconv @ksravista

Comment on lines +288 to +290
if (balancedCluster == null) {
return "null";
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A null list here likely means a test failure, but you're likely to find a more clear failure message somewhere other than here

assertFullyBalancedForReplicas);
}

protected void testWithCluster(Map<ServerName, List<RegionInfo>> serverMap,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any tests where it actually makes sense to run without iteration

Comment on lines 72 to 82
protected void increaseMaxRunTimeOrFail() {
long current = getCurrentMaxRunTimeMs();
assertTrue(current < MAX_MAX_RUN_TIME_MS);
setMaxRunTime(Math.max(MAX_MAX_RUN_TIME_MS, current * 2));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was just to make this test framework ambitious, but forgiving. If your cluster is too complicated to be evaluated meaningfully in the initial runtime, then we should bump it up before failing

Comment on lines +248 to +245
StochasticLoadTestBalancer() {
super(new DummyMetricsStochasticBalancer());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The balancer test suite actually couldn't be run in its entirety without this addition, because otherwise there are JMX conflicts. So this is just an unrelated bug fix that I forgot to separate in my planning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see the impact of this bug in our CI runs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly we don't. I don't understand exactly how the CI env differs from my local env to make that the case, but presumably it has better isolation between the tests so that there's no conflict in the JMX setup when running many tests at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a difference between running in IDE vs directly in maven cli?

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly rmdmattingly force-pushed the HBASE-29070 branch 3 times, most recently from 90b49ea to bfac6e2 Compare January 13, 2025 01:10
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@ndimiduk
Copy link
Member

On top of actually making the balancer better, this PR reduces the runtime of the balancer test suite on my machine from about 30min to about 8min

!!!

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits. This is good cleanup and the shorted runtime is a bonus!


public static final double COST_EPSILON = 0.0001;
public static double getCostEpsilon(double cost) {
return Math.ulp(cost);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

conf.setClass("hbase.util.ip.to.rack.determiner", MockMapping.class, DNSToSwitchMapping.class);
conf.setFloat("hbase.master.balancer.stochastic.localityCost", 0);
conf.setBoolean("hbase.master.balancer.stochastic.runMaxSteps", true);
conf.setLong(StochasticLoadBalancer.MAX_RUNNING_TIME_KEY, 250);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why override with such a small value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests really only need this much, so starting small makes the test suite much faster

Comment on lines +248 to +245
StochasticLoadTestBalancer() {
super(new DummyMetricsStochasticBalancer());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see the impact of this bug in our CI runs?

loadBalancer.loadConf(conf);
}

protected void testWithClusterWithIteration(int numNodes, int numRegions, int numRegionsPerServer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not your code, but this argument list is long enough and has enough overlapping argument types that I think a build object is warranted. It would be nice if we could introduce something like http://immutables.github.io/ but that's not this PR.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 36s master passed
+1 💚 compile 0m 21s master passed
+1 💚 javadoc 0m 18s master passed
+1 💚 shadedjars 6m 26s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 32s the patch passed
+1 💚 compile 0m 24s the patch passed
+1 💚 javac 0m 24s the patch passed
+1 💚 javadoc 0m 16s the patch passed
+1 💚 shadedjars 6m 37s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 7m 11s hbase-balancer in the patch passed.
30m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6597/6/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6597
Optional Tests javac javadoc unit compile shadedjars
uname Linux fbdf52d8ac35 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 807c1c4
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6597/6/testReport/
Max. process+thread count 251 (vs. ulimit of 30000)
modules C: hbase-balancer U: hbase-balancer
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6597/6/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 4m 1s master passed
+1 💚 compile 0m 33s master passed
+1 💚 checkstyle 0m 10s master passed
+1 💚 spotbugs 0m 28s master passed
+1 💚 spotless 0m 59s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 0m 27s the patch passed
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 10s the patch passed
+1 💚 spotbugs 0m 38s the patch passed
+1 💚 hadoopcheck 13m 22s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 51s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 8s The patch does not generate ASF License warnings.
34m 27s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6597/6/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6597
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux a2b9bcbb2ed0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 807c1c4
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-balancer U: hbase-balancer
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6597/6/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly rmdmattingly merged commit 58b742b into apache:master Jan 13, 2025
1 check passed
@rmdmattingly rmdmattingly deleted the HBASE-29070 branch January 13, 2025 14:46
rmdmattingly added a commit that referenced this pull request Jan 13, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 13, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 13, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 13, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 13, 2025
rmdmattingly added a commit that referenced this pull request Jan 14, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
rmdmattingly added a commit that referenced this pull request Jan 16, 2025
rmdmattingly added a commit that referenced this pull request Jan 16, 2025
rmdmattingly added a commit that referenced this pull request Jan 16, 2025
rmdmattingly added a commit that referenced this pull request Jan 17, 2025
rmdmattingly added a commit that referenced this pull request Jan 17, 2025
rmdmattingly added a commit that referenced this pull request Jan 20, 2025
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
…cise (apache#6597) (apache#6600) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Jan 28, 2025
…cise (apache#6597) (apache#6600) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Mar 5, 2025
…cise (apache#6597) (apache#6600) (will be in 2.6.3)

Signed-off-by: Nick Dimiduk <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
sanjeet006py pushed a commit to sanjeet006py/hbase that referenced this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants