Skip to content

Conversation

@virajjasani
Copy link
Contributor

@virajjasani virajjasani commented Apr 16, 2025

Jira: HBASE-29251

Comment on lines 159 to 162
if (updateFailForTest) {
// test for HBASE-29251
throw new IOException("Update failed");
}
Copy link
Contributor Author

@virajjasani virajjasani Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good way to test his, but since MasterRegion is final class, extending it is also not possible.

Using this, we can reproduce the exact issue with the test if we don't abort master with IOE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine to remove final from MasterRegion class so that we can extend it for testing purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that and then make this test more clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MasterRegion has a private constructor, that's why we mark it as final.

Since UpdateMasterRegion is just a interface, I think it is very easy to verify the changes?

Copy link
Contributor

@Apache9 Apache9 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And inside MasterRegion we just use HRegion, for HRegion, we have a way to inject specify implementation class. Please see HRegion.newHRegion

  public static HRegion newHRegion(Path tableDir, WAL wal, FileSystem fs, Configuration conf,
    RegionInfo regionInfo, final TableDescriptor htd, RegionServerServices rsServices) {
    try {
      @SuppressWarnings("unchecked")
      Class<? extends HRegion> regionClass =
        (Class<? extends HRegion>) conf.getClass(HConstants.REGION_IMPL, HRegion.class);

      Constructor<? extends HRegion> c =
        regionClass.getConstructor(Path.class, WAL.class, FileSystem.class, Configuration.class,
          RegionInfo.class, TableDescriptor.class, RegionServerServices.class);

      return c.newInstance(tableDir, wal, fs, conf, regionInfo, htd, rsServices);
    } catch (Throwable e) {
      // todo: what should I throw here?
      throw new IllegalStateException("Could not instantiate a region instance.", e);
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any custom implementation of UpdateMasterRegion also needs custom hooks in Procedure executor classes, which make it more complicated to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Apache9 @apurtell! Updated the test, now it's clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to define what you want to test, if you just want to make sure that if there is an exception you will call abort, it is very easy, and even do not need to bring up a cluster.

If you want to do something like a integration tests, you can extend HRegion, and there are bunch of ways to decide whether to throw an exception in batchMutate method. You can make the specific HRegion implementation an inner class of the testcase, and set a static field in the test class to control whether to throw an exception...

@virajjasani virajjasani requested a review from Apache9 April 16, 2025 05:52
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@virajjasani virajjasani requested a review from apurtell April 16, 2025 15:08
Copy link
Contributor

@apurtell apurtell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but do consider a cleaner test

Comment on lines 159 to 162
if (updateFailForTest) {
// test for HBASE-29251
throw new IOException("Update failed");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that and then make this test more clean.

server.abort("WAL sync timeout", e);
} catch (IOException e) {
LOG.error(HBaseMarkers.FATAL, "MasterRegion mutation is not successful. Aborting server.");
server.abort("MasterRegion mutation is not successful", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aborting is a start.
The rest of my question here is addressed by the discussion on the JIRA.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

} catch (WALSyncTimeoutIOException e) {
LOG.error(HBaseMarkers.FATAL, "WAL sync timeout. Aborting server.");
server.abort("WAL sync timeout", e);
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add some comments here, to summary the discussion on the jira, and also give a pointer to the jira, to let later developpers know why here we will abort for any IOExceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

public static void setUpBeforeClass() throws Exception {
TEST_UTIL.getConfiguration().setClass(HConstants.REGION_IMPL, TestRegion.class, HRegion.class);
StartTestingClusterOption.Builder builder = StartTestingClusterOption.builder();
builder.numMasters(4).numRegionServers(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need 4 masters?

Copy link
Contributor Author

@virajjasani virajjasani Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 would also work (2 are aborted) but i just kept one additional. I can keep it 3 if you are not fine with this.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@Reidddddd Reidddddd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

// RegionTooBusyException is the type of IOException for which we can retry
// for few times before aborting the active master. The master region might
// have genuine case for delayed flushes and/or some procedure bug causing
// heavy pressure on the memstore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RegionTooBusyException is caught, here can trigger flusherAndCompactor.onUpdate();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for tries == (maxRetriesForRegionUpdates - 1) condition? Otherwise, it will do it anyways for all retries as per the above loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean like this:

 } catch (RegionTooBusyException e) {
   flusherAndCompactor.onUpdate();
   if (tries == (maxRetriesForRegionUpdates - 1)) {
     ***
   }
 }

The time interval hasn't been reached, but the changesAfterLastFlush threshold has been met, could happen, so when this exception caught. you need to trigger flusherAndCompactor.onUpdate();

Or otherwise, you need to switch the execution order

flusherAndCompactor.onUpdate(); // first
action.update(region); // after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@Reidddddd
Copy link
Contributor

Will you add back-off feature in next ticket?

abortServer(e);
}
LOG.info("Master region {} is too busy... retry attempt: {}", region, tries);
Threads.sleep(ConnectionUtils.getPauseTime(regionUpdateRetryPauseTime, tries));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reidddddd exponential backoff is added here. I think I should add comment here, because single line is not readable enough, let me do that.

Copy link
Contributor

@Reidddddd Reidddddd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@virajjasani
Copy link
Contributor Author

Thanks everyone for the reviews! Awaiting final build results before merging the PR, the jenkins build is still stuck scheduling the build to a VM.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 40s master passed
+1 💚 compile 3m 22s master passed
+1 💚 checkstyle 0m 41s master passed
+1 💚 spotbugs 1m 47s master passed
+1 💚 spotless 0m 52s branch has no errors when running spotless:check.
-0 ⚠️ patch 0m 59s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 28s the patch passed
+1 💚 compile 3m 14s the patch passed
+1 💚 javac 3m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 39s the patch passed
+1 💚 spotbugs 1m 53s the patch passed
+1 💚 hadoopcheck 14m 33s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 56s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 22s The patch does not generate ASF License warnings.
44m 23s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6910/8/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6910
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 0ad8e13010a8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 47edd9a
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6910/8/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 19s master passed
+1 💚 compile 0m 56s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 5m 55s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ patch 6m 3s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 3s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 5m 52s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 211m 28s /patch-unit-hbase-server.txt hbase-server in the patch failed.
237m 13s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6910/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6910
Optional Tests javac javadoc unit compile shadedjars
uname Linux a54f91b8d3a0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 47edd9a
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6910/8/testReport/
Max. process+thread count 4947 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6910/8/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani merged commit f0e069e into apache:master Apr 23, 2025
1 check failed
virajjasani added a commit that referenced this pull request Apr 23, 2025
…sisted (#6910)

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: gvprathyusha6 <[email protected]>
virajjasani added a commit that referenced this pull request Apr 23, 2025
…sisted (#6916) (#6910)

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: gvprathyusha6 <[email protected]>
virajjasani added a commit that referenced this pull request Apr 23, 2025
…sisted (#6916) (#6910)

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: gvprathyusha6 <[email protected]>
virajjasani added a commit that referenced this pull request Apr 23, 2025
…sisted (#6916) (#6910)

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: gvprathyusha6 <[email protected]>
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
…sisted (apache#6916) (apache#6910)

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: gvprathyusha6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants