Skip to content

Conversation

@apurtell
Copy link
Contributor

@apurtell apurtell commented May 5, 2021

In CatalogJanitor we schedule GCRegionProcedure to clean up both filesystem and in-memory state after a split, and GCMultipleMergedRegionsProcedure to do the same for merges. Both of these procedures clean up in-memory state, but CatalogJanitor also does this redundantly just after scheduling the procedures. The cleanup should be done in only one place. Presumably we are using the procedures to do it in a principled way. This is least a nit, but probably a source of future bugs. Remove the redundancy in CatalogJanitor and fix any follow on issues, like test failures.

@apurtell
Copy link
Contributor Author

apurtell commented May 5, 2021

This change is moved here from #3230.

We know from the test report there that TestCatalogJanitorInMemoryStates will fail. Let's let it fail here too and then proceed.

java.lang.AssertionError: Parent region should have been removed from RegionStates
at org.apache.hadoop.hbase.master.janitor.TestCatalogJanitorInMemoryStates.testInMemoryParentCleanup(TestCatalogJanitorInMemoryStates.java:123)

@apurtell
Copy link
Contributor Author

apurtell commented May 5, 2021

@Apache9 added some context on #3230

Some background

https://issues.apache.org/jira/browse/HBASE-24942

I've found a bug when refactoring CatalogJanitor because of we may clean the merge qualifiers in MergeTableRegionsProcedure. Theoretically we should only do this in GCRegion related procedures(as what you proposed here), but doing this in other places could speed up later process which could be blocked by merge qualifiers.

For me I'm +1 on removing the redundant removal in CatalogJanitor, but let's wait for @saintstack 's opinon too?

Thanks.

@apurtell
Copy link
Contributor Author

apurtell commented May 5, 2021

Although the unit test will fail, when I tried this change out on a cluster in a very write heavy ingestion test, the end result was good. The ingestion test completes successfully and without weird artifacts in the logging. All split regions are GCed by procedures. In memory state aligns with filesystem state.

/hbase/logs/hbase-apurtell-master-ip-172-31-58-47.us-west-2.compute.internal.log:130972:2021-05-05 15:47:39,190 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 230 regions (0 disabled, 0 split parents) from in-memory state
/hbase/logs/hbase-apurtell-master-ip-172-31-58-47.us-west-2.compute.internal.log:130973:2021-05-05 15:47:39,190 DEBUG [master/ip-172-31-58-47:8100.Chore.1] 
 master.HbckChore: Regions by state: OPEN=230
/hbase/logs/hbase-apurtell-master-ip-172-31-58-47.us-west-2.compute.internal.log:130974:2021-05-05 15:47:39,190 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 230 regions from 5 regionservers' reports and found 0 orphan regions
/hbase/logs/hbase-apurtell-master-ip-172-31-58-47.us-west-2.compute.internal.log:130986:2021-05-05 15:47:39,200 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 3 tables 230 regions from filesyetem and found 0 orphan regions

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 40s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 46s master passed
+1 💚 compile 1m 19s master passed
+1 💚 shadedjars 8m 40s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 45s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 29s the patch passed
+1 💚 compile 1m 12s the patch passed
+1 💚 javac 1m 12s the patch passed
+1 💚 shadedjars 8m 17s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
-1 ❌ unit 10m 11s hbase-server in the patch failed.
42m 23s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3234
Optional Tests javac javadoc unit shadedjars compile
uname Linux 061f988fd6d0 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / eb9b543
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/testReport/
Max. process+thread count 762 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 57s master passed
+1 💚 compile 3m 8s master passed
+1 💚 checkstyle 1m 4s master passed
+1 💚 spotbugs 2m 4s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 41s the patch passed
+1 💚 compile 3m 11s the patch passed
+1 💚 javac 3m 11s the patch passed
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 18m 1s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 12s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
47m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3234
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 29f65b080f67 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / eb9b543
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to one-place only (in the procedure).

I've lost context on why the CJ redundant removes. I defer to the test you ran Andrew at scale.

(I just noticed the wonky split message -- have started a bit of ITBLL over here...)

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 10s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 19s master passed
+1 💚 compile 1m 3s master passed
+1 💚 shadedjars 8m 56s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 3s the patch passed
+1 💚 compile 1m 5s the patch passed
+1 💚 javac 1m 5s the patch passed
+1 💚 shadedjars 9m 15s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
-1 ❌ unit 259m 28s hbase-server in the patch failed.
292m 41s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3234
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6024d088b446 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / eb9b543
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/testReport/
Max. process+thread count 2827 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor Author

apurtell commented May 6, 2021

(I just noticed the wonky split message -- have started a bit of ITBLL over here...)

I've committed a few changes over the past couple of days that address two issues here. One was an accumulation of SPLIT state regionStates for as long as CatalogJanitor is deferring cleanup, because daughters still have references, because compaction is backed up. The accumulation is fine but it caused concerning warnings at each balancer iteration. Another was a race condition between split handling and regionserver report handling that could cause multiple split procedures to get scheduled for the same split request. Only one could succeed. The others would cause log noise but their failures were harmless. Would be good to get a second opinion.

@apurtell
Copy link
Contributor Author

apurtell commented May 6, 2021

Ok, let me rebase this and fix the unit tests.

apurtell added 2 commits May 5, 2021 17:13
…itor

In CatalogJanitor we schedule GCRegionProcedure to clean up both
filesystem and in-memory state after a split, and
GCMultipleMergedRegionsProcedure to do the same for merges. Both of these
procedures clean up in-memory state, but CatalogJanitor also does this
redundantly just after scheduling the procedures. The cleanup should be
done in only one place. Presumably we are using the procedures to do it in
a principled way. Remove the redundancy in CatalogJanitor and fix any
follow on issues, like test failures.
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 7s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 14s master passed
+1 💚 compile 3m 13s master passed
+1 💚 checkstyle 1m 4s master passed
+1 💚 spotbugs 2m 7s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 49s the patch passed
+1 💚 compile 3m 17s the patch passed
+1 💚 javac 3m 17s the patch passed
+1 💚 checkstyle 1m 3s the patch passed
-0 ⚠️ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 18m 39s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 18s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
49m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3234
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux ca8965295a6f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cc88cf0
Default Java AdoptOpenJDK-1.8.0_282-b08
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 32s master passed
+1 💚 compile 1m 12s master passed
+1 💚 shadedjars 8m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 15s the patch passed
+1 💚 compile 1m 12s the patch passed
+1 💚 javac 1m 12s the patch passed
+1 💚 shadedjars 8m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
-1 ❌ unit 140m 32s hbase-server in the patch failed.
172m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3234
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8ed77b4bd87a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cc88cf0
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/testReport/
Max. process+thread count 4533 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor Author

apurtell commented May 6, 2021

TestReplicationSource failure is not related.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.258 s - in org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource
[INFO] 
[INFO] Results:
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0

Actual failing test TestCatalogJanitorInMemoryStates was fixed with e7db51f .

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 42s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 32s master passed
+1 💚 compile 1m 31s master passed
+1 💚 shadedjars 10m 41s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 48s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 1s the patch passed
+1 💚 compile 1m 19s the patch passed
+1 💚 javac 1m 19s the patch passed
+1 💚 shadedjars 10m 39s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 50s the patch passed
_ Other Tests _
+1 💚 unit 223m 40s hbase-server in the patch passed.
264m 50s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3234
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4890c5b69764 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cc88cf0
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/testReport/
Max. process+thread count 2629 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3234/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are just removing what scheduled Procs are anyways going to do right, sounds good. +1

@apurtell apurtell merged commit 6309c09 into apache:master May 6, 2021
@apurtell apurtell deleted the HBASE-25854 branch May 6, 2021 16:13
asfgit pushed a commit that referenced this pull request May 6, 2021
…itor (#3234)

In CatalogJanitor we schedule GCRegionProcedure to clean up both
filesystem and in-memory state after a split, and
GCMultipleMergedRegionsProcedure to do the same for merges. Both of these
procedures clean up in-memory state, but CatalogJanitor also does this
redundantly just after scheduling the procedures. The cleanup should be
done in only one place. Presumably we are using the procedures to do it in
a principled way. Remove the redundancy in CatalogJanitor and fix any
follow on issues, like test failures.

Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Michael Stack <[email protected]>
Signed-off-by: Viraj Jasani <[email protected]>
asfgit pushed a commit that referenced this pull request May 6, 2021
…itor (#3234)

In CatalogJanitor we schedule GCRegionProcedure to clean up both
filesystem and in-memory state after a split, and
GCMultipleMergedRegionsProcedure to do the same for merges. Both of these
procedures clean up in-memory state, but CatalogJanitor also does this
redundantly just after scheduling the procedures. The cleanup should be
done in only one place. Presumably we are using the procedures to do it in
a principled way. Remove the redundancy in CatalogJanitor and fix any
follow on issues, like test failures.

Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Michael Stack <[email protected]>
Signed-off-by: Viraj Jasani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants