Skip to content

Conversation

@virajjasani
Copy link
Contributor

Namenode heartbeat service should provide error with full stacktrace if it cannot register namenode in the state store. As of today, we only log info msg.

For zookeeper based impl, this might mean either a) curator manager is not initialized or b) if it fails to write to znode after exhausting retries. For either of these cases, reporting only INFO log might not be good enough and we might have to look for errors elsewhere.

Sample example:

2023-02-20 23:10:33,714 DEBUG [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Received service state: ACTIVE from HA namenode: {ns}-nn0:nn-0-{ns}.{cluster}:9000
2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] impl.MembershipStoreImpl - Inserting new NN registration: nn-0.namenode.{cluster}:8888->{ns}:nn0:nn-0-{ns}.{cluster}:9000-ACTIVE
2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Cannot register namenode in the State Store

If we could log full stacktrace:

2023-02-21 00:20:24,691 ERROR [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Cannot register namenode in the State Store
org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: State Store driver StateStoreZooKeeperImpl in nn-0.namenode.{cluster} is not ready.
        at org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.verifyDriverReady(StateStoreDriver.java:158)
        at org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.putAll(StateStoreZooKeeperImpl.java:235)
        at org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreBaseImpl.put(StateStoreBaseImpl.java:74)
        at org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl.namenodeHeartbeat(MembershipStoreImpl.java:179)
        at org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:381)
        at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:317)
        at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.lambda$periodicInvoke$0(NamenodeHeartbeatService.java:244)
...
... 

@virajjasani
Copy link
Contributor Author

@goiri @slfan1989 could you please review this PR?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 25s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 41s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 47s trunk passed
+1 💚 javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 57s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 35s trunk passed
+1 💚 shadedclient 20m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 33s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 53s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 20s the patch passed
+1 💚 shadedclient 20m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 24s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
115m 23s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/1/artifact/out/Dockerfile
GITHUB PR #5470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ebbd91d060b3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 67c4a12
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/1/testReport/
Max. process+thread count 2689 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Comment on lines 320 to 324
} catch (IOException e) {
LOG.info("Cannot register namenode in the State Store");
LOG.error("Cannot register namenode in the State Store", e);
} catch (Exception ex) {
LOG.error("Unhandled exception updating NN registration for {}",
getNamenodeDesc(), ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge the both blocks, now both are logging error, with the later just having additional getNamenodeDesc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so you mean having getNamenodeDesc in general for both is fine right? I think that's good idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 49s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 42s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 1s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 20m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 33s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 52s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 18s the patch passed
+1 💚 shadedclient 20m 32s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 30s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
115m 19s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/2/artifact/out/Dockerfile
GITHUB PR #5470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f4ca0638ca8c 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e08544b
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/2/testReport/
Max. process+thread count 2727 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5470/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani requested a review from ayushtkn March 12, 2023 18:47
Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jojochuang jojochuang merged commit 15935fa into apache:trunk Mar 15, 2023
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants