Skip to content

Conversation

@eubnara
Copy link
Contributor

@eubnara eubnara commented Jan 30, 2026

Description of PR

DataNodeID.updateRegInfo() updates hostName but misses hostNameBytes.

Since PBHelperClient.convert(DatanodeID) uses getHostNameBytes() for protobuf
serialization, clients end up receiving the stale hostname from before the
re-registration.

This becomes a real problem when a DataNode first registers with a PQDN and
later re-registers with a FQDN. With dfs.client.use.datanode.hostname=true,
the client tries to connect using the old PQDN and fails with
UnknownHostException.

The fix is to add hostNameBytes = nodeReg.getHostNameBytes() in updateRegInfo(),
same as how setIpAndXferPort() already handles ipAddr/ipAddrBytes together

In my environment, I use configurations as follows:

dfs.client.use.datanode.hostname=true
hadoop.security.token.service.use_ip=false

I got an UnknownHostException while reproducing this issue.
(hostname, IP address, and username are anonymized for privacy)

  • datanode001 should be datanode001.example.com as FQDN, not PQDN.
  • I stopped the DataNode for approximately 10 minutes and 30 seconds (2 × dfs.namenode.heartbeat.recheck-interval + 10 × dfs.heartbeat.interval), so that it would be recognized as dead by the NameNode. After that, I restarted the DataNode, and also restarted both the active and standby NameNodes. After these steps, the issue was resolved.
$ HADOOP_ROOT_LOGGER=DEBUG,console yarn logs -applicationId application_1763013060073_51480 > application_1763013060073_51480.txt

26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to namenode001.example.com/10.1.1.2:9020 from [email protected] sending #138 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to namenode001.example.com/10.1.1.2:9020 from [email protected] got value #138
26/01/29 18:42:09 DEBUG ipc.ProtobufRpcEngine2: Call: getFileInfo took 1ms
26/01/29 18:42:09 DEBUG hdfs.DFSClient: Connecting to datanode datanode001:9011
26/01/29 18:42:09 DEBUG impl.BlockReaderFactory: Block read failed. Getting remote block reader using TCP
java.io.IOException: Unresolved host: datanode001:9011
        at org.apache.hadoop.hdfs.DFSUtilClient.isLocalAddress(DFSUtilClient.java:640)
        at org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:152)
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:472)
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
        at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
26/01/29 18:42:09 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.UnknownHostException: datanode001:9011
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
        at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3033)
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:829)
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:754)
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:381)
        at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)

How was this patch tested?

  • manual test on the private cluster
  • add an unit test

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

If an AI tool was used:

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 33s trunk passed
+1 💚 compile 1m 15s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 17s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 checkstyle 0m 54s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 5s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 spotbugs 3m 51s trunk passed
+1 💚 shadedclient 32m 5s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 51s the patch passed
+1 💚 compile 0m 49s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 49s the patch passed
+1 💚 compile 0m 47s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 47s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 51s the patch passed
+1 💚 javadoc 0m 37s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 37s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 spotbugs 3m 22s the patch passed
+1 💚 shadedclient 30m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 32s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
129m 42s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8217/1/artifact/out/Dockerfile
GITHUB PR #8217
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 04f0bad4e1ee 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 50bdae7
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8217/1/testReport/
Max. process+thread count 611 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8217/1/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants