Skip to content

Conversation

@haiyang1987
Copy link
Contributor

Description of PR

https://issues.apache.org/jira/browse/HDFS-17564

If DataNode is marked as busy and contains many EC blocks, when running decommission DataNode, when execute ErasureCodingWork#addTaskToDatanode, here will no replication work will be generated for ecBlocksToBeReplicated, but related metrics (such as DatanodeDescriptor#currApproxBlocksScheduled, pendingReconstruction and needReconstruction) will still updated.

Specific code:
BlockManager#scheduleReconstruction -> BlockManager#chooseSourceDatanodes [2628~2650]
If DataNode is marked as busy and contains many EC blocks here will not add to srcNodes.

@VisibleForTesting
DatanodeDescriptor[] chooseSourceDatanodes(BlockInfo block,
    List<DatanodeDescriptor> containingNodes,
    List<DatanodeStorageInfo> nodesContainingLiveReplicas,
    NumberReplicas numReplicas, List<Byte> liveBlockIndices,
    List<Byte> liveBusyBlockIndices, List<Byte> excludeReconstructed, int priority) {
  containingNodes.clear();
  nodesContainingLiveReplicas.clear();
  List<DatanodeDescriptor> srcNodes = new ArrayList<>();
 ...

  for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) {
    final DatanodeDescriptor node = getDatanodeDescriptorFromStorage(storage);
    final StoredReplicaState state = checkReplicaOnStorage(numReplicas, block,
        storage, corruptReplicas.getNodes(block), false);
    ...
    // for EC here need to make sure the numReplicas replicates state correct
    // because in the scheduleReconstruction it need the numReplicas to check
    // whether need to reconstruct the ec internal block
    byte blockIndex = -1;
    if (isStriped) {
      blockIndex = ((BlockInfoStriped) block)
          .getStorageBlockIndex(storage);
      countLiveAndDecommissioningReplicas(numReplicas, state,
          liveBitSet, decommissioningBitSet, blockIndex);
    }

    if (priority != LowRedundancyBlocks.QUEUE_HIGHEST_PRIORITY
&& (!node.isDecommissionInProgress() && !node.isEnteringMaintenance())
        && node.getNumberOfBlocksToBeReplicated() +
        node.getNumberOfBlocksToBeErasureCoded() >= maxReplicationStreams) {
      if (isStriped && (state == StoredReplicaState.LIVE
|| state == StoredReplicaState.DECOMMISSIONING)) {
        liveBusyBlockIndices.add(blockIndex);
        //HDFS-16566 ExcludeReconstructed won't be reconstructed.
        excludeReconstructed.add(blockIndex);
      }
      continue; // already reached replication limit
    }

    if (node.getNumberOfBlocksToBeReplicated() +
        node.getNumberOfBlocksToBeErasureCoded() >= replicationStreamsHardLimit) {
      if (isStriped && (state == StoredReplicaState.LIVE
|| state == StoredReplicaState.DECOMMISSIONING)) {
        liveBusyBlockIndices.add(blockIndex);
        //HDFS-16566 ExcludeReconstructed won't be reconstructed.
        excludeReconstructed.add(blockIndex);
      }
      continue;
    }

    if(isStriped || srcNodes.isEmpty()) {
      srcNodes.add(node);
      if (isStriped) {
        liveBlockIndices.add(blockIndex);
      }
      continue;
    }
   ...

ErasureCodingWork#addTaskToDatanode[149~157]

@Override
void addTaskToDatanode(NumberReplicas numberReplicas) {
  final DatanodeStorageInfo[] targets = getTargets();
  assert targets.length > 0;
  BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock();

  ...
  } else if ((numberReplicas.decommissioning() > 0 ||
      numberReplicas.liveEnteringMaintenanceReplicas() > 0) &&
      hasAllInternalBlocks()) {
    List<Integer> leavingServiceSources = findLeavingServiceSources();
    // decommissioningSources.size() should be >= targets.length
    // if the leavingServiceSources size is 0,  here will not to createReplicationWork
    final int num = Math.min(leavingServiceSources.size(), targets.length);
    for (int i = 0; i < num; i++) {
      createReplicationWork(leavingServiceSources.get(i), targets[i]);
    }
  ...
}

// Since there is no decommission busy datanode in srcNodes, here return the set size of srcIndices as 0.
private List<Integer> findLeavingServiceSources() {
    // Mark the block in normal node.
    BlockInfoStriped block = (BlockInfoStriped)getBlock();
    BitSet bitSet = new BitSet(block.getRealTotalBlockNum());
    for (int i = 0; i < getSrcNodes().length; i++) {
      if (getSrcNodes()[i].isInService()) {
        bitSet.set(liveBlockIndices[i]);
      }
    }
    // If the block is on the node which is decommissioning or
    // entering_maintenance, and it doesn't exist on other normal nodes,
    // we just add the node into source list.
    List<Integer> srcIndices = new ArrayList<>();
    for (int i = 0; i < getSrcNodes().length; i++) {
      if ((getSrcNodes()[i].isDecommissionInProgress() ||
          (getSrcNodes()[i].isEnteringMaintenance() &&
          getSrcNodes()[i].isAlive())) &&
          !bitSet.get(liveBlockIndices[i])) {
        srcIndices.add(i);
      }
    }
    return srcIndices;
  }

so we need to fix this logic to avoid inaccurate metrics.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 1m 27s Docker failed to build run-specific yetus/hadoop:tp-7840}.
Subsystem Report/Notes
GITHUB PR #6911
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 0s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 compile 0m 40s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 checkstyle 0m 41s trunk passed
+1 💚 mvnsite 0m 48s trunk passed
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 1m 8s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 1m 56s trunk passed
+1 💚 shadedclient 22m 14s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 38s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 javac 0m 35s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 29s the patch passed
+1 💚 mvnsite 0m 38s the patch passed
+1 💚 javadoc 0m 31s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 1m 2s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 1m 42s the patch passed
+1 💚 shadedclient 22m 36s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 195m 41s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
295m 52s
Reason Tests
Failed junit tests hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
hadoop.hdfs.server.blockmanagement.TestDatanodeManager
hadoop.hdfs.server.blockmanagement.TestBlockManager
hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/2/artifact/out/Dockerfile
GITHUB PR #6911
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux cfa97df1b2c7 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 130d9ae
Default Java Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/2/testReport/
Max. process+thread count 4949 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@haiyang1987
Copy link
Contributor Author

The failed unit test is not related to this PR.

@haiyang1987
Copy link
Contributor Author

Hi @Hexiaoqiao @ZanderXu @zhangshuyan0 could you please help me review this pr when you have free time? Thanks~.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM almost. Leave one nit comment inline. FYI. Thanks.

assertEquals(decommisionNodes.size(), liveDecommissioning);

//4. wait for decommission block to replicate
Thread.sleep(3000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about to use GenericTestUtils.waitFor rather than Thread.sleep?

*/
@Test(timeout = 120000)
public void testBusyAfterDecommissionNode() throws Exception {
byte busyDNIndex = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any consideration when define byte type for index here? Not blocker just out of interest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Hexiaoqiao for you comment.
there is no special meaning to define byte type, maybe we can change it to int type.

@haiyang1987
Copy link
Contributor Author

Update PR, Hi @Hexiaoqiao help review it again, thanks!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 14m 11s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 45m 5s trunk passed
+1 💚 compile 1m 21s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 compile 1m 18s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 checkstyle 1m 11s trunk passed
+1 💚 mvnsite 1m 25s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 1m 44s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 3m 20s trunk passed
+1 💚 shadedclient 35m 43s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 5s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 javac 1m 5s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 58s the patch passed
+1 💚 mvnsite 1m 12s the patch passed
+1 💚 javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 1m 37s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 3m 10s the patch passed
+1 💚 shadedclient 35m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 227m 36s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
380m 17s
Reason Tests
Failed junit tests hadoop.hdfs.server.blockmanagement.TestDatanodeManager
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/3/artifact/out/Dockerfile
GITHUB PR #6911
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ea2255be361b 5.15.0-113-generic #123-Ubuntu SMP Mon Jun 10 08:16:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a4aaa06
Default Java Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/3/testReport/
Max. process+thread count 4184 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6911/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 from my side.

@Hexiaoqiao Hexiaoqiao changed the title HDFS-17564. Erasure Coding: Fix the issue of inaccurate metrics when decommission mark busy DN HDFS-17564. EC: Fix the issue of inaccurate metrics when decommission mark busy DN. Jul 4, 2024
@Hexiaoqiao
Copy link
Contributor

Will commit if no more other comments while wait one workday.

@Hexiaoqiao Hexiaoqiao merged commit ae76e94 into apache:trunk Jul 5, 2024
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @haiyang1987 .
Failed unit tests is not related to this PR.

@haiyang1987
Copy link
Contributor Author

Thanks @Hexiaoqiao for your review and merge it.

KeeProMise pushed a commit to KeeProMise/hadoop that referenced this pull request Sep 9, 2024
… mark busy DN. (apache#6911). Contributed by Haiyang Hu.

Signed-off-by: He Xiaoqiao <[email protected]>
Hexiaoqiao pushed a commit to Hexiaoqiao/hadoop that referenced this pull request Sep 12, 2024
… mark busy DN. (apache#6911). Contributed by Haiyang Hu.

Signed-off-by: He Xiaoqiao <[email protected]>
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Jan 6, 2026
… mark busy DN. (apache#6911). Contributed by Haiyang Hu.

Signed-off-by: He Xiaoqiao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants