Skip to content

Conversation

@liumihust
Copy link

@liumihust liumihust commented Jul 25, 2017

Dear Hadoop Developers,
I'm from Alibaba, China. Recently, I meet a scenario where user want to migrate all the data in the old volumes to newly added volumes. Although HDFS now has a DiskBalancer tool, but it doesn't meet the requirement of us. So, we develop a new tool DiskMigration, which can migrate all the data in the current volumes to the new volumes and keep balance of data distribution at the same time.
After introduce the work I'm doing, now we get to the point of the bug of the newest version hadoop3.0:
BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, if there are no blocks any more, it will return null up to DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() will check whether it's a valid block.
When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't check the null pointer! In fact, we firstly need to check whether it's null or not, or exception will occur.
This bug is hard to find, because the DiskBalancer hardly copy all the data of one volume to others. Even if some times we may copy all the data of one volume to other volumes, when the bug occurs, the copy process has already done.
However, when we try to copy all the data of two or more volumes to other volumes in more than one step, the thread will be shut down, which is caused by the bug above.
The bug can fixed by two ways:
1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
2)Check the null pointer inside the implementation of FsDatasetSpi.isValidBlock()

BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, if there are no blocks any more,it will retrurn null to DiskBalancer.getBlockToCopy().However, the DiskBalancer.getBlockToCopy() will check whether it's a valid block.
when I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't check the null pointer!
in fact,we firstly need to check whether it's null or not, or exception will occur.
this bug is hard to find, because the DiskBalancer hardly copy all the data of one volume to others.even if some times we may copy all the data of one volume to other volumes, when the bug occurs, the copy process has done.
However, when we try to copy all the data of two or more volumes to other volumes in more than one step, the thread will be shut down,which is caused by the bug above.

the bug can fixed by two ways:
1)before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
2)check the null pointer inside the implementation of FsDatasetSpi.isValidBlock()
@liumihust liumihust changed the title Update DiskBalancer.java [ Indiscoverable bug in HDFS] FsDatasetSpi.isValidBlock() lacks null pointer check inside and none the callers Jul 25, 2017
@liumihust liumihust changed the title [ Indiscoverable bug in HDFS] FsDatasetSpi.isValidBlock() lacks null pointer check inside and none the callers [ Indiscoverable bug in HDFS] FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers Jul 25, 2017
shanthoosh added a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…ely.

Author: Shanthoosh Venkataraman <[email protected]>

Reviewers: Navina Ramesh <[email protected]>

Closes apache#253 from shanthoosh/SAMZA-1365
Comment on lines +895 to +899

//null pointer should not be passed to FsDatasetSpi.isValidBlock()
if(block == null){
return block;
}
Copy link
Member

@aajisaka aajisaka Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1820 to +1822
if(b == null){
throw new NullPointerException("Input Block is null!");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the call hierarchy of the method and b is always non-null.

@aajisaka
Copy link
Member

Closing this. If you disagree, please file a JIRA and feel free to reopen this.

@aajisaka aajisaka closed this Jul 31, 2020
steveloughran pushed a commit to steveloughran/hadoop that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants