-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17060. listStatus and getFileStatus behave inconsistent in the case of ViewFs implementation for isDirectory #2061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
66036d6
HADOOP-17060. listStatus and getFileStatus behave inconsistent in the…
umamaheswararao 32f0c7f
Checkstyle fixes
umamaheswararao 2bc0eb4
Revert "Checkstyle fixes"
umamaheswararao 6d4e1d1
Revert "HADOOP-17060. listStatus and getFileStatus behave inconsisten…
umamaheswararao 6960825
HADOOP-17060. listStatus and getFileStatus behave inconsistent in the…
umamaheswararao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here in the FileStatus we have resolved the mount and giving details of the destination. Any idea where the link.getTargetLink() in the FileStatus can be or is used?
If we are resolving the mount entry, This is actually the FileStatus of the destination, if the destination itself is a symlink, in that case instead of link.getTargetLink() shouldn't status.getSymlink() be done?
Since all the other details like permissions and stuff we are having of the destination, the intent seems to resolve it as the destination directory rather than projecting this as a symlink to the end user. else if we see in linux the symlink doesn't shows the destination permission or other details it shows it's own details while listing :
In RBF also for listing on mount points, we don't denote them as symlinks, instead just pass the destination FileStatus with name resolved...
Is it possible we keep the assertions as it is and change link.getTargetLink() and specify that as per the destination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn , Thanks for the review!
Here ViewFS mount points seems to be designed to show them as symlinks. This can be figured out by looking at ViewFsBaseTest.java#testListOnInternalDirsOfMountTable (which was way old test) and also I had discussion with Sanjay on that.
So, I don't think it's reasonable to change that assumption/behavior now.
Coming to permissions bit I just realized symlink showing its own permissions instead of target in macos also. ( but we just changed the behavior in HADOOP-17029 where we display target dir's permissions and group. cc: @abhishekdas99, as per linux/macos the older behavior seems to be ok, but ViewFS perspective it seems to be good to show target permissions instead of showing some bogus permissions. I think in ViewFS case the in-memory InternalViewFSDirFS was created by fs but we can not really cary who created what mount, we will just configure in xml that link and would not have any info who added that, all should be at admins permission control only ). I assume in linux mount link dir will be created with the permission to the current user who created right? Since we don't have that link creation logic dynamically at ViewFS, I don't think we can behave exactly same as Linux here?
My another original question was why we should not have symlink on dir?
is it because HDFS does not support that? If that because of hdfs, should we move that assert under HdfsNamedFileStatus? I don't know what else will break here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent behind that assertion is little different.
An entity can be either a directory, file or a symlink. In linux while listing, directory is denoted as d, link by l and file by -, So there are three categories (Directory/File/Link) not just two (Directory/File)
A symlink can point to both directory and file, but isn't itself a directory or file. It is link, and has its own identity.
In Hadoop, to distinguish between the three categories, the logic seems like :
isDir==true --> It is a Directory
isDir==false --> Can be a file or Symlink. So to conclude further whether a file or link
isDir==false and link==null --> it is file
isDir==false and link!=null --> it is a symlink
The assertion message "A directory can not be a symlink" also tries to depict this only. Since if isDir is true that means it is a directory so it can't be a symlink, it can be a symlink when it isn't neither directory nor file.
Now regarding changing it here in ViewFS, allowing isDir to be true and as well as having a link.
If someone is having an application code with
His code will break, Since this check won't return true post this change. Would this change incompatible?
Now regarding inconsistency between getFileStatus and listStatus, if we are considering a mount entry as link, so getFileStatus should follow the similar semantics as listStatus, A same entity can not be treated differently. If we want to treat it as link, we can think of makin getFileStatus adapt..
Just my thoughts, not much idea about the actual discussions what was the intent then..
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here nio Files APIs return true for isDirectory API. But here we cannot make that judgement with this information. I saw that 'l' part along with directory. However, native filesystems seems to be capturing the info about target filesystem and return isDir true based on that. It denotes as symlink along with permission bits.
My original thought was to change GetFileStatus see comment as you thought here:
But after verifying tests on local mac, I realized isDirectory is getting returned tru in that cases, but here we cannot make that decision. in MAC it was showing as folder icon if target is a directory and isDirectory as true.
Yes, they are incompatible changes( either the change done in listStatus/GetFileStatus)( we will mark them), and behavioral corrections in other side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well things are different at different places and TBH I don't have a strong opinion on which is the best way to do.
On a personal note changing
getFileStatus()seems to be little more safe to me, As those assertions and stuffs stays as it is and changes gets restricted toviewFSonly and no changes to links interpretations and stuffs. (my assumption, it should be safe, haven't digged in much)getFileStatus()is treating it as a link only(but with isDir true), it doesn't shows target file system permissions and times. That also need to be changed similarly to HADOOP-17029, to resolve permissions and stuff from target file system. FileStatus should be same through both API's? We can make things get in sync by doing that, and get away with inconsistencies b/w these two API's as of now..Changes in
getListing()apart from making the APIs in sync(That also we need to change in getFileStatus() as well since there 'true' is hardcoded), we seems to change symlink interpretation logics as well to get that in sync with other systems and I think that might break things for people relying on checks like this :if (isDir==false and link!=null)`, May be we can have a follow up JIRA as well, to change link interpretations with bigger audience.But in any case, I don't have oppositions to any of the approach, all up to you whichever way you want to go ahead. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ur previous comment:
This is from permission bits. If you look at our FSPermission class we have a stickybit to denote its a file or not. Unfortunately it's just a boolean, it does not have 3rd option to represent link.
(nio) Files.isDirectory returns true for a symlink if target is dir. Where as Hadoop FileStatus class does not allow if symlink represent as dir. Behaviors are confusing :-(
That means we don't have enough info captured in Permissions or FileStatus classes to represent symlink properly.
Coming to getFileStatus question:
In your above quoted code runs on internal directory, thats not a link. Here internal directory means, if you have link src as /a/b/c, here a, b are internal dirs (here it's always a dir, so we can safely hardcode to true) and c is a link dir.
GetFileStatus run in input path, so, when input path is link, it would have resolved and run getFileStatus on targetFileSystem#getFileStatus. Where listStatus gets the FileStatus of immediate childrens. So, listStatus of /a/b , since be is an internal dir, it will go to InetrnalViewFsDir and run listStatus. and one of its child c is a link. so previously treating that children FileStatus object as symlink with isDir is false, irrespective whether target is file/dir. One safe assumption we can keep is, in ViewFS target would always be a directory only. I don't think anyone would configure a file as target link. But we can not leave that case anyway for consistency. Hope this helps. I also don't have very clear opinions here. Symlinks seems to mess around here. :-( .
But I like the idea of representing as symlink as that one of fs features. but consistent behavior is what we need to find here. If we wanted to fix getFileStatus("/a/b/c"), we need to make isDir as false, but current it will return true as it runs on targetFileSystem and also give symlink with tagetFs path. Here no way user know whether the target is file or dir, unless he directly access symlinked path and see.
Let me catchup Sanjay, if he has some thoughts around.
But in any case, I don't have oppositions to any of the approach, all up to you whichever way you want to go ahead. :-)Thank you. Let's figure out what would be the correct thing to do. I have no plans to move ahead until we find some reasonable solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn
I have verified symlinks behavior in HDFS.
It turns out that the behavior of listStatus and GetFileStatus are different. They both returning different FileStatus. Same behavior in ViewFS also.
GetFileStatus(/test), just runs on resolved path directly. So, it will not be represented as symLink.
ListStatus(/) of gets /test as children FileStatus object. But that represents as symLink.
Probably we should just clarify the behavior in user guide and API docs about the behaviors in symlinks case? Otherwise fixing this needs to be done all other places and it will be incompatible change across.
One advantage I see with the existing behavior is that, with listStatus we can know dir is symlink. If one wants to know targetFs details, then issue GetFileStatus on that path will resolved to targetFS and gets the FileStatus at targetFS.
I will also check with Sanjay about his opinions on this.