Skip to content

Conversation

@skysiders
Copy link

@skysiders skysiders commented Dec 24, 2022

HIVE-26887 In the QueryResultsCache function of class QueryResultsCache, there is the following code segment

  private QueryResultsCache(HiveConf configuration) throws IOException {
    ......
    FileSystem fs = cacheDirPath.getFileSystem(conf);
    FsPermission fsPermission = new FsPermission("700");
    fs.mkdirs(cacheDirPath, fsPermission);
    ......
}

It can be seen that the function will use the mkdirs to create cacheDirPath, and the parameters passed in include the path variable cacheDirPath and a permission 700. But we haven't confirmed whether the permission is correctly assigned to the file.

The above question is raised because there are two mkdir functions of hadoop,mkdirs(Path f, FsPermission permission) with mkdirs(FileSystem fs, Path dir, FsPermission permission)and the first one is used here. The permissions of this function will be affected by the underlying umask. Although 700 here will hardly be affected by umask, but I think from a rigorous point of view, we should have one more permission check and permission grant here. So I judge whether the file permission is correctly granted by detecting the influence of umask on the permission, and if not, give it the correct permission through setPermission.

    if (!permission.equals(permission.applyUMask(FsPermission.getUMask(conf)))) {
        fs.setPermission(dir, permission);
    }

And I find same issue in other three methods here.
In class Context

private Path getScratchDir(String scheme, String authority,
      boolean mkdir, String scratchDir) {
          ......
          FileSystem fs = dirPath.getFileSystem(conf);
          dirPath = new Path(fs.makeQualified(dirPath).toString());
          FsPermission fsPermission = new FsPermission(scratchDirPermission);

          if (!fs.mkdirs(dirPath, fsPermission)) {
            throw new RuntimeException("Cannot make directory: "
                + dirPath.toString());
          ......
  }

In class SessionState

  static void createPath(HiveConf conf, Path path, String permission, boolean isLocal,
      boolean isCleanUp) throws IOException {
    FsPermission fsPermission = new FsPermission(permission);
    FileSystem fs;
    ......
    if (!fs.mkdirs(path, fsPermission)) {
      throw new IOException("Failed to create directory " + path + " on fs " + fs.getUri());
    }
    ......
  }

and in class TezSessionState

private Path createTezDir(String sessionId, String suffix) throws IOException {
    ......
    Path tezDir = new Path(hdfsScratchDir, TEZ_DIR);
    FileSystem fs = tezDir.getFileSystem(conf);
    FsPermission fsPermission = new FsPermission(HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIRPERMISSION));
    fs.mkdirs(tezDir, fsPermission);
    ......
  }

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @skysiders . Thank you for sending a pull request.

I would not expect this patch to be necessary. As you said, the permissions will generally be 700 (governed by hive.scratch.dir.permission). The typical umask is 022 (governed by fs.permissions.umask-mode), though a few spots in the Hadoop code use something more restrictive. This umask will have no effect on these permissions. I can't think of any reasonable values of hive.scratch.dir.permission and fs.permissions.umask-mode for which we should force the permissions instead of respecting the umask. Have you seen any problems related to this in practice that needed this patch?

The proposed fix is potentially incomplete from a security perspective, because there is still a small window of time in which the directory has been created but the additional setPermission hasn't run yet. If the initial permissions are weak enough, then an authorized user could slip in and access the directory.

On a side note, a shorter way to code this is the single-line call to static method FileSystem.mkdirs(fs, dir, permission). However, the internal implementation is still 2 steps: 1) create directory, 2) set permissions, so it's subject to the same problems. FileSystem doesn't offer any way to circumvent umask atomically.

@skysiders
Copy link
Author

Hi @cnauroth ,thank for your review!
What I want to talk about here is a programming mode, which is implemented for special permissions. For example, we should check this permission after setting the special permission. You are right, the 700 permissions here will not cause defects with a high probability, but I think a good programming style should be maintained to ensure the correct granting of file permissions, so here I first judge the file permissions and check the file permissions Whether it is correctly granted, if not, then set the permissions for them. However, if the permissions here are changed to 770 or other permissions that may be affected by the umask in the future, there will be defects. Therefore, from a rigorous point of view, I think it is necessary to further check the permissions here.

I have mentioned similar flaws in STORM-3862 ,TEZ-4412,HBASE-26994

Finally, regarding the FileSystem.mkdirs(fs, path, perm) you mentioned, I agree with your point of view, but what I still want to say is that the idea of this kind of repair is not to set permissions on files, but out of file permissions Check, as shown in the code, if the permissions are correct, we don't need to set permissions on it again.

@skysiders
Copy link
Author

Hi @cnauroth ,I thought about this question again, and I think that compared with FileSystem.mkdirs(fs,path,perms) mentioned by you, my patch may be a little faster. The mkdirs(fs,path,perms) function does two things, first creating a file with default permissions and then assigning permissions to that file. What I do can be divided into three steps, first create the file, then determine if the file permissions are correct, if not set permissions. I think the reason for the speed here is that the permission judgment is faster than setting the permission directly, only if the file itself has the same permission and the permission to set, it can be about 4 times faster. According to what you said, this patch should modify all mkdirs(path,perms) into mkdirs(fs,path,perms), but as I just said, I think it will be a little slower than my current patch, but this kind of file creation is also rigorous.I‘m looking forward to your reply.

@skysiders skysiders requested review from cnauroth and removed request for abstractdog January 6, 2023 14:02
Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skysiders , thank you for sharing the thought process, and you have a good point about the potential performance difference.

I still think these changes are unnecessary in any reasonable deployment model. In practice, it will only make a difference for deployments with non-zero in the first digit (owner permissions) of fs.permissions.umask-mode. I can't think of a good reason an admin would restrict the owner's access to their own directory.

You mentioned that perhaps this code could evolve someday to use 770 instead of 700, in which case the default fs.permissions.umask-mode would restrict it to 750. I think that's unlikely, as these code paths by design involve directories isolated to the individual user, not intended for sharing to the group or other levels. Even if that did happen, I'd argue that the umask is serving its intended purpose: providing additional restrictions according to the admin's security policies.

If an application really wants to make a design choice to circumvent umask (and make sure it carefully covers all of the associated risks with that), then I think it should be approached as initializing a FileSystem with a Configuration override of fs.permissions.umask-mode=000, ignoring whatever is configured in core-site.xml. Then, the application can be confident that it's going to get exactly the permissions it requested. Even better, this will be guaranteed correct with a single RPC: just mkdirs instead of mkdirs + setPermission.

Thank you for sharing the links to pull requests in other projects too. I would have had the same feedback there, but I see those are already committed.

Thank you for the patch! The code looks like a correct implementation of what you described. (Have you looked into whether or not it's possible to add tests?) I just entered one more minor comment about formatting.

I am -0 for this, meaning I disagree but I won't oppose it. I'm also not a Hive committer, so my vote is non-binding. You should look for a review from a Hive committer if you want to keep pursuing this.

fs.mkdirs(cacheDirPath, fsPermission);

if (!fsPermission.equals(fsPermission.applyUMask(FsPermission.getUMask(conf)))) {
fs.setPermission(cacheDirPath, fsPermission);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is off on this line.

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skysiders , I had some more thoughts on this today. There is some established precedent for overriding permissions exactly like this elsewhere in the ecosystem, e.g. the code we are looking at in MAPREDUCE-7375 / apache/hadoop#4237. I forgot about that while looking at the Hive code in isolation.

I still have some disagreements with the concept as described in my last comment, but I'm converting to +1 for the sake of consistency across projects. Again, that's non-binding, so you'll need to seek out a committer for review too. (Reminder: I also pointed out an indentation problem to correct.)

Thank you for the patch, and thanks for bearing with me while I reason through it.

@skysiders
Copy link
Author

Hi @cnauroth , thanks for your review. I'm glad to see your reply. In the MAPREDUCE-7375 you mentioned, I saw your approval of this fix, as well as the extremely special situation that you mentioned earlier that the umask would not be set to 777, and I very much agree with these. The repairs I submitted in other projects are also based on the way of thinking I mentioned above, that is, after we create a file, should we check and judge the permissions of this file and grant permissions to it. I think this matter is necessary, because as an upper-level application, it is difficult to determine the umask setting of the underlying file system (for example, the umask under Linux is set by the system administrator root, and the umask of the distributed filesystem is set by the underlying hdfs The user decides when starting dfs, while hive or other systems are run by program administrators, and these two users may not be the same user) This is my most fundamental point of view, and it is also the reason why I insist on this repair.

I would like to discuss this issue with you further. Do you think that this repair method should be applied to those permissions of 770 instead of permissions like 700 (because 770 may be defaulted by the file system umask=022 affect)?

Finally, I am very happy to have such an in-depth discussion with you on the issue of file permissions. Thank you for your views on my point of view.

Hi @abstractdog @ayushtkn , could you please have a look at this?

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
No Duplication information No Duplication information

@skysiders
Copy link
Author

skysiders commented Jan 8, 2023

Hi @cnauroth @abstractdog @ayushtkn , I reviewed this part of the code again today, and I found that a previous patch here is HIVE-4487, we can clearly see that the developer deliberately set permissions here, and then this code was modified again in HIVE-8015 I used to be in HBASE-26994 mentioned the concept of API misuse, I think the second patch here should be the misuse of API.

@skysiders skysiders requested review from cnauroth and removed request for abstractdog January 8, 2023 09:00
@skysiders
Copy link
Author

Hi @zabetak , could you please have a look at this?

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding)

Thank you for responding to the code review feedback, @skysiders .

@zabetak
Copy link
Member

zabetak commented Jan 18, 2023

Thanks for the elaborate analysis and discussion @skysiders @cnauroth !

Looking into the changes it seems that this is kind of a breaking change since depending on the configuration permissions will be set differently.

Moreover the proposed changes make the code more verbose and less straightforward.

Furthermore, I am not sure we want to enforce a programming pattern where we do fs.mkdirs and then fs.setPermission since like that we essentially by-pass the umask that is the expected way of creating directories with the appropriate permissions (https://issues.apache.org/jira/browse/HDFS-1322?focusedCommentId=13072984&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13072984).

For the reasons above, I would prefer if we didn't merge these changes.

@skysiders
Copy link
Author

Hi @zabetak , thanks for your review.
You mentioned that this is "kind of a breaking change", but I don't agree with this. In setting file permissions, most of the time we can rely on the umask of the underlying file system, such as the most commonly used fs.create function, but in In my fix, all file permissions are explicitly set. I think this explicit permission setting is due to the developer thinking that the file should be set to this explicit permission. If the development of this pair of files does not require explicit permissions, the underlying umask can indeed be used to constrain it, but once the permissions are clear, the underlying umask may cause the file permissions to be too strict and make the files unusable. I would like to give an inappropriate example here. For example, the umask of the underlying file system is 777, and the file permissions are 000, so the upper-level files will not have any permissions. Therefore, for such files with clearly set permissions, I think it should be Make sure they are properly assigned permissions.

Regarding the second point "programming pattern" you mentioned, in fact, it is also possible to use Hadoop's underlying FileSystem.create(fs, path, perm) here. In fact, I now think that such a "programming pattern" should be adopted, because this It is safe and more reliable than fs.create(path,perm). This kind of repair is mainly aimed at API misuse. I have mentioned this problem in HBASE-26994, which means that the developer originally intended to Set special permissions here, but mistakenly think that fs.create(path, perm) can set special permissions perm for the path. In fact, this is wrong. In the chat with the hbase developer, I pointed out his mistake and Got his approval.

Finally, I want to say that this fix is necessary in my opinion. I have searched in hive and found these four API misuse problems, so I point out this problem here.

@zabetak
Copy link
Member

zabetak commented Feb 7, 2023

@skysiders Apologies for not following up on this but I am still pretty busy with various things.

Regarding the second point "programming pattern" you mentioned, in fact, it is also possible to use Hadoop's underlying FileSystem.create(fs, path, perm) here.

If this pattern is preferred then why not adopting it here? I saw your comments above about a potential perf benefit. If that is true then it would be better to propose this perf improvement in the "Hadoop HDFS" project. It would be better to have this if block (or whatever) in one place rather in many different places (5 times X 5 projects).

Other than that I don't really like the fact that FileSystem.create and fs.create have different behavior; this will always be a problem for developers when trying to pick which API to use. Again this is not something to handle here but maybe discuss with the HDFS folks.

@github-actions
Copy link

github-actions bot commented Apr 9, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Apr 9, 2023
@github-actions github-actions bot closed this Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants