Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import java.io.{File, PrintStream}
import java.lang.{Iterable => JIterable}
import java.util.{Locale, Map => JMap}
import java.util.concurrent.TimeUnit._
import javax.security.auth.login.LoginException

import scala.collection.JavaConverters._
import scala.collection.mutable
Expand All @@ -41,6 +42,7 @@ import org.apache.hadoop.hive.ql.session.SessionState
import org.apache.hadoop.hive.serde.serdeConstants
import org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
import org.apache.hadoop.security.UserGroupInformation

import org.apache.spark.{SparkConf, SparkException}
import org.apache.spark.internal.Logging
Expand Down Expand Up @@ -220,7 +222,17 @@ private[hive] class HiveClientImpl(
hiveConf
}

private val userName = conf.getUser
private val userName = try {
val doAs = sys.env.get("HADOOP_USER_NAME").orNull
val ugi = if (doAs != null && doAs.length() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong or, at the very least, backwards: the current UGI should be preferred.

What's the goal of this code? Why can't you just get the current UGI's name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, at least that is not a regression, as conf.getUser calls Hive side and Hive does it in Utils.getUGI, at least from Hive 1.2.1.spark2.
So the code is copied and pasted to modify calling ugi.getShortUserName() instead of ugi.getUserName() - @hddong left a comment before to explain why the code had to be copied and pasted - #23952 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and there's a huge comment in the Hive method (at least in the branch I'm looking at) that explains why that is done. And if you read that comment you'll see that it does not apply here.

In a way, the HMS API is broken in that it lets the caller set the owner (instead of getting it from the auth info).

But we really should at least try to get the correct information through, and that comes from the UGI, not from env variables.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually I just read decompiled code of Utils.getUGI() (missed to read comment in source) and didn't indicate the intention - let other application be able to pass it. Yes I agree it's not needed in Spark side, and it would be weird HADOOP_USER_NAME is only used here and undocumented. Thanks for explaining!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you two, I missed to read comment in source too. Yes, current UGI is ok here, SPARK-22846 Fix table owner is null when creating table through spark sql or thriftserver, but lead to an issue that have occurred(owner is principal in kerberized cluster).

UserGroupInformation.createProxyUser(doAs, UserGroupInformation.getLoginUser())
} else {
UserGroupInformation.getCurrentUser
}
ugi.getShortUserName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we prove this is the right fix? What if there is no UGI?

Copy link
Contributor Author

@hddong hddong Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan , It will return system user information if there is no ugi. Actually, the source code of conf.getUser is:

public String getUser() throws IOException {
        try {
             UserGroupInformation ugi = Utils.getUGI(); 
             return ugi.getUserName();
         } catch (LoginException var2) { 
             throw new IOException(var2); 
         } 
     }

this change just use ugi.getShortUserName instead of ugi.getUserName()

Copy link
Contributor

@HeartSaVioR HeartSaVioR Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Utils is accessible from here, so if we just want to leverage Hive code and call getShortUserName instead, we can just do it with one-liner.

private val userName = org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName

Here I intended to not catch LoginException as we don't do anything but throw it.

Copy link
Contributor Author

@hddong hddong Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hadoop.hive.shims.Utils.getUGI.getShortUserName

Yep, but shims.Utils is not compatible with hive0.*, and cannot pass all tests. This file is added after hive1.*, so, copy the source code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Didn't recognize it doesn't match with some versions. Makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f1eb8cf was for this, sorry I had to track down commits as well.

} catch {
case _: LoginException => throw new LoginException("Can not get login user.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? Just let the original exception propagate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the previous comment, I guess the origin code was IOException wrapped LoginException:
https://github.com/apache/spark/pull/23952/files#r268466632

so if we want to follow the origin code, use IOException, otherwise, just remove try/catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin @HeartSaVioR
In hive source code, it throw two exception: LoginException, IOException. I wonder if Exception is ok here.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.

Hive source code has to do something (like wrapping as you've seen) because both LoginException and IOException are checked exceptions - given you're coding in Scala you can forget about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess things will fail with logging exception in caller site, so I guess the recommendation is just removing try/catch altogether.

Hive source code has to do something (like wrapping as you've seen) because both LoginException and IOException are checked exceptions - given you're coding in Scala you can forget about that.

Yes, you are right, it will fail with logging exception. Changed it.

}

override def getConf(key: String, defaultValue: String): String = {
conf.get(key, defaultValue)
Expand Down