-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
2609c9c
05e54fe
bdcc9fa
c4e3468
e8c0571
c27d4af
f1eb8cf
0c7ddb1
6011598
76192b2
a8d6b1d
2f37513
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,7 @@ import java.io.{File, PrintStream} | |
| import java.lang.{Iterable => JIterable} | ||
| import java.util.{Locale, Map => JMap} | ||
| import java.util.concurrent.TimeUnit._ | ||
| import javax.security.auth.login.LoginException | ||
|
|
||
| import scala.collection.JavaConverters._ | ||
| import scala.collection.mutable | ||
|
|
@@ -41,6 +42,7 @@ import org.apache.hadoop.hive.ql.session.SessionState | |
| import org.apache.hadoop.hive.serde.serdeConstants | ||
| import org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe | ||
| import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe | ||
| import org.apache.hadoop.security.UserGroupInformation | ||
|
|
||
| import org.apache.spark.{SparkConf, SparkException} | ||
| import org.apache.spark.internal.Logging | ||
|
|
@@ -220,7 +222,17 @@ private[hive] class HiveClientImpl( | |
| hiveConf | ||
| } | ||
|
|
||
| private val userName = conf.getUser | ||
| private val userName = try { | ||
| val doAs = sys.env.get("HADOOP_USER_NAME").orNull | ||
| val ugi = if (doAs != null && doAs.length() > 0) { | ||
| UserGroupInformation.createProxyUser(doAs, UserGroupInformation.getLoginUser()) | ||
| } else { | ||
| UserGroupInformation.getCurrentUser | ||
| } | ||
| ugi.getShortUserName | ||
|
||
| } catch { | ||
| case _: LoginException => throw new LoginException("Can not get login user.") | ||
|
||
| } | ||
|
|
||
| override def getConf(key: String, defaultValue: String): String = { | ||
| conf.get(key, defaultValue) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong or, at the very least, backwards: the current UGI should be preferred.
What's the goal of this code? Why can't you just get the current UGI's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, at least that is not a regression, as conf.getUser calls Hive side and Hive does it in
Utils.getUGI, at least from Hive 1.2.1.spark2.So the code is copied and pasted to modify calling
ugi.getShortUserName()instead ofugi.getUserName()- @hddong left a comment before to explain why the code had to be copied and pasted - #23952 (comment)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and there's a huge comment in the Hive method (at least in the branch I'm looking at) that explains why that is done. And if you read that comment you'll see that it does not apply here.
In a way, the HMS API is broken in that it lets the caller set the owner (instead of getting it from the auth info).
But we really should at least try to get the correct information through, and that comes from the UGI, not from env variables.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah actually I just read decompiled code of
Utils.getUGI()(missed to read comment in source) and didn't indicate the intention - let other application be able to pass it. Yes I agree it's not needed in Spark side, and it would be weirdHADOOP_USER_NAMEis only used here and undocumented. Thanks for explaining!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you two, I missed to read comment in source too. Yes, current UGI is ok here, SPARK-22846 Fix table owner is null when creating table through spark sql or thriftserver, but lead to an issue that have occurred(owner is principal in kerberized cluster).