-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-1569 Spark on Yarn, authentication broken by pr299 #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,7 @@ import java.io.File | |
| import java.net.URI | ||
|
|
||
| import scala.collection.JavaConversions._ | ||
| import scala.collection.mutable.HashMap | ||
| import scala.collection.mutable.{HashMap, ListBuffer} | ||
|
|
||
| import org.apache.hadoop.fs.Path | ||
| import org.apache.hadoop.yarn.api._ | ||
|
|
@@ -44,9 +44,9 @@ trait ExecutorRunnableUtil extends Logging { | |
| hostname: String, | ||
| executorMemory: Int, | ||
| executorCores: Int, | ||
| localResources: HashMap[String, LocalResource]) = { | ||
| localResources: HashMap[String, LocalResource]): List[String] = { | ||
| // Extra options for the JVM | ||
| var JAVA_OPTS = "" | ||
| val JAVA_OPTS = ListBuffer[String]() | ||
| // Set the JVM memory | ||
| val executorMemoryString = executorMemory + "m" | ||
| JAVA_OPTS += "-Xms" + executorMemoryString + " -Xmx" + executorMemoryString + " " | ||
|
|
@@ -56,10 +56,16 @@ trait ExecutorRunnableUtil extends Logging { | |
| JAVA_OPTS += opts | ||
| } | ||
|
|
||
| JAVA_OPTS += " -Djava.io.tmpdir=" + | ||
| new Path(Environment.PWD.$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR) + " " | ||
| JAVA_OPTS += "-Djava.io.tmpdir=" + | ||
| new Path(Environment.PWD.$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR) | ||
| JAVA_OPTS += ClientBase.getLog4jConfiguration(localResources) | ||
|
|
||
| // This is needed for the authentication configs because the Executor has to | ||
| // know whether to use authentication before it registers with the Scheduler. | ||
| for ((k, v) <- sparkConf.getAll.filter{case (k, v) => k.startsWith("spark.auth")}) { | ||
| JAVA_OPTS += "-D" + k + "=" + "\\\"" + v + "\\\"" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can actually do a simple check here to filter the configs that contain "auth", right?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use interpolation for slightly better readability? JAVA_OPTS += s"-D${k}="${v}""
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that line doesn't work. I assume the " is being picked up as regular " scala> JAVA_OPTS += s"-D${k}="${v}""
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, weird. Scala bug? Anyway, no big deal.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or, if you really want to: s"""-D$k="$v"""" (That's a lot of quotes.)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah It seems that scala can't handle both interpolation and backslashes in the same string very well. I think at that point the original is more readable. |
||
| } | ||
|
|
||
| // Commenting it out for now - so that people can refer to the properties if required. Remove | ||
| // it once cpuset version is pushed out. | ||
| // The context is, default gc for server class machines end up using all cores to do gc - hence | ||
|
|
@@ -85,25 +91,25 @@ trait ExecutorRunnableUtil extends Logging { | |
| } | ||
| */ | ||
|
|
||
| val commands = List[String]( | ||
| Environment.JAVA_HOME.$() + "/bin/java" + | ||
| " -server " + | ||
| val commands = Seq(Environment.JAVA_HOME.$() + "/bin/java", | ||
| "-server", | ||
| // Kill if OOM is raised - leverage yarn's failure handling to cause rescheduling. | ||
| // Not killing the task leaves various aspects of the executor and (to some extent) the jvm in | ||
| // an inconsistent state. | ||
| // TODO: If the OOM is not recoverable by rescheduling it on different node, then do | ||
| // 'something' to fail job ... akin to blacklisting trackers in mapred ? | ||
| " -XX:OnOutOfMemoryError='kill %p' " + | ||
| JAVA_OPTS + | ||
| " org.apache.spark.executor.CoarseGrainedExecutorBackend " + | ||
| masterAddress + " " + | ||
| slaveId + " " + | ||
| hostname + " " + | ||
| executorCores + | ||
| " 1> " + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" + | ||
| " 2> " + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr") | ||
|
|
||
| commands | ||
| "-XX:OnOutOfMemoryError='kill %p'") ++ | ||
| JAVA_OPTS ++ | ||
| Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend", | ||
| masterAddress.toString, | ||
| slaveId.toString, | ||
| hostname.toString, | ||
| executorCores.toString, | ||
| "1>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout", | ||
| "2>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr") | ||
|
|
||
| // TODO: it would be nicer to just make sure there are no null commands here | ||
| commands.map(s => if (s == null) "null" else s).toList | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to exclude null commands altogether? Maybe something like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is copy and paste from ClientBase to make the command building be the same, I'd prefer to take any changes to it to a separate jira as I assume there was a reason they didn't do it initially and put a comment.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note, I can take the changes to make it a list out if you would like and just make that a separate jira, I just ran into the space issue so thought I would make it the same. Really we should perhaps make a utility function for it but we don't have any util classes across client/app master right now.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I think we should keep the list, since you don't have to worry about spaces here and there. For the null thing, I'm trying to understand what it does. If I have a command Seq("java", "null", "-cp", "some.jar", "SomeClass"), doesn't this get compiled to "java null -cp some.jar SomeClass"? It seems that the consequences are undefined if we leave arbitrary "null" strings in there. Also I didn't realize it was copy and pasted from ClientBase, which makes this all the stranger.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, its a bit odd, I just didn't have the time to investigate at this point and figured someone put the comment there perhaps thinking there might be an argument that was "null" was ok. One thing I thought of was if one of the args is null, like the hostname or masterAddress, if you just strip it out, then the usage is wrong and it might not be obvious why its failing. Hopefully it would complain somewhere on startup that a null was passed and it would be easier to debug.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that --jar is required, but can be "null" (e.g. when using spark-shell or pyspark).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From YarnClientSchedulerBackend.scala::start(): Maybe that explains the need? |
||
| } | ||
|
|
||
| private def setupDistributedCache( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
spark.core.connection.auth.wait.timeout? Do we also need to ship this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I personally prefer
(not a huge deal if we don't change this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.core.connection.auth.wait.timeout isn't needed as its for the connectionManager and not created until after the initial registration and transfer of sparkConf. Any of the akka settings are needed though. Let me make another pass through the code to see if I see any others.