-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from … #9812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,8 +19,13 @@ package org.apache.spark.repl | |
|
|
||
| import java.io.File | ||
| import java.net.{URL, URLClassLoader} | ||
| import java.nio.charset.StandardCharsets | ||
| import java.util | ||
|
|
||
| import com.google.common.io.Files | ||
|
|
||
| import scala.concurrent.duration._ | ||
| import scala.io.Source | ||
| import scala.language.implicitConversions | ||
| import scala.language.postfixOps | ||
|
|
||
|
|
@@ -41,6 +46,7 @@ class ExecutorClassLoaderSuite | |
|
|
||
| val childClassNames = List("ReplFakeClass1", "ReplFakeClass2") | ||
| val parentClassNames = List("ReplFakeClass1", "ReplFakeClass2", "ReplFakeClass3") | ||
| val parentResourceNames = List("fake-resource.txt") | ||
| var tempDir1: File = _ | ||
| var tempDir2: File = _ | ||
| var url1: String = _ | ||
|
|
@@ -54,6 +60,8 @@ class ExecutorClassLoaderSuite | |
| url1 = "file://" + tempDir1 | ||
| urls2 = List(tempDir2.toURI.toURL).toArray | ||
| childClassNames.foreach(TestUtils.createCompiledClass(_, tempDir1, "1")) | ||
| parentResourceNames.foreach(x => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
| Files.write("resource".getBytes(StandardCharsets.UTF_8), new File(tempDir2, x))) | ||
| parentClassNames.foreach(TestUtils.createCompiledClass(_, tempDir2, "2")) | ||
| } | ||
|
|
||
|
|
@@ -99,6 +107,32 @@ class ExecutorClassLoaderSuite | |
| } | ||
| } | ||
|
|
||
| test("resource from parent") { | ||
| val parentLoader = new URLClassLoader(urls2, null) | ||
| val classLoader = new ExecutorClassLoader(new SparkConf(), url1, parentLoader, true) | ||
| val resourceName: String = "fake-resource.txt" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better for this to be a field in the class since you're also using it above in |
||
| Option(classLoader.getResource(resourceName)) match { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd simplify this a bit. |
||
| case Some(url) => { | ||
| val fileReader = Source.fromInputStream(url.openStream()).bufferedReader() | ||
| assert(fileReader.readLine().contains("resource"), "File doesn't contain 'resource'") | ||
| } | ||
| case None => fail(s"Resource $resourceName not found") | ||
| } | ||
| } | ||
|
|
||
| test("resources from parent") { | ||
| val parentLoader = new URLClassLoader(urls2, null) | ||
| val classLoader = new ExecutorClassLoader(new SparkConf(), url1, parentLoader, true) | ||
| val resourceName: String = "fake-resource.txt" | ||
| val resources: util.Enumeration[URL] = classLoader.getResources(resourceName) | ||
| if (!resources.hasMoreElements) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| fail(s"Resource $resourceName not found") | ||
| } else { | ||
| val fileReader = Source.fromInputStream(resources.nextElement().openStream()).bufferedReader() | ||
| assert(fileReader.readLine().contains("resource"), "File doesn't contain 'resource'") | ||
| } | ||
| } | ||
|
|
||
| test("failing to fetch classes from HTTP server should not leak resources (SPARK-6209)") { | ||
| // This is a regression test for SPARK-6209, a bug where each failed attempt to load a class | ||
| // from the driver's class server would leak a HTTP connection, causing the class server's | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holdenk maybe you have some thoughts on this?
Also summoning @vanzin
Doesn't this need to check
userClassPathFirst?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen
It doesn't need to check
userClassPathFirstsince this implementation implies that REPL never provides resources dynamically so there's no need to lookup resource from ExecutorClassLoader itself.Btw, could precondition be broken? I couldn't imagine REPL generating resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if the repl generated resources, but at the same time, what if someone tries to load the generated class file as a resource? It's an unusual but valid use case.
The code to support that is not complicated; it already exists in
MutableURLClassLoader.scala(classChildFirstURLClassLoaderfor, example). If you're up for it you could even create a helper trait or a utility method somewhere for that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin
I didn't see use case you mentioned but it could make sense.
In order to achieve, we have to implement findResource() and findResources() for ExecutorClassLoader since ExecutorClassLoader cannot rely on superclass (ClassLoader) to find resource.
It is easy to provide resource URL which points to origin scheme (http, https, ftp, hdfs), but since I'm new to classloader, so I'm wondering it is safe to return URL from findResource() and findResources() which doesn't point to local file.
If it is not safe to return non local file as URL, what's recommended way to do?
I can only think about downloading files to local temp directory per every call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be perfectly fine. That's how URLClassLoader works, after all. The only potential odd thing would be
getResourceAsStream, which returns anInputStream, and my guess is if the JDK'sURLclass doesn't support the protocol, would cause an error unless you overrode it here.So perhaps it's too much to worry about and we can just assume that no one will do that, and fix it if someone ever needs that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin
To clarify about "feature", do you want me to change implementation of findResource() and findResources() for pointing origin scheme, and forget about potential odd? Or forget about finding resources from REPL uri and leave as this PR is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the PR as is should be fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin
OK, Thanks for clarification! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all seams reasonable to me (sorry for the slow reply I'm in Tokyo right now).