Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions core/src/main/scala/org/apache/spark/SparkContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli

// This is a hack to enforce loading hdfs-site.xml.
// See SPARK-11227 for details.
FileSystem.get(new URI(path), hadoopConfiguration)
FileSystem.get(Utils.resolveURI(path), hadoopConfiguration)

// A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
Expand Down Expand Up @@ -1081,7 +1081,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli

// This is a hack to enforce loading hdfs-site.xml.
// See SPARK-11227 for details.
FileSystem.get(new URI(path), hadoopConfiguration)
FileSystem.get(Utils.resolveURI(path), hadoopConfiguration)

// The call to NewHadoopJob automatically adds security credentials to conf,
// so we don't need to explicitly add them ourselves
Expand Down
15 changes: 14 additions & 1 deletion core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1900,7 +1900,20 @@ private[spark] object Utils extends Logging {
*/
def resolveURI(path: String): URI = {
try {
val uri = new URI(path)
val osSafePath = if (Path.isWindowsAbsolutePath(path, false)) {
// Make sure C:/ part becomes /C/.
val windowsUri = new URI(path)
val driveLetter = windowsUri.getScheme
s"/$driveLetter/${windowsUri.getSchemeSpecificPart()}"
} else if (Path.isWindowsAbsolutePath(path, true)) {
// Make sure /C:/ part becomes /C/.
val windowsUri = new URI(path.substring(1))
val driveLetter = windowsUri.getScheme
s"/$driveLetter/${windowsUri.getSchemeSpecificPart()}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this logic is taken after Hadoop codes. Please let me know if there are equivalent functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? I think resolveURI already treat Windows-style path.
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala#L398

Or, did you find another case which this method cannot treat properly?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. yeap, Firstly, I didn't add this logic and run the test with this diff (HyukjinKwon/spark@master...HyukjinKwon:a136206dad6011d6ad112f33417790ad3c6a9912)

it produced the output here

https://gist.github.com/HyukjinKwon/f3f9a36dde88028ca09fd417b6ce5c68

(several test failures were removed here anyway).

So, I corrected this (without knowing we are testing that case). with the diff here (HyukjinKwon/spark@master...HyukjinKwon:b648e4f2d5aae072748a97550cfcf832c57d9315)

it produced the output here, https://gist.github.com/HyukjinKwon/0c42b2c208e06c59525d91087252d9b0

(all test failures except for two cases were removed here).

This seems it reduces roughly 10ish test failures. So I thought this is a legitimate change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will double check and will look into this deeper. I didn't notice we have that test anyway. Thanks for pointing this out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to of course just use the Hadoop Path class and do something like new Path(path).toURI -- I think they handle C:/ correctly. I don't know if this affects other functionality though (like SPARK-11227) and we should check with @sarutak

Copy link
Member Author

@HyukjinKwon HyukjinKwon Sep 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, meanwhile, I will try to use that and run tests. Thanks for your quick feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way @shivaram mentioned works well and doesn't affect SPARK-11227.
@HyukjinKwon You can fix this problem with the way but if you will fix resolveURI, adding new test cases to UtilsSuite is desireble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me please just use new Path(...).toUri directly to deal with this if it seems okay.

I just tried to fix resolveURI to use new Path(path).toUri instead of new URI(path) but I found it breaks existing tests for resolveURI. It seems parsing special characters differently, for example , #

"hdfs:/root/spark.jar[%23]app.jar" did not equal "hdfs:/root/spark.jar[#]app.jar"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O.K, let's fix the other problem of resolveURI in another PR.

} else {
path
}
val uri = new URI(osSafePath)
if (uri.getScheme() != null) {
return uri
}
Expand Down