-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22594][CORE] Handling spark-submit and master version mismatch #19802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6c48392 to
2d2e996
Compare
|
Can you please explain more, and how to reproduce this issue? Spark's RPC is not designed for version compatible. |
Sure,
cat ./python/pyspark/version.py
__version__='2.2.0'./bin/spark-class org.apache.spark.deploy.worker.Worker spark://`hostname -I | cut -d' ' -f1`:7077 -c 1 -m 1G &> /dev/null &
./bin/spark-class org.apache.spark.deploy.master.Master
cat ./python/pyspark/version.py
...
__version__ = "2.3.0.dev0"./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master spark://`hostname -I | cut -d' ' -f1`:7077 \
--executor-memory 512M \
$PWD/examples/target/scala-2.11/jars/spark-examples_2.11-2.3.0-SNAPSHOT.jar \
10result: in the Spark Master log there is the following exception: but on the
I hear you, on the other hand PR doesn't even try to make it compatible, all it does is to translate a cryptic error to more understandable one. I think, it may be quite common to run older spark-submit against the updated spark master or at least I've hit the issue couple of times. And I had to google the exception and then on stackoverflow or elsewhere figure out that it actually means that there was a version discrepancy. If this PR is merged the "client" (the invoker of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that this won't catch all or even most errors resulting from version incompatibility. Spark has never supported or contemplated mis-matching versions internally. I don't think we should try to handle this, because it's fundamentally piecemeal and hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I should have picked different name for the PR than Handling spark-submit and master version mismatch. It doesn't try to solve the issue in a complex way that two different version could be able to talk to each other, all it does is saying the user. Hey, you have probably different version than spark master. I agree, it's little bit hacky, on the other hand I see no other option than to catch the InvalidClassException, if the version is not part of the message. Perhaps some initial handshake in which the version is sent would be cleaner.
What about re-throwing the exception. This way it wouldn't change the semantics of the code, but the client would be informed. wdyt?
|
To explain better the intentions, it doesn't try to solve of somehow provide a compatibility layer between old and new versions of Spark, all it does is slightly improving the UX, because people are hitting the issue all the time (including me) couple of instances of this issue:
btw. it's not only issue with |
|
I think there are some values in having a better experience with mismatch client/server versions. We discussed it might be even more common when the client was a R package (or Python perhaps down the line) @shivaram I agree though focusing on one particular line is likely not sufficient. Perhaps we should design this out? Have this version check formally be part of the protocol handshake? |
2d2e996 to
1b0fd3a
Compare
|
I've also added tests and removed "WIP" from the label? Do you guys want the change or would you like to have something more elaborated, like an initial handshake? One note about the "handshake approach". If there is new initial message that would be sent from client to server in which the client sends its version and server decides if it's backward compatible or not with this particular client version, this approach itself wouldn't solve the issue when old client (that doesn't know anything about it) tries to talk with new server. This PR solves this scenario. Merry Christmas 🎄 |
|
@srowen is there anything I can improve in the PR? |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably OK if you also re-throw that exception for now. You're just giving more info then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are JVM asserts, not JUnit asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ah, long time no Java 🤦♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: put static members first and separated by whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space, not 4 space indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after casts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to assert about the text because it could change
868e650 to
ff0b7fb
Compare
…om different versions
Hmm, In the original code the exception wasn't re-thrown. Do you really think it is a good idea? The this is the original 'catch-them-all' code :) } catch (Exception e) {
logger.error("Error while invoking RpcHandler#receive() for one-way message.", e);
}I think, I addressed all your inline other comments, thanks for the review. |
|
I see, yeah that's a fine point. Don't rethrow then. |
|
@srowen was Jenkins ok with the change? I can't see the results. |
|
Test build #4086 has finished for PR 19802 at commit
|
|
@vanzin do you have thoughts on this one? |
|
I think this is the wrong place for the fix, if one is desired. The transport library doesn't know anything about serialization, that's all handled by the code using it. In this case, the 'RpcEnv' stuff in core. So I don't think it's correct to handle those errors in the transport library. |
|
I get this when I start the master and worker, without running spark-submit on both 2.2.0 and 2.2.1 |
|
Can one of the admins verify this patch? |
|
@Jiri-Kremser if you're not planning to address the feedback you should probably close this PR. |
What changes were proposed in this pull request?
If the
InvalidClassExceptionexception is thrown in theTransportRequestHandler.processOneWayMessage(as a consequence of the differentserialVersionUIDs) let's send the connected client a message that there is probably a version mismatch between his spark-submit and the remote spark master she is trying to contact.How was this patch tested?
TODO:
( I've added test in 1b0fd3a )