-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3415] [PySpark] removes SerializingAdapter code #2287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @wardviaene, Do you have an example program that reproduces this bug? We should probably add it as a regression test (see (For other reviewers: you can browse SerializingAdapter's code at http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/) It looks like this code is designed to handle the pickling of file() objects. The Dill developers have recently been discussing how to pickle file handles: uqfoundation/dill#57 It looks like |
|
Hi @JoshRosen I added a test script in this pull request. The sys.stderr in a class triggers the bug. |
python/pyspark/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is capitalized (SetUp) so it won't be called by unittest. Also, we end up inheriting the proper setup and teardown methods from PySparkTestCase, so you don't need these methods.
|
Can one of the admins verify this patch? |
|
Jenkins, this is ok to test. |
|
Hi @JoshRosen The bug would only be triggered in a Class, that's why I initially wrote it like that. My last commit removes the references to SerializingAdapter and adds a test script that calls the save function directly. The test fails before the patch, but succeeds after the patch. |
|
QA tests have started for PR 2287 at commit
|
|
QA tests have finished for PR 2287 at commit
|
|
QA tests have started for PR 2287 at commit
|
|
QA tests have finished for PR 2287 at commit
|
|
LGTM, thanks! |
python/pyspark/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final naming nit: so far, the classes named *TestCase are used for factoring out common setup / teardown code (such as unittest.TestCase, PySparkTestCase, etc.), while the classes with the actual tests have been called [ComponentName]Tests or Test[ComponentName]. Therefore, I'd prefer to call this CloudPickleTests.
Also, since this is just testing CloudPickle without using any PySpark features, it should extend unittest.TestCase instead of PySparkTestCase so that we don't run a setup / teardown method for a SparkContext that we never use.
|
This looks fine to me, too, although I have two minor naming nits. Sorry to be so nitpicky on the names and test code, but I'd like this code to be really clean so that it serves as an example for future CloudPickle tests. |
…ad using StringIO
|
We have SerializationTestCase, so it's better to put it there. On Sat, Sep 6, 2014 at 12:59 PM, Josh Rosen [email protected]
|
python/pyspark/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't guaranteed to be the same across Python versions, so this test will be brittle. It's probably fine to pickle it using dumps, load it with dumps, and verify that what you get back is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no load in cloudpickle, so I used the default one in python. Test dumps it, loads it, and compares it against original. Also moved the code under SerializationTestCase
|
QA tests have started for PR 2287 at commit
|
|
QA tests have finished for PR 2287 at commit
|
|
Woah, looks like this is a test failure due to some unrelated MiMa binary compatibility failures in GraphX... strange, since other PRs are building fine. |
|
Jenkins, retest this please. |
|
The Mima failures were an unintended consequence of bumping the version number without changing Mima configurations (see #2315). This looks good to me, so I'm going to merge it into master. Thanks! |
This code removes the SerializingAdapter code that was copied from PiCloud