[SPARK-3415] [PySpark] removes SerializingAdapter code #2287

wardviaene · 2014-09-05T13:14:58Z

This code removes the SerializingAdapter code that was copied from PiCloud

JoshRosen · 2014-09-05T14:59:57Z

Do you have an example program that reproduces this bug? We should probably add it as a regression test (see python/pyspark/tests.py for examples of how to do this).

(For other reviewers: you can browse SerializingAdapter's code at http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/) It looks like this code is designed to handle the pickling of file() objects. The Dill developers have recently been discussing how to pickle file handles: uqfoundation/dill#57

It looks like SerializingAdapter.max_transmit_data acts as an upper-limit on the sizes of closures that PiCloud would send to their service. Unlike PiCloud, we don't have limits on closure sizes (there are warnings, but these are detected / enforced inside the JVM). Therefore, I wonder if we should just remove this limit and allow the whole file to be read rather than adding an obscure configuration option.

wardviaene · 2014-09-05T18:18:15Z

Hi @JoshRosen

I added a test script in this pull request. The sys.stderr in a class triggers the bug.

JoshRosen · 2014-09-05T18:19:44Z

python/pyspark/tests.py

This is capitalized (SetUp) so it won't be called by unittest. Also, we end up inheriting the proper setup and teardown methods from PySparkTestCase, so you don't need these methods.

SparkQA · 2014-09-05T23:40:59Z

Can one of the admins verify this patch?

JoshRosen · 2014-09-05T23:42:51Z

Jenkins, this is ok to test.

wardviaene · 2014-09-06T09:11:43Z

Hi @JoshRosen

The bug would only be triggered in a Class, that's why I initially wrote it like that. My last commit removes the references to SerializingAdapter and adds a test script that calls the save function directly. The test fails before the patch, but succeeds after the patch.

SparkQA · 2014-09-06T09:43:21Z

QA tests have started for PR 2287 at commit aaf10b7.

This patch merges cleanly.

SparkQA · 2014-09-06T09:44:21Z

QA tests have finished for PR 2287 at commit aaf10b7.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-09-06T13:43:43Z

QA tests have started for PR 2287 at commit afc4a9a.

This patch merges cleanly.

SparkQA · 2014-09-06T14:47:26Z

QA tests have finished for PR 2287 at commit afc4a9a.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2014-09-06T18:00:34Z

LGTM, thanks!

JoshRosen · 2014-09-06T19:56:00Z

python/pyspark/tests.py

One final naming nit: so far, the classes named *TestCase are used for factoring out common setup / teardown code (such as unittest.TestCase, PySparkTestCase, etc.), while the classes with the actual tests have been called [ComponentName]Tests or Test[ComponentName]. Therefore, I'd prefer to call this CloudPickleTests.

Also, since this is just testing CloudPickle without using any PySpark features, it should extend unittest.TestCase instead of PySparkTestCase so that we don't run a setup / teardown method for a SparkContext that we never use.

JoshRosen · 2014-09-06T19:59:36Z

This looks fine to me, too, although I have two minor naming nits. Sorry to be so nitpicky on the names and test code, but I'd like this code to be really clean so that it serves as an example for future CloudPickle tests.

…ad using StringIO

davies · 2014-09-06T21:43:57Z

We have SerializationTestCase, so it's better to put it there.

On Sat, Sep 6, 2014 at 12:59 PM, Josh Rosen [email protected]
wrote:

This looks fine to me, too, although I have two minor naming nits. Sorry
to be so nitpicky on the names and test code, but I'd like this code to be
really clean so that it serves as an example for future CloudPickle tests.

Reply to this email directly or view it on GitHub
#2287 (comment).

Davies

JoshRosen · 2014-09-06T21:44:28Z

python/pyspark/tests.py

This isn't guaranteed to be the same across Python versions, so this test will be brittle. It's probably fine to pickle it using dumps, load it with dumps, and verify that what you get back is reasonable.

There is no load in cloudpickle, so I used the default one in python. Test dumps it, loads it, and compares it against original. Also moved the code under SerializationTestCase

SparkQA · 2014-09-06T22:44:39Z

QA tests have started for PR 2287 at commit 5f0d426.

This patch merges cleanly.

SparkQA · 2014-09-06T23:45:04Z

QA tests have finished for PR 2287 at commit 5f0d426.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2014-09-06T23:47:04Z

Woah, looks like this is a test failure due to some unrelated MiMa binary compatibility failures in GraphX... strange, since other PRs are building fine.

JoshRosen · 2014-09-07T06:17:35Z

Jenkins, retest this please.

JoshRosen · 2014-09-08T01:54:21Z

The Mima failures were an unintended consequence of bumping the version number without changing Mima configurations (see #2315). This looks good to me, so I'm going to merge it into master. Thanks!

SPARK-3415: removes legacy SerializingAdapter code

e263bf5

Ward Viaene added 2 commits September 5, 2014 19:07

SPARK-3415: test script

a958866

removed duplicate test

65ffeff

JoshRosen reviewed Sep 5, 2014
View reviewed changes

SPARK-3415: removed references to SerializingAdapter and rewrote test

aaf10b7

SPARK-3415: added newlines to pass lint

afc4a9a

JoshRosen reviewed Sep 6, 2014
View reviewed changes

SPARK-3415: modified test class name and call cloudpickle.dumps inste…

5f5d559

…ad using StringIO

JoshRosen reviewed Sep 6, 2014
View reviewed changes

SPARK-3415: modified test class to do dump and load

5f0d426

asfgit closed this in ecfa76c Sep 8, 2014

[SPARK-3415] [PySpark] removes SerializingAdapter code #2287

[SPARK-3415] [PySpark] removes SerializingAdapter code #2287

Uh oh!

Conversation

wardviaene commented Sep 5, 2014

Uh oh!

JoshRosen commented Sep 5, 2014

Uh oh!

wardviaene commented Sep 5, 2014

Uh oh!

JoshRosen Sep 5, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 5, 2014

Uh oh!

JoshRosen commented Sep 5, 2014

Uh oh!

wardviaene commented Sep 6, 2014

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

davies commented Sep 6, 2014

Uh oh!

JoshRosen Sep 6, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Sep 6, 2014

Uh oh!

davies commented Sep 6, 2014

Uh oh!

JoshRosen Sep 6, 2014

Choose a reason for hiding this comment

Uh oh!

wardviaene Sep 6, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

SparkQA commented Sep 6, 2014

Uh oh!

JoshRosen commented Sep 6, 2014

Uh oh!

JoshRosen commented Sep 7, 2014

Uh oh!

JoshRosen commented Sep 8, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants