Skip to content

Conversation

@wardviaene
Copy link

This code removes the SerializingAdapter code that was copied from PiCloud

@JoshRosen
Copy link
Contributor

Hi @wardviaene,

Do you have an example program that reproduces this bug? We should probably add it as a regression test (see python/pyspark/tests.py for examples of how to do this).

(For other reviewers: you can browse SerializingAdapter's code at http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/) It looks like this code is designed to handle the pickling of file() objects. The Dill developers have recently been discussing how to pickle file handles: uqfoundation/dill#57

It looks like SerializingAdapter.max_transmit_data acts as an upper-limit on the sizes of closures that PiCloud would send to their service. Unlike PiCloud, we don't have limits on closure sizes (there are warnings, but these are detected / enforced inside the JVM). Therefore, I wonder if we should just remove this limit and allow the whole file to be read rather than adding an obscure configuration option.

@wardviaene
Copy link
Author

Hi @JoshRosen

I added a test script in this pull request. The sys.stderr in a class triggers the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is capitalized (SetUp) so it won't be called by unittest. Also, we end up inheriting the proper setup and teardown methods from PySparkTestCase, so you don't need these methods.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@wardviaene
Copy link
Author

Hi @JoshRosen

The bug would only be triggered in a Class, that's why I initially wrote it like that. My last commit removes the references to SerializingAdapter and adds a test script that calls the save function directly. The test fails before the patch, but succeeds after the patch.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have started for PR 2287 at commit aaf10b7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have finished for PR 2287 at commit aaf10b7.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have started for PR 2287 at commit afc4a9a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have finished for PR 2287 at commit afc4a9a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Sep 6, 2014

LGTM, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final naming nit: so far, the classes named *TestCase are used for factoring out common setup / teardown code (such as unittest.TestCase, PySparkTestCase, etc.), while the classes with the actual tests have been called [ComponentName]Tests or Test[ComponentName]. Therefore, I'd prefer to call this CloudPickleTests.

Also, since this is just testing CloudPickle without using any PySpark features, it should extend unittest.TestCase instead of PySparkTestCase so that we don't run a setup / teardown method for a SparkContext that we never use.

@JoshRosen
Copy link
Contributor

This looks fine to me, too, although I have two minor naming nits. Sorry to be so nitpicky on the names and test code, but I'd like this code to be really clean so that it serves as an example for future CloudPickle tests.

@davies
Copy link
Contributor

davies commented Sep 6, 2014

We have SerializationTestCase, so it's better to put it there.

On Sat, Sep 6, 2014 at 12:59 PM, Josh Rosen [email protected]
wrote:

This looks fine to me, too, although I have two minor naming nits. Sorry
to be so nitpicky on the names and test code, but I'd like this code to be
really clean so that it serves as an example for future CloudPickle tests.

Reply to this email directly or view it on GitHub
#2287 (comment).

  • Davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't guaranteed to be the same across Python versions, so this test will be brittle. It's probably fine to pickle it using dumps, load it with dumps, and verify that what you get back is reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no load in cloudpickle, so I used the default one in python. Test dumps it, loads it, and compares it against original. Also moved the code under SerializationTestCase

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have started for PR 2287 at commit 5f0d426.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have finished for PR 2287 at commit 5f0d426.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Woah, looks like this is a test failure due to some unrelated MiMa binary compatibility failures in GraphX... strange, since other PRs are building fine.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor

The Mima failures were an unintended consequence of bumping the version number without changing Mima configurations (see #2315). This looks good to me, so I'm going to merge it into master. Thanks!

@asfgit asfgit closed this in ecfa76c Sep 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants