Skip to content

Conversation

@pierreglaser
Copy link
Member

@pierreglaser pierreglaser commented Jan 29, 2019

This PR changes the way globals collisions are handled:

  • before, already existing globals would override globals shipped in a function's pickle string
  • now, it is the opposite: any global shipped with a function will have the priority over already existing globals in the depickling environment.

The tests are accordingly updated. Note that #216 did not add any new tests, but simply parametrized existing ones.

EDIT: For the record, the previous behavior (silently override globals shipped with the functions) introduced regressions reported by users of downstream projects. Here is the list of issues related to it:

@ogrisel it is a good PR to trigger the new downstream integration builds!

@ogrisel
Copy link
Contributor

ogrisel commented Jan 30, 2019

A bit of background: this PR restores the behavior of cloudpickle from before 0.6.0.

In 0.6.0 and 0.6.1 we changed the dynamic module globals handling behavior as we thought it was a bug (lack of consistency with how globals are handled in non-dynamic modules) but user feedback including #214 revealed that the new behavior is surprising/annoying for most users.

In #216 we explored the possibility of introducing of a switch on the CloudPickler class to control the overriding behavior but in retrospect we felt that there wasn't any realistic use cases that would mandate the override_globals=False case.

So to keep things simple, this PR hardcodes the override_globals=True case to restore a behavior equivalent to what cloudpickle 0.5 and earlier had.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 30, 2019

@ogrisel it is a good PR to trigger the new downstream integration builds!

I agree, please feel free to push an empty commit with the [ci downstream] flag to do so.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 30, 2019

The failure comes from test_workdir_simple, which at first glance seems unrelated to cloudpickle.
I'll add a [ci distributed] commit to see if this was a random failure.

Edit: the next build passed, so indeed it looks like a random failure.

@codecov-io
Copy link

codecov-io commented Jan 30, 2019

Codecov Report

Merging #240 into master will decrease coverage by 0.48%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #240      +/-   ##
==========================================
- Coverage   85.39%   84.91%   -0.49%     
==========================================
  Files           1        1              
  Lines         589      570      -19     
  Branches      118      112       -6     
==========================================
- Hits          503      484      -19     
  Misses         63       63              
  Partials       23       23
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 84.91% <66.66%> (-0.49%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b31ced...c207afc. Read the comment docs.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few comments:

@mitar
Copy link

mitar commented Jan 30, 2019

@suquark, can you check this?

@suquark
Copy link
Contributor

suquark commented Jan 31, 2019

@mitar I think this PR is nice and it will solve some previous problems. And I will help testing Ray with the latest cloudpickle (#238).

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 6, 2019

This doesn't seem to fix the issue fully for us. Not sure what is going on, do you have an idea @suquark?

@pierreglaser
Copy link
Member Author

pierreglaser commented Feb 6, 2019

@pcmoritz sorry, I still had a couple fixes to do that I thought I already pushed. This PR now reverts cloudpickle's globals handling behavior to 0.5.3. Back then this behavior was something we did not fully understand (and tested). Hopefully, this area will get steadier ☺️ .

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 6, 2019

@pierreglaser Great, I re-ran the Ray test suite on top of your latest commits and it seems to be working now. Thanks a lot for the fix :)

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass at reviewing the new version of this PR.

@pierreglaser pierreglaser force-pushed the force-existing-globals-overriding branch from 13afdd2 to f4f15e3 Compare February 7, 2019 11:15
@pierreglaser
Copy link
Member Author

pierreglaser commented Feb 7, 2019

Here is a gist that loads "old" (e.g created using cloudpickle 0.5, 0.6, 0.7) pickle strings of functions of dynamically created modules using the reversion introduced in this PR. So far so good.

PS: sorry about the ugly force push, i forgot I resolved some conflicts in the browser...

@ogrisel
Copy link
Contributor

ogrisel commented Feb 7, 2019

@pierreglaser please launch downstream CI on this PR to check that we have not broken anything.

CHANGES.md Outdated
variables when loaded in their new environment. This restores the (previously
untested) behavior of cloudpickle prior to changes done in 0.5.4 for
functions defined in the `__main__` module, and 0.6.0/1 for other dynamic
functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "override" anymore because we create a new empty namespace each time. May I suggest the following:

- Global variables referenced by functions pickled  by cloudpickle are now
  unpickled in a new and isolated namespace scoped by the CloudPickler instance.  
  This restores the (previously  untested) behavior of cloudpickle prior to changes
  done in 0.5.4 for functions defined in the `__main__` module,  and 0.6.0/1 for
  other dynamic functions.

Copy link
Member Author

@pierreglaser pierreglaser Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, this was stale. I'll let the downstream integration test finish before pushing a new commit.

@ogrisel ogrisel changed the title Force existing globals overriding Restore CloudPickler-scoped globals namespace isolation for pickled functions Feb 7, 2019
@ogrisel
Copy link
Contributor

ogrisel commented Feb 7, 2019

You can trigger downstream CI again, I merged your PR to fix the broken loky test.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for merge once the downstream CI has completed and the changelog is updated.

@ogrisel ogrisel merged commit 9184a67 into cloudpipe:master Feb 7, 2019
@ogrisel
Copy link
Contributor

ogrisel commented Feb 7, 2019

Merged. Thanks for your work @pierreglaser! Thanks for your feedback @pcmoritz!

HyukjinKwon added a commit to apache/spark that referenced this pull request Feb 27, 2019
## What changes were proposed in this pull request?

After upgrading cloudpickle to 0.6.1 at #20691, one regression was found. Cloudpickle had a critical cloudpipe/cloudpickle#240 for that.

Basically, it currently looks existing globals would override globals shipped in a function's, meaning:

**Before:**

```python
>>> def hey():
...     return "Hi"
...
>>> spark.range(1).rdd.map(lambda _: hey()).collect()
['Hi']
>>> def hey():
...     return "Yeah"
...
>>> spark.range(1).rdd.map(lambda _: hey()).collect()
['Hi']
```

**After:**

```python
>>> def hey():
...     return "Hi"
...
>>> spark.range(1).rdd.map(lambda _: hey()).collect()
['Hi']
>>>
>>> def hey():
...     return "Yeah"
...
>>> spark.range(1).rdd.map(lambda _: hey()).collect()
['Yeah']
```

Therefore, this PR upgrades cloudpickle to 0.8.0.

Note that cloudpickle's release cycle is quite short.

Between 0.6.1 and 0.7.0, it contains minor bug fixes. I don't see notable changes to double check and/or avoid.

There is virtually only this fix between 0.7.0 and 0.8.1 - other fixes are about testing.

## How was this patch tested?

Manually tested, tests were added. Verified unit tests were added in cloudpickle.

Closes #23904 from HyukjinKwon/SPARK-27000.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants