Skip to content

Conversation

@pierreglaser
Copy link
Member

Not sure globals_ref is still needed, now that base_globals is simply the function's module name.

@codecov-io
Copy link

codecov-io commented Jan 16, 2019

Codecov Report

Merging #229 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
- Coverage   84.92%   84.81%   -0.11%     
==========================================
  Files           1        1              
  Lines         577      573       -4     
  Branches      114      113       -1     
==========================================
- Hits          490      486       -4     
  Misses         63       63              
  Partials       24       24
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 84.81% <100%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fd15c2...c684319. Read the comment docs.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 17, 2019

Hum, this is fishy. I wonder if we did not break something that was not tested in the past when doing this series of changes in the handling of base globals.

Maybe this is related to @suquark's comment #216 (comment) but I am not sure I understand.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 17, 2019

In summary:

  • prior to the series of changes, it has a clear, but somewhat limited (if not broken) usage
  • now, the only situation in which it has an influence is if modifying a function's __module__ attribute. But with a even more unstable behaviour.

Prior to the series of changes (i.e before commit 9a52ba3)

globals_ref allowed two functions sharing the same __globals__ attribute in a parent process to share it in the child process, but only if pickled in the same pickle string. Indeed, in this case, globals_ref would tell the CloudPickler to use its memo when pickling __globals__ instead of pickling each time a new dict.

My opinion is that globals_ref global variable sharing was not strong, since the memo is an attribute of a CloudPickler. Several calls to cloudpickle.dumps would have created several Cloudpickler, hence functions sharing the same global environment would have had separate global environment when depickled.

I made two tests that I ran using a version of CloudPickle, with HEAD at a249c44

    def test_global_variable_sharing_same_cloudpickler(self):
        code_template = """
        from cloudpickle import loads, dumps
        A_SHARED_GLOBAL = 1

        def f():
            global A_SHARED_GLOBAL
            A_SHARED_GLOBAL += 1
            return A_SHARED_GLOBAL

        def g():
            global A_SHARED_GLOBAL
            return A_SHARED_GLOBAL

        # the depickled functions share a global variable, as the the globals
        # dict will be shared in the cloudpickler's memo
        cloned_f, cloned_g = loads(dumps([f, g]))

        del A_SHARED_GLOBAL

        UNPICKLED_SHARED_GLOBAL = cloned_f()
        assert UNPICKLED_SHARED_GLOBAL == 2, UNPICKLED_SHARED_GLOBAL
        assert cloned_g() == UNPICKLED_SHARED_GLOBAL, cloned_g()
        """
        assert_run_python_script(textwrap.dedent(code_template))

    def test_global_variable_sharing_different_cloudpickler(self):
        code_template = """
        from cloudpickle import loads, dumps
        A_SHARED_GLOBAL = 1

        def f():
            global A_SHARED_GLOBAL
            A_SHARED_GLOBAL += 1
            return A_SHARED_GLOBAL

        def g():
            global A_SHARED_GLOBAL
            return A_SHARED_GLOBAL


        # here, the two functions do not share the global variables, as they
        # are picked in different pickle strings, so the memo is not use to
        # pickle the globals dictionnary
        cloned_f = loads(dumps(f))
        cloned_g = loads(dumps(g))

        del A_SHARED_GLOBAL

        UNPICKLED_SHARED_GLOBAL = cloned_f()
        assert UNPICKLED_SHARED_GLOBAL == 2, UNPICKLED_SHARED_GLOBAL
        # this fails, so the global variable is not shared
        assert cloned_g() == UNPICKLED_SHARED_GLOBAL, cloned_g()
        """
        assert_run_python_script(textwrap.dedent(code_template))

the first one pass, the second one fails.

If modifying the code by removing globals_ref:

@@ -260,7 +260,7 @@ class CloudPickler(Pickler):
         # set of modules to unpickle
         self.modules = set()
         # map ids to dictionary. used to ensure that functions can share global env
-        self.globals_ref = {}
+        # self.globals_ref = {}
 
     def dump(self, obj):
         self.inject_addons()
@@ -632,8 +632,9 @@ class CloudPickler(Pickler):
         # save the dict
         dct = func.__dict__
 
-        base_globals = self.globals_ref.get(id(func.__globals__), {})
-        self.globals_ref[id(func.__globals__)] = base_globals
+        # base_globals = self.globals_ref.get(id(func.__globals__), {})
+        # self.globals_ref[id(func.__globals__)] = base_globals
+        base_globals = {}
 
         return (code, f_globals, defaults, closure, dct, base_globals)

the two tests fail.

After the series of changes (i.e on master)

The situation has changed: globals_ref acts as a proxy before looking at a the function's __module__. There are edge cases where the presence or the absence of globals_ref therefore changes the globals handling behavior, but only if the function's __module__ attribute get mutated. But this looks like undefined behaviour anyway to me.

I think one of the key takeaways of my endeavours is that globals_ref's influence is limited to a Cloudpickler, i.e one call of cloudpickle.dumps. I am not sure we want this.

@TheButlah
Copy link

Perhaps this is related to my issues at #230?

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 17, 2019

@TheButlah so far, all the code I posted here was ran on a Linux machine, so I cannot see a clear relation between this and #230. I did not investigate your issue though, and I am sadly not familiar with windows specificities.

@ogrisel
Copy link
Contributor

ogrisel commented Feb 7, 2019

I believe we actually want to preserve this. This is now being correctly tested as part of: #240.

Let's close this issue.

@ogrisel ogrisel closed this Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants