-
Notifications
You must be signed in to change notification settings - Fork 184
Handle dynamic modules globals modification #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
f022c41
2a0c3cb
780dee7
7561346
faab1f5
c82008b
21f05db
f1f4c31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,6 +78,23 @@ | |
| PY3 = True | ||
|
|
||
|
|
||
| # caches dynamic modules that are not referenced in sys.modules | ||
| _dynamic_modules_globals = weakref.WeakValueDictionary() | ||
|
|
||
| # _dynamic_modules_globals will store _base_globals variables. | ||
| # Or, _base_globals end up being dicts, and built-in dict instances | ||
| # cannot be weakly referenced. Therefore, we create a mirror of the | ||
| # dict type, and at the mutation of _base globals from string to dict, | ||
| # we use the DynamicDict constructor instead of the builtin dict one. | ||
|
|
||
|
|
||
| class DynamicDict(dict): | ||
| """class used to instanciate base_globals, in order to be weakly refrenced | ||
| into _dynamic_modules_globals | ||
| """ | ||
| pass | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why don't you keep a reference to the module object itself as I suggested in #201 (comment) ? That would make the code much more homogeneous, no?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this particular case, a module was never properly created in the previous code, simply a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No my point would be to register the dynamic modules themselves in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the case where we pickle only a function from a dynamic module, and not the dynamic module itself, it is not possible, because only By using a dictionary instead of a module, things worked because So in this particular use case, it seems hard to keep a reference to the dynamic module where an unpickled function lived.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, that's a good point. |
||
|
|
||
|
|
||
| def _make_cell_set_template_code(): | ||
| """Get the Python compiler to emit LOAD_FAST(arg); STORE_DEREF | ||
|
|
||
|
|
@@ -1090,12 +1107,17 @@ def _make_skel_func(code, cell_count, base_globals=None): | |
| if base_globals is None: | ||
| base_globals = {} | ||
| elif isinstance(base_globals, str): | ||
| base_globals_name = base_globals | ||
| if sys.modules.get(base_globals, None) is not None: | ||
| # this checks if we can import the previous environment the object | ||
| # This checks if we can import the previous environment the object | ||
| # lived in | ||
| base_globals = vars(sys.modules[base_globals]) | ||
| else: | ||
| base_globals = {} | ||
| base_globals = _dynamic_modules_globals.get( | ||
| base_globals, DynamicDict()) | ||
|
||
| # base_globals is not a string anymore, using base_globals_name | ||
| # instead | ||
|
||
| _dynamic_modules_globals[base_globals_name] = base_globals | ||
|
|
||
| base_globals['__builtins__'] = __builtins__ | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ | |
| import collections | ||
| import base64 | ||
| import functools | ||
| import gc | ||
| import imp | ||
| from io import BytesIO | ||
| import itertools | ||
|
|
@@ -42,7 +43,8 @@ | |
| tornado = None | ||
|
|
||
| import cloudpickle | ||
| from cloudpickle.cloudpickle import _find_module, _make_empty_cell, cell_set | ||
| from cloudpickle.cloudpickle import _find_module, _make_empty_cell, cell_set,\ | ||
| _dynamic_modules_globals | ||
|
||
|
|
||
| from .testutils import subprocess_pickle_echo | ||
| from .testutils import assert_run_python_script | ||
|
|
@@ -440,6 +442,56 @@ def method(self, x): | |
| mod1, mod2 = pickle_depickle([mod, mod]) | ||
| self.assertEqual(id(mod1), id(mod2)) | ||
|
|
||
| def test_dynamic_modules_globals(self): | ||
| # _dynamic_modules_globals is a WeakValueDictionary, so if this dynamic | ||
| # module has no other reference than in this # dict (whether on the | ||
|
||
| # child process or the parent process), this module will be garbage | ||
| # collected. | ||
|
||
|
|
||
| # We first create a module | ||
| mod = imp.new_module('mod') | ||
| code = ''' | ||
| x = 1 | ||
| def func(): | ||
| return | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on this one
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is. I can merge the two test, but unsure if it is better. |
||
| ''' | ||
| exec(textwrap.dedent(code), mod.__dict__) | ||
|
|
||
| pickled_module_path = 'mod.pkl' | ||
| print(mod.func.__module__) | ||
|
||
|
|
||
| with open(pickled_module_path, 'wb') as f: | ||
| cloudpickle.dump(mod.func, f) | ||
|
|
||
| # _dynamic_modules_globals will have mod appended to its values | ||
| # in the child process only | ||
| child_process_script = ''' | ||
| import pickle | ||
| from cloudpickle.cloudpickle import _dynamic_modules_globals | ||
| import gc | ||
| with open("{pickled_module_path}", 'rb') as f: | ||
| func = pickle.load(f) | ||
|
|
||
| # A dictionnary storing the globals of the newly unpickled function | ||
| # should have been created | ||
| assert list(_dynamic_modules_globals.keys())==['mod'] | ||
|
||
|
|
||
| del func | ||
| gc.collect() | ||
|
|
||
| # There is no reference to the globals of func since func has been | ||
| # deleted and _dynamic_modules_globals is a WeakValueDictionary, | ||
| # so _dynamic_modules_globals should now be empty | ||
| assert list(_dynamic_modules_globals.keys())==[] | ||
|
||
| ''' | ||
| child_process_script = child_process_script.format( | ||
| pickled_module_path=pickled_module_path) | ||
|
|
||
| assert_run_python_script(textwrap.dedent(child_process_script)) | ||
|
|
||
| # We remove the other references to the object and trigger a | ||
| # gc event | ||
|
|
||
| def test_load_dynamic_module_in_grandchild_process(self): | ||
| # Make sure that when loaded, a dynamic module preserves its dynamic | ||
| # property. Otherwise, this will lead to an ImportError if pickled in | ||
|
|
@@ -993,6 +1045,95 @@ def f1(): | |
| finally: | ||
| _TEST_GLOBAL_VARIABLE = orig_value | ||
|
|
||
| def test_function_from_dynamic_module_with_globals_modifications(self): | ||
| """ | ||
| this test verifies that: | ||
| - any modification in the global variables of a dynamic | ||
| module living in a child process won't get overridden | ||
| when new objects are unpickled in the child's interpreter | ||
|
|
||
| - vice versa, e.g a modification in the parent process does not | ||
| override the value of the variables in the child process | ||
|
|
||
| The two cases are equivalent, and here, the second case is tested. | ||
| """ | ||
|
|
||
| # first, we create a dynamic module in the parent process | ||
| mod = imp.new_module('mod') | ||
| code = ''' | ||
| x = 1 | ||
| def func_that_relies_on_dynamic_module(v=None): | ||
| global x | ||
| if v is not None: | ||
| x = v | ||
| return x | ||
| ''' | ||
| exec(textwrap.dedent(code), mod.__dict__) | ||
|
|
||
| try: | ||
| # simple sanity check on the function's output | ||
| assert mod.func_that_relies_on_dynamic_module() == 1 | ||
|
|
||
| # the function of mod is pickled two times, with two different | ||
| # values for the global variable x. | ||
|
|
||
| # a child process that sequentially unpickles the | ||
| # two functions is then launched | ||
|
|
||
| # once the _first_ function gets unpickled, mod is created and | ||
| # tracked in the child environment. Whatever the global variable | ||
| # x's value in the second function, it will be overriden by the | ||
| # initial value of x in the child environment | ||
|
|
||
| with open('function_with_initial_globals.pk', 'wb') as fid: | ||
| cloudpickle.dump(mod.func_that_relies_on_dynamic_module, fid) | ||
|
|
||
| # change the mod's global variable x | ||
| mod.x = 2 | ||
|
|
||
| # at this point, mod.func_that_relies_on_dynamic_module() | ||
| # returns 2 | ||
| assert mod.func_that_relies_on_dynamic_module() == 2 | ||
| with open('function_with_modified_globals.pk', 'wb') as fid: | ||
| cloudpickle.dump(mod.func_that_relies_on_dynamic_module, fid) | ||
|
|
||
| child_process_code = """ | ||
| import pickle | ||
|
|
||
| with open('function_with_initial_globals.pk','rb') as fid: | ||
| function_with_initial_globals = pickle.load(fid) | ||
|
|
||
| # at this point, a module called 'mod' should exist in | ||
| # _dynamic_modules_globals. further function loading | ||
| # will use the globals living in mod | ||
|
|
||
| assert function_with_initial_globals() == 1 | ||
|
|
||
| # load a function with initial global variable x set to 2 | ||
| with open('function_with_modified_globals.pk','rb') as fid: | ||
| function_with_modified_globals = pickle.load(fid) | ||
|
|
||
| # assert the initial global got overridden by | ||
| # _dynamic_modules_globals | ||
| assert function_with_modified_globals()==1 | ||
|
|
||
| # both function's global x should point to the | ||
| # same variable. | ||
| # calling function_with_initial_globals('test_value') | ||
| # will change this variable: function_with_modified_globals() | ||
| # should return the changed variable | ||
| assert function_with_initial_globals('test_value') == 'test_value' | ||
| assert function_with_modified_globals() == 'test_value' | ||
| """ | ||
|
|
||
| # finally, we execute the code | ||
| assert_run_python_script(textwrap.dedent(child_process_code)) | ||
|
|
||
| finally: | ||
| # remove the created files | ||
| os.unlink('function_with_initial_globals.pk') | ||
| os.unlink('function_with_modified_globals.pk') | ||
|
|
||
| @pytest.mark.skipif(sys.version_info >= (3, 0), | ||
| reason="hardcoded pickle bytes for 2.7") | ||
| def test_function_pickle_compat_0_4_0(self): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: please follow PEP 257 for docstring: oneliner short summary, then optional blank line + paragraph with more details that could not fit in the summary. Following PEP 257 is helpful to make code editors documentation tooltips work better (as well as sphinx API doc for public function / classes).
Also I think we should make the this class private explicitly with a leading underscore.