libpython: Avoid race condition when reading region in use_temp_region()#638
Conversation
When g.region is called without -u flag, it always adjusts and writes back the computational region. When a parallel process is reading it too, it hits an empty (or rarely corrupted) file. In the provided test, this happens when reading the WIND file. At this point, the results from the test must be expected manually.
Return codes of bg procs are not propagated, so something is needed to track processes. Using maps created at the end of each Python script to do that. Parametrize the Bash script to make it reusable also for a potential nesting region correctness test. Also be more robust when the Python script is not available (zero returned on some platforms).
|
I have completed the test which checks if the parallel scripts are running using raster maps created which adds another complexity and region access, although the resulting maps are checked only for existence. (A correct passing of the region is not the topic of this PR, but the scripts can be used for that.) It should not show up in the list of failing tests in CI and the number of failing tests should be at least roughly the same as on master (39 on Ubuntu 18.04). You can run it locally in one of these three ways: A failing test (when I remove the -u fix from this patch) looks like this: |
HuidaeCho
left a comment
There was a problem hiding this comment.
It looks good to me. Just a couple typos and a comment about g.region's behavior.
| """ | ||
| name = "tmp.%s.%d" % (os.path.basename(sys.argv[0]), os.getpid()) | ||
| run_command("g.region", save=name, overwrite=True) | ||
| run_command("g.region", flags="u", save=name, overwrite=True) |
There was a problem hiding this comment.
I think we should fix g.region so that it updates WIND only when needed. For example, in this case, it just saves the current region (no changes to the region) then why rewrite WIND? It should be a read-only operation. That's for a new PR though.
There was a problem hiding this comment.
I agree, but as a heads-up, the code there looks pretty intentional.
|
Given what is written in Trac 2230 ('g.region -p' writes new WIND file, causes race condition for parallel jobs), this is actually pretty clear. I'm merging. We can have the discussion about g.region elsewhere. |
Test introduced in OSGeo#638 uses raster maps to determine if the subprocess was successfully executed. However, in GitHub Actions this fails randomly with a random number of rasters missing. This PR makes the test wait for couple seconds before counting the rasters in case the file system needs to catch up. If there is an actual error, waiting won't fix it, so the test will still do its job. If running and rerunning the tests here is successful, it suggests that it is a good fix at least in the sense of accomodating the given environment. Alternative fix for the test would be to replace Bash by Python which would allow leaving out the raster creation check (which is replacing return code check here) and provide more control. On one hand, we want to make the test run, on the other, we want to have the actual code robust, but maybe writing 50 rasters in parallel in a small VM in GitHub Actions is not a use case we need to cover in code and thus modifying the test is enough.
When g.region is called without -u flag, it always adjusts and writes back the
computational region. When a parallel process is reading it too, it hits an empty
(or rarely corrupted) file.
In the provided test, this happens when reading the WIND file. At this point, the results
from the test must be expected manually.