Fix #10842 - When using pyenergyplus to run (via run_energyplus), PythonPlugin initializations errors lead to hang #10844
Conversation
There was a problem hiding this comment.
Review for the Test part.
On 0c73d29, before fix (you can also use the MCVE on https://github.com/NREL/EnergyPlusDevSupport/commit/4a6027ab293f4d697b60a71e106aa7a83e4a89d3 )
$ ctest -R 'TestRuntimeReleasesTheGIL' -VV
22: Test command: /home/julien/.pyenv/versions/3.12.2/bin/python3.12 "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" "-d" "/home/julien/Software/Others/EnergyPlus-build/tst/api/TestRuntimeReleasesTheGIL" "-w" "/home/julien/Software/Others/EnergyPlus/weather/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.epw" "-D" "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf"
22: Working Directory: /home/julien/Software/Others/EnergyPlus-build/src/EnergyPlus
22: Environment variables:
22: PYTHONPATH=/home/julien/Software/Others/EnergyPlus-build/Products
22: Test timeout computed to be: 10
22: EnergyPlus Starting
22: EnergyPlus, Version 25.1.0-a119feb883, YMD=2024.12.03 12:50
22: **FATAL:Python import error causes program termination
22: EnergyPlus Run Time=00hr 00min 2.65sec
22: Program terminated: EnergyPlus Terminated--Error(s) Detected.
1/1 Test #22: API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL ...***Timeout 10.06 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 10.16 sec
The following tests FAILED:
22 - API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL (Timeout)
Ok: gcc on decent ci also timeouts: https://raw.githubusercontent.com/Myoldmopar/EnergyPlusBuildResults/eb8127d16465353e3137023c1a6d27a405cfe69d/_posts/10842-PH/2024-12-03-EnergyPlus-0c73d2989ba393ab72d8ead77c2651f4ad3e941f-x86_64-Linux-Ubuntu-24.04-gcc-13.2-results.html
| # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
| # POSSIBILITY OF SUCH DAMAGE. | ||
|
|
||
| from this_does_not_exist.hello_world import garbage |
There was a problem hiding this comment.
Broken Python Plugin import
| PythonPlugin:Instance, | ||
| Heating Setpoint Override, !- Name | ||
| Yes, !- Run During Warmup Days | ||
| mcve_gil, !- Python Module Name | ||
| HeatingSetPoint; !- Plugin Class Name |
There was a problem hiding this comment.
MCVE file that has nothing but this plugin, the required objects (Building, GlobalGeometry Rules) and design days.
| api = EnergyPlusAPI() | ||
| state = api.state_manager.new_state() | ||
| exit_code = api.runtime.run_energyplus(state, sys.argv[1:]) | ||
|
|
||
| if exit_code == 0: | ||
| print("Expected EnergyPlus to return an error to the broken python plugin. Task Succeeded Unsuccessfully!") | ||
| sys.exit(1) |
There was a problem hiding this comment.
We run that file idf file with python plugin from API.
In the current state, the exit_code is never returned anyways because it just hangs. But testing that it did return an error is what we eventually want
| set(TEST_DIR "${PROJECT_BINARY_DIR}/tst/api/TestRuntimeReleasesTheGIL") | ||
| file(MAKE_DIRECTORY ${TEST_DIR}) | ||
| add_test(NAME "API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL" | ||
| COMMAND "${Python_EXECUTABLE}" "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" -d "${TEST_DIR}" -w "${EPW_FILE}" -D "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf" | ||
| ) | ||
| set_tests_properties("API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL" | ||
| PROPERTIES | ||
| ENVIRONMENT PYTHONPATH=${DIR_WITH_PY_ENERGYPLUS} | ||
| TIMEOUT 10 # This used to timeout! and we expect it NOT to | ||
| ) |
There was a problem hiding this comment.
Register the ctest. NOTE THE TIMEOUT 10 (seconds), we do expect it to timeout to demonstrate #10842
src/EnergyPlus/PluginManager.cc
Outdated
| // GilGrabber is an RAII helper that will ensure we release the GIL (including if we end up throwing) | ||
| struct GilGrabber | ||
| { | ||
| GilGrabber() | ||
| { | ||
| gil = PyGILState_Ensure(); | ||
| } | ||
| ~GilGrabber() | ||
| { | ||
| PyGILState_Release(gil); | ||
| } | ||
|
|
||
| PyGILState_STATE gil; | ||
| }; | ||
|
|
||
| // Take control of the global interpreter lock | ||
| GilGrabber gil_grabber; |
There was a problem hiding this comment.
Ok, so the issue was that we were NOT releasing the Python Global Interpreter Lock after acquiring it. (I tested a C version of the same test, which was correctly throwing and exiting as expected, so I figured it was the GIL)
Initially in 552f5dd I put a try / catch around plugin.setup()
for (auto &plugin : state.dataPluginManager->plugins) {
+ try {
plugin.setup(state);
+ } catch (const FatalError &e) {
+ PyGILState_Release(gil);
+ throw e;
+ }
}But I realized it was probably a better idea to just use RAII, so that there's no chance we forget to release the Python GIL
| // Release the global interpreter lock is done via RAII | ||
|
|
There was a problem hiding this comment.
No need to release the GIL manually anymore in the case it works fine either
|
@jmarrec This is my test suite of osws that fail in each place that energyplus contacts a pythonplugin. Hopefully this can be useful in helping diagnose this issue. |
Myoldmopar
left a comment
There was a problem hiding this comment.
This is really nice. A sleek fix, RAII working great, excellent test, all good.
|
And confirmed it's happy with develop pulled in. Merging, thanks @jmarrec |
Pull request overview
Pull Request Author
Add to this list or remove from it as applicable. This is a simple templated set of guidelines.
Reviewer
This will not be exhaustively relevant to every PR.