[pytest] Test the running status of Monit service#2309
[pytest] Test the running status of Monit service#2309wangxin merged 11 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Yong Zhao <[email protected]>
|
This is a good basic, simple test to ensure Monit is running. I think we also should consider adding at least one more test in the future to check the output of |
'dead' state if it is stopped. Signed-off-by: Yong Zhao <[email protected]>
Yes, I will do this in the following PR. |
is not running. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
…t running. Signed-off-by: Yong Zhao <[email protected]>
…is not running. Signed-off-by: Yong Zhao <[email protected]>
| @summary: Test the running status of Monit service by analyzing the command | ||
| output of "sudo systemctl status monit.service | grep Active". | ||
| """ | ||
| monit_service_status_info = duthost.shell("sudo systemctl status monit.service | grep Active") |
There was a problem hiding this comment.
should just do sudo monit status
and then check the return value, whether it is zero or not.
There was a problem hiding this comment.
Yes. Checking the return value of sudo monit status is zero will ensure that not only is Monit running, but that it is configured correctly. I was thinking about added separate tests for testing the config, but this makes for a better one-step liveness/basic health check.
There was a problem hiding this comment.
Great suggestion! Fixed.
Signed-off-by: Yong Zhao <[email protected]>
determinie whether the Monit is running or not. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
| if exit_code == 0: | ||
| logger.info("Monit service is running.") | ||
| else: | ||
| pytest.fail("Monit service is not running: '{}'".format(status_line)) | ||
| pytest.fail("Monit service is not running.") |
There was a problem hiding this comment.
A return code of 0 from sudo monit status means Monit is running and healthy. A non-zero return code does not necessarily mean Monit is not running. It could be running with a bad config.
There was a problem hiding this comment.
i think better to use the wrapper. pytest_fail?
There was a problem hiding this comment.
There is no pytest_fail wrapper. I believe you're thinking of the pytest_assert wrapper. pytest.fail is fine, or he could use pytest_assert(exit_code == 0, <message>)
There was a problem hiding this comment.
I don't think we gain much from logging that Monit is running. I think this if/else should be replaced with:
pytest_assert(exit_code == 0, "Monit is either not running or not configured correctly")There was a problem hiding this comment.
A return code of 0 from
sudo monit statusmeans Monit is running and healthy. A non-zero return code does not necessarily mean Monit is not running. It could be running with a bad config.
@jleveque You mentioned that if the return code is non-zero, Monit could be running with a bad config. "bad config" at here means someone modified the Monit configuration files and introduced syntax errors or other errors. But they did not run the command sudo systemctl restart monit.service. Although Monit is still in the running status, return code of the command sudo monit status will be non-zero, right?
There was a problem hiding this comment.
If Monit is running and the currently loaded Monit configuration contains syntax errors or similar (like duplicate program names, as we encountered previously), sudo monit status will return a non-zero value. Thus a non-zero exit code could mean that Monit is not running or that it's not configured correctly.
There was a problem hiding this comment.
That's what I thought and tested.
| ] | ||
|
|
||
|
|
||
| def test_monit_service_status(duthost): |
There was a problem hiding this comment.
Suggest shortening the name of this function and the file to test_monit_status
| @summary: Test the running status of Monit service by analyzing the command | ||
| output of "sudo systemctl status monit.service | grep Active". | ||
| """ | ||
| monit_service_status_info = duthost.shell("sudo monit status", module_ignore_errors=True) |
There was a problem hiding this comment.
Suggest renaming var to monit_status_result
| @summary: Test the running status of Monit service by analyzing the command | ||
| output of "sudo systemctl status monit.service | grep Active". |
There was a problem hiding this comment.
This summary is now out of date and references the wrong command. Suggest removing the command from it altogether, as the command is two lines below and the function is simple.
function. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
|
Retest this please |
Commits: bee3684 - 2022-06-20 : Add BGP profile to Vnet routes (sonic-net#2339) [Prince Sunny] f9af510 - 2022-06-16 : [intfmgr]: Set proxy_arp kernel param (sonic-net#2334) [Lawrence Lee] 725071f - 2022-06-08 : Fix test_warm_reboot issues blocking PR merge (sonic-net#2309) [Vaibhav Hemant Dixit] 0db6f15 - 2021-11-16 : [orchagent] Flush pipeline every 1 second, not only when select will timeout (sonic-net#2003) [Kamil Cudnik]
linkmgrd: * 2da783b 2022-06-07 | Check self's mux mode before switching peer to standby & add support for `detach` mode (sonic-net#79) (HEAD -> 202205, github/202205) [Jing Zhang] sairedis: * 54642c7 2022-06-09 | [counter] Fix port flex counter (sonic-net#1052) (HEAD -> 202205, github/202205) [Junhua Zhai] * b7f5f92 2022-06-06 | [ci] Paralize azure pipeline (sonic-net#1054) [Shilong Liu] swss: * 77043fb 2022-06-09 | [fpmsyncd] don't manipulate route weight (sonic-net#2321) (HEAD -> 202205, github/202205) [Ying Xie] * ae157f1 2022-06-10 | Fix test_warm_reboot issues blocking PR merge (sonic-net#2309) (sonic-net#2318) [Shilong Liu] Signed-off-by: Ying Xie <[email protected]>
Update sonic-utilities submodule pointer to include the following: * [route_check]: Ignore standalone tunnel routes (sonic-net#2325) ([sonic-net#2346](sonic-net/sonic-utilities#2346)) * [VRF]Adding CLI checks to ensure Vrf is valid in interface bind and static route commands ([sonic-net#2333](sonic-net/sonic-utilities#2333)) * Subinterface vrf bind issue fix ([sonic-net#2211](sonic-net/sonic-utilities#2211)) * [decode-syseeprom] Fix setting use_db based on support_eeprom_db ([sonic-net#2270](sonic-net/sonic-utilities#2270)) * Fix vrf UT failed issue ([sonic-net#2309](sonic-net/sonic-utilities#2309)) Signed-off-by: dprital <[email protected]> Signed-off-by: dprital <[email protected]>
b9c509d Fix test_warm_reboot issues blocking PR merge (sonic-net#2309)
Approach
What is the motivation for this PR?
This PR is used to test whether the Monit service is running correctly on physical testbeds/DuTs.
How did you do it?
How did you verify/test it?
I tested this PR on a virtual testbed.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation