Fix LogAnalyzer to force load regular expressions from common files#5193
Fix LogAnalyzer to force load regular expressions from common files#5193wangxin merged 1 commit intosonic-net:masterfrom
Conversation
There was a problem hiding this comment.
It is better to run "load_common_config" in lines 55-56 while creating analyzer?
And line 33-35 should be removed.
There was a problem hiding this comment.
Hi, @wangxin If add "load_common_config" into line 55-56 - self.match_regex will be empty after line 57.
"load_common_config" - should be called after "parallel_run"
There was a problem hiding this comment.
@ppikh I tested below patch, it works from my side. Can you try that?
diff --git a/tests/common/plugins/loganalyzer/__init__.py b/tests/common/plugins/loganalyzer/__init__.py
index 32015d16f7..9f80fb3901 100644
--- a/tests/common/plugins/loganalyzer/__init__.py
+++ b/tests/common/plugins/loganalyzer/__init__.py
@@ -30,18 +30,13 @@ def analyzer_add_marker(analyzers, node=None, results=None):
loganalyzer = analyzers[node.hostname]
logging.info("Add start marker into DUT syslog for host {}".format(node.hostname))
marker = loganalyzer.init()
- logging.info("Load config and analyze log for host {}".format(node.hostname))
- # Read existed common regular expressions located with legacy loganalyzer module
- loganalyzer.load_common_config()
results[node.hostname] = marker
-
@reset_ansible_local_tmp
def analyze_logs(analyzers, markers, node=None, results=None):
dut_analyzer = analyzers[node.hostname]
dut_analyzer.analyze(markers[node.hostname])
-
@pytest.fixture(autouse=True)
def loganalyzer(duthosts, request):
if request.config.getoption("--disable_loganalyzer") or "disable_loganalyzer" in request.keywords:
@@ -53,7 +48,10 @@ def loganalyzer(duthosts, request):
analyzers = {}
parallel_run(analyzer_logrotate, [], {}, duthosts, timeout=120)
for duthost in duthosts:
- analyzers[duthost.hostname] = LogAnalyzer(ansible_host=duthost, marker_prefix=request.node.name)
+ analyzer = LogAnalyzer(ansible_host=duthost, marker_prefix=request.node.name)
+ analyzer.load_common_config()
+ analyzers[duthost.hostname] = analyzer
+
markers = parallel_run(analyzer_add_marker, [analyzers], {}, duthosts, timeout=120)
yield analyzers
@@ -63,4 +61,3 @@ def loganalyzer(duthosts, request):
return
logging.info("Starting to analyse on all DUTs")
parallel_run(analyze_logs, [analyzers, markers], {}, duthosts, timeout=120)
There was a problem hiding this comment.
Hi @wangxin
I did fixes, it works.
Thanks
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-mgmt |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Force loading regular expression from the common files
Signed-off-by: Petro Pikh <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Hi @wangxin @roysr-nv @liat-grozovik Can someone merge this commit? |
|
Because loganalyzer was broken for a long time, many error logs sneaked in. We should have this PR merged anyway because PR test failures were caused by error logs sneaked in recently. Extra efforts are needed to address those error logs. We should not blame loganalyzer for the surfaced new test errors after this PR is merged. |
|
@ppikh, seems this PR break some UT, here is one of my test PR: https://github.com/Azure/sonic-buildimage/pull/10329/checks?check_run_id=5656011837 Issue seems related with change in tests/common/plugins/loganalyzer/init.py |
|
Hi @liuh-80 |
|
Hi I met similiar issue. https://dev.azure.com/mssonic/build/_build/results?buildId=83555&view=logs&j=7ecf62ce-878c-5860-615d-9d0f5ab291f1&t=50051680-b4c8-5a0e-b7f9-3520ce72ee4a&l=1860
|
Thanks, I check the UT source code and found the UT failed by design, https://github.com/Azure/sonic-mgmt/blob/2333bbcc25ef419580e744fe2c5d79463f5c9afc/tests/pc/test_po_update.py Seems the UT not failed before, because the log analyzer not catch the message before this change. I'm not author of this UT and there seems other UT also failed. I will wait for the UT fixed. |
I did some research and found that the GCU will first sort the json_change to make sure it follows yang model rules. When some sorting doesn't comfort to the yang model rules, it will stop current sorting and just try other sorting method. |
|
Hi @liat-grozovik , for GCU test it is by design. As it will discard incorrect ordering and finally provide a correct ordering for JsonPatch. It is the previous incorrect ordering fail the log analyzer check. However, it doesn't fail because it will find a correct ordering and apply it. |
…5193) What is the motivation for this PR? Looks like commit #3235 caused issue that LA has empty list of match errors expressions. How did you do it? Fix LogAnalyzer to force load regular expressions from common files How did you verify/test it? Executed tests which using LA, checked that LA failed in case when we have errors in logs Signed-off-by: Petro Pikh <[email protected]>
… files (#5193)" (#5433) This reverts commit 03cccf7. Reverts #5193 After this fix was merged, PR test keeps failing because of errors in syslog. We spent some effort trying to temporarily ignore the errors. However, the list seems endless. Please refer to: [loganalyzer]add log patterns to the common ignore #5411 Add loganalyzer ignore regex for GCU #5391 We need a way to temporarily unblock PR testing. Let's revert this fix for now. Then I'll submit another PR to fix the loganalyzer issue together with a complete ignore list.
… files (#5193)" (#5433) This reverts commit 03cccf7. Reverts #5193 After this fix was merged, PR test keeps failing because of errors in syslog. We spent some effort trying to temporarily ignore the errors. However, the list seems endless. Please refer to: [loganalyzer]add log patterns to the common ignore #5411 Add loganalyzer ignore regex for GCU #5391 We need a way to temporarily unblock PR testing. Let's revert this fix for now. Then I'll submit another PR to fix the loganalyzer issue together with a complete ignore list.
Loganalyzer was broken in PR sonic-net#3235. The issue is that common config was loaded in subprocess for adding marks to syslog. After the subprocess exited, the common config is lost. PR sonic-net#5193 tried to fix this issue. However, because of many new error logs sneaked in when log analyzer was not working, PR testing started to fail by these error logs after PR sonic-net#5193 was merged. PR sonic-net#5191 and sonic-net#5411 tried to workaround the PR testing failure to unblock PR testing. PR sonic-net#5191 is to address the GCU related error logs and was merged. PR sonic-net#5411 tried to add other error logs to the common ignore list. But the effort took too long because the ignore list seemed endless. To unblock PR testing as soon as possible, the original fix sonic-net#5193 was reverted in sonic-net#5433. This PR tries to complete the work left over from sonic-net#5411 and sonic-net#5433. Changes: 1. Fix the log analyzer common config not loaded issue. 2. Temporarily add error logs to the common ignore list. 3. Improve the logging of log analyzer and parallel_run 4. PR testing t0_part2 takes much more time than t0_part1 after the GCU test scripts are added. This change re-balanced t0 part1&part2 testing by moving some of the tests from part2 to part1. 5. Sorted the PR testing scripts in alphabetic order. Signed-off-by: Xin Wang <[email protected]>
What is the motivation for this PR? Loganalyzer was broken in PR #3235. The issue is that common config was loaded in subprocess for adding marks to syslog. After the subprocess exited, the common config is lost. PR #5193 tried to fix this issue. However, because of many new error logs sneaked in when log analyzer was not working, PR testing started to fail by these error logs after PR #5193 was merged. PR #5391 and #5411 tried to work around the PR testing failure to unblock PR testing. PR #5391 is to address the GCU related error logs and was merged. PR #5411 tried to add other error logs to the common ignore list. But the effort took too long because the ignore list seemed endless. To unblock PR testing as soon as possible, the original fix #5193 was reverted in #5433. This PR tries to complete the work left over from #5411 and #5433. How did you do it? Changes: * Fix the log analyzer common config not loaded issue. * Temporarily add error logs to the common ignore list. * Improve the logging of log analyzer and parallel_run * PR testing t0_part2 takes much more time than t0_part1 after the GCU test scripts are added. This change re-balanced t0 part1&part2 testing by moving some of the tests from part2 to part1. * Sorted the PR testing scripts in alphabetic order. How did you verify/test it? Tested run a few test scripts with log analyzer enabled on KVM testbed. Signed-off-by: Xin Wang <[email protected]>
What is the motivation for this PR? Loganalyzer was broken in PR #3235. The issue is that common config was loaded in subprocess for adding marks to syslog. After the subprocess exited, the common config is lost. PR #5193 tried to fix this issue. However, because of many new error logs sneaked in when log analyzer was not working, PR testing started to fail by these error logs after PR #5193 was merged. PR #5391 and #5411 tried to work around the PR testing failure to unblock PR testing. PR #5391 is to address the GCU related error logs and was merged. PR #5411 tried to add other error logs to the common ignore list. But the effort took too long because the ignore list seemed endless. To unblock PR testing as soon as possible, the original fix #5193 was reverted in #5433. This PR tries to complete the work left over from #5411 and #5433. How did you do it? Changes: * Fix the log analyzer common config not loaded issue. * Temporarily add error logs to the common ignore list. * Improve the logging of log analyzer and parallel_run * PR testing t0_part2 takes much more time than t0_part1 after the GCU test scripts are added. This change re-balanced t0 part1&part2 testing by moving some of the tests from part2 to part1. * Sorted the PR testing scripts in alphabetic order. How did you verify/test it? Tested run a few test scripts with log analyzer enabled on KVM testbed. Signed-off-by: Xin Wang <[email protected]>
…onic-net#5193) What is the motivation for this PR? Looks like commit sonic-net#3235 caused issue that LA has empty list of match errors expressions. How did you do it? Fix LogAnalyzer to force load regular expressions from common files How did you verify/test it? Executed tests which using LA, checked that LA failed in case when we have errors in logs Signed-off-by: Petro Pikh <[email protected]>
… files (sonic-net#5193)" (sonic-net#5433) This reverts commit 03cccf7. Reverts sonic-net#5193 After this fix was merged, PR test keeps failing because of errors in syslog. We spent some effort trying to temporarily ignore the errors. However, the list seems endless. Please refer to: [loganalyzer]add log patterns to the common ignore sonic-net#5411 Add loganalyzer ignore regex for GCU sonic-net#5391 We need a way to temporarily unblock PR testing. Let's revert this fix for now. Then I'll submit another PR to fix the loganalyzer issue together with a complete ignore list.
Signed-off-by: Petro Pikh [email protected]
Description of PR
Fix LogAnalyzer to force load regular expressions from common files
Looks like commit #3235 caused issue that LA has empty list of match errors expressions.
Summary: Fix LogAnalyzer to force load regular expressions from common files
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
LogAnalyzer did not work, this PR is fix for LA
How did you do it?
See code
How did you verify/test it?
Executed tests which using LA, checked that LA failed in case when we have errors in logs
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation