[202311] improve qos log readability (#12526)#12899
Merged
yxieca merged 1 commit intosonic-net:202311from May 20, 2024
Merged
Conversation
Log message wrapper: All the messages were outputted to PTF logging by default, and can set flag to output to stderr of PTF console for specific message. so it can avoid lots of message flush on console and "test summary", easy to identify failure when triage. and also can check PTF logging which include all the message when rootcause failure. CounterCollector Class provide general interface for counter collecting, comparing, and displaying. Diagnostic Counter Wrapper so far, we can read 8 kinds of counter: port_counter, queue_counter_counter, queue_share_wm_counter, pg_share_wm_counter, pg_headroom_wm_counter, pg_counter_couner, pg_drop_counter and ptf_tx_rx_counter Although CounterCollector provides a common API to collect, compare and display these counters, if you use countercollect directly, the code of the test case will still become confusing. After all, at least one line of code for each counter. If the types of counter queries are subsequently increased, more code unrelated to the test steps will be exposed in the testcase. Therefore, the diag coutner wrapper is used to include all types of counter activities, so that the code in the test case is more inclined to reflect the test steps and logic rather than these diagnostic codes. assert wrapper By default, we will display the counter difference between the first and last step of this case on both normal and abnormal exits. but using python build-in assert instruction make it difficult to show counter diff. so we implement a assert wrapper to show counter diff when assert exception occur. TextTable Class This is not newly added class, in befor, it help to output counters in table format like well-known python library prettytable. in this PR, add a new class static method "merge_table())" to merge two table which need to show their difference. example case: not applied this feature to all qos testcase. only applied above changes to xoff, xon, lossyqueue cases as a example first. Monitor for long time to collect the feedback, and then enhance. already cover various sku/topo see below test record table skip chassis device since test have not covered chassis yet, skip chassis device support so far. How did you verify/test it? pass verification in lab testbed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Log message wrapper:
All the messages were outputted to PTF logging by default, and can set flag to output to stderr of PTF console for specific message. so it can avoid lots of message flush on console and "test summary", easy to identify failure when triage. and also can check PTF logging which include all the message when rootcause failure.
CounterCollector Class
provide general interface for counter collecting, comparing, and displaying.
Diagnostic Counter Wrapper
so far, we can read 8 kinds of counter:
port_counter, queue_counter_counter, queue_share_wm_counter, pg_share_wm_counter, pg_headroom_wm_counter, pg_counter_couner, pg_drop_counter and ptf_tx_rx_counter
Although CounterCollector provides a common API to collect, compare and display these counters, if you use countercollect directly, the code of the test case will still become confusing. After all, at least one line of code for each counter. If the types of counter queries are subsequently increased, more code unrelated to the test steps will be exposed in the testcase.
Therefore, the diag coutner wrapper is used to include all types of counter activities, so that the code in the test case is more inclined to reflect the test steps and logic rather than these diagnostic codes.
assert wrapper
By default, we will display the counter difference between the first and last step of this case on both normal and abnormal exits. but using python build-in assert instruction make it difficult to show counter diff. so we implement a assert wrapper to show counter diff when assert exception occur.
TextTable Class
This is not newly added class, in befor, it help to output counters in table format like well-known python library prettytable. in this PR, add a new class static method "merge_table())" to merge two table which need to show their difference.
example case:
not applied this feature to all qos testcase.
only applied above changes to xoff, xon, lossyqueue cases as a example first. Monitor for long time to collect the feedback, and then enhance.
already cover various sku/topo
see below test record table
skip chassis device
since test have not covered chassis yet, skip chassis device support so far.
How did you verify/test it?
pass verification in lab testbed
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
manually cherry pick PR #12526 to fix conflict
How did you do it?
since master PR #11000 hasn't been cherry-picked to 202311 branch yet, caused conflict,
manually fixed it.
How did you verify/test it?
run multi-platform test, don't found regression caused by this change.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation