[202311] improve qos log readability (#12526) by XuChen-MSFT · Pull Request #12899 · sonic-net/sonic-mgmt

XuChen-MSFT · 2024-05-20T13:39:22Z

Log message wrapper:
All the messages were outputted to PTF logging by default, and can set flag to output to stderr of PTF console for specific message. so it can avoid lots of message flush on console and "test summary", easy to identify failure when triage. and also can check PTF logging which include all the message when rootcause failure.

CounterCollector Class
provide general interface for counter collecting, comparing, and displaying.

Diagnostic Counter Wrapper
so far, we can read 8 kinds of counter:
port_counter, queue_counter_counter, queue_share_wm_counter, pg_share_wm_counter, pg_headroom_wm_counter, pg_counter_couner, pg_drop_counter and ptf_tx_rx_counter

Although CounterCollector provides a common API to collect, compare and display these counters, if you use countercollect directly, the code of the test case will still become confusing. After all, at least one line of code for each counter. If the types of counter queries are subsequently increased, more code unrelated to the test steps will be exposed in the testcase.

Therefore, the diag coutner wrapper is used to include all types of counter activities, so that the code in the test case is more inclined to reflect the test steps and logic rather than these diagnostic codes.

assert wrapper
By default, we will display the counter difference between the first and last step of this case on both normal and abnormal exits. but using python build-in assert instruction make it difficult to show counter diff. so we implement a assert wrapper to show counter diff when assert exception occur.

TextTable Class
This is not newly added class, in befor, it help to output counters in table format like well-known python library prettytable. in this PR, add a new class static method "merge_table())" to merge two table which need to show their difference.

example case:
not applied this feature to all qos testcase.
only applied above changes to xoff, xon, lossyqueue cases as a example first. Monitor for long time to collect the feedback, and then enhance.

already cover various sku/topo
see below test record table

skip chassis device
since test have not covered chassis yet, skip chassis device support so far.

How did you verify/test it?
pass verification in lab testbed

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

manually cherry pick PR #12526 to fix conflict

How did you do it?

since master PR #11000 hasn't been cherry-picked to 202311 branch yet, caused conflict,
manually fixed it.

How did you verify/test it?

run multi-platform test, don't found regression caused by this change.

testbed	testplan id
vms11-t0-7050qx-acs-4	664a3497bb0aadaaaca62e8b
vms18-t1-7050qx-acs-03	664a349441ead59ac1b7bffb
testbed-bjw-can-7050qx-3	664a349348fb649086787a55
vms28-t0-7050-12	664a349348fb649086787a55
tbtk5-t0-7260-7	664a34713bbcc36b5b02821d
vms21-dual-t0-7050-3	664a346ef4e2923d16563707
vms24-dual-t0-7050-2	664a346d52c4f0206499e616
vms3-t0-s6100	664a3452bb0aadaaaca62e81
vms64-t0-s6100-1	664a345148fb649086787a4c
vms1-t1-2700	664a30313bbcc36b5b02820e
vms64-t1-4700-1	664a302f52c4f0206499e60d

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Log message wrapper: All the messages were outputted to PTF logging by default, and can set flag to output to stderr of PTF console for specific message. so it can avoid lots of message flush on console and "test summary", easy to identify failure when triage. and also can check PTF logging which include all the message when rootcause failure. CounterCollector Class provide general interface for counter collecting, comparing, and displaying. Diagnostic Counter Wrapper so far, we can read 8 kinds of counter: port_counter, queue_counter_counter, queue_share_wm_counter, pg_share_wm_counter, pg_headroom_wm_counter, pg_counter_couner, pg_drop_counter and ptf_tx_rx_counter Although CounterCollector provides a common API to collect, compare and display these counters, if you use countercollect directly, the code of the test case will still become confusing. After all, at least one line of code for each counter. If the types of counter queries are subsequently increased, more code unrelated to the test steps will be exposed in the testcase. Therefore, the diag coutner wrapper is used to include all types of counter activities, so that the code in the test case is more inclined to reflect the test steps and logic rather than these diagnostic codes. assert wrapper By default, we will display the counter difference between the first and last step of this case on both normal and abnormal exits. but using python build-in assert instruction make it difficult to show counter diff. so we implement a assert wrapper to show counter diff when assert exception occur. TextTable Class This is not newly added class, in befor, it help to output counters in table format like well-known python library prettytable. in this PR, add a new class static method "merge_table())" to merge two table which need to show their difference. example case: not applied this feature to all qos testcase. only applied above changes to xoff, xon, lossyqueue cases as a example first. Monitor for long time to collect the feedback, and then enhance. already cover various sku/topo see below test record table skip chassis device since test have not covered chassis yet, skip chassis device support so far. How did you verify/test it? pass verification in lab testbed

XuChen-MSFT requested a review from StormLiangMS May 20, 2024 13:39

XuChen-MSFT changed the title ~~imporve qos log readability (#12526)~~ [202311] imporve qos log readability (#12526) May 20, 2024

XuChen-MSFT changed the title ~~[202311] imporve qos log readability (#12526)~~ [202311] improve qos log readability (#12526) May 20, 2024

yxieca merged commit c212c99 into sonic-net:202311 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[202311] improve qos log readability (#12526)#12899

[202311] improve qos log readability (#12526)#12899
yxieca merged 1 commit intosonic-net:202311from
XuChen-MSFT:xuchen3/202311/qos-ptf-diag-utils

XuChen-MSFT commented May 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

XuChen-MSFT commented May 20, 2024

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants