Skip to content

Test case 3 of PFC watchdog against warm-reboot: random storming#837

Merged
wendani merged 20 commits intosonic-net:masterfrom
wendani:pfcwd_wb_2_master
Oct 14, 2019
Merged

Test case 3 of PFC watchdog against warm-reboot: random storming#837
wendani merged 20 commits intosonic-net:masterfrom
wendani:pfcwd_wb_2_master

Conversation

@wendani
Copy link
Contributor

@wendani wendani commented Mar 22, 2019

Test case 3:
PFC storm asynchronously starts at a random time and lasts a random period at fanout
Warm-reboot emission
Wait for all the PFC storms to finish
Verify PFC storm detection and restoration functional

Tested on regular pfc watchdog without break.

Infrastructure change:
Add the flexiblity to defer the start and stop of PFC storm at Arista fanout
TOFIX: Mlnx fanout

Incremental commits on top of #834

Description of PR

Summary:
Fixes # (issue)

Type of change

  • [] Bug fix
  • [] Testbed and Framework(new/improvement)
  • [] Test case(new/improvement)

Approach

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

wendani added 17 commits March 19, 2019 18:19
Signed-off-by: Wenda Ni <[email protected]>
which uses cached time for a certain period of time ansible/ansible#22561

Signed-off-by: Wenda Ni <[email protected]>
functional_test_storm_perq.yml and functional_test_restore_perq.yml,
respectively

Add the capability to storm multiple queues of a port

Signed-off-by: Wenda Ni <[email protected]>
PFC storm started and detected before warm-reboot
On-going storm on warm-reboot emission, and lasts past the warm-reboot finish
PFC storm stopped and restored after warm-reboot

Signed-off-by: Wenda Ni <[email protected]>
Mar 20 00:40:33.599212 str-a7050-acs-1 ERR syncd#syncd:
_brcm_sai_cosq_stat_get:1146 cosq stat get failed with error Invalid
parameter (0xfffffffc).
Mar 20 00:40:33.599212 str-a7050-acs-1 DEBUG syncd#syncd:
brcm_sai_get_queue_stats:724 cosq stat get failed with error -5 for port
1 qid 10
Mar 20 00:40:33.599212 str-a7050-acs-1 NOTICE syncd#syncd: :-
setQueueCounterList: Queue oid:0x102150000000b does not has supported
counters

Signed-off-by: Wenda Ni <[email protected]>
Using include asynchronously with with_items not supported

From <ansible/ansible#22716>
Signed-off-by: Wenda Ni <[email protected]>
PFC storm asynchronously starts at a random time and lasts a random period at fanout
Warm-reboot emission
Wait for all the PFC storms to finish
Verify PFC storm detection and restoration functional

Signed-off-by: Wenda Ni <[email protected]>
@wendani wendani merged commit 639905f into sonic-net:master Oct 14, 2019
yxieca pushed a commit that referenced this pull request Oct 15, 2019
* First test case of PFC watchdog against warm-reboot

Signed-off-by: Wenda Ni <[email protected]>

* Add more comments for code readability

Signed-off-by: Wenda Ni <[email protected]>

* Modify output message

Signed-off-by: Wenda Ni <[email protected]>

* Allow log analyzer to take a specified start marker

Signed-off-by: Wenda Ni <[email protected]>

* Use lookup('pipe', 'date +%H:%M:%S') in place of ansible_date_time.time,
which uses cached time for a certain period of time ansible/ansible#22561

Signed-off-by: Wenda Ni <[email protected]>

* Add the flexiblity to not start storm at fanout link partener in running
functional_test_storm.yml

Signed-off-by: Wenda Ni <[email protected]>

* Dump only the current result and summary files for debugging and troubleshooting purpose

Signed-off-by: Wenda Ni <[email protected]>

* Add the capability to check if the number of exact matches is equal to
to the target number

Signed-off-by: Wenda Ni <[email protected]>

* Split the actual storm and restore tests into
functional_test_storm_perq.yml and functional_test_restore_perq.yml,
respectively

Add the capability to storm multiple queues of a port

Signed-off-by: Wenda Ni <[email protected]>

* Add test case 2 of PFC watchdog against warm-reboot:

PFC storm started and detected before warm-reboot
On-going storm on warm-reboot emission, and lasts past the warm-reboot finish
PFC storm stopped and restored after warm-reboot

Signed-off-by: Wenda Ni <[email protected]>

* Ignore trival syncd ERR during the warm-reboot, e.g.,

Mar 20 00:40:33.599212 str-a7050-acs-1 ERR syncd#syncd:
_brcm_sai_cosq_stat_get:1146 cosq stat get failed with error Invalid
parameter (0xfffffffc).
Mar 20 00:40:33.599212 str-a7050-acs-1 DEBUG syncd#syncd:
brcm_sai_get_queue_stats:724 cosq stat get failed with error -5 for port
1 qid 10
Mar 20 00:40:33.599212 str-a7050-acs-1 NOTICE syncd#syncd: :-
setQueueCounterList: Queue oid:0x102150000000b does not has supported
counters

Signed-off-by: Wenda Ni <[email protected]>

* Run apswitch action asynchronously

Using include asynchronously with with_items not supported

From <ansible/ansible#22716>

* Add the flexiblity to defer storm start and stop at fanout

Signed-off-by: Wenda Ni <[email protected]>

* Randomly generate deferred time

Signed-off-by: Wenda Ni <[email protected]>

* Move actual storming ops to per queue

Signed-off-by: Wenda Ni <[email protected]>

* Clean debugging symbols

Signed-off-by: Wenda Ni <[email protected]>

* Test cast 3 of PFC watchdog against warm-reboot

PFC storm asynchronously starts at a random time and lasts a random period at fanout
Warm-reboot emission
Wait for all the PFC storms to finish
Verify PFC storm detection and restoration functional

Signed-off-by: Wenda Ni <[email protected]>

* Specify reboot type to be 'warm-reboot'

Signed-off-by: Wenda Ni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants