Enhancing core_dump_and_config_check to be multi-asic aware by sanmalho-git · Pull Request #6527 · sonic-net/sonic-mgmt

sanmalho-git · 2022-10-12T19:49:36Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

201911
202012
202205

Approach

What is the motivation for this PR?

In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error '

Feature 'x' auto-restart is not consistent across namespaces

The reason was that pretest goes and disables autorestart on all the containers. Then ACL tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph. The above results in setting the autorestart state of the containers back to default of 'enabled'

Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that autorestart state has changed to 'enabled'. Thus, it tries to restore it.

However, when it restores, it only restores config_db.json, and not the other asics config_db's.

This results in the state of autorestart to be not consistent across namespaces - global has autorestart 'disabled', while namespace has autorestart 'enabled'.

How did you do it?

Fix for the above it to have enhance check_dut_health_status to be multi-asic aware.

It compares not just config_db.json, but also the config_db's of all the asics.
If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT.

How did you verify/test it?

Ran check_dut_health_status with changes to config_db's and validated that it is restored to what it is before the suite is run

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

azure-pipelines · 2022-10-13T17:00:49Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
$(results)

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

sanmalho-git · 2022-10-27T17:40:42Z

@SuvarnaMeenakshi - i will fix the merge conflict - can you please review

azure-pipelines · 2022-10-27T21:06:54Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Passed check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/conftest.py:16:1: F401 'tests.common.fixtures.conn_graph_facts.conn_graph_facts' imported but unused tests/conftest.py:28:1: F401 'tests.common.fixtures.duthost_utils.backup_and_restore_config_db_session' imported but unused tests/conftest.py:29:1: F401 'tests.common.fixtures.ptfhost_utils.ptf_portmap_file' imported but unused tests/conftest.py:30:1: F401 'tests.common.fixtures.ptfhost_utils.run_icmp_responder_session' imported but unused tests/conftest.py:58:26: E261 at least two spaces before inline comment tests/conftest.py:86:121: E501 line too long (123 > 120 characters) tests/conftest.py:89:121: E501 line too long (122 > 120 characters) tests/conftest.py:90:121: E501 line too long (139 > 120 characters) tests/conftest.py:93:121: E501 line too long (176 > 120 characters) tests/conftest.py:97:21: E128 continuation line under-indented for visual indent tests/conftest.py:112:21: E128 continuation line under-indented for visual indent ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

sanmalho-git · 2022-10-28T15:45:13Z

@SuvarnaMeenakshi @judyjoseph are you expecting the PR owners to fix the pre-commit errors - even though they are in lines of code that the PR owner didn't touch.

SuvarnaMeenakshi · 2022-10-31T18:30:58Z

@SuvarnaMeenakshi @judyjoseph are you expecting the PR owners to fix the pre-commit errors - even though they are in lines of code that the PR owner didn't touch.

@sanmalho-git The log mentions that new issues must be fixed.
Fixing old issue is not mandatory

tests/conftest.py

SuvarnaMeenakshi

To ensure changes are uniform, changes should be done here as well, where golden config db is created and removed:
https://github.com/sonic-net/sonic-mgmt/blob/master/ansible/config_sonic_basedon_testbed.yml#L584

https://github.com/sonic-net/sonic-mgmt/blob/master/tests/test_pretest.py#L291

azure-pipelines · 2022-11-08T21:52:43Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing ansible/config_sonic_basedon_testbed.yml fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/test_pretest.py check yaml...............................................................Passed check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

sanmalho-git · 2022-11-08T21:53:42Z

To ensure changes are uniform, changes should be done here as well, where golden config db is created and removed: https://github.com/sonic-net/sonic-mgmt/blob/master/ansible/config_sonic_basedon_testbed.yml#L584

https://github.com/sonic-net/sonic-mgmt/blob/master/tests/test_pretest.py#L291

Thanks for pointing this out.
Have addressed this in the latest commit

tests/conftest.py

tests/test_pretest.py

SuvarnaMeenakshi · 2022-11-14T22:27:07Z

@anamehra fyi

SuvarnaMeenakshi · 2022-11-16T00:11:45Z

@sanmalho-git - can you resolve conflict and fix minor comments.

In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error ' Feature 'x' auto-restart is not consistent across namespaces The reason was that pretest goes and disables autorestart on all the containers. Then ACL tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph. The above results in setting the autorestart state of the containers back to default of 'enabled' Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that autorestart state has changed to 'enabled'. Thus, it tries to restore it. However, when it restores, it only restores config_db.json, and not the other asics config_db's. This results in the state of autorestart to be not consistent across namespaces - global has autorestart 'disabled', while namespace has autorestart 'enabled'. Fix for the above it to have enhance check_dut_health_status to be multi-asic aware. - It compares not just config_db.json, but also the config_db's of all the asics. - If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT.

This will make sure that critical services and ports are up when proceeding to the next suite. The default wait time of 120 for pizza box and 240 for modular chassis is sometimes not sufficient - especially with all 400G ports having SFPs

…fig_check

azure-pipelines · 2022-11-17T21:06:52Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing ansible/config_sonic_basedon_testbed.yml Fixing tests/conftest.py fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/test_pretest.py check yaml...............................................................Passed check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

sanmalho-git · 2022-11-17T21:09:20Z

@SuvarnaMeenakshi - have addressed the comments and rebased.

SuvarnaMeenakshi · 2022-11-18T20:30:22Z

@sanmalho-git Thank you for addressing the comments.

PR checks are failing.
Checking one of the logs, I see this error:
dhcp_relay/test_dhcp_relay.py::test_dhcp_relay_random_sport[dual] ERROR [100%]

==================================== ERRORS ====================================
___________ ERROR at teardown of test_dhcp_relay_random_sport[dual] ____________
...
pre_running_config = duts_data[duthost.hostname]["pre_running_config"][cfg_context]

              cur_running_config = duts_data[duthost.hostname]["cur_running_config"][cfg_context]

E KeyError: None

tests/conftest.py

- Forgot to replace 'host' with None in one of the spots

azure-pipelines · 2022-11-18T21:19:53Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing ansible/config_sonic_basedon_testbed.yml Fixing tests/conftest.py fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/test_pretest.py check yaml...............................................................Passed check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

tests/conftest.py

azure-pipelines · 2022-11-21T15:58:25Z

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing ansible/config_sonic_basedon_testbed.yml Fixing tests/conftest.py fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/test_pretest.py check yaml...............................................................Passed check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

SuvarnaMeenakshi

lgtm

What is the motivation for this PR? In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error ' Feature 'x' auto-restart is not consistent across namespaces The reason was that pretest goes and disables autorestart on all the containers. Then ACL tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph. The above results in setting the autorestart state of the containers back to default of 'enabled' Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that autorestart state has changed to 'enabled'. Thus, it tries to restore it. However, when it restores, it only restores config_db.json, and not the other asics config_db's. This results in the state of autorestart to be not consistent across namespaces - global has autorestart 'disabled', while namespace has autorestart 'enabled'. How did you do it? Fix for the above it to have enhance check_dut_health_status to be multi-asic aware. It compares not just config_db.json, but also the config_db's of all the asics. If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT. How did you verify/test it? Ran check_dut_health_status with changes to config_db's and validated that it is restored to what it is before the suite is run

…nfig (#6918) What is the motivation for this PR? Some test cases failed at teardown when comparing pre config and current config: for key in cur_config_extra_keys: > cur_only_config[duthost.hostname].update({key: cur_running_config[key]}) E KeyError: u'MUX_LINKMGR' The issue is introduced by #6527. How did you do it? Get previous and current config keys after removing exclusive keys. Also add [cfg_context] for cur_only_config and pre_only_config How did you verify/test it? Run dualtor case,such as dualtor/test_tor_ecn.py::test_dscp_to_queue_during_encap_on_standby Signed-off-by: Zhaohui Sun <[email protected]>

…ic. (#8884) What is the motivation for this PR? In PR (#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio. How did you do it? Change the key in single-asic scenerio from None to asic0.

…ic. (sonic-net#8884) What is the motivation for this PR? In PR (sonic-net#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio. How did you do it? Change the key in single-asic scenerio from None to asic0.

…ic. (#8884) What is the motivation for this PR? In PR (#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio. How did you do it? Change the key in single-asic scenerio from None to asic0.

…ic-mgmt into internal-202205 Fix merge conflicts. - Fix verify_no_packet_any call in fib_test (sonic-net#6461) - Fix the test case test_TSA failure when check the routes on the eos host (sonic-net#6483) - Use conditional mark to skip testcase instead of required_mocked_dualtor (sonic-net#6766) - [tagged_arp] fix issue 'fixture ports_list not found' (sonic-net#6773) - [QoS] fixes after moving to python3 (sonic-net#6786) - update parse funciton for image url (sonic-net#6848) - Fix typo in get_queue_counter (sonic-net#6852) - Revert "Fix loganalyzer.py UnicodeDecodeError (sonic-net#6524)" (sonic-net#6858) - Enhancing core_dump_and_config_check to be multi-asic aware (sonic-net#6527) - Adding support for calculating balancing in multi-lc/multi-asic case (Test_fib.py) (sonic-net#6391) - Support different RC in case of pre or post sanity check failed (sonic-net#6860) - Update getbuild.py to support pass an empty access_token - [202205] Fixing auto_techsupport (sonic-net#6882) - Merge branch 'azure-202205' into dev/yaqiangzhu/202205_manually_merge

…ic. (sonic-net#8884) What is the motivation for this PR? In PR (sonic-net#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio. How did you do it? Change the key in single-asic scenerio from None to asic0.

sanmalho-git force-pushed the check_dut_health branch from e98fb4a to 09b82e5 Compare October 13, 2022 16:59

Blueve added the Request for 202205 branch label Oct 27, 2022

tjchadaga assigned SuvarnaMeenakshi Oct 27, 2022

sanmalho-git force-pushed the check_dut_health branch from 09b82e5 to 4b244b3 Compare October 27, 2022 21:06

SuvarnaMeenakshi reviewed Nov 7, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi requested changes Nov 7, 2022

View reviewed changes

sanmalho-git force-pushed the check_dut_health branch from 4b244b3 to 93d2f77 Compare November 8, 2022 21:51

sanmalho-git changed the title ~~Enhancing check_dut_health_status to be multi-asic aware~~ Enhancing core_dump_and_config_check to be multi-asic aware Nov 8, 2022

SuvarnaMeenakshi reviewed Nov 9, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi reviewed Nov 9, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi reviewed Nov 9, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi reviewed Nov 9, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi reviewed Nov 9, 2022

View reviewed changes

tests/test_pretest.py Outdated Show resolved Hide resolved

sanmalho-git added 4 commits November 17, 2022 16:05

Changes to support golden_running_config as part of core_dump_and_con…

9c679e2

…fig_check

Review comments on PR#6257 - core_dump_and_config_check fixture

4704b61

sanmalho-git force-pushed the check_dut_health branch from 93d2f77 to 4704b61 Compare November 17, 2022 21:05

SuvarnaMeenakshi reviewed Nov 18, 2022

View reviewed changes

tests/conftest.py Show resolved Hide resolved

Fix for failing test cases

cdcb0aa

- Forgot to replace 'host' with None in one of the spots

SuvarnaMeenakshi reviewed Nov 18, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

SuvarnaMeenakshi reviewed Nov 19, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

Fixing flake8 issues

9eb2766

SuvarnaMeenakshi approved these changes Nov 21, 2022

View reviewed changes

SuvarnaMeenakshi merged commit 543910e into sonic-net:master Nov 21, 2022

wangxin added the Included in 202205 branch label Nov 23, 2022

ZhaohuiS mentioned this pull request Nov 29, 2022

Fix KeyError: u'MUX_LINKMGR' when comparing pre config and current config #6918

Merged

5 tasks

yutongzhang-microsoft mentioned this pull request Jul 10, 2023

Set key "asic0" in single-asic scenerio to keep consistent with multi-asic in function core_dump_and_config_check. #8884

Merged

6 tasks

mssonicbld mentioned this pull request Jul 10, 2023

[action] [PR:8884] Set key "asic0" in single-asic scenerio to keep consistent with multi-asic in function core_dump_and_config_check. #8885

Merged

6 tasks

Conversation

sanmalho-git commented Oct 12, 2022

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

azure-pipelines bot commented Oct 13, 2022

Uh oh!

sanmalho-git commented Oct 27, 2022

Uh oh!

azure-pipelines bot commented Oct 27, 2022

Uh oh!

sanmalho-git commented Oct 28, 2022

Uh oh!

SuvarnaMeenakshi commented Oct 31, 2022

Uh oh!

Uh oh!

SuvarnaMeenakshi left a comment

Choose a reason for hiding this comment

Uh oh!

azure-pipelines bot commented Nov 8, 2022

Uh oh!

sanmalho-git commented Nov 8, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SuvarnaMeenakshi commented Nov 14, 2022

Uh oh!

SuvarnaMeenakshi commented Nov 16, 2022

Uh oh!

azure-pipelines bot commented Nov 17, 2022

Uh oh!

sanmalho-git commented Nov 17, 2022

Uh oh!

SuvarnaMeenakshi commented Nov 18, 2022

Uh oh!

Uh oh!

azure-pipelines bot commented Nov 18, 2022

Uh oh!

Uh oh!

Uh oh!

azure-pipelines bot commented Nov 21, 2022

Uh oh!

SuvarnaMeenakshi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants