Skip to content

Enhancing core_dump_and_config_check to be multi-asic aware#6527

Merged
SuvarnaMeenakshi merged 6 commits intosonic-net:masterfrom
sanmalho-git:check_dut_health
Nov 21, 2022
Merged

Enhancing core_dump_and_config_check to be multi-asic aware#6527
SuvarnaMeenakshi merged 6 commits intosonic-net:masterfrom
sanmalho-git:check_dut_health

Conversation

@sanmalho-git
Copy link
Contributor

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205

Approach

What is the motivation for this PR?

In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error '

Feature 'x' auto-restart is not consistent across namespaces

The reason was that pretest goes and disables autorestart on all the containers. Then ACL tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph. The above results in setting the autorestart state of the containers back to default of 'enabled'

Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that autorestart state has changed to 'enabled'. Thus, it tries to restore it.

However, when it restores, it only restores config_db.json, and not the other asics config_db's.

This results in the state of autorestart to be not consistent across namespaces - global has autorestart 'disabled', while namespace has autorestart 'enabled'.

How did you do it?

Fix for the above it to have enhance check_dut_health_status to be multi-asic aware.

  • It compares not just config_db.json, but also the config_db's of all the asics.
  • If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT.

How did you verify/test it?

Ran check_dut_health_status with changes to config_db's and validated that it is restored to what it is before the suite is run

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
$(results)

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@sanmalho-git
Copy link
Contributor Author

@SuvarnaMeenakshi - i will fix the merge conflict - can you please review

@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/conftest.py:16:1: F401 'tests.common.fixtures.conn_graph_facts.conn_graph_facts' imported but unused
tests/conftest.py:28:1: F401 'tests.common.fixtures.duthost_utils.backup_and_restore_config_db_session' imported but unused
tests/conftest.py:29:1: F401 'tests.common.fixtures.ptfhost_utils.ptf_portmap_file' imported but unused
tests/conftest.py:30:1: F401 'tests.common.fixtures.ptfhost_utils.run_icmp_responder_session' imported but unused
tests/conftest.py:58:26: E261 at least two spaces before inline comment
tests/conftest.py:86:121: E501 line too long (123 > 120 characters)
tests/conftest.py:89:121: E501 line too long (122 > 120 characters)
tests/conftest.py:90:121: E501 line too long (139 > 120 characters)
tests/conftest.py:93:121: E501 line too long (176 > 120 characters)
tests/conftest.py:97:21: E128 continuation line under-indented for visual indent
tests/conftest.py:112:21: E128 continuation line under-indented for visual indent
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@sanmalho-git
Copy link
Contributor Author

@SuvarnaMeenakshi @judyjoseph are you expecting the PR owners to fix the pre-commit errors - even though they are in lines of code that the PR owner didn't touch.

@SuvarnaMeenakshi
Copy link
Contributor

@SuvarnaMeenakshi @judyjoseph are you expecting the PR owners to fix the pre-commit errors - even though they are in lines of code that the PR owner didn't touch.

@sanmalho-git The log mentions that new issues must be fixed.
Fixing old issue is not mandatory

Copy link
Contributor

@SuvarnaMeenakshi SuvarnaMeenakshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing ansible/config_sonic_basedon_testbed.yml

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/test_pretest.py

check yaml...............................................................Passed
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@sanmalho-git
Copy link
Contributor Author

To ensure changes are uniform, changes should be done here as well, where golden config db is created and removed: https://github.com/sonic-net/sonic-mgmt/blob/master/ansible/config_sonic_basedon_testbed.yml#L584

https://github.com/sonic-net/sonic-mgmt/blob/master/tests/test_pretest.py#L291

Thanks for pointing this out.
Have addressed this in the latest commit

@sanmalho-git sanmalho-git changed the title Enhancing check_dut_health_status to be multi-asic aware Enhancing core_dump_and_config_check to be multi-asic aware Nov 8, 2022
@SuvarnaMeenakshi
Copy link
Contributor

@anamehra fyi

@SuvarnaMeenakshi
Copy link
Contributor

@sanmalho-git - can you resolve conflict and fix minor comments.

In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error '
    Feature 'x' auto-restart is not consistent across namespaces

The reason was that pretest goes and disables autorestart on all the containers. Then ACL
tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph.
The above results in setting the autorestart state of the containers back to default of 'enabled'

Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL
suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that
autorestart state has changed to 'enabled'. Thus, it tries to restore it.

However, when it restores, it only restores config_db.json, and not the other asics config_db's.

This results in the state of autorestart to be not consistent across namespaces - global
has autorestart 'disabled', while namespace has autorestart 'enabled'.

Fix for the above it to have enhance check_dut_health_status to be multi-asic aware.
  - It compares not just config_db.json, but also the config_db's of all the asics.
  - If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT.
This will make sure that critical services and ports are up when proceeding
to the next suite. The default wait time of 120 for pizza box and 240 for
modular chassis is sometimes not sufficient - especially with all 400G ports having SFPs
@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing ansible/config_sonic_basedon_testbed.yml
Fixing tests/conftest.py

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/test_pretest.py

check yaml...............................................................Passed
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@sanmalho-git
Copy link
Contributor Author

@SuvarnaMeenakshi - have addressed the comments and rebased.

@SuvarnaMeenakshi
Copy link
Contributor

@sanmalho-git Thank you for addressing the comments.

PR checks are failing.
Checking one of the logs, I see this error:
dhcp_relay/test_dhcp_relay.py::test_dhcp_relay_random_sport[dual] ERROR [100%]

==================================== ERRORS ====================================
___________ ERROR at teardown of test_dhcp_relay_random_sport[dual] ____________
...
pre_running_config = duts_data[duthost.hostname]["pre_running_config"][cfg_context]

              cur_running_config = duts_data[duthost.hostname]["cur_running_config"][cfg_context]

E KeyError: None

- Forgot to replace 'host' with None in one of the spots
@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing ansible/config_sonic_basedon_testbed.yml
Fixing tests/conftest.py

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/test_pretest.py

check yaml...............................................................Passed
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@azure-pipelines
Copy link

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing ansible/config_sonic_basedon_testbed.yml
Fixing tests/conftest.py

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/test_pretest.py

check yaml...............................................................Passed
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

Copy link
Contributor

@SuvarnaMeenakshi SuvarnaMeenakshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SuvarnaMeenakshi SuvarnaMeenakshi merged commit 543910e into sonic-net:master Nov 21, 2022
wangxin pushed a commit that referenced this pull request Nov 23, 2022
What is the motivation for this PR?
In our pipeline runs, autorestart tests against a multi-asic DUT were failing with error '
Feature 'x' auto-restart is not consistent across namespaces
The reason was that pretest goes and disables autorestart on all the containers. Then ACL tests run with change the config_db's and to retore in it's cleanup does a config load_minigraph. The above results in setting the autorestart state of the containers back to default of 'enabled'
Now, when check_dut_health_status kicks in, it takes a snapshot of the config_db.json before the ACL suite which has the autorestart state as 'disabled'. But, after the ACL suite, it detects that autorestart state has changed to 'enabled'. Thus, it tries to restore it.
However, when it restores, it only restores config_db.json, and not the other asics config_db's.
This results in the state of autorestart to be not consistent across namespaces - global has autorestart 'disabled', while namespace has autorestart 'enabled'.
How did you do it?
Fix for the above it to have enhance check_dut_health_status to be multi-asic aware.
It compares not just config_db.json, but also the config_db's of all the asics.
If it finds that config has changed, restore not just config_db.json, but also the config_db's of all the asics before rebooting the DUT.
How did you verify/test it?
Ran check_dut_health_status with changes to config_db's and validated that it is restored to what it is before the suite is run
ZhaohuiS added a commit that referenced this pull request Dec 1, 2022
…nfig (#6918)

What is the motivation for this PR?
Some test cases failed at teardown when comparing pre config and current config:

                    for key in cur_config_extra_keys:
>                       cur_only_config[duthost.hostname].update({key: cur_running_config[key]})
E                       KeyError: u'MUX_LINKMGR'
The issue is introduced by #6527.

How did you do it?
Get previous and current config keys after removing exclusive keys.
Also add [cfg_context] for cur_only_config and pre_only_config

How did you verify/test it?
Run dualtor case,such as dualtor/test_tor_ecn.py::test_dscp_to_queue_during_encap_on_standby

Signed-off-by: Zhaohui Sun <[email protected]>
yxieca pushed a commit that referenced this pull request Dec 1, 2022
…nfig (#6918)

What is the motivation for this PR?
Some test cases failed at teardown when comparing pre config and current config:

                    for key in cur_config_extra_keys:
>                       cur_only_config[duthost.hostname].update({key: cur_running_config[key]})
E                       KeyError: u'MUX_LINKMGR'
The issue is introduced by #6527.

How did you do it?
Get previous and current config keys after removing exclusive keys.
Also add [cfg_context] for cur_only_config and pre_only_config

How did you verify/test it?
Run dualtor case,such as dualtor/test_tor_ecn.py::test_dscp_to_queue_during_encap_on_standby

Signed-off-by: Zhaohui Sun <[email protected]>
wangxin pushed a commit that referenced this pull request Jul 10, 2023
…ic. (#8884)

What is the motivation for this PR?
In PR (#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio.

How did you do it?
Change the key in single-asic scenerio from None to asic0.
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jul 10, 2023
…ic. (sonic-net#8884)

What is the motivation for this PR?
In PR (sonic-net#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio.

How did you do it?
Change the key in single-asic scenerio from None to asic0.
mssonicbld pushed a commit that referenced this pull request Jul 10, 2023
…ic. (#8884)

What is the motivation for this PR?
In PR (#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio.

How did you do it?
Change the key in single-asic scenerio from None to asic0.
bingwang-ms pushed a commit to bingwang-ms/sonic-mgmt that referenced this pull request Jul 27, 2023
…ic-mgmt into internal-202205

Fix merge conflicts.

- Fix verify_no_packet_any call in fib_test (sonic-net#6461)
- Fix the test case test_TSA failure when check the routes on the eos host (sonic-net#6483)
- Use conditional mark to skip testcase instead of required_mocked_dualtor (sonic-net#6766)
- [tagged_arp] fix issue 'fixture ports_list not found' (sonic-net#6773)
- [QoS] fixes after moving to python3 (sonic-net#6786)
- update parse funciton for image url (sonic-net#6848)
- Fix typo in get_queue_counter (sonic-net#6852)
- Revert "Fix loganalyzer.py UnicodeDecodeError (sonic-net#6524)" (sonic-net#6858)
- Enhancing core_dump_and_config_check to be multi-asic aware (sonic-net#6527)
- Adding support for calculating balancing in multi-lc/multi-asic case (Test_fib.py) (sonic-net#6391)
- Support different RC in case of pre or post sanity check failed (sonic-net#6860)
- Update getbuild.py to support pass an empty access_token
- [202205] Fixing auto_techsupport (sonic-net#6882)
- Merge branch 'azure-202205' into dev/yaqiangzhu/202205_manually_merge
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Jan 25, 2024
…ic. (sonic-net#8884)

What is the motivation for this PR?
In PR (sonic-net#6527), it enhanced function core_dump_and_config_check to be multi-asic aware. But in single-asic scenerio, it simply set the key None, which does not make scene. In this PR, I reset the key "asic0" in single-asic scenerio to keep consistent with the key value of multi-asic scenerio.

How did you do it?
Change the key in single-asic scenerio from None to asic0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants