Skip to content

Add a check for ensuring mirror session ACLs are programmed to ASIC#3333

Merged
vaibhavhd merged 5 commits intosonic-net:masterfrom
ryanzhu706:master
Jun 3, 2024
Merged

Add a check for ensuring mirror session ACLs are programmed to ASIC#3333
vaibhavhd merged 5 commits intosonic-net:masterfrom
ryanzhu706:master

Conversation

@ryanzhu706
Copy link
Copy Markdown
Contributor

@ryanzhu706 ryanzhu706 commented May 21, 2024

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested:

### arp/test_wr_arp.py result:
=========================================================================== warnings summary ===========================================================================
../../../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236
  /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
    "class": algorithms.Blowfish,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------------------------------------------- generated xml file: /var/src/sonic-mgmt-int/tests/logs/arp/test_wr_arp.xml ----------------------------------------------
------------------------------------------------------------------------ live log sessionfinish ------------------------------------------------------------------------
19:08:59 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
=============================================================== 2 passed, 1 warning in 858.02s (0:14:18) ===============================================================
INFO:root:Can not get Allure report URL. Please check logs

### manually warm-reboot result:
Fri May 31 08:38:28 PM UTC 2024 Starting warm-reboot
Fri May 31 08:38:31 PM UTC 2024 Saving counters folder before warmboot...
Fri May 31 08:38:42 PM UTC 2024 Loading kernel without secure boot
Fri May 31 08:38:44 PM UTC 2024 Setting up control plane assistant: 10.1.2.3 ...
Fri May 31 08:38:48 PM UTC 2024 Checking if mirror session ACLs (arp, nd) programmed to ASIC successfully
Fri May 31 08:38:48 PM UTC 2024 Mirror session ACLs (arp, nd) programmed to ASIC successfully

ERROR: There are port channels/peer devices that failed the probe: ['PortChannel102', 'PortChannel101', 'PortChannel103', 'PortChannel104']
Fri May 31 08:38:53 PM UTC 2024 Warning: Retry count feature support unknown for one or more neighbor devices; assuming that it's not available
Fri May 31 08:38:53 PM UTC 2024 Pausing orchagent ...

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented May 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@vaibhavhd vaibhavhd marked this pull request as ready for review May 30, 2024 00:00
@ryanzhu706 ryanzhu706 requested review from saiarcot895 and yxieca May 30, 2024 18:06
vaibhavhd
vaibhavhd previously approved these changes May 31, 2024
Copy link
Copy Markdown
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the change and update the description before PR merge

@saiarcot895
Copy link
Copy Markdown
Contributor

You'll also need to fix up the email addresses in the commits (or squash the commits) for EasyCLA to pass.

@vaibhavhd vaibhavhd merged commit 676ebe4 into sonic-net:master Jun 3, 2024
arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…onic-net#3333)

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested on physical testbed running 202311 and master
@ayurkiv-nvda
Copy link
Copy Markdown
Contributor

ayurkiv-nvda commented Sep 17, 2024

Hello @vaibhavhd
We found that in while loop we will fail after 1 iteration if ACL entries were not found in redis
I suggest continuing retry cycle.
Please take a look at suggested solution:

#3548

Thanks

nmoray pushed a commit to nmoray/sonic-utilities that referenced this pull request Jun 25, 2025
…onic-net#3333)

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested on physical testbed running 202311 and master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants