Skip to content

Fix Issue with Checking for Active ACL Rules by aclshow -a#21947

Merged
bingwang-ms merged 66 commits intosonic-net:masterfrom
AndoniSanguesa:dev/asanguesa-fix-broken-everflow-v6-test
Jan 22, 2026
Merged

Fix Issue with Checking for Active ACL Rules by aclshow -a#21947
bingwang-ms merged 66 commits intosonic-net:masterfrom
AndoniSanguesa:dev/asanguesa-fix-broken-everflow-v6-test

Conversation

@AndoniSanguesa
Copy link
Copy Markdown
Contributor

Description of PR

Some platforms appear to treat everflow ACLs differently from typical dataplane ACLs, in that counters in showacl -a will always appear as N/A before and after traffic even if the rules are correctly formed. The previous version of the IPv6 Everflow test used the results of showacl -a to determine whether ACLs were configured and ready, but in the case of some Arista devices, the N/A behaviour caused the test to fail at that check. This PR simply changes the logic to check that all rules are 'active' in `show acl rule'.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Approach

What is the motivation for this PR?

We saw unexpected test failures after the deployment of the previous test

How did you do it?

Updated the failing check to use a source of truth that was reliable across platforms

How did you verify/test it?

Ran it against the known working platforms (sn4600) and verified it against platforms known to have the N/A issue (Arista 7060x6 and 7050cx3).

Documentation

acl_counters_fix_7060x6.txt
acl_counters_fix_sn2700.txt

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

@AndoniSanguesa AndoniSanguesa force-pushed the dev/asanguesa-fix-broken-everflow-v6-test branch from 44ef981 to 2168b67 Compare January 15, 2026 21:53
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

"""
res = duthost.shell("aclshow -a")['stdout_lines']
if len(res) <= 2 or [line for line in res if 'N/A' in line]:
res = duthost.shell("show acl rule")['stdout_lines']
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI show acl rule returns all ACL rules, meaning it may not get the status of the rule we are trying to check.
Maybe we can read the status of rule from state_db directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verifies that all rules are in active state. If any become inactive due to loading a new ACL rule that should be something we flag.

Even before the check verified that all counters were not N/A. But if we want to alter the workflow to only check the rules we added, we can definitely do that

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense.
But we also need to exclude control plane ACL. For control plane ACL, the status is always N/A.
The suggestion is to only check ACL rule status from one specific table with CLI show acl rule TABLE_NAME

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this check to take the table as an input. It now only checks rules for the table provided

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

bingwang-ms
bingwang-ms previously approved these changes Jan 21, 2026
@bingwang-ms bingwang-ms enabled auto-merge (squash) January 21, 2026 23:33
auto-merge was automatically disabled January 21, 2026 23:35

Head branch was pushed to by a user without write access

@AndoniSanguesa AndoniSanguesa force-pushed the dev/asanguesa-fix-broken-everflow-v6-test branch from c251d4d to 8b51228 Compare January 21, 2026 23:35
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@github-actions github-actions bot requested review from sdszhang and wangxin January 21, 2026 23:35
PriyanshTratiya and others added 7 commits January 21, 2026 15:36
This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Andoni Sanguesa <[email protected]>
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address

memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <[email protected]>
Signed-off-by: Andoni Sanguesa <[email protected]>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms bingwang-ms merged commit d054282 into sonic-net:master Jan 22, 2026
18 checks passed
@bingwang-ms bingwang-ms added Request for msft-202412 Branch Request for 202511 branch Request to backport a change to 202511 branch labels Jan 22, 2026
justin-oliver pushed a commit to justin-oliver/sonic-mgmt that referenced this pull request Jan 26, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
@r12f
Copy link
Copy Markdown
Collaborator

r12f commented Feb 1, 2026

hi @AndoniSanguesa , do you mind to help bring this change to 202412?

ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: Yael Tzur <[email protected]>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 3, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #22226

vmittal-msft pushed a commit that referenced this pull request Feb 3, 2026
…#22226)

* Feature/route programming data (#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR #21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Co-authored-by: Andoni Sanguesa <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Anirudh-nokia pushed a commit to Anirudh-nokia/sonic-mgmt-fork that referenced this pull request Feb 6, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: ayya <[email protected]>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
…et#21947) (sonic-net#22226)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Co-authored-by: Andoni Sanguesa <[email protected]>
Signed-off-by: Lakshmi Yarramaneni <[email protected]>
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: nnelluri-cisco <[email protected]>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: Raghavendran Ramanathan <[email protected]>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: Zhuohui Tan <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Signed-off-by: Abhishek <[email protected]>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
…et#21947)

* Feature/route programming data (sonic-net#21523)

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time.

Signed-off-by: Andoni Sanguesa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.