Fix Issue with Checking for Active ACL Rules by `aclshow -a` by AndoniSanguesa · Pull Request #21947 · sonic-net/sonic-mgmt

AndoniSanguesa · 2026-01-15T21:50:57Z

Description of PR

Some platforms appear to treat everflow ACLs differently from typical dataplane ACLs, in that counters in showacl -a will always appear as N/A before and after traffic even if the rules are correctly formed. The previous version of the IPv6 Everflow test used the results of showacl -a to determine whether ACLs were configured and ready, but in the case of some Arista devices, the N/A behaviour caused the test to fail at that check. This PR simply changes the logic to check that all rules are 'active' in `show acl rule'.

Type of change

Approach

What is the motivation for this PR?

We saw unexpected test failures after the deployment of the previous test

How did you do it?

Updated the failing check to use a source of truth that was reliable across platforms

How did you verify/test it?

Ran it against the known working platforms (sn4600) and verified it against platforms known to have the N/A issue (Arista 7060x6 and 7050cx3).

Documentation

acl_counters_fix_7060x6.txt
acl_counters_fix_sn2700.txt

mssonicbld · 2026-01-15T21:51:05Z

/azp run

azure-pipelines · 2026-01-15T21:51:19Z

Azure Pipelines failed to run 1 pipeline(s).

mssonicbld · 2026-01-15T21:53:56Z

/azp run

azure-pipelines · 2026-01-15T21:54:01Z

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

mssonicbld · 2026-01-16T21:50:01Z

/azp run

azure-pipelines · 2026-01-16T21:50:14Z

Azure Pipelines successfully started running 1 pipeline(s).

bingwang-ms · 2026-01-16T22:03:46Z

tests/everflow/everflow_test_utilities.py

        """
-        res = duthost.shell("aclshow -a")['stdout_lines']
-        if len(res) <= 2 or [line for line in res if 'N/A' in line]:
+        res = duthost.shell("show acl rule")['stdout_lines']


The CLI show acl rule returns all ACL rules, meaning it may not get the status of the rule we are trying to check.
Maybe we can read the status of rule from state_db directly?

This verifies that all rules are in active state. If any become inactive due to loading a new ACL rule that should be something we flag.

Even before the check verified that all counters were not N/A. But if we want to alter the workflow to only check the rules we added, we can definitely do that

I see. That makes sense.
But we also need to exclude control plane ACL. For control plane ACL, the status is always N/A.
The suggestion is to only check ACL rule status from one specific table with CLI show acl rule TABLE_NAME

updated this check to take the table as an input. It now only checks rules for the table provided

mssonicbld · 2026-01-21T23:24:28Z

/azp run

azure-pipelines · 2026-01-21T23:24:43Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-01-21T23:35:49Z

/azp run

azure-pipelines · 2026-01-21T23:35:56Z

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Priyansh Tratiya <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>

Signed-off-by: Andoni Sanguesa <[email protected]>

Signed-off-by: vikumarks <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>

Summary: Health check sometimes load wrong inventory admin/password Fixes # (issue) 36307349 From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally. This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE 2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False} memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE 2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False} In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen memory id 139728659566400 host XXXX <---- SAME ID HERE 2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False} memory id 139728659566400 host YYYY <---- SAME ID HERE 2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False} Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password}) If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables. Therefore it leads to using wrong username and password. Signed-off-by: Austin Pham <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>

azure-pipelines · 2026-01-21T23:36:46Z

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

mssonicbld · 2026-01-21T23:37:22Z

/azp run

azure-pipelines · 2026-01-21T23:37:37Z

Azure Pipelines successfully started running 1 pipeline(s).

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]>

r12f · 2026-02-01T20:40:37Z

hi @AndoniSanguesa , do you mind to help bring this change to 202412?

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: Yael Tzur <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]>

mssonicbld · 2026-02-03T03:13:18Z

Cherry-pick PR to 202511: #22226

…#22226) * Feature/route programming data (#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR #21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Co-authored-by: Andoni Sanguesa <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: ayya <[email protected]>

…et#21947) (sonic-net#22226) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Co-authored-by: Andoni Sanguesa <[email protected]> Signed-off-by: Lakshmi Yarramaneni <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: nnelluri-cisco <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: Raghavendran Ramanathan <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: Zhuohui Tan <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]> Signed-off-by: Abhishek <[email protected]>

…et#21947) * Feature/route programming data (sonic-net#21523) This PR adds test cases and supporting utilities to measure route programming time under high‑scale BGP IPv6 scenarios, building on the refactoring in PR sonic-net#21335. It focuses on quantifying how long it takes for routes to be fully programmed after BGP/connection events (e.g., convergence, admin flaps), and verifying that the measured RP times stay within expected limits and similar to the convergence time. Signed-off-by: Andoni Sanguesa <[email protected]>

github-actions bot requested review from StormLiangMS, abdosi and xwjiang-ms January 15, 2026 21:51

AndoniSanguesa force-pushed the dev/asanguesa-fix-broken-everflow-v6-test branch from 44ef981 to 2168b67 Compare January 15, 2026 21:53

github-actions bot requested review from cyw233 and sanjair-git January 15, 2026 21:54

bingwang-ms reviewed Jan 16, 2026

View reviewed changes

bingwang-ms previously approved these changes Jan 21, 2026

View reviewed changes

bingwang-ms enabled auto-merge (squash) January 21, 2026 23:33

auto-merge was automatically disabled January 21, 2026 23:35
Head branch was pushed to by a user without write access

AndoniSanguesa force-pushed the dev/asanguesa-fix-broken-everflow-v6-test branch from c251d4d to 8b51228 Compare January 21, 2026 23:35

github-actions bot requested review from sdszhang and wangxin January 21, 2026 23:35

PriyanshTratiya and others added 7 commits January 21, 2026 15:36

fixed egress acl being applied where it should not be

da04b76

Signed-off-by: Andoni Sanguesa <[email protected]>

updated how we check for acl active-ness

3d13be2

Signed-off-by: Andoni Sanguesa <[email protected]>

updated method name to match new action

b757e8e

Signed-off-by: Andoni Sanguesa <[email protected]>

update rule checking to use table name

25631b0

Signed-off-by: Andoni Sanguesa <[email protected]>

Data plane Snappi-based Switch Capacity Test (sonic-net#21473)

f7d2d3f

Signed-off-by: vikumarks <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>

Merge branch 'master' into dev/asanguesa-fix-broken-everflow-v6-test

29f4fbd

AndoniSanguesa dismissed bingwang-ms’s stale review via 29f4fbd January 21, 2026 23:37

bingwang-ms approved these changes Jan 22, 2026

View reviewed changes

bingwang-ms merged commit d054282 into sonic-net:master Jan 22, 2026
18 checks passed

bingwang-ms added Request for msft-202412 Branch Request for 202511 branch Request to backport a change to 202511 branch labels Jan 22, 2026

mssonicbld added the Cherry Pick Conflict_msft-202412 label Jan 22, 2026

vmittal-msft added the Approved for 202511 branch label Feb 3, 2026

mssonicbld added the Created PR to 202511 branch label Feb 3, 2026

mssonicbld mentioned this pull request Feb 3, 2026

[action] [PR:21947] Fix Issue with Checking for Active ACL Rules by aclshow -a #22226

Merged

5 tasks

mssonicbld added Included in 202511 branch and removed Created PR to 202511 branch labels Feb 3, 2026

venu-nexthop mentioned this pull request Feb 17, 2026

Validate ACL rules and routes in test_everflow_testbed before traffic tests #22199

Merged

12 tasks

Conversation

AndoniSanguesa commented Jan 15, 2026

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Documentation

Uh oh!

mssonicbld commented Jan 15, 2026

Uh oh!

azure-pipelines bot commented Jan 15, 2026

Uh oh!

mssonicbld commented Jan 15, 2026

Uh oh!

azure-pipelines bot commented Jan 15, 2026

Uh oh!

mssonicbld commented Jan 16, 2026

Uh oh!

azure-pipelines bot commented Jan 16, 2026

Uh oh!

bingwang-ms Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

AndoniSanguesa Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

bingwang-ms Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

AndoniSanguesa Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Jan 21, 2026

Uh oh!

azure-pipelines bot commented Jan 21, 2026

Uh oh!

mssonicbld commented Jan 21, 2026

Uh oh!

azure-pipelines bot commented Jan 21, 2026

Uh oh!

azure-pipelines bot commented Jan 21, 2026

Uh oh!

mssonicbld commented Jan 21, 2026

Uh oh!

azure-pipelines bot commented Jan 21, 2026

Uh oh!

Uh oh!

r12f commented Feb 1, 2026

Uh oh!

mssonicbld commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants