Skip to content

[Scale|CRM] Optimize test runtime with adaptive polling#22132

Merged
yxieca merged 1 commit intosonic-net:masterfrom
AntonHryshchuk:crm_runtime_adapt
Feb 7, 2026
Merged

[Scale|CRM] Optimize test runtime with adaptive polling#22132
yxieca merged 1 commit intosonic-net:masterfrom
AntonHryshchuk:crm_runtime_adapt

Conversation

@AntonHryshchuk
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Replace fixed sleeps with polling and reduce wait times:

  • Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
  • Replace 50s resource waits with adaptive polling
  • Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
  • Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

Runtime improvement

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@AntonHryshchuk
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@AharonMalkin AharonMalkin requested a review from r12f January 29, 2026 15:40
@r12f r12f added the Request for 202511 branch Request to backport a change to 202511 branch label Jan 29, 2026
@yutongzhang-microsoft
Copy link
Copy Markdown
Contributor

Hi, @AntonHryshchuk , Did you test this change on physical testbeds?

@yxieca yxieca merged commit 19e4eb7 into sonic-net:master Feb 7, 2026
19 checks passed
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 12, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #22384

mssonicbld pushed a commit that referenced this pull request Feb 12, 2026
Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
arista-setu added a commit to arista-setu/sonic-mgmt that referenced this pull request Mar 12, 2026
Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 12, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: Ravali Yeluri (WIPRO LIMITED) <v-ryeluri@microsoft.com>
arista-setu added a commit to arista-setu/sonic-mgmt that referenced this pull request Mar 16, 2026
Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
arlakshm pushed a commit that referenced this pull request Mar 17, 2026
Issue:
PR #22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 17, 2026
…-net#23004)

Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
mssonicbld pushed a commit that referenced this pull request Mar 17, 2026
Issue:
PR #22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Signed-off-by: Abhishek <abhishek@nexthop.ai>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
…-net#23004)

Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
Signed-off-by: Abhishek <abhishek@nexthop.ai>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 19, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
vrajeshe pushed a commit to vrajeshe/sonic-mgmt that referenced this pull request Mar 23, 2026
…-net#23004)

Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
Signed-off-by: Venkata Gouri Rajesh Etla <vrajeshe@cisco.com>
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 27, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 27, 2026
…-net#23004)

Issue:
PR sonic-net#22132 introduced a polling check after route deletion with an incorrect threshold: `crm_stats_route_used - total_routes`. This subtracts `total_routes` added from the initial baseline `crm_stats_route_used`, but the test only deletes routes it previously added, so the counter should return to the baseline — not below it. The bug is masked when total_routes = 1 (most platforms) but fails on broadcom-dnx devices where total_routes = 64.

Fix:
Change the threshold to `crm_stats_route_used + CRM_COUNTER_TOLERANCE` so the polling correctly waits for the counter to return to approximately the initial baseline value, consistent with the final assertion.

Signed-off-by: setu <setu@arista.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
)

Summary:
Replace fixed sleeps with polling and reduce wait times:

Add polling helpers: wait_for_crm_counter_update(), wait_for_resource_stabilization()
Replace 50s resource waits with adaptive polling
Reduce config waits from 10s to 5s (CONFIG_UPDATE_TIME)
Reduce cleanup wait from 50s to 20s (SONIC_RES_CLEANUP_UPDATE_TIME)

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants