Skip to content

Fix test issue for test_ipv6_bgp_scale.py#22676

Merged
roy-sror merged 1 commit intosonic-net:masterfrom
weiguo-nvidia:fix_bgp_test_issue
Mar 18, 2026
Merged

Fix test issue for test_ipv6_bgp_scale.py#22676
roy-sror merged 1 commit intosonic-net:masterfrom
weiguo-nvidia:fix_bgp_test_issue

Conversation

@weiguo-nvidia
Copy link
Copy Markdown
Contributor

Summary: Fix test issue for test_ipv6_bgp_scale.py
Fixes # (issue)

  1. Fix test issue in calculate_downtime() function. Please see the detail info as below

  1. Print actual downtime in the logs
  • Before change, when case fail, we do not know the actual downtime value, it is inconvenient
Failed: Dataplane downtime is too high, threshold is 2.0 seconds
  • After change, we can see the actual downtime value in the error message
Failed: Dataplane downtime is too high: actual 5.5822 seconds, threshold is 2.0 seconds

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

The calculate_downtime() function in test_ipv6_bgp_scale.py uses [:-1] on the mask_rx_cnt dict values to exclude the backplane port from the RX total:

rx_total = sum(list(ptf_dp.mask_rx_cnt[masked_exp_pkt].values())[:-1])  # Exclude the backplane

This is incorrect because [:-1] removes the last entry by dict insertion order, not the backplane port. The dict insertion order depends on which port first receives a matching packet, which is non-deterministic and varies between runs.

In topologies without a backplane (e.g., t1-isolated-*), mask_rx_cnt contains only legitimate data port entries and no backplane entry at all. The [:-1] unconditionally removes a valid egress port's counter, causing false packet loss to be reported.

For example, in a test_bgp_admin_flap[1] failure:

  • mask_rx_cnt had exactly 32 entries, all corresponding to BGP neighbor ports (no backplane)
  • [:-1] excluded port (0, 176) (Ethernet176 / ARISTA113T1) which received 1019 packets
  • This caused missing_pkt_cnt = 1019 and downtime = 0.4185s, exceeding the 0.2s threshold
  • In reality, all 31,990 sent packets were received (TX total == full RX sum), meaning zero actual packet loss

The bug also makes the test flaky — depending on which port happens to be last in the dict, the "missing" count changes, causing random pass/fail results.

How did you do it?

Replaced the fragile [:-1] approach with explicit backplane port identification using ptf.config['port_map'].

When PTF initializes, backplane ports are registered in ptf.config['port_map'] with interface name "backplane" (see ptfadapter/__init__.py get_ifaces_map()). The new helper _get_backplane_ports() queries this config to find backplane port keys, and calculate_downtime() excludes only those specific ports from the RX sum.

  • In topologies with backplane (e.g., ciscovs-7nodes): backplane ports are correctly identified and excluded
  • In topologies without backplane: _get_backplane_ports() returns an empty set, no ports are excluded

How did you verify/test it?

Run regression test, pass

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@nhe-NV nhe-NV added the Request for 202511 branch Request to backport a change to 202511 branch label Mar 1, 2026
PriyanshTratiya
PriyanshTratiya previously approved these changes Mar 2, 2026
Copy link
Copy Markdown
Contributor

@PriyanshTratiya PriyanshTratiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks Wei!

@weiguo-nvidia
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

roy-sror
roy-sror previously approved these changes Mar 8, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Change-Id: I776836c6d5806c8af5f6f50ce2401a01d1f8539a
Signed-off-by: Wei Guo <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

@roy-sror roy-sror merged commit 332b276 into sonic-net:master Mar 18, 2026
17 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 18, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #23070

duthost.facts['router_mac'],
pdp.get_mac(pdp.port_to_device(injection_port), injection_port),
icmp_type
global_icmp_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @weiguo-nvidia , any particular reason why we use the global_icmp_type here? It does not seem to change as the test progresses, whereas the icmp_type variable uses a cycle within a safe range and will distinguish packets across tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a conflict resolution/rebase regression, Will update this if no particular reason to use global_icmp_type

@weiguo-nvidia weiguo-nvidia deleted the fix_bgp_test_issue branch March 19, 2026 03:55
vrajeshe pushed a commit to vrajeshe/sonic-mgmt that referenced this pull request Mar 23, 2026
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants