Skip to content

[action] [PR:22676] Fix test issue for test_ipv6_bgp_scale.py#23070

Merged
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/22676
Mar 18, 2026
Merged

[action] [PR:22676] Fix test issue for test_ipv6_bgp_scale.py#23070
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/22676

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Summary: Fix test issue for test_ipv6_bgp_scale.py
Fixes # (issue)

  1. Fix test issue in calculate_downtime() function. Please see the detail info as below

  1. Print actual downtime in the logs
  • Before change, when case fail, we do not know the actual downtime value, it is inconvenient
Failed: Dataplane downtime is too high, threshold is 2.0 seconds
  • After change, we can see the actual downtime value in the error message
Failed: Dataplane downtime is too high: actual 5.5822 seconds, threshold is 2.0 seconds

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

The calculate_downtime() function in test_ipv6_bgp_scale.py uses [:-1] on the mask_rx_cnt dict values to exclude the backplane port from the RX total:

rx_total = sum(list(ptf_dp.mask_rx_cnt[masked_exp_pkt].values())[:-1]) # Exclude the backplane

This is incorrect because [:-1] removes the last entry by dict insertion order, not the backplane port. The dict insertion order depends on which port first receives a matching packet, which is non-deterministic and varies between runs.

In topologies without a backplane (e.g., t1-isolated-*), mask_rx_cnt contains only legitimate data port entries and no backplane entry at all. The [:-1] unconditionally removes a valid egress port's counter, causing false packet loss to be reported.

For example, in a test_bgp_admin_flap[1] failure:

  • mask_rx_cnt had exactly 32 entries, all corresponding to BGP neighbor ports (no backplane)
  • [:-1] excluded port (0, 176) (Ethernet176 / ARISTA113T1) which received 1019 packets
  • This caused missing_pkt_cnt = 1019 and downtime = 0.4185s, exceeding the 0.2s threshold
  • In reality, all 31,990 sent packets were received (TX total == full RX sum), meaning zero actual packet loss

The bug also makes the test flaky — depending on which port happens to be last in the dict, the "missing" count changes, causing random pass/fail results.

How did you do it?

Replaced the fragile [:-1] approach with explicit backplane port identification using ptf.config['port_map'].

When PTF initializes, backplane ports are registered in ptf.config['port_map'] with interface name "backplane" (see ptfadapter/__init__.py get_ifaces_map()). The new helper _get_backplane_ports() queries this config to find backplane port keys, and calculate_downtime() excludes only those specific ports from the RX sum.

  • In topologies with backplane (e.g., ciscovs-7nodes): backplane ports are correctly identified and excluded
  • In topologies without backplane: _get_backplane_ports() returns an empty set, no ports are excluded

How did you verify/test it?

Run regression test, pass

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #22676

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 7e8cd92 into sonic-net:202511 Mar 18, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants