Skip to content

BGP [T2] - Update neighbor admin down only for specific asic under test#14124

Merged
yejianquan merged 2 commits intosonic-net:masterfrom
sanjair-git:bgp-peer-shut
Aug 29, 2024
Merged

BGP [T2] - Update neighbor admin down only for specific asic under test#14124
yejianquan merged 2 commits intosonic-net:masterfrom
sanjair-git:bgp-peer-shut

Conversation

@sanjair-git
Copy link
Contributor

Description of PR

Summary:
Fixes # (issue)

  • This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
  • Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
  • Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

  • Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.

For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS

How did you do it?

  • While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.

How did you verify/test it?

  • Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

image

@rlhui rlhui requested a review from cyw233 August 21, 2024 17:50
@cyw233
Copy link
Contributor

cyw233 commented Aug 22, 2024

Great catch @sanjair-git! I'm wondering why I didn't see this behavior on Cisco 8000. Maybe it's because different platforms have different mechanisms when updating bgp neighbor status? May I ask what platforms you are using please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the above logging.debug(...) under this if statement as well, please? Maybe add the asic namespace in the this log too, e.g. "update CONFIG_DB admin_status to down on {namespace}". Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cyw233, updated the logging as suggested. Please take a look.

@sanjair-git
Copy link
Contributor Author

Great catch @sanjair-git! I'm wondering why I didn't see this behavior on Cisco 8000. Maybe it's because different platforms have different mechanisms when updating bgp neighbor status? May I ask what platforms you are using please?

Hi @cyw233, Thanks for reviewing the code. We are using 'Nokia-IXR7250E-36x400G' T2 Chassis.

@cyw233
Copy link
Contributor

cyw233 commented Aug 28, 2024

Great catch @sanjair-git! I'm wondering why I didn't see this behavior on Cisco 8000. Maybe it's because different platforms have different mechanisms when updating bgp neighbor status? May I ask what platforms you are using please?

Hi @cyw233, Thanks for reviewing the code. We are using 'Nokia-IXR7250E-36x400G' T2 Chassis.

Hi @sanjair-git, thanks for the info! PR looks good, will approve once the PR checks succeed

Copy link
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan yejianquan merged commit 8f2749f into sonic-net:master Aug 29, 2024
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Aug 29, 2024
…st (sonic-net#14124)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Approach
What is the motivation for this PR?
Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.
For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS
How did you do it?
While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.
How did you verify/test it?
Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
image

co-authorized by: [email protected]
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #14315

mssonicbld pushed a commit that referenced this pull request Sep 2, 2024
…st (#14124)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Approach
What is the motivation for this PR?
Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.
For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS
How did you do it?
While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.
How did you verify/test it?
Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
image

co-authorized by: [email protected]
hdwhdw pushed a commit to hdwhdw/sonic-mgmt that referenced this pull request Sep 20, 2024
…st (sonic-net#14124)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Approach
What is the motivation for this PR?
Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.
For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS
How did you do it?
While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.
How did you verify/test it?
Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
image

co-authorized by: [email protected]
arista-hpandya pushed a commit to arista-hpandya/sonic-mgmt that referenced this pull request Oct 2, 2024
…st (sonic-net#14124)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Approach
What is the motivation for this PR?
Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.
For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS
How did you do it?
While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.
How did you verify/test it?
Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
image

co-authorized by: [email protected]
vikshaw-Nokia pushed a commit to vikshaw-Nokia/sonic-mgmt that referenced this pull request Oct 23, 2024
…st (sonic-net#14124)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issue on 'test_bgp_peer_shutdown.py', when the test runs on T2 multi-asic dut.
Whole test operates on one of the asic but during bgp teardown session; while removing newly added neighbor config, test runs sonic-db-cli command on both asics of the dut, which creates a dummy neighbor entry on the other asic.
Due to this, test fails with the following reason,
                with capture_bgp_packages_to_file(duthost, "any", bgp_pcap, n0.namespace):
                    n0.teardown_session()
                    if not wait_until(
                        WAIT_TIMEOUT,
                        5,
                        20,
                        lambda: is_neighbor_session_down(duthost, n0),
                    ):
>                       pytest.fail("Could not tear down bgp session")
E                       Failed: Could not tear down bgp session

Approach
What is the motivation for this PR?
Test creates a dummy bgp neighbor entry on the other asic, where the test is not operating. Due to this test fails.
For example, dummy entry 20.0.0.1 on the other asic is shown below.

 show ip bgp sum 

IPv4 Unicast Summary (VRF default):
BGP router identifier 8.0.0.2, local AS number 65100 vrf-id 0
BGP table version 4468
RIB entries 3245, using 710 KiB of memory
Peers 6, using 4348 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.0.0.7        4      65200     29995     30093        0    0    0 1d00h57m         1057     1611 ARISTA04T3
10.0.0.11       4      65200     29994     30094        0    0    0 1d00h57m         1058     1611 ARISTA06T3
20.0.0.1        4          0         0         0        0    0    0    never      Connect        0 N/A
3.3.3.0         4      65100     29964     29991        0    0    0 1d00h50m         2057     2119 ASIC0
3.3.3.42        4      65100     29882     29988        0    0    0 1d00h50m          133     2119 ixre-egl-board182-AS
3.3.3.44        4      65100     29912     29990        0    0    0 1d00h50m          134     2119 ixre-egl-board182-AS
How did you do it?
While removing new bgp neighbor config, make sure it is removed only for the specific asic under test.
How did you verify/test it?
Ran all the above-mentioned test case on a T2 multi-asic chassis and made sure test passed with expected behavior.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
image

co-authorized by: [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants