Skip to content

[libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts#45

Closed
stepanblyschak wants to merge 4 commits intomasterfrom
fix-teamd-warm-reboot-bug
Closed

[libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts#45
stepanblyschak wants to merge 4 commits intomasterfrom
fix-teamd-warm-reboot-bug

Conversation

@stepanblyschak
Copy link
Owner

@stepanblyschak stepanblyschak commented Jul 16, 2021

with state of tdport from previous warm-reboot.

In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.

Example of PortChannel0002 dump with single member Ethernet24 file when
this issue is reproduced:

admin@sonic:~$ sudo cat /host/warmboot/teamd/PortChannel0002
1
4
Ethernet24
0
Ethernet24
1
Ethernet24
1
Ethernet24
1

Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.

Signed-off-by: Stepan Blyschak stepanb@nvidia.com

Why I did it

To fix an issue seen in warm-reboot-sad test cases.

How I did it

I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.

How to verify it

Run warm-reboot-sad test on t0-56 topology.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

with state of tdport from previous warm-reboot.

In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.

Example of PortChannel0002 dump with single member Ethernet24 file when
this issue is reproduced:

```
admin@sonic:~$ sudo cat /host/warmboot/teamd/PortChannel0002
0
4
Ethernet24
0
Ethernet24
1
Ethernet24
1
Ethernet24
1
```

Fix this issue by searching for existing tdport in lacp->wr.state and set
enabled flag in tdport or append in case tdport is not found.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak stepanblyschak deleted the fix-teamd-warm-reboot-bug branch September 23, 2022 13:33
stepanblyschak pushed a commit that referenced this pull request Oct 20, 2022
#### Why I did it

Submodule update for sonic-dbsyncd with following change:
```
0d67faf 2022-07-28 | Replace pyswsssdk with sonic-py-common (#45) [Hua Liu]
265c833 2022-01-11 | Updated the Azure pipeline for Code Coverage (#44) [abdosi]
6548116 2021-04-04 | [ci]: add proper azp [Guohan Lu]
43b9dab 2021-04-04 | [pytest]: add pytest.ini [Guohan Lu]
```

#### How I did it

#### How to verify it

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106

#### Description for the changelog
Submodule update for sonic-dbsyncd with following change:
```
0d67faf 2022-07-28 | Replace pyswsssdk with sonic-py-common (#45) [Hua Liu]
265c833 2022-01-11 | Updated the Azure pipeline for Code Coverage (#44) [abdosi]
6548116 2021-04-04 | [ci]: add proper azp [Guohan Lu]
43b9dab 2021-04-04 | [pytest]: add pytest.ini [Guohan Lu]
```

#### A picture of a cute animal (not mandatory but encouraged)

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
stepanblyschak pushed a commit that referenced this pull request Oct 20, 2022
… URL support "not to use cac (sonic-net#12394)

he" (#45)
* 4f45e3a Update gnmi_cli (#5) (#44)
stepanblyschak pushed a commit that referenced this pull request Dec 21, 2023
5ae186f Yaqiang Zhu Tue Dec 19 12:05:15 2023 -0500 [counter] Clear counter table when init (#45)
stepanblyschak pushed a commit that referenced this pull request Jan 4, 2024
…ly (sonic-net#17572)

#### Why I did it
src/dhcprelay
```
* 5ae186f - (HEAD -> master, origin/master, origin/HEAD) [counter] Clear counter table when init (#45) (10 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request Jul 2, 2024
sonic-dhcp-relay
5ae186f Yaqiang Zhu Tue Dec 19 12:05:15 2023 -0500 [counter] Clear counter table when init (#45)
40c6877 Jing Zhang Fri Nov 10 12:41:23 2023 -0800 [CodeQL] fix unmet dependency for build-swss-common (#44)

sonic-dhcpmon
7c55e50 StormLiangMS Thu Sep 14 09:57:06 2023 +0800 Merge pull request #13 from jcaiMR/dev/jcai_master_interface_counter
085a087 jcaiMR Mon Sep 11 09:17:03 2023 +0000 refine counting logic
stepanblyschak pushed a commit that referenced this pull request May 20, 2025
… automatically (sonic-net#760)

#### Why I did it
src/sonic-platform-common
```
* 047e12b - (HEAD -> 202412, origin/202412) [code sync] Merge code from sonic-net/sonic-platform-common:202411 to 202412 (#45) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request May 20, 2025
…D automatically (sonic-net#1016)

#### Why I did it
src/sonic-sairedis
```
* 86d1413 - (HEAD -> 202412, origin/HEAD, origin/202412) Merge pull request #45 from r12f/code-sync-202412 (31 minutes ago) [Riff]
* 0fcc968 - Merge remote-tracking branch 'base/202411' into code-sync-202412 (13 hours ago) [r12f]
* 4048483 - Revert "Optimize counter polling interval by making it more accurate (sonic-net#1457) …" (sonic-net#1570) (2 weeks ago) [Kumaresh Perumal]
* 420d92f - Update build_and_install_module.sh to match newer Linux kernel version (sonic-net#1561) (4 weeks ago) [mssonicbld]
* e2d2ca6 - [vslib] SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED, SAI_PORT_ATTR_HOST_TX_READY_STATUS support (sonic-net#1553) (5 weeks ago) [mssonicbld]
* 8c17d4b - Revert "Do not enter vendor SAI critical section for counter polling/clearing operations (sonic-net#1450)" (sonic-net#1541) (7 weeks ago) [mssonicbld]
* 3df03e1 - Optimize counter polling interval by making it more accurate (sonic-net#1457) (sonic-net#1534) (7 weeks ago) [Stephen Sun]
* d884ff9 - [syncd] Move logSet logGet under mutex to prevent race condition (sonic-net#1520) (sonic-net#1538) (8 weeks ago) [Kamil Cudnik]
* ec8b3c3 - Fix pipeline errors related to rsyslogd and libswsscommon installation (sonic-net#1535) (8 weeks ago) [mssonicbld]
* 6b263b8 - [FC] Support Policer Counter (sonic-net#1533) (8 weeks ago) [mssonicbld]
* e53489e - [syncd] Update log level for bulk api (sonic-net#1532) (8 weeks ago) [Jianyue Wu]
* 7ae00e5 - Define bulk chunk size and bulk chunk size per counter ID (sonic-net#1528) (9 weeks ago) [mssonicbld]
* f35e743 - [nvidia] Skip SAI discovery on ports (sonic-net#1524) (2 months ago) [mssonicbld]
* bf049ed - Use sonictest pool instead of sonic-common and fix arm64 issue. (sonic-net#1516) (2 months ago) [mssonicbld]
* ffe371d - [syncd] Support bulk set in INIT_VIEW mode (sonic-net#1517) (2 months ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request May 20, 2025
…D automatically (sonic-net#1025)

#### Why I did it
src/sonic-sairedis
```
* 08c1e34 - (HEAD -> 202412, origin/HEAD, origin/202412) Merge pull request #46 from r12f/user/riffjiang/fix-merge (31 minutes ago) [Riff]
* a49942e - Revert "Merge pull request #45 from r12f/code-sync-202412" (38 minutes ago) [r12f]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request Sep 24, 2025
…sonic-net#23654)

#### Why I did it
src/dhcpmon
```
* 1cb6ced - (HEAD -> master, origin/master, origin/HEAD) Update clear_counter_timeout to fix clear counter issue (#45) (19 hours ago) [Yaqiang Zhu]
* 848304e - [build] Update to use libyang3 (#46) (4 days ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants