[teamd] prevent re-entrance of port priv change handler#2723
Merged
yxieca merged 1 commit intosonic-net:masterfrom Apr 1, 2019
Merged
[teamd] prevent re-entrance of port priv change handler#2723yxieca merged 1 commit intosonic-net:masterfrom
yxieca merged 1 commit intosonic-net:masterfrom
Conversation
When adding a lag member dynamically after system boots up, teamd port priv change handler could re-entrant itself and causing adding operation to fail. While handling PORT_CHANGE event, teamd_per_port.c port priv change handler was called, it will then call runner_lacp to add port to lag, the later causes IFINFO_CHANGE to be notified and calls the priv change handler again, this re-entrance would cause runner_lacp port_added to be called again and messes up with the previous adding sequence. Then fails the lag member adding operation. Prevent per port priv change handler re-entrance solves the problem. Signed-off-by: Ying Xie <[email protected]>
Collaborator
|
I feel that we need to drill down the root cause of that device name missing issue. Patching with all the workaround doesn't seem to be a long term solution. |
Contributor
Author
|
@jipanyang I agree with you. That might take longer time. If we identify the root cause and proper fix. I don't mind revert this change and #2699. Frankly speaking, I thought about this. I think the other way to fix the issue is to delay creating LAG devices after warm reboot. But I don't know a good indication of 'readiness' to do so. |
pavel-shirshov
approved these changes
Apr 1, 2019
yxieca
added a commit
that referenced
this pull request
Apr 1, 2019
When adding a lag member dynamically after system boots up, teamd port priv change handler could re-entrant itself and causing adding operation to fail. While handling PORT_CHANGE event, teamd_per_port.c port priv change handler was called, it will then call runner_lacp to add port to lag, the later causes IFINFO_CHANGE to be notified and calls the priv change handler again, this re-entrance would cause runner_lacp port_added to be called again and messes up with the previous adding sequence. Then fails the lag member adding operation. Prevent per port priv change handler re-entrance solves the problem. Signed-off-by: Ying Xie <[email protected]>
tiantianlv
pushed a commit
to SONIC-DEV/sonic-buildimage
that referenced
this pull request
Apr 10, 2019
When adding a lag member dynamically after system boots up, teamd port priv change handler could re-entrant itself and causing adding operation to fail. While handling PORT_CHANGE event, teamd_per_port.c port priv change handler was called, it will then call runner_lacp to add port to lag, the later causes IFINFO_CHANGE to be notified and calls the priv change handler again, this re-entrance would cause runner_lacp port_added to be called again and messes up with the previous adding sequence. Then fails the lag member adding operation. Prevent per port priv change handler re-entrance solves the problem. Signed-off-by: Ying Xie <[email protected]>
tiantianlv
pushed a commit
to SONIC-DEV/sonic-buildimage
that referenced
this pull request
Apr 10, 2019
When adding a lag member dynamically after system boots up, teamd port priv change handler could re-entrant itself and causing adding operation to fail. While handling PORT_CHANGE event, teamd_per_port.c port priv change handler was called, it will then call runner_lacp to add port to lag, the later causes IFINFO_CHANGE to be notified and calls the priv change handler again, this re-entrance would cause runner_lacp port_added to be called again and messes up with the previous adding sequence. Then fails the lag member adding operation. Prevent per port priv change handler re-entrance solves the problem. Signed-off-by: Ying Xie <[email protected]>
tiantianlv
pushed a commit
to SONIC-DEV/sonic-buildimage
that referenced
this pull request
Apr 10, 2019
When adding a lag member dynamically after system boots up, teamd port priv change handler could re-entrant itself and causing adding operation to fail. While handling PORT_CHANGE event, teamd_per_port.c port priv change handler was called, it will then call runner_lacp to add port to lag, the later causes IFINFO_CHANGE to be notified and calls the priv change handler again, this re-entrance would cause runner_lacp port_added to be called again and messes up with the previous adding sequence. Then fails the lag member adding operation. Prevent per port priv change handler re-entrance solves the problem. Signed-off-by: Ying Xie <[email protected]>
yxieca
pushed a commit
that referenced
this pull request
Apr 20, 2023
…lly (#14750) src/sonic-swss * 96407be - (HEAD -> 202205, origin/202205) sonic-swss: Sync and program remote lag member status. (#2730) (11 hours ago) [Sambath Kumar Balasubramanian] * 6051d33 - Fix race condition in sub-interface handling (#2723) (11 hours ago) [Stephen Sun] * 7b95ec5 - [muxorch] handling multiple mux nexthops for route (#2656) (11 hours ago) [Nikola Dancejic]
mihirpat1
pushed a commit
to mihirpat1/sonic-buildimage
that referenced
this pull request
Jun 14, 2023
* Fix race condition in sub-interface handling A sub-interface's becomes admin up in the following scenarios: 1. A port becomes admin up in STATE_DB.LAG_TABLE when there are sub-interfaces created based on it 2. A sub-interfaces is created when the parent port's admin_status is up in APPL_DB.LAG_TABLE However, it's possible that - the sub-interfaces have not been created while STATE_DB.LAG_TABLE is updated - the LAGs, as the parent ports, are still admin-down in APPL_DB.LAG_TABLE when sub-interfaces are created. In this scenario, sub-interfaces will keep admin-down for ever. Fix: In (2), check LAG's admin status in STATE_DB instead of APPL_DB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
This change addresses an issue introduced by #2699
When adding a lag member dynamically after system boots up, teamd
port priv change handler could re-entrant itself and causing adding
operation to fail.
While handling PORT_CHANGE event, teamd_per_port.c port priv change
handler was called, it will then call runner_lacp to add port to lag,
the later causes IFINFO_CHANGE to be notified and calls the priv change
handler again, this re-entrance would cause runner_lacp port_added to
be called again and messes up with the previous adding sequence. Then
fails the lag member adding operation.
Prevent per port priv change handler re-entrance solves the problem.
Signed-off-by: Ying Xie [email protected]
- How to verify it
Passed incremental config nightly test over the weekend.
Manually tested add a member port to a lag.
897 iterations of continuous warm reboot: 338 with [teamd] retry creating team_port after interface info changed #2699 only. And 559 with [teamd] retry creating team_port after interface info changed #2699 plus this change (test still running):
yinxi@acs-trusty8:/warmboot/cont_wb$ grep Rebooting wb-test-20190328-0631.log | grep 20190329 | wc -l
559
yinxi@acs-trusty8:/warmboot/cont_wb$ grep Rebooting wb-test-20190328-0631.log | grep -v 20190329 | wc -l
338