[202012] [teammgrd]: Improve LAGs cleanup on shutdown#1831
[202012] [teammgrd]: Improve LAGs cleanup on shutdown#1831nazariig wants to merge 1 commit intosonic-net:202012from
Conversation
|
@nazariig can you please add the code change and the approach used in the PR description under "What I did". I beleive that what is currently written is the 'Why I did' |
| { | ||
| std::stringstream cmd; | ||
| cmd << "cat " << shellquote("/var/run/teamd/" + alias + ".pid"); | ||
| EXEC_WITH_ERROR_THROW(cmd.str(), res); |
There was a problem hiding this comment.
Although the performance is sufficient, we can probably gain more by replacing this part with FILE I/O API (in scope of 1024 LAGs)
| { | ||
| std::stringstream cmd; | ||
| cmd << "kill -TERM " << pid; | ||
| EXEC_WITH_ERROR_THROW(cmd.str(), res); |
There was a problem hiding this comment.
Although the performance is sufficient, we can probably gain more by replacing this part with signals API (in scope of 1024 LAGs)
The original bug which that PR was trying to solve is related to PID alive status check. |
@liat-grozovik done |
| EXEC_WITH_ERROR_THROW(cmd.str(), res); | ||
|
|
||
| pid = static_cast<pid_t>(std::stoul(res, nullptr, 10)); | ||
| aliasPidMap[alias] = pid; |
There was a problem hiding this comment.
PIDs can be updated during LAG SET/REMOVE operations to gain more performance on shutdown (in scope of 1024 LAGs)
|
|
||
| SWSS_LOG_NOTICE("Waiting for port channel %s to stop...", alias.c_str()); | ||
|
|
||
| cmd << "tail -f --pid=" << pid << " /dev/null"; |
There was a problem hiding this comment.
Although the performance is sufficient, we can probably gain more by replacing this part with signals API (in scope of 1024 LAGs)
|
@judyjoseph can you please review the suggested fix? i understand it also take care of the concern you raised with previous fix done which was reverted. As this is a blocker issue for 202012 appreciate if you can review and provide your feedback. |
…o PID. Signed-off-by: Nazarii Hnydyn <[email protected]>
be59c5e to
a6c55da
Compare
|
@nazariig, please don't close and reopen PRs. Until /azpw is available, someone can help retrigger. |
|
@judyjoseph please have a look |
|
how is this related to : |
|
Should we have a PR on master branch first and cherry-pick to 202012? |
@qiluo-msft done: #1841 |
@abdosi because the current implementation uses bool TeamMgr::removeLag(const string &alias)
{
SWSS_LOG_ENTER();
stringstream cmd;
string res;
cmd << TEAMD_CMD << " -k -t " << shellquote(alias);
EXEC_WITH_ERROR_THROW(cmd.str(), res);
SWSS_LOG_NOTICE("Stop port channel %s", alias.c_str());
return true;
}and there is a degradation since |
|
Close and in favor of #1841 |
sonic-net#1831) #### What I did Fixing issue sonic-net#1830 #### How I did it Problem we consume the function `loadData` from sonic-yang-mgmt pkg that always crops tables without YANG models from configdb json object. It does it as a side-effect and is not an expected outcome of the function. The fix here is to crop the current/target tables before doing any sorting, this way gurantee we avoid this bug. I think the better soln is to fix sonic-yang-mgmt pkg. Will look into this more next week. #### How to verify it Run the command in the issue, it will result in the expected outcome #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
Signed-off-by: Nazarii Hnydyn [email protected]
This PR is intended to fix LAGs cleanup degradation caused by python2.7 -> python3 migration.
The approach is to replace
teamd -k -tcall with the rawSIGTERMand add PID alive check.This will make sure the
teammgrdis stopped only after all managed processes are being killed.resolves: sonic-net/sonic-buildimage#8071
What I did
teamd -k -tcall with rawSIGTERMWhy I did it
How I verified it
Details if related