[chassis][syncd][sai] Adjusting response timeout during syncd init#2159
Merged
prsunny merged 1 commit intosonic-net:masterfrom Mar 1, 2022
Merged
[chassis][syncd][sai] Adjusting response timeout during syncd init#2159prsunny merged 1 commit intosonic-net:masterfrom
prsunny merged 1 commit intosonic-net:masterfrom
Conversation
In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. The number of ports need to be initialized by SAI depends on number of ports per asic and total number of system ports configured in the system. The total number of system ports in the system in turn depends on number of line cards supported, number of asics per line card and number of ports supported by each asic. Therefore in a fully populated system, which is an often expected scenario, this crashing will happen. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Contributor
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
/azp run |
1 similar comment
Contributor
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
judyjoseph
approved these changes
Mar 1, 2022
prsunny
approved these changes
Mar 1, 2022
judyjoseph
pushed a commit
that referenced
this pull request
Mar 1, 2022
…2159) In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
6 tasks
preetham-singh
pushed a commit
to preetham-singh/sonic-swss
that referenced
this pull request
Aug 6, 2022
…onic-net#2159) In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Janetxxx
pushed a commit
to Janetxxx/sonic-swss
that referenced
this pull request
Nov 10, 2025
…onic-net#2159) In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Fix for syncd response time out for switch create request from orchagent
Why I did it
In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes.
The number of ports need to be initialized by SAI depends on number of ports per asic and total number of system ports configured in the system. The total number of system ports in the system in turn depends on number of line cards supported, number of asics per line card and number of ports supported by each asic. Therefore in a fully populated system, which is an often expected scenario, this crashing will happen.
To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create.
How I verified it
Details if related