Skip to content

[mux]: Implement rollback for failed mux switchovers#2714

Merged
theasianpianist merged 15 commits intosonic-net:masterfrom
theasianpianist:mux-rollback
May 2, 2023
Merged

[mux]: Implement rollback for failed mux switchovers#2714
theasianpianist merged 15 commits intosonic-net:masterfrom
theasianpianist:mux-rollback

Conversation

@theasianpianist
Copy link
Copy Markdown
Contributor

@theasianpianist theasianpianist commented Mar 24, 2023

Depends on sonic-net/sonic-sairedis#1224 being merged first

What I did

  • Make all SAI API operations needed for switchover idempotent
  • Implement rollback when a switchover fails

Why I did it

How I verified it

  • Run new mux_rollback mock_tests.

Details if related

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Make all switchover-related SAI operations idempotent (create and remove for ACL, next hop, neighbor, and route)
- Implement rollback when a switchover fails

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
// Check at the start of the function means we will never reach here
SWSS_LOG_ERROR("[%s] Rollback to %s not supported", mux_name_.c_str(),
muxStateValToString.at(prev_state_).c_str());
return;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leave st_chg_in_progress_ as true, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we will never get here since the check on the first line of this function will return early for FAILED and PENDING states

void bindAllPorts(AclTable &acl_table);

// class shared dict: ACL table name -> ACL table
static std::map<std::string, AclTable> acl_table_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed? i think its changing the overall logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on my understanding, this was used to cache if the ACL table had already been created in hardware. I changed the behavior to delegate this check to AclOrch instead, since the static scope of this map was preventing the mock_tests from passing. The MuxAclHandler now always asks AclOrch to check if the table already exists, instead of storing that info in MuxAclHandler.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@theasianpianist theasianpianist requested a review from prsunny March 31, 2023 19:32
@prsunny prsunny requested a review from Ndancejic April 3, 2023 20:15
status = sai_acl_api->create_acl_entry(&m_ruleOid, gSwitchId, (uint32_t)rule_attrs.size(), rule_attrs.data());
if (status != SAI_STATUS_SUCCESS)
{
if (status == SAI_STATUS_ITEM_ALREADY_EXISTS)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theasianpianist, is this change required for mux idempotency/rollback? @bingwang-ms , can you please review?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ACL entry already exists, the SAI API will return this status, when we rollback we might hit this scenario so want to continue normally if the entry does already exist.

{
/* When next hop is not found, we continue to remove neighbor entry. */
if (status == SAI_STATUS_ITEM_NOT_FOUND)
if (status == SAI_STATUS_ITEM_NOT_FOUND || status == SAI_STATUS_INVALID_PARAMETER)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be changing existing flow. Are we getting SAI_STATUS_INVALID_PARAMETER here for non-existent entries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto status = sai_acl_api->remove_acl_entry(m_ruleOid);
if (status != SAI_STATUS_SUCCESS)
{
if (status == SAI_STATUS_ITEM_NOT_FOUND || status == SAI_STATUS_INVALID_PARAMETER)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the sairedis merged and revisit this section

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@prsunny
Copy link
Copy Markdown
Collaborator

prsunny commented Apr 24, 2023

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny
Copy link
Copy Markdown
Collaborator

prsunny commented Apr 26, 2023

Need to fix build failure

@prabhataravind prabhataravind requested a review from prsunny April 29, 2023 04:03
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Copy link
Copy Markdown
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@theasianpianist theasianpianist merged commit bc4062b into sonic-net:master May 2, 2023
@yxieca
Copy link
Copy Markdown
Contributor

yxieca commented May 4, 2023

@theasianpianist can you create separate PR for 202205 branch?

theasianpianist added a commit to theasianpianist/sonic-swss that referenced this pull request May 4, 2023
- Make all SAI API operations needed for switchover idempotent
- Implement rollback when a switchover fails

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@theasianpianist
Copy link
Copy Markdown
Contributor Author

@theasianpianist can you create separate PR for 202205 branch?

PR here: #2761

yxieca pushed a commit that referenced this pull request May 11, 2023
- Make all SAI API operations needed for switchover idempotent
- Implement rollback when a switchover fails

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
yxieca pushed a commit that referenced this pull request Aug 29, 2023
- Make all SAI API operations needed for switchover idempotent
- Implement rollback when a switchover fails

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
- Make all SAI API operations needed for switchover idempotent
- Implement rollback when a switchover fails

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants