Skip to content

AMD-Pensando HA Proposal document#206

Merged
SanjayTh merged 1 commit intosonic-net:mainfrom
SanjayTh:ha_design
Aug 31, 2022
Merged

AMD-Pensando HA Proposal document#206
SanjayTh merged 1 commit intosonic-net:mainfrom
SanjayTh:ha_design

Conversation

@SanjayTh
Copy link
Copy Markdown
Collaborator

No description provided.

@ghost
Copy link
Copy Markdown

ghost commented Aug 31, 2022

CLA assistant check
All CLA requirements met.

@SanjayTh SanjayTh merged commit 191bb85 into sonic-net:main Aug 31, 2022
### Interaction between Bulk Sync and Datapath Sync

Due to the scale requirements for DASH the flow table size that needs to be handled during bulk sync can be very large and hence the bulk sync process can take a long time to complete. It is not possible to halt all traffic that would create new flows during this time. Hence the sync mechanism has to handle creation of new flows during bulk sync. It is also possible that there might be changes in the policy that might affect existing flows. The perfect sync mechanism calls for marking different “color”s to flows that are created after the start of bulk sync. The flow table is walked and all flows not the current color are synchronized to the peer. Any flows that are created during the bulk sync phase are inline synchronized via the datapath synchronization path. Other challenges include

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also possible that there might be changes in the policy that might affect existing flows. The perfect sync mechanism calls for marking different “color”s to flows that are created after the start of bulk sync.

can different what are the state (color) can we mark for the flow in hardware. can we get specific state listed.

Copy link
Copy Markdown
Collaborator Author

@SanjayTh SanjayTh Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any given time during bulk sync, there are flows that are in the bulk sync snapshot and flows that are were created after the bulk sync started. These classes of flows can be differentiated by the "color". Apart from this within the flows that are in the bulk sync snapshot, there are flows that have already been processed and flows that are yet to be processed. It is okay to rely on DP sync for the former (for flow updates) but not for the later. The implementation may need to track this in the state and handle. Will add this to the description.


## State Synchronization

State synchronization between the 2 DPUs uses the CNIP IP. All state synchronization happens at the granularity of the DP-VIP and happens from the primary of the DP-VIP towards the secondary. State synchronization happens in 2 stages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the state synchronization happening in stages or in parallel? would be great to clarify this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both are happening in parallel. Will state it here explicitly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened PR209 to address this PR209

sai_process_flow_sync_message_fn process_flow_sync_message;
sai_oper_role_status_fn oper_role_status;
sai_cp_control_message_fn cp_control_message;
} sai_dash_ha_api_t;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have sample code how to call these APIs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants