-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Distributed forwarding in a VOQ architecture HLD #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
e607a9f
Distributed forwarding in a VOQ architecture HLD
kartik-arista 8459987
Updates after review meeting dated Jun-18 2020
kartik-arista ddf468a
Second set of edits from review at VOQ working group.
kartik-arista 88a798a
Changes for Rev 2.0 of the document
kartik-arista 90c1289
More edits after community review
kartik-arista fcf7e0c
Moving VOQ arch documents to voq directory instead of chassis as
kartik-arista 2190c9a
Renaming Global DB to chassis db and misc cleanups
kartik-arista e716907
Complete all the renaming of global to chassis
kartik-arista File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| # Distributed Forwarding in a Virtual Output Queue (VOQ) Architecture | ||
|
|
||
| # High Level Design Document | ||
| #### Rev 0.1 | ||
|
|
||
| # Table of Contents | ||
| * [List of Tables](#list-of-tables) | ||
| * [List of Figures](#list-of-figures) | ||
| * [Revision](#revision) | ||
| * [About this Manual](#about-this-manual) | ||
| * [Scope](#scope) | ||
| * [Definitions/Abbreviation](#definitionsabbreviation) | ||
| * [1 Requirements](#1-requirements) | ||
| * [2 Design](#2-design) | ||
| * [3 Testing](#3-testing) | ||
|
|
||
| # List of Tables | ||
| * [Table 1: Abbreviations](#definitionsabbreviation) | ||
|
|
||
| # List of Figures | ||
| * [Figure 1: VoQ Distributed Forwarding Architecture](#41-general-flow) | ||
|
|
||
| ###### Revision | ||
| | Rev | Date | Author | Change Description | | ||
| |:---:|:-----------:|:------------------:|--------------------| | ||
| | 0.1 | May-19 2020 | Kartik Chandran (Arista Networks) | Initial Version | | ||
| | 0.2 | June-22 2020 | Kartik Chandran (Arista Networks) | First set of review comments from public review | | ||
|
|
||
| # About this Manual | ||
|
|
||
| This document provides an overview of the implementation of SONiC support for distributed packet forwarding across a set of devices that have a VOQ (Virtual Output Queue) architecture interconnected by an internal fabric. | ||
|
|
||
| # Scope | ||
|
|
||
| Support for distributed forwarding encompasses the following aspects | ||
| - Physical interfaces and VOQs | ||
| - Logical interfaces such as link aggregation groups (LAGs) | ||
| - The internal interconnection fabric | ||
| - The packet forwarding data plane | ||
| - The control plane, both internal (within the devices in the system) and with external devices. | ||
|
|
||
| This document covers | ||
| - The basic SONiC architecture enhancements to support distributed VOQ based forwarding | ||
| - Representation and management of physical ports across the system. | ||
|
|
||
| The other aspects listed above are expected to be covered in separate self-contained design proposals that build on this architecture. | ||
|
|
||
| The initial target for this work is a VOQ chassis system in which linecards running SONiC are interconnected over a fabric and the overall system is controlled by a supervisor module that also runs SONiC. | ||
|
|
||
| However, this architecture makes no hard assumptions about operating within a chassis and can be extended to other form factors with the same VOQ architecture. | ||
|
|
||
|
|
||
| # Definitions/Abbreviations | ||
|
|
||
| | | | | | ||
| |------|--------------------|--------------------------------| | ||
| | FSI | Forwarding SONiC Instance | SONiC instance on a packet forwarding module like a linecard. | ||
| | SSI | Supervisor SONiC Instance | SONiC instance on a central supervisor module that controls a cluster of forwarding instances and the interconnection fabric. | ||
| | Forwarding Device | A unit of hardware that runs SONiC and is responsible for packet forwarding | | ||
| | ASIC | Application Specific Integrated Circuit | Refers to the forwarding engine on a device that is responsible for packet forwarding. Also referred to as NPU | ||
|
|
||
|
|
||
| # 1 Requirements | ||
|
|
||
| # 1.1 Functional Requirements | ||
|
|
||
| ## 1.1.1 Distributed Operation | ||
|
|
||
| * Each forwarding device must run an independent SONiC instance (called the Forwarding SONiC Instance or FSI) which controls the operation of one or more ASICs on the device, including the front panel and internal fabric ports conn ected to the ASICs. | ||
| * A Forwarding device must act as a fully functional router that can run routing protocols and other networking services just like single box SONiC devices. | ||
| * The system of forwarding devices should be managed by a single central Supervisor SONiC instance (SSI) that also manages the internal fabric that interconnects the forwarding devices. | ||
|
|
||
| ## 1.1.2 Intra-System Control Plane | ||
|
|
||
| * Each FSI should be able to connect to other FSIs over the internal fabric in order to be able to run protocols like BGP within the system. | ||
|
|
||
| * This connection must be fate shared with the data path so that a loss of connectivity in the internal fabric is reflected as loss of internal control plane connectivity as well. | ||
|
|
||
| ## 1.1.3 Intra-System Management Plane | ||
|
|
||
| Every FSI must have a management interface over which it can reach the supervisor and the rest of the network outside the system. This network must be completely separate from the internal control plane network described above. | ||
|
|
||
| ## 1.2 Configuration and Management Requirements | ||
|
|
||
| * Each SONiC instance must be independently configurable and manageable through standard SONiC management interfaces. | ||
|
|
||
| * The physical configuration of the entire system is fixed at startup. This includes | ||
| * The Hardware SKU that is used for each forwarding device | ||
| * The Physical port organization of the entire system | ||
|
|
||
| * Live replacement of forwarding devices or pluggable modules like transceivers must be supported as long as the part being replaced is an identical SKU. | ||
|
|
||
| # 2 Design | ||
|
|
||
| ## 2.1 Design Assumptions | ||
|
|
||
| In order for the system to function correctly, some state that provides the global view of the system to all the FSIs is necessary. This state is stored in the SSI and all FSIs connect to the SSI over the internal management network to access this state. | ||
|
|
||
| ## 2.2 SAI Support | ||
|
|
||
| Support for VOQ based forwarding in SONiC is dependent on the [SAI VOQ API](https://github.com/opencomputeproject/SAI/blob/master/doc/VoQ/SAI-Proposal-VoQ-Switch.md) | ||
|
|
||
| ## 2.3 State Sharing | ||
|
|
||
|  | ||
|
|
||
| All state of global interest to the entire system is stored in the SSI in a new Redis instance with a database called “Chassis DB”. This instance is accessible over the internal management network. | ||
|
|
||
| FSIs connect to this instance in addition to their own local Redis instance to access and act on this global state. | ||
|
|
||
| ## 2.3.1 Chassis DB Organization | ||
|
|
||
| The Chassis DB runs in a new container known as ‘docker-database-chassis’ as a separate Redis instance. This ensures both that the Chassis state is isolated from the rest of the databases in the instance and can also be conditionally started only on the SSI. | ||
|
|
||
| ## 2.3.2 Config DB Additions | ||
|
|
||
| ``` | ||
| { | ||
| "DEVICE_METADATA": { | ||
| "localhost": { | ||
| …. | ||
| “chassis_db_address" : "10.8.1.200", | ||
| “connect_chassis_db” : “1”, | ||
| …. | ||
| } | ||
| } | ||
| ``` | ||
| Two new attributes are added to the DEVICE_METADATA object in Config DB. These are used to convey to an FSI that a ChassisDB exists in the system. | ||
|
|
||
|
|
||
| ## 2.4 Chip Management | ||
| There are two kinds of chips that are of interest | ||
|
|
||
| ### 2.4.1 Forwarding ASIC | ||
|
|
||
| The ASIC (also referred to as NPU in SONiC terminology) performs all the packet reception, forwarding, queueing and transmission functions in the system. | ||
|
|
||
| ### 2.4.2 Fabric Chip | ||
|
|
||
| The internal fabric that interconnects forwarding engines is made up of multiple fabric chips that are responsible for moving packets from the source to destination forwarding engine. | ||
|
|
||
| ASICs are connected to fabric chips over internal links that terminate on fabric ports at each end. | ||
|
|
||
| Fabric chips do not play any role in packet forwarding and do not need to be explicitly configured once initialized. All subsequent interactions with fabric chips are for monitoring only. | ||
|
|
||
| All chips are managed based on the existing Multi-ASIC paradigm in SONiC, in which there is one instance of syncd (and SAI), swss and other related agents per chip. | ||
|
|
||
| ### 2.4.3 Switch Numbering | ||
|
|
||
| * Each chip in the system (ASIC and Fabric Chip) in the system is given a global ID called a Switch ID. | ||
| * Each chip consumes C consecutive switch IDs, where C is the number of switching cores. | ||
| * Each core on a chip has a core ID between 0 and C. | ||
|
|
||
| Please see the SAI VoQ spec for more detailed examples. | ||
|
|
||
| ## 2.5 SONiC Instance and ASIC Naming | ||
|
|
||
| ### Forwarding SONiC Instance | ||
|
|
||
| Each FSI has a globally unique name that represents that SONiC instance. In a modular chassis, the name would conventionally be "Lineard-N", where N is the slot in which the linecard is inserted. | ||
|
|
||
| ### ASIC Name | ||
|
|
||
| In addition, each ASIC has a globally unique name which would conventionally be "Linecard-N.K", where K is a slot local identifier of the ASIC. | ||
|
|
||
| ## 2.6 Port Management | ||
|
|
||
| There are four types of ports that need to be managed | ||
|
|
||
| ### 2.6.1 Local Ports | ||
|
|
||
| These are front panel interfaces that are directly attached to each FSI. They are modeled and represented in SONiC exactly as they are with existing fixed configuration devices. | ||
|
|
||
| ### 2.6.2 System Ports | ||
|
|
||
| Every port on the system requires a global representation in addition to its existing local representation. This is known as a System Port (AKA sysport). Every system port is assigned an identifier that is globally unique called a system_port_id. In addition, every port is assigned a local port ID within a core called a “Core Port Id”. The scope of the Local Port Id is _within_ a core of a forwarding engine. | ||
|
|
||
| System Ports are modeled in ChassisDB in the SYSTEM_PORT table. | ||
|
|
||
| ``` | ||
| ;Layer2 port representation across a distribute VoQ system | ||
| ;instance_name is the globally unique name on the forwarding device on ;which the port is present | ||
| key = SYSTEM_PORT|asic_name|ifname ; | ||
| speed = 1*6DIGIT ; port line speed in Mbps | ||
| system_port_id = 1*6DIGIT ; globally unique port ID | ||
| switch_id = 1*2DIGIT ; global switch ID. | ||
| core_id = 1*2DIGIT ; core id within switch_id. | ||
| core_port_id = 1*6DIGIT ; chip specific port | ||
| ``` | ||
|
|
||
| The globally unique key in the SYSTEM_PORT_TABLE is the name of the ASIC instance and the front panel interface name. The ASIC name is chosen so that various agents like syncd can use this name as a filtering criterion to select the subset of entities that they need to operate on in order to manage a specific ASIC. | ||
|
|
||
| The only restriction on selecting the ifname is that the key "SYSTEM_PORT|asic_name|ifname" must uniquely identify a port on the system. | ||
|
|
||
| ### 2.6.3 Inband Ports | ||
|
|
||
| Inband ports are required to provide control plane connectivity between forwarding engines. They are connected to the forwarding device local CPU on one side and the internal fabric on the other. | ||
|
|
||
| Every inband port is assigned a System Port ID just like front panel ports which is known to all the forwarding devices. Thus, every inband port is reachable from every forwarding engine. | ||
|
|
||
| ### 2.6.4 Fabric Ports | ||
|
|
||
| The provisioning and management of Fabric ports is outside the scope of this document and will be documented as a separate proposal. | ||
|
|
||
| ## 2.7 Orchestration Agent | ||
|
|
||
| ### 2.7.1 System Port Handling | ||
|
|
||
| System port configuration is expected to completely static and known at the start of the system. | ||
|
|
||
| Based on the state of ‘“connect_chassis_db” in the device metadata in ConfigDB, Orchagent connects to Chassis DB and subscribes to the SYSTEM_PORT table in CHASSIS_DB. It uses this list of system ports from the SYSTEM_PORT table to construct the switch attributes needed by the create_switch SAI API. In a later phase, the system ports could be directly created by making sairedis calls from orchagent. | ||
|
|
||
| A system port can be used in lieu of a physical port in several SAI API calls as relevant. For example, a system port can be added as a vlan member or be a lag member. To account for these, portsorch will be updated to support sysports. | ||
|
|
||
|
|
||
| Portsyncd does not have to support sysports because sysports do not have any associated kernel devices. | ||
|
|
||
| # 3 Testing | ||
|
|
||
| Test coverage for the distributed VoQ architecture is achieved by extending the existing virtual switch based SWSS pytest infrastructure. | ||
|
|
||
| The distributed switching architecture is represented as multiple VS instances connected with each other and called as Virtual Chassis, where one of the instance plays the role of the SSI and the remaining instances as FSIs. | ||
|
|
||
| ## 3.1 Regression Testing | ||
|
|
||
| Existing SWSS pytests can be executed against any of the instances in the virtual chassis to ensure that existing SONiC functionality is not affected while operating in a distributed environment. | ||
kartik-arista marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 3.2 Distributed VoQ functionality | ||
|
|
||
| Additional tests that specifically validate distributed VoQ forwarding functionality run only in the virtual chassis environment. | ||
kartik-arista marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### 3.2.3 test_virtual_chassis.py | ||
|
|
||
| Is the top level test driver that executes testcases against the virtual chassis. | ||
|
|
||
| Sysport handling is tested by test_chassis_sysport which validates that | ||
| * System ports can be populated in CHASSIS_DB | ||
| * All FSIs can connect to CHASSIS_DB and access sysport state | ||
| * Orchagent programs the correct SAI Redis ASIC_DB state to represent the configured sysport. | ||
|
|
||
| # 4 Future Work | ||
kartik-arista marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Dynamic System Ports | ||
|
|
||
| Dynamic system port support is required to support the following forwarding scenarios | ||
|
|
||
| * Addition of a new forwarding device into an existing running system | ||
| * Replacing a forwarding device with another device of a different hardware SKU (such as replacing a linecard with a new linecard of a different SKU in a chassis slot). | ||
|
|
||
| Both these scenarios can be supported smoothly as long as the global system port numbering scheme is maintained and the modifications to system ports can be performed without impacting the System Port IDs of the running system. | ||
|
|
||
| Support for dynamic system ports requires SAI support for the `create_port` and `remove_port` calls. | ||
|
|
||
| ### Overview of PortsOrch changes | ||
|
|
||
| * Subscribe to the SYSTEM_PORT table in PortsOrch | ||
| * React to changes in the SYSTEM_PORT table | ||
| * Make the appropriate `create_port` and `remove_port` calls. | ||
|
|
||
| In addition, forwarding features that are dependent on System Ports need to react to these changes and reprogram the related forwarding plane state such as routing nexthops, LAG membership etc. | ||
|
|
||
|
|
||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.