You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[QoS] Optimize QoS operations of buffer setting and queue information fetching (#2752)
What I did
Optimize QoS operations:
Cache queue information to avoid fetching them from SAI every time
The cache is created when a queue's information is fetched for the first time
Avoid calling SAI API to fetch queue information if it exists in the cache
Cache will be cleared for the queues of a certain port when the port is removed
Apply buffer items (table: BUFFER_QUEUE, BUFFER_PG, BUFFER_PORT_INGRESS_PROFILE_LIST, BUFFER_PORT_EGRESS_PROFILE_LIST) only if they are updated
There is only one attribute, profile or profile_list, in the items in all the tables, and the attribute is stored in BufferOrch::m_buffer_type_maps, which means we can just check whether the new value is the same as the one stored in the mapping and apply to SAI only if it differs.
For the BUFFER_QUEUE table, it's possible that it needs to retry when a PFC storm is detected on the queue. A new set m_partiallyAppliedQueues is introduced to handle this case.
In any case, if it fails to call SAI API, we do not repeat calling it when the buffer table is set with the same value of attribute because it's users' responsibility to correct the configuration.
Signed-off-by: Stephen Sun [email protected]
Why I did it
Theoretically, it should be fast for both operations. But there is a mutex in sairedis enforcing a critical section for all SAI APIs. In case there is another SAI API ongoing, eg. fetching the counter, it has to wait for the current one to finish which can take more milliseconds. This occurs frequently when a large number of buffer PG or queue items are being set and the accumulated time is significant. In this scenario, two threads run parallelly and they will compete the critical section.
Syncd main thread in which the buffer PG, queue setting API, or queue info getting API runs,
FlexCounter thread in which the counter is fetched.
How I verified it
Mock test
Regression test
Details if related
An example of queue information fetching. For each queue, the information is fetched for 5 times, which consumes ~0.25 seconds. With the caching logic, it will be called only once.
2023-04-20.18:01:00.634562|a|INIT_VIEW
2023-04-20.18:01:00.635586|A|SAI_STATUS_SUCCESS
--
2023-04-20.18:01:43.290205|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205
2023-04-20.18:01:43.331625|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
--
2023-04-20.18:01:46.420931|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=0
2023-04-20.18:01:46.422113|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
--
2023-04-20.18:01:56.825879|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=24
2023-04-20.18:01:56.866720|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
--
2023-04-20.18:02:37.248679|a|APPLY_VIEW
2023-04-20.18:02:37.249435|A|SAI_STATUS_SUCCESS
--
2023-04-20.18:02:54.824194|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205
2023-04-20.18:02:54.866955|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
--
2023-04-20.18:02:54.932174|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205
2023-04-20.18:02:54.965082|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
0 commit comments