[watermarkorch] add watermarkorch, extend queue and pg counters with wat… by mykolaf · Pull Request #629 · sonic-net/sonic-swss

mykolaf · 2018-09-26T15:15:52Z

…ermark stats

What I did

Add new orch - watermarkorch
add new flex counter groups - QUEUE_WATERMARK, PG_WATERMARK
add new queue & pg flex plugins - watermark_queue.lua, watermark_pg.lua
Why I did it
This is new code forthe watermarks feature
High Level Design Document
How I verified it
Manually verified on mlnx switch.
Details if related

lguohan · 2018-10-03T16:29:56Z

retest this please

…ermark stats Signed-off-by: Mykola Faryma <[email protected]>

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf · 2018-10-16T16:03:44Z

Depends on sonic-utilities sonic-net/sonic-utilities#327

wendani · 2018-10-16T16:59:46Z

orchagent/watermarkorch.cpp

+                        m_telemetryInterval = to_uint<uint32_t>(i.second.c_str());
+                    }
+                }
+            }


else case of not supported key?

wendani · 2018-10-16T22:48:33Z

orchagent/portsorch.cpp

+        {
+            counters_stream << delimiter << sai_serialize_ingress_priority_group_stat(it);
+            delimiter = comma;
+        }


If we add the pg watermark stats here, they will be enabled if the corresponding "FLEX_COUNTER_TABLE|FLEX_COUNTER_STATUS" is "enabled" in the CONFIG_DB. We may take the similar approach as in the PFCWD that we listen to the SET_COMMAND and DEL_COMMAND to be able to enable and disable polling watermark stats at run time. Similar case for the queue watermark stats above.

The syncd has the logic for listening to FC STATUS in place:
https://github.com/Azure/sonic-sairedis/blob/master/syncd/syncd.cpp#L2767
There is also the ability to enable/disable WM polling via counterpoll utility:
https://github.com/Azure/sonic-utilities/blob/614dbeec6a0bb99c0020e276f95cfe60bc6a1a60/counterpoll/main.py#L99

prsunny · 2018-10-16T23:02:40Z

orchagent/watermarkorch.h

+#include "timer.h"
+
+extern "C" {
+#include "sai.h"


orch.h is already including sai.h

prsunny · 2018-10-16T23:04:08Z

orchagent/watermarkorch.h

+    void doTask(NotificationConsumer& consumer);
+    void doTask(SelectableTimer &timer);
+
+    void init_pg_ids();


Can we follow same format for all member functions?

wendani · 2018-10-17T02:35:35Z

orchagent/pfcwdorch.cpp

        if (!c_queueStatIds.empty())
        {
            string str = counterIdsToStr(c_queueStatIds, sai_serialize_queue_stat);
+            queueFieldValues.emplace_back(STATS_MODE_FIELD, STATS_MODE_READ);


for PORT_STATS, QUEUE_STATS, PG_WATERMARK_STATS, and QUEUE_WATERMARK_STATS, STATS_MODE is set to the corresponding GROUP_TABLE. For PFCWD, it is set to FLEX_COUNTER_TABLE. Why the difference?

Signed-off-by: Mykola Faryma <[email protected]>

wendani · 2018-10-17T21:49:58Z

orchagent/portsorch.cpp

+
+        vector<FieldValueTuple> fieldValues;
+        fieldValues.emplace_back(QUEUE_PLUGIN_FIELD, queueWmSha);
+        fieldValues.emplace_back(POLL_INTERVAL_FIELD, QUEUE_WATERMARK_FLEX_STAT_COUNTER_POLL_MSECS);


I will look into the flex counter thread in syncd. Before I get more acquainted, a general question in my mind is that each flex counter table has its own polling interval. How does the flex counter thread in syncd make sure each one gets polled and serviced just on time when the interval expires?

I checked the code in sycnd. flex counter thread is per flex counter table. So it makes sense to me now.

wendani · 2018-10-17T21:59:23Z

orchagent/portsorch.cpp

+
    m_flex_db = shared_ptr<DBConnector>(new DBConnector(FLEX_COUNTER_DB, DBConnector::DEFAULT_UNIXSOCKET, 0));
    m_flexCounterTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_TABLE));
    m_flexCounterGroupTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_GROUP_TABLE));


A general question: why the flex counter table and the flex counter group table need to be ProducerTable? Qi mentioned that ProducerTable is a FIFO queue. So it is used when we care about the sequences the redis request is queued.

@mykolaf We should use ProducerTable for flex counter table and flex counter group table. This is what used in the pfcwd. Just curious about the reason behind it.

@wenda Flex counter uses ConsumerTable, so we use the related producer. There is no specific reason for using it, it was just implemented that way.

qiluo-msft · 2018-10-19T21:20:16Z

retest this please

lguohan · 2018-10-26T23:18:45Z

retest this please

mykolaf · 2018-10-29T15:33:18Z

VS test dependencies:

"set counter" infra (sonic-sairedis):
Add VS support for setting stats via redis DB channel sonic-sairedis#366
watermark CLI
[watermarks] add watermarkstat, watermarkcfg and aliases sonic-utilities#327
@wendani could you take a look?

Probably sonic-utilities pointer also needs to be updated in order to support WM CLI in the docker-vs)

lguohan · 2018-10-31T22:33:23Z

retest this please

mykolaf · 2018-11-02T08:51:40Z

@lguohan all 3 of the watermark tests passed, but some other recently introduced tests failed (

lguohan · 2018-11-02T16:55:13Z

retest this please

lguohan · 2018-11-02T18:45:41Z

@mykolaf , can you sign the cla?

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf · 2018-11-02T19:52:16Z

@lguohan That was some bug, I already signed it 8 month ago.
Anyway, now it shows that it's signed.

wendani · 2018-11-11T02:10:00Z

orchagent/portsorch.cpp

+    vector<FieldValueTuple> pgPortVector;
+    vector<FieldValueTuple> pgIndexVector;
+
+    for (size_t pgIndex = 0; pgIndex < port.m_priority_group_ids.size(); ++pgIndex)


Lossy PG does not have SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES. So the query value will always return 0?

I guess it depends on the SAI implementation of it. If the SAI returns some status other than success, the flex counter will mark the counter as unsupported and stop polling. For this cases we will show the value as 'N/A' in the output.
But having made a little test, I see that I get a 0 for lossy pg. Is this different on some other platform?
From my point of view, it seems logical that even lossy PG has a counter for headroom watermark, even if it has lossy profile applied. Looks like the structure of PG's counters doesn't care of pg being lossy/lossless.

lguohan · 2018-11-21T01:55:50Z

retest this please

lguohan · 2018-11-21T06:11:52Z

retest this please

lguohan · 2018-11-23T03:08:07Z

retest this please

wendani · 2019-04-20T19:44:40Z

orchagent/watermarkorch.cpp

+            m_appDb.get(),
+            "WATERMARK_CLEAR_REQUEST");
+    auto clearNotifier = new Notifier(m_clearNotificationConsumer, this, "WM_CLEAR_NOTIFIER");
+    Orch::addExecutor(clearNotifier);


For user issued clear, why not just directly clear through the CLI but signal through an APPL_DB event channel, and delegate to WatermarkOrch to do so? What is the benefit here?

This is the design we agreed on.
https://github.com/Azure/SONiC/blob/gh-pages/doc/watermarks_HLD.md

This was needed in the draft design of the feature, in current implementation it doesn't give any advantages.

wendani · 2019-04-21T06:57:09Z

orchagent/watermark_pg.lua

+    if (pg_shared_wm) then
+        redis.call('HSET', periodic_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES',  periodic_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(periodic_shared_wm)) or pg_shared_wm)
+        redis.call('HSET', persistent_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', persistent_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(persistent_shared_wm)) or pg_shared_wm)
+        redis.call('HSET', user_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', user_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(user_shared_wm)) or pg_shared_wm)


The use of ternary operator is fancy, but we do not need to issue a redis hset command if the new value is no greater than the historic high, which is an expense operation in the context of a syncd flex counter thread.

Need to mirror the neighbor solicitation packets - type 135. Signed-off-by: Shu0T1an ChenG <[email protected]>

…onic-net#629) * add a README to tests directory to describe how to run 'make check' * small spelling and grammar fix Co-authored-by: Syd Logan <[email protected]>

…for-port-cntrs Temporarily disable bulk init requests for PORT counters Add temporary fix for https://github.com/aristanetworks/sonic-qual.msft/issues/655 This forces each port to be processed individually, avoiding capability mismatch between different ports in bulk requests What I did Temporarily disable bulk init requests for PORT counters. Why I did it When swss requests bulk initialization of PORT counters, corresponding component in sonic-sairedis assumes all the requested ports support same attributes, which is not the case for SFP/mgmt ports of Arista switches and was causing these ports to be completely skipped. This is supposed to be fixed by Azure/sonic-sairedis.msft#73 but it needs a re-work as its breaking non-Broadcom platform. So, we're temporarily disabling this flow. How I verified it Verified countersDB is now having all the supported counters for SFP ports. Details if related relevant threads: #558, #629, 655 signed-off-by: [email protected]

…wat… (sonic-net#629) * [watermarks] add watermarkorch, extend queue and pg counters with watermark stats Signed-off-by: Mykola Faryma <[email protected]> * [watermark] add VS tests Signed-off-by: Mykola Faryma <[email protected]> * [watermark] resolve conflict Signed-off-by: Mykola Faryma <[email protected]> * address review comments Signed-off-by: Mykola Faryma <[email protected]> * update VS test to use set counters and cover the flex counter flow Signed-off-by: Mykola Faryma <[email protected]>

mykolaf force-pushed the wm branch 4 times, most recently from 489ab81 to e2df6cf Compare September 27, 2018 18:40

mykolaf force-pushed the wm branch 2 times, most recently from adf20ae to ef7a81f Compare October 12, 2018 09:18

Mykola Faryma added 2 commits October 12, 2018 09:30

[watermarks] add watermarkorch, extend queue and pg counters with wat…

d188009

…ermark stats Signed-off-by: Mykola Faryma <[email protected]>

[watermark] add VS tests

7209b8b

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf force-pushed the wm branch 3 times, most recently from b7a9a41 to de001b4 Compare October 16, 2018 11:37

[watermark] resolve conflict

441c0db

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf force-pushed the wm branch from de001b4 to 441c0db Compare October 16, 2018 12:26

wendani self-requested a review October 16, 2018 16:59

wendani reviewed Oct 16, 2018

View reviewed changes

prsunny reviewed Oct 16, 2018

View reviewed changes

wendani reviewed Oct 17, 2018

View reviewed changes

address review comments

3881120

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf force-pushed the wm branch from 221aa9e to 3881120 Compare October 17, 2018 11:08

wendani reviewed Oct 17, 2018

View reviewed changes

mykolaf force-pushed the wm branch 2 times, most recently from cc3f266 to 3881120 Compare October 19, 2018 16:20

wendani approved these changes Oct 20, 2018

View reviewed changes

mykolaf force-pushed the wm branch from 0335bce to 9003f06 Compare October 29, 2018 16:15

mykolaf force-pushed the wm branch from 9003f06 to fb14c66 Compare October 31, 2018 16:51

update VS test to use set counters and cover the flex counter flow

ec53952

Signed-off-by: Mykola Faryma <[email protected]>

mykolaf force-pushed the wm branch from fb14c66 to ec53952 Compare November 2, 2018 19:50

stcheng added the Enhancement ➕ label Nov 3, 2018

wendani reviewed Nov 11, 2018

View reviewed changes

lguohan merged commit b750a4b into sonic-net:master Nov 23, 2018

mykolaf deleted the wm branch February 21, 2019 11:59

wendani reviewed Apr 20, 2019

View reviewed changes

wendani reviewed Apr 21, 2019

View reviewed changes

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022

[neighbor_advertiser]: Change the ICMPv6 type to 135 (sonic-net#629)

6e2b1bf

Need to mirror the neighbor solicitation packets - type 135. Signed-off-by: Shu0T1an ChenG <[email protected]>

Conversation

mykolaf commented Sep 26, 2018

Uh oh!

lguohan commented Oct 3, 2018

Uh oh!

mykolaf commented Oct 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wendani Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qiluo-msft commented Oct 19, 2018

Uh oh!

lguohan commented Oct 26, 2018

Uh oh!

mykolaf commented Oct 29, 2018

Uh oh!

lguohan commented Oct 31, 2018

Uh oh!

mykolaf commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lguohan commented Nov 2, 2018

Uh oh!

lguohan commented Nov 2, 2018

Uh oh!

mykolaf commented Nov 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lguohan commented Nov 21, 2018

Uh oh!

lguohan commented Nov 21, 2018

Uh oh!

lguohan commented Nov 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wendani Oct 17, 2018 •

edited

Loading

mykolaf commented Nov 2, 2018 •

edited

Loading