Skip to content

[watermarkorch] add watermarkorch, extend queue and pg counters with wat…#629

Merged
lguohan merged 5 commits intosonic-net:masterfrom
mykolaf:wm
Nov 23, 2018
Merged

[watermarkorch] add watermarkorch, extend queue and pg counters with wat…#629
lguohan merged 5 commits intosonic-net:masterfrom
mykolaf:wm

Conversation

@mykolaf
Copy link
Copy Markdown
Collaborator

@mykolaf mykolaf commented Sep 26, 2018

…ermark stats

What I did

  • Add new orch - watermarkorch
  • add new flex counter groups - QUEUE_WATERMARK, PG_WATERMARK
  • add new queue & pg flex plugins - watermark_queue.lua, watermark_pg.lua
    Why I did it
    This is new code forthe watermarks feature
    High Level Design Document
    How I verified it
    Manually verified on mlnx switch.
    Details if related

@mykolaf mykolaf force-pushed the wm branch 4 times, most recently from 489ab81 to e2df6cf Compare September 27, 2018 18:40
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 3, 2018

retest this please

@mykolaf mykolaf force-pushed the wm branch 2 times, most recently from adf20ae to ef7a81f Compare October 12, 2018 09:18
Mykola Faryma added 2 commits October 12, 2018 09:30
@mykolaf mykolaf force-pushed the wm branch 3 times, most recently from b7a9a41 to de001b4 Compare October 16, 2018 11:37
Signed-off-by: Mykola Faryma <[email protected]>
@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Oct 16, 2018

Depends on sonic-utilities sonic-net/sonic-utilities#327

@wendani wendani self-requested a review October 16, 2018 16:59
m_telemetryInterval = to_uint<uint32_t>(i.second.c_str());
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else case of not supported key?

{
counters_stream << delimiter << sai_serialize_ingress_priority_group_stat(it);
delimiter = comma;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the pg watermark stats here, they will be enabled if the corresponding "FLEX_COUNTER_TABLE|FLEX_COUNTER_STATUS" is "enabled" in the CONFIG_DB. We may take the similar approach as in the PFCWD that we listen to the SET_COMMAND and DEL_COMMAND to be able to enable and disable polling watermark stats at run time. Similar case for the queue watermark stats above.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "timer.h"

extern "C" {
#include "sai.h"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orch.h is already including sai.h

void doTask(NotificationConsumer& consumer);
void doTask(SelectableTimer &timer);

void init_pg_ids();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow same format for all member functions?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

if (!c_queueStatIds.empty())
{
string str = counterIdsToStr(c_queueStatIds, sai_serialize_queue_stat);
queueFieldValues.emplace_back(STATS_MODE_FIELD, STATS_MODE_READ);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for PORT_STATS, QUEUE_STATS, PG_WATERMARK_STATS, and QUEUE_WATERMARK_STATS, STATS_MODE is set to the corresponding GROUP_TABLE. For PFCWD, it is set to FLEX_COUNTER_TABLE. Why the difference?

Signed-off-by: Mykola Faryma <[email protected]>

vector<FieldValueTuple> fieldValues;
fieldValues.emplace_back(QUEUE_PLUGIN_FIELD, queueWmSha);
fieldValues.emplace_back(POLL_INTERVAL_FIELD, QUEUE_WATERMARK_FLEX_STAT_COUNTER_POLL_MSECS);
Copy link
Copy Markdown
Contributor

@wendani wendani Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into the flex counter thread in syncd. Before I get more acquainted, a general question in my mind is that each flex counter table has its own polling interval. How does the flex counter thread in syncd make sure each one gets polled and serviced just on time when the interval expires?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the code in sycnd. flex counter thread is per flex counter table. So it makes sense to me now.


m_flex_db = shared_ptr<DBConnector>(new DBConnector(FLEX_COUNTER_DB, DBConnector::DEFAULT_UNIXSOCKET, 0));
m_flexCounterTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_TABLE));
m_flexCounterGroupTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_GROUP_TABLE));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question: why the flex counter table and the flex counter group table need to be ProducerTable? Qi mentioned that ProducerTable is a FIFO queue. So it is used when we care about the sequences the redis request is queued.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mykolaf We should use ProducerTable for flex counter table and flex counter group table. This is what used in the pfcwd. Just curious about the reason behind it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenda Flex counter uses ConsumerTable, so we use the related producer. There is no specific reason for using it, it was just implemented that way.

@mykolaf mykolaf force-pushed the wm branch 2 times, most recently from cc3f266 to 3881120 Compare October 19, 2018 16:20
@qiluo-msft
Copy link
Copy Markdown
Contributor

retest this please

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 26, 2018

retest this please

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Oct 29, 2018

VS test dependencies:

  1. "set counter" infra (sonic-sairedis):
    Add VS support for setting stats via redis DB channel sonic-sairedis#366
  2. watermark CLI
    [watermarks] add watermarkstat, watermarkcfg and aliases sonic-utilities#327
    @wendani could you take a look?

Probably sonic-utilities pointer also needs to be updated in order to support WM CLI in the docker-vs)

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 31, 2018

retest this please

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Nov 2, 2018

@lguohan all 3 of the watermark tests passed, but some other recently introduced tests failed (

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 2, 2018

retest this please

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 2, 2018

@mykolaf , can you sign the cla?

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Nov 2, 2018

@lguohan That was some bug, I already signed it 8 month ago.
Anyway, now it shows that it's signed.

vector<FieldValueTuple> pgPortVector;
vector<FieldValueTuple> pgIndexVector;

for (size_t pgIndex = 0; pgIndex < port.m_priority_group_ids.size(); ++pgIndex)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lossy PG does not have SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES. So the query value will always return 0?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the SAI implementation of it. If the SAI returns some status other than success, the flex counter will mark the counter as unsupported and stop polling. For this cases we will show the value as 'N/A' in the output.
But having made a little test, I see that I get a 0 for lossy pg. Is this different on some other platform?
From my point of view, it seems logical that even lossy PG has a counter for headroom watermark, even if it has lossy profile applied. Looks like the structure of PG's counters doesn't care of pg being lossy/lossless.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 21, 2018

retest this please

2 similar comments
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 21, 2018

retest this please

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 23, 2018

retest this please

@lguohan lguohan merged commit b750a4b into sonic-net:master Nov 23, 2018
@mykolaf mykolaf deleted the wm branch February 21, 2019 11:59
m_appDb.get(),
"WATERMARK_CLEAR_REQUEST");
auto clearNotifier = new Notifier(m_clearNotificationConsumer, this, "WM_CLEAR_NOTIFIER");
Orch::addExecutor(clearNotifier);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user issued clear, why not just directly clear through the CLI but signal through an APPL_DB event channel, and delegate to WatermarkOrch to do so? What is the benefit here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed in the draft design of the feature, in current implementation it doesn't give any advantages.

if (pg_shared_wm) then
redis.call('HSET', periodic_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', periodic_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(periodic_shared_wm)) or pg_shared_wm)
redis.call('HSET', persistent_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', persistent_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(persistent_shared_wm)) or pg_shared_wm)
redis.call('HSET', user_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', user_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(user_shared_wm)) or pg_shared_wm)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of ternary operator is fancy, but we do not need to issue a redis hset command if the new value is no greater than the historic high, which is an expense operation in the context of a syncd flex counter thread.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Need to mirror the neighbor solicitation packets - type 135.

Signed-off-by: Shu0T1an ChenG <[email protected]>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
…onic-net#629)

* add a README to tests directory to describe how to run 'make check'

* small spelling and grammar fix

Co-authored-by: Syd Logan <[email protected]>
yejianquan added a commit that referenced this pull request Sep 3, 2025
…for-port-cntrs

Temporarily disable bulk init requests for PORT counters

Add temporary fix for https://github.com/aristanetworks/sonic-qual.msft/issues/655

This forces each port to be processed individually, avoiding capability mismatch between different ports in bulk requests

What I did
Temporarily disable bulk init requests for PORT counters.

Why I did it
When swss requests bulk initialization of PORT counters, corresponding component in sonic-sairedis assumes all the requested ports support same attributes, which is not the case for SFP/mgmt ports of Arista switches and was causing these ports to be completely skipped. This is supposed to be fixed by Azure/sonic-sairedis.msft#73 but it needs a re-work as its breaking non-Broadcom platform.

So, we're temporarily disabling this flow.

How I verified it
Verified countersDB is now having all the supported counters for SFP ports.

Details if related
relevant threads: #558, #629, 655

signed-off-by: [email protected]
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
…wat… (sonic-net#629)

* [watermarks] add watermarkorch, extend queue and pg counters with watermark stats

Signed-off-by: Mykola Faryma <[email protected]>

* [watermark] add VS tests

Signed-off-by: Mykola Faryma <[email protected]>

* [watermark] resolve conflict

Signed-off-by: Mykola Faryma <[email protected]>

* address review comments

Signed-off-by: Mykola Faryma <[email protected]>

* update VS test to use set counters and cover the flex counter flow

Signed-off-by: Mykola Faryma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants