Skip to content

Do not always crash syncd with event process error, provide SAI API f…#315

Closed
jipanyang wants to merge 8 commits intosonic-net:masterfrom
jipanyang:sai_feed_back
Closed

Do not always crash syncd with event process error, provide SAI API f…#315
jipanyang wants to merge 8 commits intosonic-net:masterfrom
jipanyang:sai_feed_back

Conversation

@jipanyang
Copy link
Copy Markdown
Contributor

@jipanyang jipanyang commented Apr 26, 2018

…eedback for create/set/remove operations

Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com

The change is to provide feedback path to orchagent for libsai redis API create/set/remove operations.

SAI_OBJECT_TYPE_ROUTE_ENTRY has been excluded from the feedback set, one of the reason is the impact on operation latency when there is large amount of route entries.

NotificationProducer/NotificationConsumer channel has been used for feedback, which has lower overhead than Producer/Consumer channel ( about 50%).

create example:

2018-04-25.23:08:40.962598|c|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000978|SAI_HOSTIF_ATTR_TYPE=SAI_HOSTIF_TYPE_NETDEV|SAI_HOSTIF_ATTR_OBJ_ID=oid:0x1000
2018-04-25.23:08:40.965889|C|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000978

set example:

2018-04-25.23:08:40.892350|s|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000977|SAI_HOSTIF_ATTR_VLAN_TAG=SAI_HOSTIF_VLAN_TAG_STRIP
2018-04-25.23:08:40.893075|S|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000977

remove example:

2018-04-26.00:07:40.480202|r|SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x270000000000c5
2018-04-26.00:07:40.481170|R|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x270000000000c5

No change to SAI_OBJECT_TYPE_ROUTE_ENTRY:

2018-04-26.00:15:25.165330|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.42.100/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x
2018-04-26.00:15:25.166478|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.55.20/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4
2018-04-26.00:15:25.167817|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.201.217/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0
2018-04-26.00:15:25.172021|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.175.25/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x
2018-04-26.00:15:25.173406|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.20.169/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x

…eedback for create/set/remove operations

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Apr 26, 2018

we need to understand the impact of fast reboot because the change mainly impact the orchagent init, so do you have time on how much longer this impact the fast reboot test?

@lguohan lguohan requested a review from pavel-shirshov April 26, 2018 05:36
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Copy link
Copy Markdown
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments.

P.S.: I feel this feature will slow orchagent down. Have you measured SONiC start time with and without this feature?

// there is something wrong and we should fail
#define CSR_RESPONSE_TIMEOUT (5*60*1000)

extern sai_status_t redis_get_response(std::string &key, std::string op, sai_object_type_t object_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use const and references for the string parameters?
const std::string &key,
const std::string &op

_In_ sai_bulk_op_error_mode_t mode,
_Out_ sai_status_t *object_statuses);

#define sai_return_obj_op_status(object_type) ({ \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make a function.
You could make the function inline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sai_return_obj_op_status() is to be used by both syncd and orchagent. sairedis.h is the common file used by both of them.

If it is function (inline or not), since sairedis.h will be included in some cpp files which don't have call to sai_return_obj_op_status(), there will be "function defined but not used" compilation error with current build setting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't declare the inline as static which I assume is what you did that triggered "function defined but not used".

// wait for response
sai_status_t redis_get_response(
_In_ std::string &key,
_In_ std::string op,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const and references as above

syncd/syncd.cpp Outdated

std::string str_status = sai_serialize_status(status);

SWSS_LOG_INFO("sending response for %s on %s with status: %s", op.c_str(), key.c_str(), str_status.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably DEBUG would be better here?

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@jipanyang
Copy link
Copy Markdown
Contributor Author

Will collect SONiC start time with and without this feature and post them here.

@jipanyang
Copy link
Copy Markdown
Contributor Author

jipanyang commented Apr 27, 2018

The exact time taken might be affected by many factors. I did systemctl restart swss with the same configuration and check sairedis.rec for certain mile stones.

Without feedback
from INIT_VIEW to last SAI_OBJECT_TYPE_ROUTER_INTERFACE (10 in total) created.
02:01:32.042732 ---- 02:01:46.824758 Time taken: 14.782 seconds.

With feedback
from INIT_VIEW to last SAI_OBJECT_TYPE_ROUTER_INTERFACE created.
02:13:06.480494 ---- 02:13:21.561555 Time taken: 15.081 seconds

There were 167 |C| create feedback, 258 |S| set feedback and 123 |R| remove feedback.

without_feedback.txt
with_feedback.txt

Copy link
Copy Markdown
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built the sairedis with this PR and found that SONiC didn't work as expected. orchagent fails on the first API call ->create_switch(). The call doesn't see any data from syncd for timeout time.

swss::Select s;

sai_status_t status;
std::string request;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How request is used here? I see it's used for output only. I don't see any assignment here.

jipanyang added 2 commits May 2, 2018 16:44
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…e wait time to 30 seconds

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@jipanyang
Copy link
Copy Markdown
Contributor Author

Link to the issue: #239

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Jul 16, 2019

@jipanyang , can we close this PR when #476 is merged?

@jipanyang
Copy link
Copy Markdown
Contributor Author

Closing in favor of #476

@jipanyang jipanyang closed this Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants