Do not always crash syncd with event process error, provide SAI API f…#315
Do not always crash syncd with event process error, provide SAI API f…#315jipanyang wants to merge 8 commits intosonic-net:masterfrom
Conversation
…eedback for create/set/remove operations Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
we need to understand the impact of fast reboot because the change mainly impact the orchagent init, so do you have time on how much longer this impact the fast reboot test? |
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
pavel-shirshov
left a comment
There was a problem hiding this comment.
As comments.
P.S.: I feel this feature will slow orchagent down. Have you measured SONiC start time with and without this feature?
lib/inc/sai_redis.h
Outdated
| // there is something wrong and we should fail | ||
| #define CSR_RESPONSE_TIMEOUT (5*60*1000) | ||
|
|
||
| extern sai_status_t redis_get_response(std::string &key, std::string op, sai_object_type_t object_type); |
There was a problem hiding this comment.
Can you please use const and references for the string parameters?
const std::string &key,
const std::string &op
| _In_ sai_bulk_op_error_mode_t mode, | ||
| _Out_ sai_status_t *object_statuses); | ||
|
|
||
| #define sai_return_obj_op_status(object_type) ({ \ |
There was a problem hiding this comment.
I think it's better to make a function.
You could make the function inline.
There was a problem hiding this comment.
sai_return_obj_op_status() is to be used by both syncd and orchagent. sairedis.h is the common file used by both of them.
If it is function (inline or not), since sairedis.h will be included in some cpp files which don't have call to sai_return_obj_op_status(), there will be "function defined but not used" compilation error with current build setting.
There was a problem hiding this comment.
You shouldn't declare the inline as static which I assume is what you did that triggered "function defined but not used".
lib/src/sai_redis_generic_create.cpp
Outdated
| // wait for response | ||
| sai_status_t redis_get_response( | ||
| _In_ std::string &key, | ||
| _In_ std::string op, |
There was a problem hiding this comment.
const and references as above
syncd/syncd.cpp
Outdated
|
|
||
| std::string str_status = sai_serialize_status(status); | ||
|
|
||
| SWSS_LOG_INFO("sending response for %s on %s with status: %s", op.c_str(), key.c_str(), str_status.c_str()); |
There was a problem hiding this comment.
probably DEBUG would be better here?
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
Will collect SONiC start time with and without this feature and post them here. |
|
The exact time taken might be affected by many factors. I did systemctl restart swss with the same configuration and check sairedis.rec for certain mile stones. Without feedback With feedback There were 167 |C| create feedback, 258 |S| set feedback and 123 |R| remove feedback. |
pavel-shirshov
left a comment
There was a problem hiding this comment.
I built the sairedis with this PR and found that SONiC didn't work as expected. orchagent fails on the first API call ->create_switch(). The call doesn't see any data from syncd for timeout time.
lib/src/sai_redis_generic_create.cpp
Outdated
| swss::Select s; | ||
|
|
||
| sai_status_t status; | ||
| std::string request; |
There was a problem hiding this comment.
How request is used here? I see it's used for output only. I don't see any assignment here.
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…e wait time to 30 seconds Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
Link to the issue: #239 |
|
@jipanyang , can we close this PR when #476 is merged? |
|
Closing in favor of #476 |
…eedback for create/set/remove operations
Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com
The change is to provide feedback path to orchagent for libsai redis API create/set/remove operations.
SAI_OBJECT_TYPE_ROUTE_ENTRY has been excluded from the feedback set, one of the reason is the impact on operation latency when there is large amount of route entries.
NotificationProducer/NotificationConsumer channel has been used for feedback, which has lower overhead than Producer/Consumer channel ( about 50%).
create example:
2018-04-25.23:08:40.962598|c|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000978|SAI_HOSTIF_ATTR_TYPE=SAI_HOSTIF_TYPE_NETDEV|SAI_HOSTIF_ATTR_OBJ_ID=oid:0x1000
2018-04-25.23:08:40.965889|C|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000978
set example:
2018-04-25.23:08:40.892350|s|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000977|SAI_HOSTIF_ATTR_VLAN_TAG=SAI_HOSTIF_VLAN_TAG_STRIP
2018-04-25.23:08:40.893075|S|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000977
remove example:
2018-04-26.00:07:40.480202|r|SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x270000000000c5
2018-04-26.00:07:40.481170|R|SAI_STATUS_SUCCESS|SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x270000000000c5
No change to SAI_OBJECT_TYPE_ROUTE_ENTRY:
2018-04-26.00:15:25.165330|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.42.100/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x
2018-04-26.00:15:25.166478|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.55.20/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4
2018-04-26.00:15:25.167817|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.201.217/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0
2018-04-26.00:15:25.172021|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.175.25/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x
2018-04-26.00:15:25.173406|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.2.20.169/32","switch_id":"oid:0x21000000000000","vr":"oid:0x300000000003a"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x