Warm reboot: store FDB entries and warm start#615
Conversation
|
where do you use refreshFdbEntries? #Resolved |
jipanyang
left a comment
There was a problem hiding this comment.
How are the FDB entries retstored? It looks FDB entries are saved in stateDB.
Could you separate code restructuring with the function implementation in general?
There are quite some changes not related to FDB restore.
orchagent/fdborch.cpp
Outdated
| } | ||
|
|
||
| void FdbOrch::update(sai_fdb_event_t type, const sai_fdb_entry_t* entry, sai_object_id_t bridge_port_id) | ||
| void FdbOrch::refreshFdbEntries() |
There was a problem hiding this comment.
If it is for FDB reconciliation, separate PR should be prepared.
FDB reconciliation requires stale entry removal, which may be done in syncd implicitly or explicitly in orchagent. #Resolved
There was a problem hiding this comment.
we plan to this reconciliation by disabling hw learn during warm-reboot on the asic. #Resolved
|
To answer @jipanyang's question
The function refreshFdbEntries() will restore FDB from StateDB. The real actions are actually deserializing and update observing orchs. |
|
@qiluo-msft I didn't find the place where refreshFdbEntries() is called. Mixing code refactoring with new feature implementation makes it pretty hard to do code review. @lguohan "we plan to this reconciliation by disabling hw learn during warm-reboot on the asic", in this case, unplanned warm restart won't work. swss won't be able to signal asic for unplanned restart. #Resolved |
|
@jipanyang , unplanned restart is not a goal here. It is very difficult to recover the system start from a crashed scenario. #Resolved |
|
@qiluo-msft , please indicate us to do the code review when you use the refreshfdbentries, both jipan and me asked the same question. not much to review for now. #Resolved |
4e6dd87 to
1be9cfc
Compare
|
In latest iteration, all warm start logic is moved into bake(), which called by orchdaemon. In reply to: 420544357 [](ancestors = 420544357) |
|
In latest iteration, all warm start logic is moved into bake(), which called by orchdaemon. Refactoring removed. In reply to: 421503216 [](ancestors = 421503216) |
|
Please help review and meanwhile I will fix the unit test. #Closed |
| } | ||
| WarmStart::setWarmStartState("orchagent", WarmStart::RESTORED); | ||
| return true; | ||
| return ts.empty(); |
There was a problem hiding this comment.
true is 1. "return !ts.empty();" or do further update? #Resolved
There was a problem hiding this comment.
There was a problem hiding this comment.
Got it, thanks #Resolved
| std::vector<FieldValueTuple> fvs; | ||
| fvs.push_back(FieldValueTuple("port", portName)); | ||
| fvs.push_back(FieldValueTuple("type", "dynamic")); | ||
| m_fdbStateTable.set(key, fvs); |
There was a problem hiding this comment.
If FDB entries are saved to stateDB. During system warm reboot, will stateDB be restored? #Resolved
There was a problem hiding this comment.
Do you mean will stateDB in redis persistent during warm reboot? Yes, it should be.
Or do you mean if fdborch will restore from stateDB? Yes, restored in bake().
In reply to: 218964895 [](ancestors = 218964895)
There was a problem hiding this comment.
stateDB will be cleared during state warm reboot. it is a problem. #Resolved
There was a problem hiding this comment.
I will save the fdb entries in state db in the warm-reboot script and recover it in the going up path. @jipanyang , thanks for the catch. #Pending
There was a problem hiding this comment.
Would it be easier if the data is put in a separate table in appDB so no special handling is needed for stateDB? #Resolved
There was a problem hiding this comment.
I agree easier because of the implementation detail. The design of stateDB is for SWSS state, and applDB is very old which is originally designed for inter-process communication between SWSS applications. So I made the decision to store in stateDB.
BTW, there is already a FDB_TABLE (ProviderStateTable) in applDB for user custom FDB.
In reply to: 218993140 [](ancestors = 218993140)
| [], | ||
| [("SAI_FDB_ENTRY_ATTR_TYPE", "SAI_FDB_ENTRY_TYPE_DYNAMIC"), | ||
| ("SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID", iface_2_bridge_port_id["Ethernet64"]), | ||
| ] |
There was a problem hiding this comment.
Was the entry removed during orchagent restart? If not, probably other check like CRM counter value might be better. #Resolved
There was a problem hiding this comment.
If you are asking entry in ASIC DB, no. orchagent warm start will create dynamic fdb, and syncd apply view will match it with old view, and keep it.
I will add CRM test.
In reply to: 219020615 [](ancestors = 219020615)
|
|
||
| finally: | ||
| # disable warm restart | ||
| # TODO: use cfg command to config it |
There was a problem hiding this comment.
"config warm_restart enable swss" is available. #Pending
There was a problem hiding this comment.
I am following tests in test_warm_reboot.py. I will update later if those tests updated.
In reply to: 219020740 [](ancestors = 219020740)
tests/test_fdb.py
Outdated
|
|
||
| try: | ||
| # restart orchagent | ||
| dvs.runcmd(['sh', '-c', 'supervisorctl restart orchagent']) |
There was a problem hiding this comment.
Would it be better to do swss start/stop? If it is for warm restart testing, test_warm_reboot.py might be a good place for all related test cases.
# start processes in SWSS
def start_swss(dvs):
dvs.runcmd(['sh', '-c', 'supervisorctl start orchagent; supervisorctl start portsyncd; supervisorctl start intfsyncd; \
supervisorctl start neighsyncd; supervisorctl start intfmgrd; supervisorctl start vlanmgrd; \
supervisorctl start buffermgrd; supervisorctl start arp_update'])
# stop processes in SWSS
def stop_swss(dvs):
dvs.runcmd(['sh', '-c', 'supervisorctl stop orchagent; supervisorctl stop portsyncd; supervisorctl stop intfsyncd; \
supervisorctl stop neighsyncd; supervisorctl stop intfmgrd; supervisorctl stop vlanmgrd; \
supervisorctl stop buffermgrd; supervisorctl stop arp_update'])
``` #Resolved
There was a problem hiding this comment.
I change it to restart swss.
It is also for fdb test, so I keep it here.
In reply to: 219020965 [](ancestors = 219020965)
| string portName = port.m_alias; | ||
|
|
||
| // ref: https://github.com/Azure/sonic-swss/blob/master/doc/swss-schema.md#fdb_table | ||
| string key = "Vlan" + to_string(vlan_id) + ":" + mac.to_string(); |
There was a problem hiding this comment.
: [](start = 48, length = 1)
should this be '|' #Resolved
There was a problem hiding this comment.
Currently there is a trick. All entries stored in StateDB FDB_TABLE will be fed into ApplDB during warm starting. So I keep them the same pattern.
In reply to: 219357862 [](ancestors = 219357862)
There was a problem hiding this comment.
I still feel it is more convenient to save the data in appDB using a new table like ASIC_FDB_TABLE, so to eliminate all the exception handling (stateDB save/restore during system warm reboot, and the special ":" separator in stateDB).
|
|
||
| size_t refilled = consumer->refillToSync(&m_fdbStateTable); | ||
| SWSS_LOG_NOTICE("Add warm input FDB State: %s, %zd", APP_FDB_TABLE_NAME, refilled); | ||
| return true; |
There was a problem hiding this comment.
does this call fdb notification? here the consumer is the app db? how can the fdb entries in state db got translated into fdb notificastions? #Resolved
There was a problem hiding this comment.
- Are you asking syncd receive this fdb entry and send orchagent notification? No.
- Yes, the consumer is the app db
- Never translated.
In reply to: 219421390 [](ancestors = 219421390)
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
…sumer Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
5953a91 to
9ffca45
Compare
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Qi Luo <[email protected]>
Signed-off-by: Guohan Lu <[email protected]>
* Refactor Signed-off-by: Qi Luo <[email protected]> * Refactor Signed-off-by: Qi Luo <[email protected]> * Refactor Signed-off-by: Qi Luo <[email protected]> * Refactor Signed-off-by: Qi Luo <[email protected]> * Store fdb notification immediately after popping from NotificationConsumer Signed-off-by: Qi Luo <[email protected]> * Add syncUpFdb() Signed-off-by: Qi Luo <[email protected]> * Refactor Signed-off-by: Qi Luo <[email protected]> * Add vlan ping test Signed-off-by: Qi Luo <[email protected]> * Refine test: verify AsicDB and StateDB on fdb entries Signed-off-by: Qi Luo <[email protected]> * Refactor test * Add FDB state sync up and test Signed-off-by: Qi Luo <[email protected]> * Revert many refactoring, and store FDB notification with port/vlan info Signed-off-by: Qi Luo <[email protected]> * Fix test Signed-off-by: Qi Luo <[email protected]> * Refine macro name Signed-off-by: Qi Luo <[email protected]> * Add test on CRM counters Signed-off-by: Qi Luo <[email protected]> * Restart swss in test Signed-off-by: Qi Luo <[email protected]> * Fix bugs: increase iterator before erase, no double dec CRM counter Signed-off-by: Qi Luo <[email protected]> * Fix test Signed-off-by: Qi Luo <[email protected]> * Split fdb tests into 2 files Signed-off-by: Qi Luo <[email protected]>
Inspired by #558
The motivation of this PR:
Tested: