Warm reboot: Orchagent state restore#554
Conversation
4601adc to
2080ddb
Compare
Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang <[email protected]>
b18b510 to
17398e8
Compare
|
need new vs test to test warm boot scenario |
orchagent/aclorch.cpp
Outdated
| auto interv = timespec { .tv_sec = COUNTERS_READ_INTERVAL, .tv_nsec = 0 }; | ||
| auto timer = new SelectableTimer(interv); | ||
| auto executor = new ExecutableTimer(timer, this); | ||
| executor->setName("ACL_POLL_TIMER"); |
There was a problem hiding this comment.
why not just change the constructor to set the name.
not sure if it is useful to introduce setName method, since we do not (probably should not) allow change the name.
orchagent/orch.cpp
Outdated
| + "|" + kfvOp(tuple); | ||
| for (auto i = kfvFieldsValues(tuple).begin(); i != kfvFieldsValues(tuple).end(); i++) | ||
| { | ||
| s += "|" + fvField(*i) + ":" + fvValue(*i); |
There was a problem hiding this comment.
do not hardcode the separator.
orchagent/orchdaemon.cpp
Outdated
| getTaskToSync(ts); | ||
| if (ts.size() != 0) | ||
| { | ||
| // TODO: change it to fatal once staged ProducerStateTable/ConsumerStateTable change and |
There was a problem hiding this comment.
For planned warm restart, there is one PR https://github.com/Azure/sonic-swss/pull/562/files for enforcing orchagent to be in certain state before stopping orchagent/swss docker. After orchagent is restored, it should gain back the same state.
But there is open issue about the exact criteria of warm restart readiness. Currently we want to make sure that there is no un-met object dependency in orchagent which means the "SyncMap m_toSync;" of each Consumer is empty. @zhenggen-xu ever expressed concern over this check.
There was a problem hiding this comment.
This check is not necessary reach the desired result. The problem is if some dependencies were never met (e,g, the nexthop is not resolved for some reason for a particular route), which was the case for pre-reboot, but may kill the whole warm reboot process. Looks like we may want to handle those entries differently or we simply do a loose check (print warning etc) and move forward.
There was a problem hiding this comment.
Do we want to allow this scenario (there is unfulfilled dependency) for planned warm restart?
The possible alternative is to add a knob to have two levels of pre-reboot check:
- Allow unfulfilled dependency.
- disallow unfulfilled dependency.
After warm restore, we do the check accordingly.
With no intention to make it a hot topic here and block everything else, I'll change the log level to NOTICE and continue for now.
Please go to #562 for further discussion.
|
|
||
| m_orchList.push_back(gAclOrch); | ||
| m_orchList.push_back(gFdbOrch); | ||
| m_orchList.push_back(mirror_orch); |
There was a problem hiding this comment.
why move mirror orch and acl orch after fdb orch?
There was a problem hiding this comment.
This is to reduce the possible turbulence caused by the dependency relaxation processing during state restore.
mirror orch may use fdb mac/vlan info.
if (!m_portsOrch->getPort(session.neighborInfo.neighbor.alias, session.neighborInfo.port))
and acl orch may check mirror session
"if (!m_pMirrorOrch->getSessionState(m_sessionName, state))"
orchagent/orchdaemon.cpp
Outdated
| syncd_apply_view(); | ||
|
|
||
| // TODO: should be set after port/fdb/arp sync up | ||
| WarmStart::setWarmStartState("orchagent", WarmStart::RECONCILED); |
There was a problem hiding this comment.
how do we sync the port status and fdb notifications?
There was a problem hiding this comment.
Port state sync: #557
FDB state sync: #558
I think @zhenggen-xu is to propose PR for arp sync up.
There was a problem hiding this comment.
Portstate sync and FDB sync functions should be called in this PR, given this PR is dependent on the other two?
neighbor syncd is isolated in a different process, so you don't need call sync function. But I guess we may need check the warmstartstate of neighsyncd before we call syncd_apply_view()?
There was a problem hiding this comment.
I feel we should take a different approach to solve port oper status and fdb sync in orchagent.
Since we have information in asic db, we should just send syncd a command and ask syncd to enumerate all state in asic db and regenerate all the events back to orchagent.
The ground truth is in asic_db.
There was a problem hiding this comment.
We may have more in-depth discussion of port state sync up and fdb sync up in the corresponding PRs. Here is some quick recap:
<1> Current SONiC use NotificationProducer to inform port state change and fdb, which means orchagent will miss the signal if it happens between orchagent shutdown and startup.
Syncd doens't know whether the signal has been lost or not. Also that notification event is triggered from libsai/SDK.
<2> In current port sync up implementation, we do a get SAI_PORT_ATTR_OPER_STATUS via sai API to have the latest oper status, then update hostif and db accordingly.
<3> For FDB sync up, since syncd does put the latest FDB entry into ASIC DB, yes, we are retrieving the latest FDB from ASIC.
Surely there is some other alternative, like change the Notification mechanism to some db data persistent channel. Under current framework, the above approaches have proven to be working and stable. I should have put more comments to the port state and fdb sync up implementation.
syncd_apply_view() is called after state restore, I guess that could make the life of syncd view comparison a little easier, but it is up to how you want view comparison to work. The libsai redis idempotent API solution doesn't have dependency on syncd_apply_view().
There was a problem hiding this comment.
we can ask syncd to pause sending notification when we are upgrading.
it is better to ask syncd to re-send the previous view to us. we do not orchagent to scan the asic view because the format might change and it can break orchagent code. orchagent should not look at the asic db.
Can you modify you fdb sync and port sync to syncd?
There was a problem hiding this comment.
For port state sync, I don't think there is need to make the solution so complex.
FDB format could be a valid concern, but we should alway be careful when making format change.
My understanding of SONiC design about docker is to decouple the services and make them more independent from each other. But I'm seeing the intent to couple swss and syncd tightly together, what is the real purpose behind this? Why were syncd and orchagent put into different dockers at the first place?
Signed-off-by: Jipan Yang <[email protected]>
…mpToSyncTasks() Signed-off-by: Jipan Yang <[email protected]>
orchagent/orchdaemon.cpp
Outdated
| o->doTask(); | ||
| } | ||
|
|
||
| warmRestoreValidation(); |
There was a problem hiding this comment.
Do we need check the return value of this function and do things needful?
There was a problem hiding this comment.
Yes, once we reach agreement what should be checked before reboot, and enforce the validation after restore.
orchagent/orchdaemon.cpp
Outdated
| syncd_apply_view(); | ||
|
|
||
| // TODO: should be set after port/fdb/arp sync up | ||
| WarmStart::setWarmStartState("orchagent", WarmStart::RECONCILED); |
There was a problem hiding this comment.
Portstate sync and FDB sync functions should be called in this PR, given this PR is dependent on the other two?
neighbor syncd is isolated in a different process, so you don't need call sync function. But I guess we may need check the warmstartstate of neighsyncd before we call syncd_apply_view()?
orchagent/orchdaemon.cpp
Outdated
| getTaskToSync(ts); | ||
| if (ts.size() != 0) | ||
| { | ||
| // TODO: change it to fatal once staged ProducerStateTable/ConsumerStateTable change and |
There was a problem hiding this comment.
This check is not necessary reach the desired result. The problem is if some dependencies were never met (e,g, the nexthop is not resolved for some reason for a particular route), which was the case for pre-reboot, but may kill the whole warm reboot process. Looks like we may want to handle those entries differently or we simply do a loose check (print warning etc) and move forward.
…or now. Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang <[email protected]>
orchagent/orchdaemon.cpp
Outdated
| auto *c = (Executor *)s; | ||
| c->execute(); | ||
|
|
||
| if (restored) |
There was a problem hiding this comment.
if [](start = 8, length = 2)
select() is used for normal processing, and it is not for warm start, because all the data is already added during bake() -> addExistingData().
You can extract the warm start logic and move before the normal processing while-loop. #Closed
There was a problem hiding this comment.
Right. Previously configDb data was retrieved here, it is not needed any more. pre-shutdown warm restart check may need to be processed in select(), and possibly arp sync up event too, but that could be done in separate PRs.
Signed-off-by: Jipan Yang <[email protected]>
952fb48 to
a4f5742
Compare
Signed-off-by: Jipan Yang <[email protected]>
a423e7f to
8f1700e
Compare
| Executor(Selectable *selectable, Orch *orch, const string &name) | ||
| : m_selectable(selectable) | ||
| , m_orch(orch) | ||
| , m_name(name) |
There was a problem hiding this comment.
m_name [](start = 10, length = 6)
I believe the new name field is not necessary, and you could always get it from m_consumerMap key. Could you help explorer the possibility?
There was a problem hiding this comment.
Isn't it a good practice to give executor a fixed unique name? For now it is mainly for debugging, later it may also be used for state validation log.
I noticed that there are even cases where empty string given to to map as key in current code, all those should be changed.
There was a problem hiding this comment.
I agree on "even cases where empty string given to to map as key in current code, all those should be changed." and I will fix them. #Resolved
There was a problem hiding this comment.
I confirmed current name confliction is not a bug, such as multiple named as empty string, because the names are unique in a Orch, not globally.
In reply to: 209830967 [](ancestors = 209830967,209784061)
There was a problem hiding this comment.
I didn't say there was bug there, at least it is working. But it is a bad design putting empty string as key.
The purpose of name related change in this PR is to facilitate debugging and logging via a meaningful name for each excecutor including those timers and Notifiers. I found it especially useful during the warm restart testing since very often you need to trace specific executor whenever problem happened.
orchagent/orch.h
Outdated
|
|
||
| string dumpTuple(KeyOpFieldsValuesTuple &tuple); | ||
| void dumpToSyncTasks(vector<string> &ts); | ||
| void execute(); |
There was a problem hiding this comment.
void execute(); [](start = 4, length = 15)
No need to move this line. #Closed
| string Consumer::dumpTuple(KeyOpFieldsValuesTuple &tuple) | ||
| { | ||
| string s = getTableName() + getConsumerTable()->getTableNameSeparator() + kfvKey(tuple) | ||
| + "|" + kfvOp(tuple); |
There was a problem hiding this comment.
| [](start = 18, length = 1)
getTableNameSeparator() could be '|', do you want to change this '|'? #Closed
There was a problem hiding this comment.
This a good question.
We had it fixed to ":" previously. Changed it to getConsumerTable()->getTableNameSeparator() upon comment. I think one of he benefits of doing this is we'll have the same key format as that in redis DB. Any other idea?
orchagent/orch.cpp
Outdated
|
|
||
| void Consumer::dumpToSyncTasks(vector<string> &ts) | ||
| { | ||
| for (auto &tm :m_toSync) |
There was a problem hiding this comment.
Add one more blank after ':'
orchagent/orch.cpp
Outdated
| return s; | ||
| } | ||
|
|
||
| void Consumer::dumpToSyncTasks(vector<string> &ts) |
There was a problem hiding this comment.
dumpToSyncTasks [](start = 15, length = 15)
dumpTuple() and dumpPendingTasks() are not well designed interfaces.
I agree the observability is important.
There was a problem hiding this comment.
dumpTuple() is existing function name. It is refactored so the common function may be used in multiple places. Any specific issue/suggestion?
There was a problem hiding this comment.
orchagent/main.cpp
Outdated
| #include "saihelper.h" | ||
| #include "notifications.h" | ||
| #include <signal.h> | ||
| #include <warm_restart.h> |
There was a problem hiding this comment.
#include "warm_restart.h" #Closed
orchagent/intfsorch.cpp
Outdated
| if (alias == "lo") | ||
| { | ||
| addIp2MeRoute(ip_prefix); | ||
| // set request for lo may come after warm start restore |
There was a problem hiding this comment.
Is it related to warm start? If not, let's do it in another PR. #Closed
There was a problem hiding this comment.
It is related to warm start. It is noticed that sometime IP data for lo comes after warm start.
There was a problem hiding this comment.
The logic is also applied to cold start. Are you fixing a bug also exists in cold start? if yes, we prefer a separate PR just for it.
In reply to: 209787586 [](ancestors = 209787586)
orchagent/intfsorch.cpp
Outdated
| { | ||
| if (alias == "lo") | ||
| { | ||
| m_syncdIntfses.erase(alias); |
orchagent/fdborch.cpp
Outdated
| } | ||
|
|
||
| // we already have such entries | ||
| if (m_entries.count(update.entry) != 0) |
There was a problem hiding this comment.
count [](start = 22, length = 5)
find(). Actually the if-else below is checking duplication. #Closed
| auto executor = new ExecutableTimer(m_timer.get(), this); | ||
| Orch::addExecutor("CRM_COUNTERS_POLL", executor); | ||
| // The CRM stats needs to be populated again | ||
| m_countersCrmTable->del(CRM_COUNTERS_TABLE_KEY); |
There was a problem hiding this comment.
m_countersCrmTable->del(CRM_COUNTERS_TABLE_KEY); [](start = 4, length = 48)
If it is applied to cold start, let's do it in another PR. #Closed
There was a problem hiding this comment.
For cold start, no data exists for CRM stats table, so it has no effect.
Signed-off-by: Jipan Yang <[email protected]>
|
test vs fails, can you fix? |
|
retest this please |
* Orchagent state restore Signed-off-by: Jipan Yang <[email protected]> * Adpat to the new warm reboot schema Signed-off-by: Jipan Yang <[email protected]> * Set Executor name at constructor phase Signed-off-by: Jipan Yang <[email protected]> * Use common Consumer::dumpTuple() for Orch::recordTuple() and Orch::dumpToSyncTasks() Signed-off-by: Jipan Yang <[email protected]> * Change the validation of pending task after restore to NOTICE level for now. Signed-off-by: Jipan Yang <[email protected]> * Remove unused function call, and update comment format Signed-off-by: Jipan Yang <[email protected]> * Move orchagent state restore and sync up out of main select loop. Signed-off-by: Jipan Yang <[email protected]> * Remove the extra round of data draining processing which is not needed. Signed-off-by: Jipan Yang <[email protected]> * Address review comments Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang [email protected]
What I did
State restore for Orchagent warm start
It includes basic after restore orchagent state validation: No un-met object dependency in orchagent.
Why I did it
How I verified it
VS testing. All existing vs test cases pass.
Details if related
Has dependency on:
sonic-net/sonic-swss-common#211
sonic-net/sonic-swss-common#214
#546
#547
#551
Follow up PRs planned:
<1> Orchagent pre-warm-restart validation.
This is to have a deterministic known state before warm restart, otherwise state restore will be a best effort processing.
<2> state sync up
dynamic data like port state, fdb, arp may change during the restart window.
<3> Warm restart phase state data.
This is to store basic warm restart state data as in #553
And more debug information should be provided like exact list of postponed tasks/executor during restore.