Skip to content

Warm reboot: Orchagent state restore#554

Merged
lguohan merged 12 commits intosonic-net:masterfrom
jipanyang:warm_reboot_collab_4_orchagent
Aug 16, 2018
Merged

Warm reboot: Orchagent state restore#554
lguohan merged 12 commits intosonic-net:masterfrom
jipanyang:warm_reboot_collab_4_orchagent

Conversation

@jipanyang
Copy link
Contributor

Signed-off-by: Jipan Yang [email protected]

What I did
State restore for Orchagent warm start

It includes basic after restore orchagent state validation: No un-met object dependency in orchagent.

Why I did it

How I verified it
VS testing. All existing vs test cases pass.

Details if related

Has dependency on:
sonic-net/sonic-swss-common#211
sonic-net/sonic-swss-common#214
#546
#547
#551

Follow up PRs planned:
<1> Orchagent pre-warm-restart validation.
This is to have a deterministic known state before warm restart, otherwise state restore will be a best effort processing.

<2> state sync up
dynamic data like port state, fdb, arp may change during the restart window.

<3> Warm restart phase state data.
This is to store basic warm restart state data as in #553
And more debug information should be provided like exact list of postponed tasks/executor during restore.

@jipanyang jipanyang force-pushed the warm_reboot_collab_4_orchagent branch from 4601adc to 2080ddb Compare July 30, 2018 07:12
@jipanyang jipanyang changed the title Orchagent state restore Warm reboot: Orchagent state restore Aug 1, 2018
@jipanyang jipanyang force-pushed the warm_reboot_collab_4_orchagent branch from b18b510 to 17398e8 Compare August 9, 2018 01:02
@lguohan
Copy link
Contributor

lguohan commented Aug 9, 2018

need new vs test to test warm boot scenario

auto interv = timespec { .tv_sec = COUNTERS_READ_INTERVAL, .tv_nsec = 0 };
auto timer = new SelectableTimer(interv);
auto executor = new ExecutableTimer(timer, this);
executor->setName("ACL_POLL_TIMER");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just change the constructor to set the name.

not sure if it is useful to introduce setName method, since we do not (probably should not) allow change the name.

+ "|" + kfvOp(tuple);
for (auto i = kfvFieldsValues(tuple).begin(); i != kfvFieldsValues(tuple).end(); i++)
{
s += "|" + fvField(*i) + ":" + fvValue(*i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hardcode the separator.

getTaskToSync(ts);
if (ts.size() != 0)
{
// TODO: change it to fatal once staged ProducerStateTable/ConsumerStateTable change and
Copy link
Contributor

@lguohan lguohan Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For planned warm restart, there is one PR https://github.com/Azure/sonic-swss/pull/562/files for enforcing orchagent to be in certain state before stopping orchagent/swss docker. After orchagent is restored, it should gain back the same state.

But there is open issue about the exact criteria of warm restart readiness. Currently we want to make sure that there is no un-met object dependency in orchagent which means the "SyncMap m_toSync;" of each Consumer is empty. @zhenggen-xu ever expressed concern over this check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary reach the desired result. The problem is if some dependencies were never met (e,g, the nexthop is not resolved for some reason for a particular route), which was the case for pre-reboot, but may kill the whole warm reboot process. Looks like we may want to handle those entries differently or we simply do a loose check (print warning etc) and move forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow this scenario (there is unfulfilled dependency) for planned warm restart?

The possible alternative is to add a knob to have two levels of pre-reboot check:

  1. Allow unfulfilled dependency.
  2. disallow unfulfilled dependency.
    After warm restore, we do the check accordingly.

With no intention to make it a hot topic here and block everything else, I'll change the log level to NOTICE and continue for now.

Please go to #562 for further discussion.


m_orchList.push_back(gAclOrch);
m_orchList.push_back(gFdbOrch);
m_orchList.push_back(mirror_orch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move mirror orch and acl orch after fdb orch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to reduce the possible turbulence caused by the dependency relaxation processing during state restore.
mirror orch may use fdb mac/vlan info.
if (!m_portsOrch->getPort(session.neighborInfo.neighbor.alias, session.neighborInfo.port))
and acl orch may check mirror session
"if (!m_pMirrorOrch->getSessionState(m_sessionName, state))"

syncd_apply_view();

// TODO: should be set after port/fdb/arp sync up
WarmStart::setWarmStartState("orchagent", WarmStart::RECONCILED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we sync the port status and fdb notifications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port state sync: #557
FDB state sync: #558
I think @zhenggen-xu is to propose PR for arp sync up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Portstate sync and FDB sync functions should be called in this PR, given this PR is dependent on the other two?

neighbor syncd is isolated in a different process, so you don't need call sync function. But I guess we may need check the warmstartstate of neighsyncd before we call syncd_apply_view()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should take a different approach to solve port oper status and fdb sync in orchagent.

Since we have information in asic db, we should just send syncd a command and ask syncd to enumerate all state in asic db and regenerate all the events back to orchagent.

The ground truth is in asic_db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have more in-depth discussion of port state sync up and fdb sync up in the corresponding PRs. Here is some quick recap:

<1> Current SONiC use NotificationProducer to inform port state change and fdb, which means orchagent will miss the signal if it happens between orchagent shutdown and startup.

Syncd doens't know whether the signal has been lost or not. Also that notification event is triggered from libsai/SDK.

<2> In current port sync up implementation, we do a get SAI_PORT_ATTR_OPER_STATUS via sai API to have the latest oper status, then update hostif and db accordingly.

<3> For FDB sync up, since syncd does put the latest FDB entry into ASIC DB, yes, we are retrieving the latest FDB from ASIC.

Surely there is some other alternative, like change the Notification mechanism to some db data persistent channel. Under current framework, the above approaches have proven to be working and stable. I should have put more comments to the port state and fdb sync up implementation.

syncd_apply_view() is called after state restore, I guess that could make the life of syncd view comparison a little easier, but it is up to how you want view comparison to work. The libsai redis idempotent API solution doesn't have dependency on syncd_apply_view().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can ask syncd to pause sending notification when we are upgrading.

it is better to ask syncd to re-send the previous view to us. we do not orchagent to scan the asic view because the format might change and it can break orchagent code. orchagent should not look at the asic db.

Can you modify you fdb sync and port sync to syncd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For port state sync, I don't think there is need to make the solution so complex.
FDB format could be a valid concern, but we should alway be careful when making format change.

My understanding of SONiC design about docker is to decouple the services and make them more independent from each other. But I'm seeing the intent to couple swss and syncd tightly together, what is the real purpose behind this? Why were syncd and orchagent put into different dockers at the first place?

o->doTask();
}

warmRestoreValidation();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need check the return value of this function and do things needful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once we reach agreement what should be checked before reboot, and enforce the validation after restore.

syncd_apply_view();

// TODO: should be set after port/fdb/arp sync up
WarmStart::setWarmStartState("orchagent", WarmStart::RECONCILED);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Portstate sync and FDB sync functions should be called in this PR, given this PR is dependent on the other two?

neighbor syncd is isolated in a different process, so you don't need call sync function. But I guess we may need check the warmstartstate of neighsyncd before we call syncd_apply_view()?

getTaskToSync(ts);
if (ts.size() != 0)
{
// TODO: change it to fatal once staged ProducerStateTable/ConsumerStateTable change and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary reach the desired result. The problem is if some dependencies were never met (e,g, the nexthop is not resolved for some reason for a particular route), which was the case for pre-reboot, but may kill the whole warm reboot process. Looks like we may want to handle those entries differently or we simply do a loose check (print warning etc) and move forward.

auto *c = (Executor *)s;
c->execute();

if (restored)
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [](start = 8, length = 2)

select() is used for normal processing, and it is not for warm start, because all the data is already added during bake() -> addExistingData().

You can extract the warm start logic and move before the normal processing while-loop. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Previously configDb data was retrieved here, it is not needed any more. pre-shutdown warm restart check may need to be processed in select(), and possibly arp sync up event too, but that could be done in separate PRs.

@jipanyang jipanyang force-pushed the warm_reboot_collab_4_orchagent branch from 952fb48 to a4f5742 Compare August 13, 2018 03:33
@jipanyang jipanyang force-pushed the warm_reboot_collab_4_orchagent branch from a423e7f to 8f1700e Compare August 13, 2018 03:39
Executor(Selectable *selectable, Orch *orch, const string &name)
: m_selectable(selectable)
, m_orch(orch)
, m_name(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_name [](start = 10, length = 6)

I believe the new name field is not necessary, and you could always get it from m_consumerMap key. Could you help explorer the possibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a good practice to give executor a fixed unique name? For now it is mainly for debugging, later it may also be used for state validation log.
I noticed that there are even cases where empty string given to to map as key in current code, all those should be changed.

Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on "even cases where empty string given to to map as key in current code, all those should be changed." and I will fix them. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by #575


In reply to: 209784061 [](ancestors = 209784061)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed current name confliction is not a bug, such as multiple named as empty string, because the names are unique in a Orch, not globally.


In reply to: 209830967 [](ancestors = 209830967,209784061)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't say there was bug there, at least it is working. But it is a bad design putting empty string as key.
The purpose of name related change in this PR is to facilitate debugging and logging via a meaningful name for each excecutor including those timers and Notifiers. I found it especially useful during the warm restart testing since very often you need to trace specific executor whenever problem happened.

orchagent/orch.h Outdated

string dumpTuple(KeyOpFieldsValuesTuple &tuple);
void dumpToSyncTasks(vector<string> &ts);
void execute();
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void execute(); [](start = 4, length = 15)

No need to move this line. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

string Consumer::dumpTuple(KeyOpFieldsValuesTuple &tuple)
{
string s = getTableName() + getConsumerTable()->getTableNameSeparator() + kfvKey(tuple)
+ "|" + kfvOp(tuple);
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| [](start = 18, length = 1)

getTableNameSeparator() could be '|', do you want to change this '|'? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a good question.
We had it fixed to ":" previously. Changed it to getConsumerTable()->getTableNameSeparator() upon comment. I think one of he benefits of doing this is we'll have the same key format as that in redis DB. Any other idea?


void Consumer::dumpToSyncTasks(vector<string> &ts)
{
for (auto &tm :m_toSync)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one more blank after ':'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return s;
}

void Consumer::dumpToSyncTasks(vector<string> &ts)
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumpToSyncTasks [](start = 15, length = 15)

dumpTuple() and dumpPendingTasks() are not well designed interfaces.

I agree the observability is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumpTuple() is existing function name. It is refactored so the common function may be used in multiple places. Any specific issue/suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will investigate and hopefully submit a PR


In reply to: 209787965 [](ancestors = 209787965)

#include "saihelper.h"
#include "notifications.h"
#include <signal.h>
#include <warm_restart.h>
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "warm_restart.h" #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

if (alias == "lo")
{
addIp2MeRoute(ip_prefix);
// set request for lo may come after warm start restore
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it related to warm start? If not, let's do it in another PR. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related to warm start. It is noticed that sometime IP data for lo comes after warm start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is also applied to cold start. Are you fixing a bug also exists in cold start? if yes, we prefer a separate PR just for it.


In reply to: 209787586 [](ancestors = 209787586)

{
if (alias == "lo")
{
m_syncdIntfses.erase(alias);
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same. #Closed

}

// we already have such entries
if (m_entries.count(update.entry) != 0)
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count [](start = 22, length = 5)

find(). Actually the if-else below is checking duplication. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

auto executor = new ExecutableTimer(m_timer.get(), this);
Orch::addExecutor("CRM_COUNTERS_POLL", executor);
// The CRM stats needs to be populated again
m_countersCrmTable->del(CRM_COUNTERS_TABLE_KEY);
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_countersCrmTable->del(CRM_COUNTERS_TABLE_KEY); [](start = 4, length = 48)

If it is applied to cold start, let's do it in another PR. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cold start, no data exists for CRM stats table, so it has no effect.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@lguohan
Copy link
Contributor

lguohan commented Aug 15, 2018

test vs fails, can you fix?

@lguohan
Copy link
Contributor

lguohan commented Aug 16, 2018

retest this please

@lguohan lguohan merged commit 8bfdea0 into sonic-net:master Aug 16, 2018
@jipanyang jipanyang deleted the warm_reboot_collab_4_orchagent branch February 9, 2019 02:32
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
* Orchagent state restore

Signed-off-by: Jipan Yang <[email protected]>

* Adpat to the new warm reboot schema

Signed-off-by: Jipan Yang <[email protected]>

* Set Executor name at constructor phase

Signed-off-by: Jipan Yang <[email protected]>

* Use common Consumer::dumpTuple() for Orch::recordTuple() and Orch::dumpToSyncTasks()

Signed-off-by: Jipan Yang <[email protected]>

* Change the validation of pending task after restore to NOTICE level for now.

Signed-off-by: Jipan Yang <[email protected]>

* Remove unused function call, and update comment format

Signed-off-by: Jipan Yang <[email protected]>

* Move orchagent state restore and sync up out of main select loop.

Signed-off-by: Jipan Yang <[email protected]>

* Remove the extra round of data draining processing which is not needed.

Signed-off-by: Jipan Yang <[email protected]>

* Address review comments

Signed-off-by: Jipan Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants