Skip to content

Warm reboot: Add basic schema for warm start in configDB and stateDB.#553

Merged
lguohan merged 5 commits intosonic-net:masterfrom
jipanyang:warm_reboot_schema
Aug 8, 2018
Merged

Warm reboot: Add basic schema for warm start in configDB and stateDB.#553
lguohan merged 5 commits intosonic-net:masterfrom
jipanyang:warm_reboot_schema

Conversation

@jipanyang
Copy link
Copy Markdown
Contributor

Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com

What I did
Basic schema for warm start
Configuration: Mainly to decide whether to flush the DB data and some timer setting for arp/route sync.

AppDB: Store basic warm restore and sync up state and restart count for each process.

Why I did it

How I verified it

Details if related

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@jipanyang jipanyang changed the title Add basic schema for warm start schema in configDB and application DB. Warm reboot: Add basic schema for warm start schema in configDB and application DB. Aug 1, 2018
key = WARM_RESTART_TABLE:process_name ; process_name is unique process identifier
restart_count = 1*10DIGIT ; a number between 0 and 2147483647,
; count of warm start times.
state = "init" / "restored" / "synced"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add definition for each phase?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored: restore the previous
synced-> reconciled-> reconcile the previous view and new view

;Status: work in progress

key = WARM_RESTART:name ; name is the name of SONiC docker or "system" for global configuration
enable = "true" / "false" ; default value as false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timer to determine when to do the reconciliation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had agreed to remove this docker-specific knob to turn on/off this feature, and make use of a system-wide knob instead, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the agreement is to keep both docker-specific knob and system-wide knob.
The docker-specific knob will mainly be used by developer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to have both knobs which give convenience . Can we be more specific in the document how system wide knob work on conjunction with docker knob?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I like the idea of keeping a docker-specific knob for tshooting purposes.

Copy link
Copy Markdown
Contributor Author

@jipanyang jipanyang Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected behavior:
<1> If system wide knob is enabled, then docker level knob will be ignored.
There is no point doing system warm reboot while specific docker warm restart is disabled.
<2> If system wide knob is disabled, docker level knob will be checked.
The use case is that we may want to docker level upgrade for dockers like bgp and swss.
Will put more info in the schema comment section.


### Configuration files
---------------------------------------------
### WARM\_RESTART\_TABLE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be put into state db.

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@jipanyang
Copy link
Copy Markdown
Contributor Author

@zhenggen-xu @rodnymolina Please help review it.

@jipanyang jipanyang changed the title Warm reboot: Add basic schema for warm start schema in configDB and application DB. Warm reboot: Add basic schema for warm start in configDB and stateDB. Aug 1, 2018

key = WARM_RESTART:name ; name is the name of SONiC docker or "system" for global configuration
enable = "true" / "false" ; default value as false
timer = 1*4DIGIT ; timer to determine when reconciliation is done, in seconds.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment that "not all docker will need the reconciliation timer, for example if BGP docker use GR then as soon as the EOR message is received, the reconciliation is finished."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of points here...

  • The bgp-restartability solution that i've been working on does not rely on EOR functionality -- neither FRR nor Quagga are capable today of generating this message from BGP down to Zebra.

  • Even if we had an EOR message coming to fpmSyncd, we would still need a timer to make sure that the reconciliation process is always executed after a reasonable amount of time has elapsed -- the EOR message could get lost or not being generated due to a bug or diverse reasons.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, as discussed, i'm fine with this docker/feature-specific timer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neighborsyncd will need the timer as well, this timer is not necessarily docker specific, but more process specific.

BTW: Should we keep the table key/fields definition in some file, so we know what exactly the tables are supported?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timer is more process specific, some update here is necessary since the key here is either "system" or docker name. For now, we don't want to have support for process level warm restart knob(?) .

I guess this swss-schema.md file is a good place to keep the key/fields definitions for all the DBs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little bit flexibility will help:

timer_name            = 1*255VCHAR           ; timer_name is the name of warm start timer for the whole system or the specific docker, 
                                             ; Ex. "neighbor_timer", "bgp_timer".
                                             ; There could be more than one timer in a docker and the system.
timer_duration        = 1*4DIGIT             ; timer duration, in seconds.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think we need put the table key/fields definitions in the code (e,g, .h file) as well, so it is synced by all code. This can be done later.

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

enable = "true" / "false" ; Default value as false.
; If "system" warm start knob is true, docker level knob will be ignored.
; If "system" warm start knob is false, docker level knob takes effect.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker knob always override system level knob, that was the use cases we dicussed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky point is the default value is false. If docker knob always override system level knob, when warm restart is enabled at system level, each docker has to be enabled too.

What is the use case of having warm restart enabled at system level, while disabled for certain docker?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if docker knob does not have this attribute, then use system level.

the use case was rodney's, snmp docker snmp warm reboot is available but experimental, so in production he wants to explicitly disable it although system level is enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodnymolina for snmp case, probably it should alway do cold boot as other dockers like lldp, dhcp_relay until its warm reboot support is stable enough.

If the existence of certain attribute is treated as one condition, then it becomes tri-state variable: true/false/not-exist, to me, it seems error prone. We don't have such case in sonic yet(?).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, more specific knobs take precedence over global configuration. I think we can use this principle here as well.
If we don't like the not-exist case, we need this knob for every docker. For the time-being, we can define the knobs for dockers which have code to support warm reboot, for the ones do not support warm reboot, they do not look at the knobs anyways.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If docker knob always takes precedence over system knob, the not-exist case is asked implicitly, otherwise why system knob is needed if it gets preempted all the time. That means the default value for enable is not-exist.

The knobs are also used to control whether to flush DB.
Ex. based on the current schema, if system knob or one docker knob enabled, we keep the appDB and some other DBs data.

But if docker knob always takes precedence, and it is disabled, should the DB be flushed or not. If DB is not flushed, the specific docker will have to handle the scenarios of DB with existing data though it is supposed to do a cold start. If DB is flushed, how about other dockers?

Better not to support the case of system knob enabled, while docker knob disabled and takes effect. That is one of the reasons for If "system" warm start knob is true, docker level knob will be ignored.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just simplify it by putting it into DEVICE_METADATA:localhost

warm_restart@: none, system, or a list of individual docker names

for most of use cases, it is system or none. if a user/developer wants to customized, he will specify a list of docker names.

warm_restart_timer_bgp: time_duration
warm_restart_timer_neighbor: time_duration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few un-documented configurations have been put under DEVICE_METADATA.
If the same set of warm restart configurations are used, squashing them into DEVICE_METADATA:localhost doesn't seem to bring any benefit other than to make the configuration more obfuscated. Also more parameters might be needed for warm restart.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jipanyang As talked, this "system" knob was meant for whole system warm reboot, not a global setting of docker restart, that would lead to a different discussion.

Could you please help to provide (the current) work flow of using these knobs during the shutdown and bringing up of the dockers/system? We can then think whether the whole work flow make sense and keeps everybody on the same page.


enable = "true" / "false" ; Default value as false.
; If "system" warm start knob is true, docker level knob will be ignored.
; If "system" warm start knob is false, docker level knob takes effect.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @zhenggen-xu , a little more clarification about the use cases for system knob and docker knobs.
<1> Upon system reboot, the system enable knob will be checked. If enabled, database data will be preserved, if not, database will be flushed.
No need to check docker level knobs in this case since the whole system is being rebooted .

<2> Upon docker service start, first to check system knob, if enabled, docker warm start should be performed, otherwise system warm reboot will be ruined. If system knob disabled, while docker knob enabled, this is likely an individual docker warm restart request.

Within each application, when the system knob or docker knob enabled, we do further check on the actual warm start state ( restart_count), if no warm start state data available, the database has been flushed, do cold start. Otherwise warm start.

The exact logic has been implemented in WarmStart::checkWarmStart():
https://github.com/Azure/sonic-swss/pull/547/files

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this logic.

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@lguohan lguohan merged commit 674a5e6 into sonic-net:master Aug 8, 2018
@jipanyang jipanyang deleted the warm_reboot_schema branch February 9, 2019 02:32
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
…sonic-net#553)

* Add basic schema for warm start schema in configDB and application DB.

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Move warm start table for process stats to state DB

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Add reconciliation timer entry in configDB warm restart table.

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Update warm restart timer schema

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* There might be more than one timer at system level or individual docker

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants