Skip to content

Added code to start a new service to configure TSA as soon as Linecard comes up, start a timer to configure TSB and configure TSB when the timer expires#206

Merged
gechiang merged 3 commits intoAzure:202205from
saksarav-nokia:saksarav-nokia-tsa-tsb
Apr 5, 2024
Merged

Added code to start a new service to configure TSA as soon as Linecard comes up, start a timer to configure TSB and configure TSB when the timer expires#206
gechiang merged 3 commits intoAzure:202205from
saksarav-nokia:saksarav-nokia-tsa-tsb

Conversation

@saksarav-nokia
Copy link
Contributor

@saksarav-nokia saksarav-nokia commented Jan 4, 2024

Why I did it

The traffic loss is seen when the chassis or Linecard is rebooted

How I did it

  1. Start a service startup_tsa_tsb.service when the IMM comes up
  2. This service configures TSA and starts a timer (currently hardcoded to 900secs in device//startup-tsa-tsb.conf
  3. When the timer expires, TSB is configured
  4. Enhanced "sudo TSC" command output to indicate TSB pending with the remaining timer value and service name if the service is running
  5. If the operator issues "sudo TSA" explicitly while the startup_tsa_tsb.service is running, service will be stopped.

How to verify it

  1. Verified that the service is started only in IMM
  2. Verified that the service runs only if device//startup-tsa-tsb.conf file exists
  3. Verified that the TSA is configured and routes are not advertised to eBGP neighbors till TSB timer expires and TSB is configured.
  4. "sudo TSC" shows the TSB Pending with remaining timer interval
  5. Service is stopped when the operator issues "sudo TSA"

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

from threading import Timer
import os.path

def getPlatform():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused the existing function

platform = (subprocess.check_output(['sonic-cfggen', '-d', '-v', platform_key.replace('"',"'")]).strip()).decode()
return platform

def getNumAsics():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused the existing function

return line.split('=')[1].strip()
return 0

def getTsbTimerInterval():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use snake case for functions names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the function names to use snake case

def getTsaConfig(asic_ns):
tsa_config = 'BGP_DEVICE_GLOBAL.STATE.tsa_enabled'
tsa_ena = (getSonicConfig(asic_ns, tsa_config)).decode()
print('{}: {} - CONFIG_DB.{} : {}'.format(__file__, asic_ns, tsa_config, tsa_ena))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please use logger iso print statements so that the message appear in syslog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the print to logger.log_info

def config_tsa():
tsa_ena = get_tsa_status()
if tsa_ena == True:
print("{}: Configuring TSA".format(__file__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please check the indentation in the file. It does not seem consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indentation

return line.split('=')[1].strip()
return 0

def getSonicConfig(ns, config_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this work for single asic linecards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the functions to work for both single asic and multi asic

ConditionPathExists=!/etc/sonic/chassisdb.conf

[Service]
Environment="STARTED_BY_TSA_TSB_SERVICE=1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the service is stopped do we need to reset STARTED_BY_TSA_TSB_SERVICE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset the env variable

@saksarav-nokia
Copy link
Contributor Author

@arlakshm , addressed all your comments. Please check

@gechiang
Copy link
Contributor

gechiang commented Jan 10, 2024

@saksarav-nokia Why is this PR raised against this MSFT repo instead of on the public repo at: "https://github.com/sonic-net/sonic-buildimage"?
Is this a new feature that is not meant for public community?

@saksarav-nokia
Copy link
Contributor Author

@saksarav-nokia Why is this PR raised against this MSFT repo instead of on the public repo at: "https://github.com/sonic-net/sonic-buildimage"? Is this a new feature that is not meant for public community?

@gechiang , It was suggested by MSFT in the meeting .

Description= STARTUP TSA-TSB SERVICE
Requires=updategraph.service database.service
After=updategraph.service database.service
ConditionPathExists=!/etc/sonic/chassisdb.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is need for this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updategraph dependency is to start the tsa-tsb service after config is loaded and chassisdb.conf file check is to start the service only in IMM.

[Unit]
Description= STARTUP TSA-TSB SERVICE
Requires=updategraph.service database.service
After=updategraph.service database.service
Copy link
Contributor

@abdosi abdosi Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sure we are running this service before swss/syncd/teamd/bgp services comes up ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no dependency on these services right now. Ensure this service starts before bgp should be sufficient right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Before=bgp.service"

Environment="STARTED_BY_TSA_TSB_SERVICE=1"
ExecStart=/usr/bin/python3 -u /usr/local/bin/startup_tsa_tsb.py start
ExecStop=/usr/bin/python3 -u /usr/local/bin/startup_tsa_tsb.py stop
RemainAfterExit=yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the behavior of this services in case of config reload and if the swss docker restart ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config reload or swss restart doesn't restart the tsa service

@gechiang
Copy link
Contributor

@saksarav-nokia Why is this PR raised against this MSFT repo instead of on the public repo at: "https://github.com/sonic-net/sonic-buildimage"? Is this a new feature that is not meant for public community?

@gechiang , It was suggested by MSFT in the meeting .

As discussed in the Chassis meeting today. It seems that this PR is supposed to be raised in this MSFT repo for the 202205 branch. Once it is approved and merged and well tested, this "feature" is also required in public master at which time please port it to "https://github.com/sonic-net/sonic-buildimage" master branch.
So the immediate action required for this PR is to move it to 202205 branch of this same msft repo.
Thanks!

@saksarav-nokia saksarav-nokia changed the base branch from master to 202205 January 10, 2024 19:15
@saksarav-nokia saksarav-nokia requested review from a team, qiluo-msft and xumia as code owners January 10, 2024 19:15
…es up, start a timer to configure TSB and configure TSB when the timer expires

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
 1) Reused the existing Python functions
 2) Modified the function names to use snake case
 3) Changed the print to logger.log_info
 4) Fixed the indentation
 5) Modified it to work for both single and multi asic
 6) Reset the env variable
 7) When user issues TSB when the service is running, stop the service

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
@saksarav-nokia
Copy link
Contributor Author

@gechiang , i moved the PR to 202205

@rlhui
Copy link
Contributor

rlhui commented Jan 10, 2024

please replace IMM with LC in the PR title so it's more generic

@saksarav-nokia saksarav-nokia changed the title Added code to start a new service to configure TSA as soon as IMM comes up, start a timer to configure TSB and configure TSB when the timer expires Added code to start a new service to configure TSA as soon as Linecard comes up, start a timer to configure TSB and configure TSB when the timer expires Jan 11, 2024
@saksarav-nokia
Copy link
Contributor Author

please replace IMM with LC in the PR title so it's more generic

Changed the title

return tsa_ena

def config_tsb():
logger.log_info("startup_tsa_tsb: Configuring TSB")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging at line 70 and 80 could be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

return

def stop_tsa_tsb():
reset_env_variables()
Copy link
Contributor

@judyjoseph judyjoseph Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the user entered TSA/TSB command manually, shouldn't we explicitly stop/cancel timer thread which we started earlier in start_tsb_timer() ? Or does the systemd cleans it up during exit

Here we just reset the env variables only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will set the daemon flag for the Timer thread so that the thread will exit when the main thread is killed by the service stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph , i tested with prctl and the threads started by the main thread are stopped when the service is stopped and main python script is killed with SIGTERM. So we don't need to explicitly stop the timer.

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
@abdosi
Copy link
Contributor

abdosi commented Jan 22, 2024

cc @anamehra for viz.

@gechiang gechiang merged commit 51a6a31 into Azure:202205 Apr 5, 2024
@saksarav-nokia saksarav-nokia mentioned this pull request Apr 18, 2024
10 tasks
prabhataravind pushed a commit that referenced this pull request Jul 7, 2025
…utomatically (#21565)

#### Why I did it
src/sonic-host-services
```
* 69788c2 - (HEAD -> master, origin/master, origin/HEAD) Fix: Set default values for kdump enhancement (#217) (3 days ago) [Muhammad Ali Hussnain]
* d7e8021 - [chassis] make multi_asic_ns_to_host_fwd False for EXTERNAL_CLIENT acl service (#218) (4 days ago) [Arvindsrinivasan Lakshmi Narasimhan]
* d925457 - Fix sudo command failed because root user password expired by password hardening feature issue (#215) (4 days ago) [Hua Liu]
* 7e1b280 - reverted ssh_key to ssh_string (#216) (6 days ago) [Muhammad Ali Hussnain]
* 741a9df - Handle NotImplementedError in determine-reboot-cause that will be thrown on VS Chassis platform (#211) (7 days ago) [Changrong Wu]
* 47f6feb - check sonic-installer message and return proper error code for when image doesn't exist. (#210) (7 days ago) [Dawei Huang]
* 1fe9a76 - Updated Key and Path (#209) (7 days ago) [Muhammad Ali Hussnain]
* bb0a31c - Adding support for persistent storage and retrieval of DPU reboot-cause (#169) (8 days ago) [rameshraghupathy]
* 5e08927 - [hostcfgd] Fix the state machine during eth0 default route check failure (#196) (2 weeks ago) [Vivek]
* d2cc1a8 - Add ImageService.set_next_boot for GNOI Activate OS. (#207) (3 weeks ago) [Dawei Huang]
* 9c49913 - register image_service and docker_service. (#208) (3 weeks ago) [Dawei Huang]
* d88d8d0 - Implementation for ImageService.List (#206) (4 weeks ago) [Dawei Huang]
* d7e4df5 - Enabled configuring the default number of kdumps in Linux. (#202) (4 weeks ago) [Mridul Bajpai]
* ca9d329 - kdump-remote feature in hostcfgd (#166) (4 weeks ago) [Muhammad Ali Hussnain]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Jul 24, 2025
…AD automatically (#1396)

#### Why I did it
src/sonic-utilities
```
* 7941e530 - (HEAD -> 202503, origin/202503) Merge pull request #206 from mssonicbld/sonicbld/202503-merge (23 hours ago) [mssonicbld]
* 7e808a81 - Merge branch '202412' of https://github.com/Azure/sonic-utilities.msft into 202503 (23 hours ago) [Sonic Automation]
* 992831ac - (origin/202412) fix show bgp cli on multiple asic device (#205) (24 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Feb 14, 2026
…tomatically (#1983)

#### Why I did it
src/sonic-swss
```
* b7678f23 - (HEAD -> 202412, origin/202412) [action] [PR:4193] [countersyncd] fix otel actor log level (#204) (22 hours ago) [mssonicbld]
* 803d2d78 - [action] [PR:4197] [countersyncd]: Modify the exit behavior of the main function (#206) (22 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants