Skip to content

[hostcfgd.service] start after interfaces-config.service#63

Closed
stepanblyschak wants to merge 1 commit intomasterfrom
fix-hostcfgd-networking-race
Closed

[hostcfgd.service] start after interfaces-config.service#63
stepanblyschak wants to merge 1 commit intomasterfrom
fix-hostcfgd-networking-race

Conversation

@stepanblyschak
Copy link
Owner

This change fixes a race condition where the first ConfigDBConnector.connect() passes, however due to network stack gets reloaded in parallel it could be that the next call from KdumpCfg to 'sonic-kdump-config' command which invokes 'sonic-installer' which internally tries to connect to the database we might get a traceback:

Traceback (most recent call last):', '  File "/usr/local/bin/sonic-installer", line 5, in <module>, 
    from sonic_installer.main import sonic_installer', '  File "/usr/local/lib/python3.9/dist-packages/sonic_installer/main.py", line 7, in <module>', 
    import utilities_common.cli as clicommon', '  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 189, in <module>',
     iface_alias_converter = InterfaceAliasConverter()', '  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 126, in __init__', 
    self.port_dict = multi_asic.get_port_table()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 301, in get_port_table',
     ports = get_port_table_for_asic(ns)', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 315, in get_port_table_for_asic',
     config_db = connect_config_db_for_ns(namespace)', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 47, in connect_config_db_for_ns',
     config_db.connect()',
   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1829, in connect', 
   return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)', 
'RuntimeError: Unable to connect to redis: Cannot assign requested address

And an error in the log during config reload:

Jan 25 22:28:03.209985 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --disable - failed: return code - 1, output:#012None
Jan 25 22:28:03.528764 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --memory 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M - failed: return code - 1, output:#012None

Why I did it

To fix the errors in the log seen during config reload:

Jan 25 22:28:03.209985 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --disable - failed: return code - 1, output:#012None
Jan 25 22:28:03.528764 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --memory 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M - failed: return code - 1, output:#012None

How I did it

Make hostcfg start after interfaces-config.service

How to verify it

Run config reload and verify no errors from hostcfgd.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

This change fixes a race condition where the first ConfigDBConnector.connect() passes, however due to network stack gets reloaded in parallel it could be that the next call from KdumpCfg to 'sonic-kdump-config' command which invokes 'sonic-installer' which internally tries to connect to the database we might get a traceback:

```
Traceback (most recent call last):', '  File "/usr/local/bin/sonic-installer", line 5, in <module>, 
    from sonic_installer.main import sonic_installer', '  File "/usr/local/lib/python3.9/dist-packages/sonic_installer/main.py", line 7, in <module>', 
    import utilities_common.cli as clicommon', '  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 189, in <module>',
     iface_alias_converter = InterfaceAliasConverter()', '  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 126, in __init__', 
    self.port_dict = multi_asic.get_port_table()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 301, in get_port_table',
     ports = get_port_table_for_asic(ns)', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 315, in get_port_table_for_asic',
     config_db = connect_config_db_for_ns(namespace)', '  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 47, in connect_config_db_for_ns',
     config_db.connect()',
   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1829, in connect', 
   return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)', 
'RuntimeError: Unable to connect to redis: Cannot assign requested address
```

And an error in the log during config reload:

```
Jan 25 22:28:03.209985 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --disable - failed: return code - 1, output:#012None
Jan 25 22:28:03.528764 r-moose-simx-161 ERR hostcfgd: sonic-kdump-config --memory 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M - failed: return code - 1, output:#012None
```
stepanblyschak pushed a commit that referenced this pull request May 17, 2022
[master][sonic-linkmgrd] submodule updates

df51322 Longxiang Lyu   Fri May 6 10:01:46 2022 +0800   Add `ActiveActiveStateMachine` implementation (#64)
e721ceb Jing Zhang      Wed May 4 10:07:14 2022 -0700   Add doc for default route related changes  (#63)
7bb06fb Jing Zhang      Tue May 3 09:48:28 2022 -0700   Add Cli support to enable or disable default route related feature (#68)
e4b02cb Jing Zhang      Mon May 2 13:27:54 2022 -0700   Reset WaitActiveUp count before switching to active (#70)
212d960 Jing Zhang      Wed Apr 27 10:35:05 2022 -0700  lower log level to warning (#69)
48abc9e Jing Zhang      Thu Apr 14 16:50:04 2022 -0700  Add support to enable switchover time measurement (with link prober interval decreased to 10ms) feature  (#61)
c4858a6 Jing Zhang      Thu Apr 14 11:27:55 2022 -0700  Avoid proactively switching to `active` if default route is missing  (#62)

sign-off: Jing Zhang zhangjing@microsoft.com
stepanblyschak pushed a commit that referenced this pull request May 26, 2022
[sonic-linkmgrd][202012] submodule update
3d13ff2 Jing Zhang      Wed May 4 10:07:14 2022 -0700   Add doc for default route related changes  (#63)
c703be4 Jing Zhang      Mon May 2 13:27:54 2022 -0700   Reset WaitActiveUp count before switching to active (#70)
86eb727 Jing Zhang      Wed Apr 27 10:35:05 2022 -0700  lower log level to warning (#69)
e22c736 Jing Zhang      Mon May 2 13:33:24 2022 -0700   [202012] Avoid proactively switching to active if default route is missing (#67)
d4f282b Jing Zhang      Thu Apr 28 18:35:11 2022 -0700  [202012] Add support to enable switchover time measurement (with link prober interval decreased to 10ms) feature (#66)

sign-off: Jing Zhang [zhangjing@microsoft.com](mailto:zhangjing@microsoft.com)
stepanblyschak pushed a commit that referenced this pull request Nov 21, 2023
…ically (sonic-net#17207)

#### Why I did it
src/sonic-dbsyncd
```
* e294eb0 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#63) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request May 9, 2025
…ly (sonic-net#22416)

#### Why I did it
src/sonic-stp
```
* a80676a - (HEAD -> master, origin/master, origin/HEAD) MSTP utility APIs (#62) (2 days ago) [Divya Kumaran Chandralekha]
* 2c3eccc - MSTP extern API declaration (#63) (2 days ago) [Divya Kumaran Chandralekha]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request May 20, 2025
…lly (sonic-net#917)

#### Why I did it
src/sonic-swss
```
* c1cbd2f - (HEAD -> 202412, origin/202412) Merge pull request #63 from mssonicbld/sonicbld/202412-merge (11 hours ago) [mssonicbld]
* 984aef3 - Merge branch '202411' of https://github.com/sonic-net/sonic-swss into 202412 (16 hours ago) [Sonic Automation]
* 19ad393 - Initialize Port oper error status map only once (sonic-net#3545) (13 days ago) [mssonicbld]
* 5c36279 - [copp]: Use non-zero trap priority for default trap group (sonic-net#3544) (2 weeks ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stepanblyschak pushed a commit that referenced this pull request Jul 22, 2025
…ically (sonic-net#23173)

#### Why I did it
src/sonic-dash-ha
```
* 92d6243 - (HEAD -> master, origin/master, origin/HEAD) Initialize hamgrd from config_db and connect to swbusd (#63) (12 hours ago) [yue-fred-gao]
* d42013b - Dynamic actor creation and deletion (#64) (21 hours ago) [yue-fred-gao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants