Skip to content

T2-VOQ-Chassis: VS support#18512

Merged
rlhui merged 8 commits intosonic-net:masterfrom
deepak-singhal0408:deepsinghal/voq_chassis_vs_support
Apr 19, 2024
Merged

T2-VOQ-Chassis: VS support#18512
rlhui merged 8 commits intosonic-net:masterfrom
deepak-singhal0408:deepsinghal/voq_chassis_vs_support

Conversation

@deepak-singhal0408
Copy link
Copy Markdown
Contributor

@deepak-singhal0408 deepak-singhal0408 commented Mar 29, 2024

Why I did it

Changes to ensure sonic_vs.img has everything required for it to be emulated as T2-VOQ-Chassis: Supervisor/Linecard..
With this change,

  1. The image could be used to emulate Supervisor or Linecard
  2. the image could be used to emulate different Linecard HWSKus.
Work item tracking
  • Microsoft ADO (number only):
    27402561

How I did it

Following Changes are made as part of this PR:

  1. On VS image, Database containers to be recreated again upon reboot. This will help change single asic VS image to multi-asic(docker_image_ctl.j2).
  2. Copy all Sup and Linecard HWSKUs directories under kvm_platform directory. Create lanemap.ini, coreportindexmap.ini, fabriclanemap.ini under each hwsku/Asic directory. Copy asic.conf file from each platform directories to their child HW_SKU directories under kvm platform directory (This will help emulate different HW-Sku's with their respective num_asic asics): (sonic-device-data/Makefile).
  3. New sonic-platform package for VS platforms. This is needed
    3.1: To ensure, that database containers bring up goes through. The database containers on linecards fetch the supervisor slot_num, current_slot num etc.
    PS: This package will be available on pizza box vs platforms as well. However, it will be noop there, as the package expects a metadata file, which will only be available on chassis VS platforms.
  4. Changes to generate unique mac-address on VS platforms. This is done by using the device_hostname string (which will always be unique). Change to provide unique MAC address per asic on multi-asic VOQ VS platforms(sonic-cfggen, device_info.py, minigraph.py)
  5. topology.service file to be dependent on sonic.target so that as part of config load_minigraph/config reload this service gets invoked.
  6. topology.sh changes to move ports to their respective namespace (only applicable on multi-asic platforms)

How to verify it

  1. Verified that sonic-vs.img.gz gets built succesfully and single asic VS DUT comes up fine(deployed vms-kvm-t0 topology) and could see that all containers come up fine. The show commands work as expected.
  2. Verified bringup on VOQ chassis with different flavor of linecard emulation.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copy link
Copy Markdown
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments.

qiluo-msft pushed a commit to sonic-net/sonic-utilities that referenced this pull request Apr 3, 2024
…age support for VS (#3250)

### What I did
For T2-Chassis VS support, we are adding new sonic_platform package for vs platforms. Please refer sonic-net/sonic-buildimage#18512 for more details.
Due to this new platform package, need to modify excpetion handling as now the Module would be found, but the metadata file will not be found for pizzabox vs platforms.

#### How I did it
Modified the exception handling logic.
MSFT ADO: 27414904

#### How to verify it
Bring up vms-kvm-t0 topology. ran show interface status. The output is proper.

PS: the Main PR(sonic-net/sonic-buildimage#18512) is dependent on this PR to be merged in first.
@saiarcot895
Copy link
Copy Markdown
Contributor

Looks like there's a new loganalyzer error message on t0:

Apr  8 07:23:49.040082 vlab-01 ERR sfputil: Failed to instantiate Chassis due to FileNotFoundError('Metadata file /etc/sonic/vs_chassis_metadata.json not found')

Is this expected? Does it need to be added to the ignore list?

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

Looks like there's a new loganalyzer error message on t0:

Apr  8 07:23:49.040082 vlab-01 ERR sfputil: Failed to instantiate Chassis due to FileNotFoundError('Metadata file /etc/sonic/vs_chassis_metadata.json not found')

Is this expected? Does it need to be added to the ignore list?

Thanks @saiarcot895 .. Added above to loganalyzer_ignore file as discussed. PR sonic-net/sonic-mgmt#12345

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 18512 in repo sonic-net/sonic-buildimage

@judyjoseph
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

Commenter does not have sufficient privileges for PR 18512 in repo sonic-net/sonic-buildimage

Hi @rlhui @yxieca , it seems I dont have permission to re run the pipeline? Could you please help check and let me know if this is expected or am I missing anything here?

@saiarcot895
Copy link
Copy Markdown
Contributor

@deepak-singhal0408 use /azpw instead of /azp.

@judyjoseph
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@prsunny
Copy link
Copy Markdown
Contributor

prsunny commented Apr 24, 2024

@deepak-singhal0408 , seems like swss PR checks are failing with this docker change. Could you confirm all swss tests were tested with this docker vs build?

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

@deepak-singhal0408 , seems like swss PR checks are failing with this docker change. Could you confirm all swss tests were tested with this docker vs build?

@prsunny , this PR is merged in sonic-buildimage on April 19th. And there is a PR merge in sonic-swss on April 22nd(sonic-net/sonic-swss#3118). The PR checker for April 22nd change would have already taken my changes.. right? May I know why you think this PR would have caused issue?

mssonicbld pushed a commit to mssonicbld/sonic-utilities that referenced this pull request May 9, 2024
…age support for VS (sonic-net#3250)

### What I did
For T2-Chassis VS support, we are adding new sonic_platform package for vs platforms. Please refer sonic-net/sonic-buildimage#18512 for more details.
Due to this new platform package, need to modify excpetion handling as now the Module would be found, but the metadata file will not be found for pizzabox vs platforms.

#### How I did it
Modified the exception handling logic.
MSFT ADO: 27414904

#### How to verify it
Bring up vms-kvm-t0 topology. ran show interface status. The output is proper.

PS: the Main PR(sonic-net/sonic-buildimage#18512) is dependent on this PR to be merged in first.
mssonicbld pushed a commit to sonic-net/sonic-utilities that referenced this pull request May 9, 2024
…age support for VS (#3250)

### What I did
For T2-Chassis VS support, we are adding new sonic_platform package for vs platforms. Please refer sonic-net/sonic-buildimage#18512 for more details.
Due to this new platform package, need to modify excpetion handling as now the Module would be found, but the metadata file will not be found for pizzabox vs platforms.

#### How I did it
Modified the exception handling logic.
MSFT ADO: 27414904

#### How to verify it
Bring up vms-kvm-t0 topology. ran show interface status. The output is proper.

PS: the Main PR(sonic-net/sonic-buildimage#18512) is dependent on this PR to be merged in first.
@ishidawataru
Copy link
Copy Markdown
Collaborator

@deepak-singhal0408 Where can I find vs_chassis_metadata.json? I built the multi-asic vs image from this repo but there is no vs_chassis_metadata.json under /etc/sonic.

@yutongzhang-microsoft
Copy link
Copy Markdown
Contributor

Hi, @deepak-singhal0408 , I got the same error

Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:   File "/usr/lib/python3/dist-packages/sonic_platform/chassis.py", line 26, in __init__
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:     self.metadata = self._read_metadata()
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:                     ^^^^^^^^^^^^^^^^^^^^^
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:   File "/usr/lib/python3/dist-packages/sonic_platform/chassis.py", line 34, in _read_metadata
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:     raise FileNotFoundError("Metadata file {} not found".format(self.metadata_file))
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]: FileNotFoundError: Metadata file /etc/sonic/vs_chassis_metadata.json not found

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

Hi, @deepak-singhal0408 , I got the same error

Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:   File "/usr/lib/python3/dist-packages/sonic_platform/chassis.py", line 26, in __init__
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:     self.metadata = self._read_metadata()
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:                     ^^^^^^^^^^^^^^^^^^^^^
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:   File "/usr/lib/python3/dist-packages/sonic_platform/chassis.py", line 34, in _read_metadata
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]:     raise FileNotFoundError("Metadata file {} not found".format(self.metadata_file))
Jun 21 03:01:42 vlab-01 determine-reboot-cause[15110]: FileNotFoundError: Metadata file /etc/sonic/vs_chassis_metadata.json not found

Hi @yutongzhang-microsoft , please give a try with this Fix. sonic-net/sonic-host-services#133
JFYI, this file is not expected to be present on pizza box vs platforms.
Even on chassis based vs image, this is not mandatory to be present..

@ishidawataru
Copy link
Copy Markdown
Collaborator

ishidawataru commented Jun 23, 2024

Even on chassis based vs image, this is not mandatory to be present..

How about stopping raising an exception instead of catching FileNotFoundError everywhere the exception can be ignored if the metadata file is not mandatory?

https://github.com/sonic-net/sonic-buildimage/pull/18512/files#diff-1837abe4216c07096db3b47b74a8ce47b0097622773caf92e9dc9f3c42636d06R34

@yutongzhang-microsoft
Copy link
Copy Markdown
Contributor

Even on chassis based vs image, this is not mandatory to be present..

How about stopping raising an exception instead of catching FileNotFoundError everywhere the exception can be ignored if the metadata file is not mandatory?

https://github.com/sonic-net/sonic-buildimage/pull/18512/files#diff-1837abe4216c07096db3b47b74a8ce47b0097622773caf92e9dc9f3c42636d06R34

I agree with you, some other commands also failed because of this error.
@deepak-singhal0408 Can you fix as suggested?

rlhui pushed a commit that referenced this pull request Dec 23, 2024
…se containers on VS platform (#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR #18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 24, 2024
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
mssonicbld pushed a commit that referenced this pull request Dec 25, 2024
…se containers on VS platform (#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR #18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
nmoray pushed a commit to nmoray/sonic-utilities that referenced this pull request Jun 25, 2025
…age support for VS (sonic-net#3250)

### What I did
For T2-Chassis VS support, we are adding new sonic_platform package for vs platforms. Please refer sonic-net/sonic-buildimage#18512 for more details.
Due to this new platform package, need to modify excpetion handling as now the Module would be found, but the metadata file will not be found for pizzabox vs platforms.

#### How I did it
Modified the exception handling logic.
MSFT ADO: 27414904

#### How to verify it
Bring up vms-kvm-t0 topology. ran show interface status. The output is proper.

PS: the Main PR(sonic-net/sonic-buildimage#18512) is dependent on this PR to be merged in first.
mssonicbld added a commit to mssonicbld/sonic-buildimage-msft that referenced this pull request Oct 8, 2025
…se containers on VS platform

<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it

To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR [#18512](sonic-net/sonic-buildimage#18512) introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.

##### Work item tracking
- Microsoft ADO **(number only)**: 30454307

#### How I did it
Remove the code that removes existing database containers in /usr/bin/database.sh

#### How to verify it

* Build VS image and install it on a KVM sonic-vs
* Reboot the sonic-vs multiple times and check if all the containers are started successfully every time.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

The original change was introduced into 202205 branch so we need to backport this fix.

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [ ] 202211
- [ ] 202305
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [X] 20220532.72

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

docker_image_ctl.j2: change to not remove existing database containers on VS platform

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
mssonicbld added a commit to Azure/sonic-buildimage-msft that referenced this pull request Oct 9, 2025
…hat recreates database containers on VS platform (#1703)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR [#18512](sonic-net/sonic-buildimage#18512) introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.

##### Work item tracking
- Microsoft ADO **(number only)**: 30454307

#### How I did it
Remove the code that removes existing database containers in /usr/bin/database.sh

#### How to verify it

* Build VS image and install it on a KVM sonic-vs
* Reboot the sonic-vs multiple times and check if all the containers are started successfully every time.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

The original change was introduced into 202205 branch so we need to backport this fix.

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [ ] 202211
- [ ] 202305
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [X] 20220532.72

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

docker_image_ctl.j2: change to not remove existing database containers on VS platform

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
@chahibi
Copy link
Copy Markdown

chahibi commented Nov 14, 2025

@deepak-singhal0408 I am still seeing the same error on the latest image from today. Do you have any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Chassis for 202205 branch PRs needed for 202205 branch in msft repo Cherry Pick Conflict_202305 Included in Chassis for 202205 Branch Indicate PR is already in MSFT repo 202205 branch Request for 202305 Branch

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.