Skip to content

Third party container management using the Sonic Application Framework#1286

Merged
adyeung merged 1 commit intosonic-net:masterfrom
sg893052:master
Jan 18, 2024
Merged

Third party container management using the Sonic Application Framework#1286
adyeung merged 1 commit intosonic-net:masterfrom
sg893052:master

Conversation

@sg893052
Copy link
Copy Markdown
Contributor

@sg893052 sg893052 commented Mar 15, 2023

Repo PR title State
sonic-utilities TPCM support in Sonic Package Manager
sonic-buildimage Additional changes for TPCM support

@sg893052 sg893052 force-pushed the master branch 3 times, most recently from 681d1be to e8caba9 Compare May 9, 2023 10:15
@zhangyanzhao
Copy link
Copy Markdown
Collaborator

@zhangyanzhao
Copy link
Copy Markdown
Collaborator

@zhangyanzhao
Copy link
Copy Markdown
Collaborator

Microsoft and Nvidia will be the reviewer for this feature.

@zhangyanzhao zhangyanzhao requested review from lguohan and stephenxs May 9, 2023 15:58
@prvattem
Copy link
Copy Markdown
Collaborator

prvattem commented May 9, 2023

@zhangyanzhao Please add me to the review.
Thanks
Phaniraj

@stephenxs
Copy link
Copy Markdown
Collaborator

@zhangyanzhao can you add @stepanblyschak as a reviewer? it should not be me :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the capability to run the script inside the TPC? e.g We may need to mount the specific config file changes in the host and use it inside TPC or for some dependencies I may need to execute some script before starting services inside the container, do we have an option to do that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ,capability to run the script inside TPC is supported

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the connectivity for the TPC to the outside world? hope we dont expose the entire host networking to the TPC. What if my requirement is to run the TPC on the mgmt VRF?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC with mgmt VRF support can be provided later but currently we don't support it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not captured in the HLD, please add it in the restrictions/limitations section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included in the Limitations section

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should not differentiate like this, wherever possible we should design generically, in this case, App ext can be extended to support docker image without manifest file as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need this tpcm flag? why cant have the generic design?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have a flag as "type" instead of "tpcm"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant we skip type field altogether?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped as suggested

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure that TPC is not hijacking any of the control package meant for processing by SONiC system? do we have any validation in place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC docker wont run in "privileged" mode and runs with the default security settings and won't have escalated privileges.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can skip '--tpcm' option in the command, would be good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped as suggested

Comment on lines 86 to 88
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think various system resource limits for container shall be part of the manifest. The manifest, once associated with a docker container, will be the same for all devices, however, some devices have more powerful hardware and these limits may not be relevant for them. Instead, resources limits configuration is up to the user using a particular device. This configuration can be done e.g. through FEATURE table and for all containers, not just TPC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC overall memory limit can be restricted to 20% of system memory ( say, 3087MB from 15G system ram).
By default, if individual TPC memory limit is not specified, 20% of TPC overall memory limit can be assigned ( say, 617MB from 3087MB in a 15G system ram).
Idea is that summation of the memory limits configured for all TPCs must not go beyond the overall TPC memory limit set.
This way the resource limits gets adjusted as per the underlying hardware.

Agreed that configuration could be placed in config db.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Could you please update the document to make memory/cpu limits configuration part of config db?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of resource limits for TPC within the CONFIG DB is a distinct feature that can also be extended to SONiC containers. Therefore, the resource limits aspect will be addressed in a subsequent phase.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is not necesary to pre define these docker applications in SONiC build,

This is not true for regular application extensions. An application extension is not required to be defined at SONiC build.
FYI, dhpc-relay and macsec are not the only application extensions we have. There are some which are not even in SONiC source tree.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extended version of sonic extension is the TPCM where it need not comply with sonic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 So I just suggest to change the wording:

So it is not necesary to pre define these docker applications in SONiC build

to

So it is not necesary to for these docker applications to be SONiC compliant

since the first one is not true for regular (non-TPC) dockers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a manifest CLI tree? You could just pass in a JSON file to install command and then updating manifest is just an upgrade with a different manifest file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating manifest file for each TPC docker is tedious task and customizing parameter is also difficult for a user.
With the manifest cli tree, it can be easily created, viewed. End user need not worry about TPC container creation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it can be set to true for TPC. support-rate-limit requires having supervisor managed rsyslogd inside container. This is very specific to SONiC containers. It should be set to false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Ok, could you please update the doc to set it to false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the command? How do you translate it for docker create command? Is it --entrypoint? If so, I suggest to rename this field to entrypoint.

Copy link
Copy Markdown
Contributor Author

@sg893052 sg893052 May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"command" attribute refers to the container startup arguments. As suggested, the attribute name could be updated accordingly.

Copy link
Copy Markdown
Contributor

@stepanblyschak stepanblyschak Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand how the command to run inside container mounts the root filesystem to container. Please clarify this example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just an example, by specifying "command":"--path.rootfs=/host", you would configure the container to use the host filesystem as the root filesystem path. This adjustment can impact the way the container interacts with and accesses files and directories during its runtime.
As agreed, this field will be updated to "entrypoint"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Got it, just an example

Comment on lines 272 to 282
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not deducing transport protocol from URL and having just --from-tarball? E.g curl can work with http(s),scp,sfpt deduced from input URL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Could you please update the doc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that for non TPC containers, users cannot modify manifest. I don't see why it might be useful for the user. Container configuration like resource limits needs to be outside the manifest as I suggested in previous comment. Manifest is an immutable property of an image, chaning it might break the extension or the system and it would be hard to create any validation mechanism for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will ensure this is allowed only for TPC containers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could have an option to indicate whether manifest file update allowed or not during image creation, by default, we cant modify the manifest file but when we create the image for TPC container, we could set it to true.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Please update the doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of resource limits for TPC within the CONFIG DB is a distinct feature that can also be extended to SONiC containers. Therefore, the resource limits aspect will be addressed in a subsequent phase.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion online, could you please summarize what are the benefits of preparing docker image on the box instead of preparing image outside the box?

Copy link
Copy Markdown
Contributor Author

@sg893052 sg893052 May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPCM is the extended version of AEF(application extension framework) which will create manifest file and associate with TPC and manipulate the container resources through manifest file.

To summarize the benefits of preparing docker image on the box instead of preparing image outside the box.

  1. Leveraging existing TPC Docker images that lack a manifest file.
  2. Eliminating the need for manual steps to modify Docker images, such as unpacking, appending the manifest file, and repackaging.
  3. Facilitating seamless upgrade scenarios, bypassing the necessity of repackaging Docker images with the manifest file. Moreover, the updated manifest file, with resource adjustments, can be used for impending Docker image upgrades.
  4. At a per-device level, permitting users to supply manifest files for TPCs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 Thanks, I see one advantage is that the user does not need to have a registry to place his re-built standard docker images and can pull them from a well known existing docker registries.

@sg893052
Copy link
Copy Markdown
Contributor Author

@stepanblyschak @lguohan @venkatmahalingam @stephenxs @adyeung @Kalimuthu-Velappan @babukr Could you please help review the answers for the comments provided. Please let us know if we could have a meeting to go over.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, resource limits are good options which need not be restricted only for TPC.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg893052 @venkatmahalingam Agree, shouldn't be restricted for TPC. In fact, resource limits were discussed to be added to regular app.ext when it first was introduced but this functionality was delayed since we had no use for configuring limits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of resource limits for TPC within the CONFIG DB is a distinct feature that can also be extended to SONiC containers. Therefore, the resource limits aspect will be addressed in a subsequent phase.

Copy link
Copy Markdown
Collaborator

@venkatmahalingam venkatmahalingam Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, please avoid tpcm name usage in the description.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SONiC-2-SONiC upgrade considered? What will happen with TPC extension if we upgrade base SONiC image to new version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the TC

@stepanblyschak
Copy link
Copy Markdown
Contributor

@adyeung
Copy link
Copy Markdown
Collaborator

adyeung commented Dec 7, 2023

@stepanblyschak pls review and signoff if you are satisfied with the change

@zhangyanzhao zhangyanzhao requested review from prvattem and stepanblyschak and removed request for stephenxs December 12, 2023 16:39
@zhangyanzhao
Copy link
Copy Markdown
Collaborator

moved to 202405 release

@adyeung
Copy link
Copy Markdown
Collaborator

adyeung commented Jan 18, 2024

All prior review comments addressed. If there are no further comments, this will go merge by end of the week.

@adyeung adyeung merged commit 50a62e0 into sonic-net:master Jan 18, 2024
@adyeung adyeung self-assigned this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

7 participants