Skip to content

Conversation

@sadasu
Copy link
Contributor

@sadasu sadasu commented Oct 24, 2019

BareMetalPlatformStatus CR is being augmented to support
configuration required for successful Baremetal IPI deployments.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 24, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sadasu
To complete the pull request process, please assign kbsingh
You can assign the PR to them by writing /assign @kbsingh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sadasu added a commit to sadasu/openshift-api that referenced this pull request Oct 24, 2019
Adding config items required for a Baremetal IPI deployment to
the BareMetalPlatformStatus CR. Based on enhancement request:
openshift/enhancements#90
@sadasu
Copy link
Contributor Author

sadasu commented Oct 25, 2019

/cc @deads2k
/cc @abhinavdahiya

@deads2k
Copy link
Contributor

deads2k commented Oct 25, 2019

please add a discussion about how the information is stored today and how you'll migrate that data. If you don't support upgrade, please indicate that, link to the docs, and ideally where in the code you actually stop that upgrade.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 28, 2019
sadasu added a commit to sadasu/openshift-api that referenced this pull request Oct 28, 2019
Adding config items required for a Baremetal IPI deployment to
the BareMetalPlatformStatus CR. Based on enhancement request:
openshift/enhancements#90
### Upgrade / Downgrade Strategy

This enhancement is applicable only to the Baremetal platform type. Since,
this platform type is not officially supported in 4.2, we feel that an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to where in the docs we say this isn't supported and to where in the code you're preventing upgrades from happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we are not preventing upgrades from happening in the code. The current upgrade strategy comes less from the code and more from business logic (for lack of a better word in my vocabulary). @russellb should be able to clarify this further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k this is a net-new integration for a currently unsupported/experimental platform so we don't expect there to be any users to upgrade, we'll only support new deployments in the first instance.

Perhaps you can expand a bit on the sort of strategy/docs you would expect in this case, clearly when this becomes fully supported we'll need to ensure we have a strategy to deal with any changes in this schema, but right now I don't think it's a concern, we're still trying to bootstrap this new baremetal IPI platform integration (and this API change is one of the final pieces to enable that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k this is a net-new integration for a currently unsupported/experimental platform so we don't expect there to be any users to upgrade, we'll only support new deployments in the first instance.

Perhaps you can expand a bit on the sort of strategy/docs you would expect in this case, clearly when this becomes fully supported we'll need to ensure we have a strategy to deal with any changes in this schema, but right now I don't think it's a concern, we're still trying to bootstrap this new baremetal IPI platform integration (and this API change is one of the final pieces to enable that)

I just want to be sure that customers cannot possibly attempt to upgrade from 4.2 to 4.3. If they can, the chances that they will go from a working cluster to a completely broken one are fairly high. Simply saying, "this isn't supported" doesn't really satisfy, but Derek told me that you can't actually get thing to install in 4.2, so I'm no longer worried about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently if anyone tries to deploy using the baremetal platform, the baremetal operator pod will simply crashloop on start because of the missing configmap, we've worked around it for developer testing by manually adding it, but I think for end-users it's extremely unlikely they would do such an upgrade, and also this platform is marked experimental in the installer.

FWIW that's also why I think this is a bug and not really an enhancement - the current MAO integration is broken without passing this data from the installer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a discussion about how the information is stored today and how you'll migrate that data. If you don't support upgrade, please indicate that, link to the docs, and ideally where in the code you actually stop that upgrade.

This information is not currently stored anywhere. For development testing purposes, we have scripts that create a configmap that contains this config. MAO code in 4.2 refers to this ConfigMap which either has to be provided by these scripts (or manually?) when the platform type is "Baremetal".
So, during the 4.3 cycle, we had 1st attempted to have the installer add the ConfigMap but that effort was abandoned in favor of adding these config items to the BaremetalPlatformStatus CR. Hopefully, the enhancement request and this discussion provides the necessary context for this effort.

@sadasu
Copy link
Contributor Author

sadasu commented Oct 28, 2019

Updated (hopefully improved) section on upgrade strategy.


All config values except the RHCOSImageUrl are not expected to change after a
metal3 cluster is up. Handling changes to this value after install time has
not been explored and that is a non-goal at this time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could mention that the update lifecycle for bootimage is an unsolved problem for all platforms ref openshift/os#381

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imcleod assembled some notes on this issue from a discussion we had a couple months ago as well: https://docs.google.com/document/d/1W_N3pcOuo6E_DcM5GQ731cTTN2VJ1KUa2A49_Zi3IRU/edit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericavonb & @imcleod isit OK for me to add a link to this doc in my enhancement request?

### User Stories [optional]

This enhancement will allow users to have their Barametal IPI deployments
customizable to their specific hardware and network requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also allows IPI baremetal deployments without manual workarounds such as creating the required ConfigMap object manually (which is what we're currently doing for testing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user story doesn't actually provide the components that will be using these fields.

Since this proposal is adding these new fields to the global platform configs, I would like to make sure this is actually a infrastructure level information or it belongs to a specific operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Augmented The User Stories section to take care of the above 2 comments.


Baremetal Platform type will be available for customers to use for the first
time in Openshift 4.3. And, when it is installed, it will always start as
a fresh baremetal installation atleast in 4.3. There is no use case where a 4.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/atleast/at least/

installation would be upgraded to a 4.3 installation with Baremetal Platform
support enabled.

For the above reason, in this particulat case, we would not have to make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/particulat/particular/g

Copy link
Contributor

@hardys hardys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, couple of small comments/typos

@sadasu sadasu force-pushed the baremetal-config branch 2 times, most recently from 015f169 to 3f1f92e Compare October 29, 2019 03:09
BareMetalPlatformStatus CR is being augmented to support
configuration required for successful Baremetal IPI deployments.

This proposal aims to add the following to BareMetalPlatformStatus:

1. ProvisioningInterface : This is the interface name on the underlying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something like talk to this interface if you want to talk to the API, like QEMU URL/amazonaws.com

Also this seems like BareMetalHost Custom Resource should have this information, because i would assume users might want to have different interface on each baremetal-server that is used to talk to provisioning network??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the physical nic used for the dedicated network on the nodes where the baremetal operator and Ironic runs. e.g eno1, eth3 or whatever.

This shouldn't be part of the BareMetalHost CRD because that describes the nodes to be provisioned, this is an implementation detail of how the provisioning service runs on the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavdahiya perhaps a bit more context would help here:

AFAICS the changes proposed by @sadasu are aligned with the guidance previously provided by @wking, but perhaps you can provide more detailed guidance as to how we can refine this approach to make progress.

We've been trying to add this small amount of additional configuration data for nearly 3 months now, and it seems like conflicting feedback on the various PRs is part of the reason we're making such slow progress - perhaps you and @wking could chat and provide some very specific guidance on how to proceed? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something like talk to this interface if you want to talk to the API, like QEMU URL/amazonaws.com

Also this seems like BareMetalHost Custom Resource should have this information, because i would assume users might want to have different interface on each baremetal-server that is used to talk to provisioning network??

@abhinavdahiya That is a good point. But, currently we are operating under the assumption that in a given installation, we would have baremetal servers that will all use an interface of the same name to be physically connected to the provisioning network.
Secondly, the BareMetalHost Custom Resourse currently contains information about baremetal hosts known to the BareMetal Operator. We also have an additional set of configs that are used by Ironic, which is a service that provisions baremetal servers and gets them ready for use as hosts and then as nodes. This config cannot be added to the BareMetalHost Custom Resource.
So, instead of creating a brand new Custom Resourse, we are augmenting the BareMetalPlatformStatus Custom Resource. Agreed that adding this make the BareMetalPlatformStatus CR look a lot different from similar CRs used by other clouds. But, in the Baremetal case, we are also responsible for provisioning of the underlying physical servers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example value for this might be "eth1" or "ens8" or the name of a bridge (we use "provisioning" in the VM-based development environment for this work).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the provision network information is for the baremetal-controller so that it knows how to reach the provisioning network on the host it's running to do the provisioning of a new node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the provision network information is for the baremetal-controller so that it knows how to reach the provisioning network on the host it's running to do the provisioning of a new node?

Wait, why can't this just be a routing config on the controller node? Controller: "I'm trying to reach the provisioning API at https://whatever.example.com", DNS: "whatever.example.com is 1.2.3.4", kernel: "take eth1 to reach 1.2.3.4". Doesn't seem bare-metal-specific to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A config similar to that might be the IngressIP and it is present in the BareMetalPlatformStatus CR. https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L176.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the provision network information is for the baremetal-controller so that it knows how to reach the provisioning network on the host it's running to do the provisioning of a new node?

Not quite, it's so that the baremetal-operator pod can configure a dedicated interface on the node where that pod runs, this happens via https://github.com/openshift/ironic-static-ip-manager which is a bit hacky but works in the absence of an external DHCP server or any other external dependencies.

So to reply to @wking routing doesn't help us here, we have to bring up a nic on the node so we can serve DHCP/TFTP etc for pxe booting the nodes.

In future this may become optional, when we support use of external DHCP servers on the provisioning network, but right now it's required to make pxe boot work.

Copy link
Contributor

@hardys hardys Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controller: "I'm trying to reach the provisioning API at https://whatever.example.com"

FWIW I think this is a fundamental misunderstanding here - there is no existing provisioning API we have to reach - the metal3 pod deployed by the MAO runs that API, and most of the data under discussion here is just the configuration needed to make that self-hosted API work.

This is very different to the typical "cloud provider" API interaction, which I think is why some of this discussion is difficult without prior context :)

cluster. This is an optional parameter and is expected to be provided
in the format : http://111.111.111.1/images

6. RHCOSImageURL : This config is expected to be the location of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This most definitely is not a cluster global configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest the MAO gets this data instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify why this is needed, https://github.com/openshift/ironic-rhcos-downloader is one of the containers we start as part of the ironic deployment.

This dowloads the required RHCOS image, and the URL for that is set in the installer via the data/data/rhcos.json - openshift/os#381 discusses in future updating bootimages but for now we need some way to download the installer specified version of the image to enable deployment of worker nodes via the BMO.

The installer has that data, so how would you prefer that we pass the URL to the downloader container, which is started via the MAO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option I guess would be to use the URL from the machine providerspec for this, but currently that points at the cached URL:

openshift/installer@ac89074

One problem is that at the point where the MAO starts the BMO pod, we don't necessarily have any machines defined.

I guess we could block metal3 deployment on the machine existence, with logic that checks if the cache already contains a specific image, but then somewhere we'd have to rewrite the image used by the deployment to point at the cache (to avoid non-local download for potentially $many machine deployments).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Steve points out, we do have an image URL in the machine provider spec but we don't want to have to download that image from the internet for every deployment. This setting would allow us to populate a cache, and then have the machine provider spec refer to the cached location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the machine objects should have the link to the RHCOS image as known in rhcos.json
The provisioning controllers can see that the image already exists in their cache and then configure the server to use the cache URL (hopefully the controller knows that because it created it?) to download the image.

The controller can do the caching when it sees the fir machine object.

And as for allowing caching across deployments, the user's should be free to override the RHCOS url chosen by the installer and set in the machine object such that the controller gets the image from local network hosted link..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abhinavdahiya I think we're in agreement re the RHCOSImageURL as per my reply to @sadasu below.

One question/clarification though:

The provisioning controllers can see that the image already exists in their cache and then configure the server to use the cache URL

So should we add the logic to check the machineconfig URL and compare it to images available in the cache to the MAO (for example by doing a HEAD on the image, or downloading the md5sum we expose there), or do we need to bake that logic into the baremetal-operator?

The latter may be cleaner from an interface perspective, but I'm not sure if it aligns with the upstream vision for the BMO interfaces - @dhellmann do you think it would be reasonable to add logic to the BMO that checks the cache for an image, and rewrites the deployment image if it's found?

a contiguous range of IPs that are available to be allocated. It is expected
to be provided in the format : 10.1.0.10, 10.1.0.100.

5. CacheURL : This is a location where image required to bring up baremetal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the machine object should have this set, how is this a global configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the machine object, it's an implementation detail of the baremetal provisioning service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to expand on "image" (maybe "bootimage"? Something else), so folks don't read this and assume "container image" like I did ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's a good point, it is where we'll cache the qcow2 bootimage indeed

workarounds like creating a ConfigMap for the config (which is the
current approach we are using to test baremetal deployments.)

The new config items would be set by the installer and will be used by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only use-case is some machine-api-operator component, I am inclined towards making these fields available to just that component and not add these to the global configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additions are specific to the baremetal platform in openshift/api#480 - and this data is required for all deployments when the baremetal platform is selected.

Can you suggest how we'd make this available only to a component? I'm not entirely clear what you are proposing, and how it differs from using the BareMetalPlatformStatus?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest how we'd make this available only to a component?

E.g. ingress-specific configuration or the registry-specific config or other operator.openshift.io stuff.

... how it differs from using the BareMetalPlatformStatus?

Mostly in scoping. It's easier to reason about the impact of config changes if you can scope read access down more tightly. If the MAO is the only component with creds to read a metal config object, you don't have to worry about other operators or cluster components sneaking in to borrow your settings and give you later trouble when you need to migrate to a different schema. But if multiple components need access, a central location like Infrastructure is DRYer for cluster admins.

be within this network. The value for this config is expected to be
provided in the format : 10.1.0.0/24.

4. ProvisioningDHCPRange : This is a range of IP addresses from which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we shouldn't just use the MachineCIDR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely different network to the MachineCIDR, it's used for baremetal deployment to pxe boot the servers and access their BMCs via IPMI.

In the installer there is a validation to ensure this network does not overlap with MachineCIDR ref openshift/installer#2449

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for baremetal we assume at least 2 networks. The MachineCIDR describes the usual network where control plane and application traffic go. This new ProvisioningCIDR describes the network where provisioning traffic goes. That traffic needs to be isolated from the control plane and applications because it involves protocols like DHCP and PXE, and you don't want to mix those with publicly facing traffic, so they are typically on different L2 network segments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, Fun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, Fun.

@hardys
Copy link
Contributor

hardys commented Nov 1, 2019

@abhinavdahiya and @deads2k Hi please can you take another look at this when you get a moment ref my comments above - this is a blocker to openshift/api#480 which blocks openshift/machine-api-operator#402 and openshift/installer#2449

Those PRs are also the first step to fixing openshift/installer#2091 which is a prerequisite for ipv6 support, and that's something we're being asked to prioritize so any help unblocking this series would be much appreciated, thanks!

type is Baremetal.

MAO reads this configuration and uses these to derive additional configuration
required to bring up a metal3 cluster. E2e testing should make sure that MAO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of negative testing do you plan on doing? Malformed strings, IP addresses/ranges overlapping, not in the expected ranges, poorly formed IP addresses, images not where they are specified, slow or interrupted downloads. Is this going to be tested and within scope of this enhancement testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of negative testing do you plan on doing? Malformed strings, IP addresses/ranges overlapping, not in the expected ranges, poorly formed IP addresses, images not where they are specified, slow or interrupted downloads. Is this going to be tested and within scope of this enhancement testing?

All of these values will be provided by the installer ref openshift/installer#2449 - that does have unit tests for the installer level validation of those values, and I'd assume we'll have similar validation and unit tests in the MAO @sadasu ?

For e2e testing we've got a 3rd party metal3 CI job which can be used to validate these changes, but not yet fully integrated openshift e2e CI for this new platform - @derekhiggins is working on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the only way to use this platform at present is to manually create a ConfigMap resource, e.g a completely unconstrained interface, so IMO having the validation at the installer layer will be a big improvement.

that is specific to a Baremetal deployment.

This proposal aims to add the following to BareMetalPlatformStatus:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The host CR is the inventory that the installer uses to fulfill a Machine. These settings tell the provisioning system how to do that work, by telling it which network it can use and what IPs it should use on that network.

baremetal server which would be physically connected to the provisioning
network.

2. ProvisioningIP : This is the IP address used to to bring up a NIC on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the IP address that the interface above is configured with - we've got a mechanism that statically configures ProvisioningInterface on whatever master node has the metal3 baremetal-operator pod running, but this is a short lease such that if/when the pod gets rescheduled the IP goes down on the old host, and is the brought back up on the new host.

@sadasu this would be good info to include in the body of the doc. Maybe link to any metal3 docs about it, too?

be within this network. The value for this config is expected to be
provided in the format : 10.1.0.0/24.

4. ProvisioningDHCPRange : This is a range of IP addresses from which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for baremetal we assume at least 2 networks. The MachineCIDR describes the usual network where control plane and application traffic go. This new ProvisioningCIDR describes the network where provisioning traffic goes. That traffic needs to be isolated from the control plane and applications because it involves protocols like DHCP and PXE, and you don't want to mix those with publicly facing traffic, so they are typically on different L2 network segments.


This proposal aims to add the following to BareMetalPlatformStatus:

1. ProvisioningInterface : This is the interface name on the underlying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example value for this might be "eth1" or "ens8" or the name of a bridge (we use "provisioning" in the VM-based development environment for this work).

a contiguous range of IPs that are available to be allocated. It is expected
to be provided in the format : 10.1.0.10, 10.1.0.100.

5. CacheURL : This is a location where image required to bring up baremetal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. CacheURL : This is a location where image required to bring up baremetal
5. CacheURL : This is a location where the image required to bring up baremetal

to be provided in the format : 10.1.0.10, 10.1.0.100.

5. CacheURL : This is a location where image required to bring up baremetal
hosts have been previously downloaded for faster image downloads within the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hosts have been previously downloaded for faster image downloads within the
hosts has been previously downloaded for faster image downloads within the

cluster. This is an optional parameter and is expected to be provided
in the format : http://111.111.111.1/images

6. RHCOSImageURL : This config is expected to be the location of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Steve points out, we do have an image URL in the machine provider spec but we don't want to have to download that image from the internet for every deployment. This setting would allow us to populate a cache, and then have the machine provider spec refer to the cached location.

@imain
Copy link

imain commented Nov 5, 2019

All of the configuration options we are proposing control how the Bare Metal Operator works.

In a baremetal deployment we have multiple network interfaces, one of which is connected to the provisioning network, which is used only for provisioning new hosts. The baremetal pod contains various services and will run DHCP and PXE to provision new nodes on this second network. So, while some of these configuration options sound similar to ones that already exist in OpenShift, they are actually a new set of options with values that are expected to be different.

The provisioning interface value specifies the network interface on the host (which could be any host in the custer) to be used for provisioning. This interface will be assigned a static IP as specified in the configuration. It will then be used to PXE boot new hosts using this network and assign an IP to them via DHCP for the provisioning step.

The RHCOS image URL is used to download the image used in the provisioning step. This will be downloaded during pod initialization and then served to the host being provisioned using a local URL. This is done to avoid downloading it from the internet every time a host is provisioned.

We could certainly move this to its own CRD and out of platformStatus. Is this the reasonable next step here?

@hardys
Copy link
Contributor

hardys commented Nov 6, 2019

@imain IMHO the only interface here that we can reasonably remove is the RHCOSImageURL - instead of another CRD I think we could have the MAO look at the machinesets and retrieve the image URL from the providerSpec, then we'd restart the downloader container if the target bootimage ever changes.

If that is the main objection here, my proposal is to remove RHCOSImageURL and merge this so we can make progress? @abhinavdahiya and/or @deads2k what are your thoughts?

@deads2k
Copy link
Contributor

deads2k commented Nov 6, 2019

If that is the main objection here, my proposal is to remove RHCOSImageURL and merge this so we can make progress? @abhinavdahiya and/or @deads2k what are your thoughts?

I will defer to @abhinavdahiya. The discussion with @abhinavdahiya and @squeed is what I wanted out of this. Once they're convinced, make the API doc match (more tersely if necessary) and we can get it in.

@sadasu
Copy link
Contributor Author

sadasu commented Nov 13, 2019

  1. I have created a "enhancement" PR to build context around the BareMetal IPI deploymet solution. Providing background for Baremetal IPI based enhancements #102

  2. I see 2 main themes evolving here :
    a. BareMetalPlatformStatus CR is not a good place for baremetal provisioning config since it is a global CR. Since, the machine-api-operator is the only component that is currently consuming this config, we should put it in a different CR that only the Installer and MAO have access to. Since, no existing CR in the baremetal case, fits this critreria, I would have to create a new CR.

     b. RHCOSImageURL should be in the Machine CR and not in the BareMetalPlatformStatus CR.
    

We are currently blocked on 2a.

@hardys
Copy link
Contributor

hardys commented Nov 13, 2019

We are currently blocked on 2a.

I think we could remove the RHCOSImageURL from whatever CR is created by the installer, and instead have the MAO look at the machinesets that exist, then restart the ironic-rhcos-downloader container if the URL found is not accessible (e.g via HEAD) from the cache webserver.

Personally I don't see this as a blocker, since the information we need can be made available via the installer, the main blocker IMHO is agreeing where the remainder of the data should reside.

@abhinavdahiya
Copy link
Contributor

a. BareMetalPlatformStatus CR is not a good place for baremetal provisioning config since it is a global CR. Since, the machine-api-operator is the only component that is currently consuming this

So I think the best way to move forward on this will be to create a new Custom Resource for the baremetal-operator that includes the Provisioning* details.

similar to other operator level configuration objects we have here https://github.com/openshift/api/tree/master/operator/v1
_ doesn't have to be in that directory/group but would be nice to have it there_

and then the installer can configure the operator in baremetal cases to include the correct information.
we already do it for ingress-operator https://github.com/openshift/installer/blob/9f62449aecce77dff5eff8f8f8dda98386fb2df9/pkg/asset/manifests/ingress.go#L97 and I don't think doing it for baremetal platform would be problematic.

@dhellmann
Copy link
Contributor

a. BareMetalPlatformStatus CR is not a good place for baremetal provisioning config since it is a global CR. Since, the machine-api-operator is the only component that is currently consuming this

So I think the best way to move forward on this will be to create a new Custom Resource for the baremetal-operator that includes the Provisioning* details.

similar to other operator level configuration objects we have here https://github.com/openshift/api/tree/master/operator/v1
_ doesn't have to be in that directory/group but would be nice to have it there_

and then the installer can configure the operator in baremetal cases to include the correct information.
we already do it for ingress-operator https://github.com/openshift/installer/blob/9f62449aecce77dff5eff8f8f8dda98386fb2df9/pkg/asset/manifests/ingress.go#L97 and I don't think doing it for baremetal platform would be problematic.

The operator that configures the bare metal components is the machine-api-operator. Other platform-specific config for the machine-api-operator is in the infrastructure resource in https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L102. Can you help me understand why you see the bare metal configuration as needing to be handled differently from all of the other platforms?

@enxebre
Copy link
Member

enxebre commented Nov 29, 2019

@dhellmann

Other platform-specific config for the machine-api-operator is in the infrastructure resource in

Mao does not use any platform-specific config from the infra resource. It only fetches the platform type to choose the provider controller image to run because the cvo does not support manifest discrimination per platform (or per anything else).

@abhinavdahiya

and then the installer can configure the operator in baremetal cases to include the correct information.
we already do it for ingress-operator

Wouldn’t this be different in the sense that https://github.com/openshift/installer/blob/9f62449aecce77dff5eff8f8f8dda98386fb2df9/pkg/asset/manifests/ingress.go#L97 is instantiating an orthogonal resource object to the bootstrapping process (an ingress controller instance) whereas here we discuss to instantiate the config that blocks the operator? Which means the creation of the CRD definition would block the instantiation of the object from the installer, which would block the operator. Unless the creation of the CRD definition is done by installer/CVO (which seems overcomplicate)? So wouldn’t it be https://github.com/openshift/cluster-version-operator/blob/master/vendor/github.com/openshift/api/config/v1/0000_10_config-operator_01_ingress.crd.yaml a better analogy?

This is the config that enables bareMetal as an underlying "platform" for consumers so having it in BareMetalPlatformStatus does not sound that bad to me.

@enxebre
Copy link
Member

enxebre commented Dec 5, 2019

I actually think this fits well for the infrastructure object and BareMetalPlatformStatus See desc here Configuration details allowing the cluster to interact with its cloud provider. https://github.com/openshift/openshift-docs/blob/c5b7bdc145d8f79a60062934375117f5a22fa5ca/installing/install_config/customizations.adoc#informational-resources. This should be cluster wide config which never changes.

I'm good with any of the items below and with whatever @abhinavdahiya prefers here but I think my order of preference would be:
instance name :cluster resource name: infrastructure.config.openshift.io
instance name :cluster resource name: baremetal.config.openshift.io
instance name :cluster resource name: baremetal.operator.openshift.io / baremetalconfig.metal3.io

@abhinavdahiya who is installing *.config.openshift.io CRDs today?
If mao would need to install baremetal.operator.openshift.io / baremetalconfig.metal3.io that'd temporary block instantiation of the object from the installer and that temporary block the deployment of the controllers which I guess is fine.

Update:
This config might be read my multiple operators e.g mco #119 (comment) which makes me feel stronger about going with:
infrastructure.config.openshift.io or baremetal.config.openshift.io

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

that'd temporary block instantiation of the object from the installer and that temporary block the deployment of the controllers which I guess is fine.

That is true, but we do already have that problem since the metal3 BareMetalHost CRD is installed via the MAO, so the installer has to wait for that to instantiate the BareMetalHost objects - to deploy workers we need both those objects and this configuration data (either via this interface or the one in #119)

https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_08_baremetalhost.crd.yaml

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

@abhinavdahiya @enxebre

A good question is raised by @enxebre on #119 which is what will install the new CRD - if we want to move the ironic pod[1] from the installer to the MCO, what will install that CRD such that we can use the data for templating a staticpod definition like in [2] ?

My initial assumption was that the MAO would install it, but that won't work for the bootstrap case?

The other option is probably to pass all this data into the installer ignition config templating [3] such that we can fix openshift/installer#2091 but then the only way we can move from the podman/systemd stuff to a static pod ref openshift/installer#2251 would be rendering the staticpod definition in the installer, not the MCO?

[1] https://github.com/openshift/installer/blob/master/data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template#L12
[2] https://github.com/openshift/machine-config-operator/blob/master/manifests/baremetal/coredns.yaml#L32
[3] https://github.com/openshift/installer/blob/master/pkg/asset/ignition/bootstrap/bootstrap.go#L51

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Dec 5, 2019

@abhinavdahiya @enxebre

I actually think this fits well for the infrastructure object and BareMetalPlatformStatus See desc here Configuration details allowing the cluster to interact with its cloud provider. https://github.com/openshift/openshift-docs/blob/c5b7bdc145d8f79a60062934375117f5a22fa5ca/installing/install_config/customizations.adoc#informational-resources. This should be cluster wide config which never changes.

Looking at the comments, it doesn't look like it's equivalent to cloudprovider configuration that is universally used by operators. this information is only used for one service i.e the provisioning network for PXEing machines.

And therefore i think it best belongs in a specific operator config.

I'm good with any of the items below and with whatever @abhinavdahiya prefers here but I think my order of preference would be:
instance name :cluster resource name: infrastructure.config.openshift.io
instance name :cluster resource name: baremetal.config.openshift.io
instance name :cluster resource name: baremetal.operator.openshift.io / baremetalconfig.metal3.io

@abhinavdahiya who is installing *.config.openshift.io CRDs today?
If mao would need to install baremetal.operator.openshift.io / baremetalconfig.metal3.io that'd temporary block instantiation of the object from the installer and that temporary block the deployment of the controllers which I guess is fine.

Update:
This config might be read my multiple operators e.g mco #119 (comment) which makes me feel stronger about going with:
infrastructure.config.openshift.io or baremetal.config.openshift.io

The object will be read by the MAO, and perhaps in future the MCO, to influence the configuration of the provisioning containers in the metal3 pod (started by the MAO), and perhaps the keepalived configuration created by the MCO cc @celebdor

the current use is to setup provisioning network and if the team tries to expand it to ANY baremetal configuration that would be ill-advised.

if we want to move the ironic pod[1] from the installer to the MCO, what will install that CRD such that we can use the data for templating a staticpod definition like in [2] ?

if the ironic is running in baremetalcontroller in machine-api it shouldn't be run as static pod using machine-config-operator on bootstrap-host. MCO is not a generic static pod generator :(

The baremetalcontroller will have to take responsibility to do it on bootstrap-host. and the configuration will be available to the controller/operator from disk.. it doesn't need the CRD installed..

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

if the ironic is running in baremetalcontroller in machine-api it shouldn't be run as static pod using machine-config-operator on bootstrap-host. MCO is not a generic static pod generator :(
The baremetalcontroller will have to take responsibility to do it on bootstrap-host. and the configuration will be available to the controller/operator from disk.. it doesn't need the CRD installed..

The MAO and other machine-api components don't run on the bootstrap VM yet? So I don't think we can use the same approach to launch ironic on the bootstrap VM (that's the reason there are two implementations atm).

It sounds like you'd prefer to just render the config directly in the installer (as we do now), then at some future point we can switch to the MAO starting it, when the full machine API stack is used to deploy masters via the bootstrap VM. We could still switch to a staticpod rendered by the installer I guess.

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

/cc @stbenjam it seems from feedback above that @abhinavdahiya would prefer to keep rendering of the bootstrap Ironic things inside the installer, not move to the MCO, so we may want to go ahead and break out the install-config changes from openshift/installer#2449 such that we can unblock rendering the config appropriately for ipv6 and external-DHCP use-cases.

@abhinavdahiya
Copy link
Contributor

It sounds like you'd prefer to just render the config directly in the installer (as we do now), then at some future point we can switch to the MAO starting it, when the full machine API stack is used to deploy masters via the bootstrap VM. We could still switch to a staticpod rendered by the installer I guess.

it seems from feedback above that @abhinavdahiya would prefer to keep rendering of the bootstrap Ironic things inside the installer, not move to the MCO, so we may want to go ahead and break out the install-config changes from openshift/installer#2449 such that we can unblock rendering the config appropriately for ipv6 and external-DHCP use-cases.

My preference is to have the controller/operator that manages the operand in the cluster to manage the service on the bootstrap-host.

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

My preference is to have the controller/operator that manages the operand in the cluster to manage the service on the bootstrap-host.

That is the machine-api-operator, it starts the metal3 pod which contains the ironic components used for deploying workers. IIUC the machine-api-operator doesn't run on the bootstrap VM currently, so how would you envisage this working?

@hardys
Copy link
Contributor

hardys commented Dec 5, 2019

Ok I discussed this with @abhinavdahiya and explained that it's currently the machine-api-operator which starts the cluster-hosted provisioning services, including the baremetal-operator controller.

There was a misunderstanding that the BMO started the ironic services, which currently is not the case, so I don't think that will work for us in the bootstrap case.

To make progress I think we should focus on #119 to enable configuration of the metal3 pod started by the MAO, and the bootstrap VM provisioning services can remain something that's handled inside the installer.

If @sadasu and @abhinavdahiya are in agreement I'd suggest we abandon this PR, focus on landing #119, then rework or replace the API/Installer/MAO PRs already posted. This will fix configuration of the cluster-hosted provisioning services.

The gaps around configuration of the bootstrap-hosted provisioning services will be tracked via the existing installer bug openshift/installer#2091

@sadasu
Copy link
Contributor Author

sadasu commented Dec 9, 2019

@enxebre Thank you so much for your comments regarding why the BareMetalPlatformStatus would be a good place to add the config. We had pursued this direction during the 4.3 time frame and gave up in favor of creating a new CR.
@abhinavdahiya I am ready to close this PR.

@sdodson
Copy link
Member

sdodson commented Dec 9, 2019

/close
Any additional feedback should be taken to #119

@openshift-ci-robot
Copy link

@sdodson: Closed this PR.

In response to this:

/close
Any additional feedback should be taken to #119

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.