-
Notifications
You must be signed in to change notification settings - Fork 525
Providing background for Baremetal IPI based enhancements #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,283 @@ | ||||||
| --- | ||||||
| title: Adding Baremetal Installer Provisioned Infrastructure (IPI) to OpenShift | ||||||
| authors: | ||||||
| - "@sadasu" | ||||||
| reviewers: | ||||||
| - "@smarterclayton" | ||||||
| - "@abhinavdahiya" | ||||||
| - "@enxebre" | ||||||
| - "@deads2k" | ||||||
| approvers: | ||||||
| - "@abhinavdahiya" | ||||||
| - "@smarterclayton" | ||||||
| - "@enxebre" | ||||||
| - "@deads2k" | ||||||
| creation-date: 2019-11-06 | ||||||
| last-updated: 2019-11-06 | ||||||
| status: implemented | ||||||
| --- | ||||||
|
|
||||||
| # Adding Baremetal IPI capabilities to OpenShift | ||||||
|
|
||||||
| This enhancement serves to provide context for a whole slew of features | ||||||
| and enhancements that will follow to make Baremetal IPI deployments via | ||||||
| OpenShift a reality. | ||||||
|
|
||||||
| At the time of this writing, code for some of these enhancements have already | ||||||
| merged, some are in progress and others are yet to me implemented. References | ||||||
| to all these features in different stages of development will be provided | ||||||
| below. | ||||||
|
|
||||||
| ## Release Signoff Checklist | ||||||
|
|
||||||
| - [ ] Enhancement is `implementable` | ||||||
| - [ ] Design details are appropriately documented from clear requirements | ||||||
| - [ ] Test plan is defined | ||||||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||||||
| - [ ] User-facing documentation is created in [openshift/docs] | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| Baremetal IPI deployments enable OpenShift to enroll baremetal servers to become | ||||||
| Nodes that can run K8s workloads. | ||||||
| The Baremetal Operator [1] along with other provisioning services (Ironic and | ||||||
| dependencies) run in their own pod called "metal3". This pod is deployed by the | ||||||
| Machine API Operator when the Platform type is `BareMetal`. The OpenShift | ||||||
| Installer is responsble for providing all the necessary configs required for | ||||||
| a successful deployment. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| The motivation for this enhancement request is to provide a background for all the | ||||||
| the subsequent enhancement requests for Baremetal IPI deployments. | ||||||
|
|
||||||
| ### Goals | ||||||
|
|
||||||
| The goal of this enhancement request is to provide context for all the changes | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this doc should answer
I may edit this comment to add more items as I have questions. |
||||||
| that have already been merged towards making Baremetal IPI deployments a reality. | ||||||
| All future Baremetal enhancement requests will refer back to this one to provide | ||||||
| context. | ||||||
|
|
||||||
| ### Non-Goals | ||||||
|
|
||||||
| Raising development PRs as a result of this enhancement request. | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| Every OpenShift based Baremetal IPI deployment will run a "metal3" pod on | ||||||
| one Master Node. A "metal3" pod includes a container running BareMetal | ||||||
| Operator(BMO) and several other supporting containers that work together. | ||||||
|
|
||||||
| The BMO and other supporting containers together are able to discover a | ||||||
| baremetal server in a pre-determined provisioning network, learn the | ||||||
| HW attributes of the server and eventually boot it to make it available | ||||||
| as a Machine within a MachineSet. | ||||||
|
|
||||||
| The Machine API Operator (MAO) currently deploys the "metal3" pod only | ||||||
| when the Platform type is `BareMetal` but the BaremetalHost CRD is exposed | ||||||
| by the MAO as part of the release payload which is managed by the cluster | ||||||
| version operator. The MAO is responsible for starting the BMO and the | ||||||
| containers running the Ironic services and for providing these containers | ||||||
| with their necessary configurations via env vars. | ||||||
|
|
||||||
| The installer is responsible for kicking off a Baremetal IPI deployment | ||||||
| with the right configuration. | ||||||
|
|
||||||
| ### User Stories | ||||||
|
|
||||||
| With the addition of features described in this and other enhancements | ||||||
| detailed in this current directory, OpenShift can be used to bring up | ||||||
| a functioning cluster starting with a set of baremetal servers. As | ||||||
| mentioned earlier, these enhancements rely on the Baremetal Operator (BMO) | ||||||
| [1] running within the "metal3" pod to manage baremetal hosts. The BMO in | ||||||
| turn relies on the Ironic service [3] to manage and provision baremetal | ||||||
| servers. | ||||||
|
|
||||||
| 1. Will enable the user to deploy a control plane with 3 master nodes. | ||||||
| 2. Will enable the user to grow the cluster by dynamically adding worker | ||||||
| nodes. | ||||||
| 3. Will enable the user to scale down the cluster by removing worker nodes. | ||||||
|
|
||||||
| ### Implementation Details/Notes/Constraints | ||||||
|
|
||||||
| Baremetal IPI is integrated with OpenShift through the metal3.io [8] project. | ||||||
| Metal3.io is a set of Kubernetes controllers that wrap the OpenStack Ironic | ||||||
| project to provide Kubernetes native APIs for managing deployment and | ||||||
| monitoring of physical hosts. | ||||||
|
|
||||||
| The installer support for Baremetal IPI deployments is described in more detail | ||||||
| in [7]. The installer runs on a special "provisioning host" that needs to be | ||||||
| connected to both a "provisioning network" and an "external network". The | ||||||
| provisioning network is a dedicated network used just for the purposes of | ||||||
| configuring baremetal servers to be part of the cluster. The traffic on the | ||||||
| provisioning network needs to be isolated from the traffic on the external | ||||||
| network (hence 2 seperate networks.). The external network is used to carry | ||||||
| cluster traffic which which includes cluster control plane traffic, application | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lift the network section from the install doc. It's nice and crisp. Make a section for it here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| and data traffic. | ||||||
|
|
||||||
| Control Plane Deployment | ||||||
|
|
||||||
| 1. A minimin Baremetal IPI deployment consists of 4 hosts, one to be used | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doc needs to take into account or reference the future path for installer - removing bootstrap nodes, configuring a full master on the first instance, replacing the hardcoded load balancers with service load balancers for the apiserver, etc. we don’t want to have a doc that embeds point in time context, or fails to move in the direction of the platform.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Will reference other enhancement docs from this one. The plan is to keep this doc alive with updates whenever new features are being worked on. Also, as new enhancement requests are written for new features affecting the baremetal platform, the idea is that these new enhancements will point back to this for context.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sadasu I see clayton's request as outstanding. The load balancers in particular. How are they being created? By which actor? When?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The load balancer handling is documented here: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md This would be a good reference to link from this doc |
||||||
| first as a provisioning host and later potentially re-purposed as a worker. | ||||||
| The other 3 make up the control plane. These 4 hosts need to be connected | ||||||
| to both the provisioning and external networks. | ||||||
|
|
||||||
| 2. Installation can be kicked off by downloading and running | ||||||
| "openshift-baremetal-install". This image differs from the "openshift-install" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a deficiency - how are you planning to fix this?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason we're not part of the main openshift-install binary is our dependency on libvirt libraries. One proposal we discussed previously is to use https://github.com/digitalocean/go-libvirt which just hits the RPC directly, but we'd have to spend the time and rewrite the libvirt terraform provider. This isn't on our radar to fix in the near term. There's a spike on the installer team to investigate removing the bootstrap host which would make this easier. Do you want this documented here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @smarterclayton, as mentioned by @stbenjam removing the bootstrap node would remove our dependency on libvirt. That way baremetal IPI installs can be part of the Openshift installer. This is part of our road map. I have updated the document to reflect that. I think this "enhancement" is essentially provides a background that reviewers of baremetal IPI related code patches need to get context for what they are reviewing. We will make sure to keep this document alive. For these reasons, I think it would be helpful to merge this document.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where and how is this separate installer maintained?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a separate installer binary, it is part of the same repo. It is distributed as part of the baremetal-installer image (like libvirt-installer is distributed as part of libvirt-installer). It's extractable using @sadasu You may want to include this information here. |
||||||
| binary only because libvirt is needs to be always linked for the baremetal | ||||||
| install. Removing a bootstrap node would remove the dependency on libvirt | ||||||
| and then baremetal IPI installs can be part of the normal Openshift installer. | ||||||
| This is in the roadmap for this work and being investigated. | ||||||
|
|
||||||
| 3. The installer starts a bootstrap VM on the provisioning host. With other | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when you describe what a provisioning host is (that's missing at the moment), indicate what needs to be installed to make this VM startable. |
||||||
| platform types supported by OpenShift, a cloud already exists and the installer | ||||||
| runs the bootstrap VM on the control plane of this existing cloud. In the case | ||||||
| of the baremetal platform type, this cloud does not already exist, so the | ||||||
| installer starts the bootstrap VM using libvirt. | ||||||
|
|
||||||
| 4. The bootstrap VM needs to be connected to the provisioning network and so the | ||||||
| the network interface on the provisioning host that is connected to the | ||||||
| provisioning network needs to be provided to the installer. | ||||||
|
|
||||||
| 5. The bootstrap VM must be configured with a special well-known IP within the | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well-known like "always the same" or well-known like the cluster-admin informs the installer? |
||||||
| provisioning network that needs to provided as input to the installer. | ||||||
|
|
||||||
| 6. The installer user Ironic in the bootstrap VM to provision each host that | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when you describe components, list ironic. Is this running a static pod, is it a daemon on the bootstrap VM, something else? Does this mean the bootstrap VM has to live past installation? |
||||||
| makes up the control plane. The installer uses terraform to invoke Ironic API | ||||||
| that configures each host to boot over the provisioning network using DHCP | ||||||
| and PXE. | ||||||
|
|
||||||
| 7. The bootstrap VM runs a DHCP server and responds with network infomation and | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dhcp server to provide IPs for which network. How long does this DHCP server need to exist? is it user configurable (IP allocation is often dictated). |
||||||
| PXE instructions when Ironic powers on a host. The host boots the Ironic Agent | ||||||
| image which is hosted on the httpd instance also running on the bootstrap VM. | ||||||
|
|
||||||
| 8. After the Ironic Agent on the host boots and runs from its ramdisk image, it | ||||||
| looks for the Ironic Service either using an URL passed in as a kernel command line | ||||||
| arguement in the PXE response or by using MDNS to seach for Ironic in the local L2 | ||||||
| network. | ||||||
|
|
||||||
| 9. Ironic on the bootstrap VM then copies the RHCOS image hosted on the httpd | ||||||
| instance to the local disk of the host and also writes the necessary ignition files | ||||||
| so that the host can start creating the control plane when it runs the local image. | ||||||
|
|
||||||
| 10. After Ironic writes the image and ignition configs to the local disk of the host, | ||||||
| Ironic power cycles the host causing it to reboot. The boot order on the host is set | ||||||
| to boot from the image on the local drive instead of PXE booting. | ||||||
|
|
||||||
| 11. After the control plane hosts have an OS, the normal bootstrapping process continues | ||||||
| with the help of the bootstrap VM. The bootstrap VM runs a temporary API service to talk | ||||||
| to the etcd cluster on the control plane hosts. | ||||||
|
|
||||||
| 12. The manifests constructed by the installer are pushed into the new cluster. The | ||||||
| operators launched in the new cluster would bring up other services and reconcile cluster | ||||||
| state and configuration. | ||||||
|
|
||||||
| 13. The Machine API Opentaror (MAO) running on the control plane cluster detects the | ||||||
| platform type as being "baremetal" and launches the "metal3" pod and the cluster-api- | ||||||
| provider-baremetal (CAPBM) controller. The metal3 pod runs several Ironic services in | ||||||
| containers in addition to the baremetal-operator (BMO). After the control plane is | ||||||
| completely up, the bootstrap VM is destroyed. | ||||||
|
|
||||||
| 14. The baremetal-operator that is part of the metal3 service starts monitoring hosts | ||||||
| using the Ironic service which is also part of metal3. The baremetal-operator uses the | ||||||
| BareMetalHost CRD to get information about the on-board controllers on the servers. As | ||||||
| mentioned previously in this document, this CRD exists in non baremetal platform types | ||||||
| too but does not represent any usable information for other platforms. | ||||||
|
|
||||||
| Worker Deployment | ||||||
|
|
||||||
| Unlike the control plane deployment, the worker deployment is managed by metal3. Not | ||||||
| all aspects of worker deployment are implemented completely. | ||||||
|
|
||||||
| 1. All worker nodes need to be attached to both the provisioning and external networks | ||||||
| and configured to PXE boot over the provisioning network. A temporary provisioning IP | ||||||
| address in the provisioning network are assigned to each of these hosts. | ||||||
|
|
||||||
| 2. The user adds hosts to the available inventory for their cluster by creating | ||||||
| BareMetalHost CRs. For more information about the 3 CRs that already exist for a host | ||||||
| transitioning from a baremetal host to a Node, please refer to [9]. | ||||||
|
|
||||||
| 3. The cluster-api-provider-baremetal (CAPBM) controller finds an unassigned/free | ||||||
| BareMetalHost and uses it to fulfill a Machine resource. It then sets the configuration | ||||||
| on the host to start provisioning with the RHCOS image (using RHCOS image URL present | ||||||
| in the Machine provider spec) and the worker ignition config for the cluster. | ||||||
|
|
||||||
| 4. Baremetal operator uses the Ironic service to provision the worker nodes in a | ||||||
| process that is very similar to the provisioning of the control plane except for | ||||||
| some key differences. The DHCP server is now running within the metal3 pod instead | ||||||
| of in the bootstarp VM. | ||||||
|
|
||||||
| 5. The provisioning IP used to bring up worker nodes remains the same as the control | ||||||
| plane case and the provisoning network also remains the same. The installer also | ||||||
| provides with a DHCP range within the same network that the workers are assigned IP | ||||||
| addresses from. | ||||||
|
|
||||||
| 6. The ignition configs for the worker nodes are as passed as user data in the config | ||||||
| drive. Just as in the control plane hosts, Ironic power cycles the hosts that boot | ||||||
| using the RHCOS image now in their local disk. The host then joins the cluster as a | ||||||
| worker. | ||||||
|
|
||||||
| Currently, there is no way to pass the provisioning config known to the installer to | ||||||
| metal3 that is responsible for provisioning the workers. | ||||||
|
|
||||||
| ### Risks and Mitigations | ||||||
|
|
||||||
| Will be specified in follow-up enhancement requests mentioned above. | ||||||
|
|
||||||
| ## Design Details | ||||||
|
|
||||||
| ### Test Plan | ||||||
|
|
||||||
| True e2e and integration testing can happen only after implementation for | ||||||
| enhancement [2] lands. Until then, e2e testing is being performed with the | ||||||
| help of some developer scripts. | ||||||
|
|
||||||
| Unit tests have been added to MAO and the Installer to test additions | ||||||
| made for the Baremetal IPI case. | ||||||
|
|
||||||
| ### Graduation Criteria | ||||||
|
|
||||||
| Metal3 integration is in tech preview in 4.2 and is targetted for GA in 4.4. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should update the release target to GA in 4.6
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also fix the "targetted" typo
Suggested change
|
||||||
|
|
||||||
| Metal3 integration is currently missing an important piece to information on | ||||||
| the baremetal servers and ther provisioning environment. Without this, true | ||||||
| end to end testing cannot be performed in order to graduate to GA. | ||||||
|
|
||||||
| ### Upgrade / Downgrade Strategy | ||||||
|
|
||||||
| Metal3 integration is in tech preview n 4.2 and missing key pieces that allows | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| a user to specify the baremetal server details and its provisioning setup. It | ||||||
| is really not usable in this state without the help of external scripts that | ||||||
| provied the above information in the form of a Config Map. | ||||||
|
|
||||||
| In 4.4, when all the installer features land, the Metal3 integration would be | ||||||
| fully functional within OpenShift. Due to those reasons, at this point an | ||||||
| upgrade strategy would not be necessary. | ||||||
|
|
||||||
| ### Version Skew Strategy | ||||||
|
|
||||||
| This enahncement serves as a backgroup for the rest of the enhancements. We will | ||||||
| discuss the version skew strategy for each enhancement individually in their | ||||||
| respective requests. | ||||||
|
|
||||||
| ## Implementation History | ||||||
|
|
||||||
| Implementation to deploy a Metal3 cluster from the MAO was added via [4]. | ||||||
|
|
||||||
| ## Infrastructure Needed | ||||||
|
|
||||||
| The Baremetal IPI solution depends on the Baremetal Operator and the baremetal | ||||||
| Machine actuator both of which can be found at [5]. | ||||||
| OpenShift integration can be found here : [6]. | ||||||
| Implementation is complete on the metal3-io and relevant bits have been | ||||||
| added to the OpenShift repo. | ||||||
|
|
||||||
| [1] - https://github.com/metal3-io/baremetal-operator | ||||||
| [2] - https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/baremetal-provisioning-config.md | ||||||
| [3] - https://github.com/openstack/ironic | ||||||
| [4] - https://github.com/openshift/machine-api-operator/commit/43dd52d5d2dfea1559504a01970df31925501e35 | ||||||
| [5] - https://github.com/metal3-io | ||||||
| [6] - https://github.com/openshift-metal3 | ||||||
| [7] - https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md | ||||||
| [8] - https://metal3.io/ | ||||||
| [9] - https://github.com/metal3-io/metal3-docs/blob/master/design/nodes-machines-and-hosts.md | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this syntax works for links. I've suggested an alternative that should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there are several instances of this in the doc