Skip to content

Commit 63167f7

Browse files
committed
Proving background for Baremetal IPI based enhancements
This enhancement request is intented to provide context for all the work that is in progress for BareMetal IPI deployments and a backdrop for all the future enhancement requests in this area.
1 parent 0de25c8 commit 63167f7

File tree

1 file changed

+283
-0
lines changed

1 file changed

+283
-0
lines changed
Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
---
2+
title: Adding Baremetal Installer Provisioned Infrastructure (IPI) to OpenShift
3+
authors:
4+
- "@sadasu"
5+
reviewers:
6+
- "@smarterclayton"
7+
- "@abhinavdahiya"
8+
- "@enxebre"
9+
- "@deads2k"
10+
approvers:
11+
- "@abhinavdahiya"
12+
- "@smarterclayton"
13+
- "@enxebre"
14+
- "@deads2k"
15+
creation-date: 2019-11-06
16+
last-updated: 2019-11-06
17+
status: implemented
18+
---
19+
20+
# Adding Baremetal IPI capabilities to OpenShift
21+
22+
This enhancement serves to provide context for a whole slew of features
23+
and enhancements that will follow to make Baremetal IPI deployments via
24+
OpenShift a reality.
25+
26+
At the time of this writing, code for some of these enhancements have already
27+
merged, some are in progress and others are yet to me implemented. References
28+
to all these features in different stages of development will be provided
29+
below.
30+
31+
## Release Signoff Checklist
32+
33+
- [ ] Enhancement is `implementable`
34+
- [ ] Design details are appropriately documented from clear requirements
35+
- [ ] Test plan is defined
36+
- [ ] Graduation criteria for dev preview, tech preview, GA
37+
- [ ] User-facing documentation is created in [openshift/docs]
38+
39+
## Summary
40+
41+
Baremetal IPI deployments enable OpenShift to enroll baremetal servers to become
42+
Nodes that can run K8s workloads.
43+
The Baremetal Operator [1] along with other provisioning services (Ironic and
44+
dependencies) run in their own pod called "metal3". This pod is deployed by the
45+
Machine API Operator when the Platform type is `BareMetal`. The OpenShift
46+
Installer is responsble for providing all the necessary configs required for
47+
a successful deployment.
48+
49+
## Motivation
50+
51+
The motivation for this enhancement request is to provide a background for all the
52+
the subsequent enhancement requests for Baremetal IPI deployments.
53+
54+
### Goals
55+
56+
The goal of this enhancement request is to provide context for all the changes
57+
that have already been merged towards making Baremetal IPI deployments a reality.
58+
All future Baremetal enhancement requests will refer back to this one to provide
59+
context.
60+
61+
### Non-Goals
62+
63+
Raising development PRs as a result of this enhancement request.
64+
65+
## Proposal
66+
67+
Every OpenShift based Baremetal IPI deployment will run a "metal3" pod on
68+
one Master Node. A "metal3" pod includes a container running BareMetal
69+
Operator(BMO) and several other supporting containers that work together.
70+
71+
The BMO and other supporting containers together are able to discover a
72+
baremetal server in a pre-determined provisioning network, learn the
73+
HW attributes of the server and eventually boot it to make it available
74+
as a Machine within a MachineSet.
75+
76+
The Machine API Operator (MAO) currently deploys the "metal3" pod only
77+
when the Platform type is `BareMetal` but the BaremetalHost CRD is exposed
78+
by the MAO as part of the release payload which is managed by the cluster
79+
version operator. The MAO is responsible for starting the BMO and the
80+
containers running the Ironic services and for providing these containers
81+
with their necessary configurations via env vars.
82+
83+
The installer is responsible for kicking off a Baremetal IPI deployment
84+
with the right configuration.
85+
86+
### User Stories
87+
88+
With the addition of features described in this and other enhancements
89+
detailed in this current directory, OpenShift can be used to bring up
90+
a functioning cluster starting with a set of baremetal servers. As
91+
mentioned earlier, these enhancements rely on the Baremetal Operator (BMO)
92+
[1] running within the "metal3" pod to manage baremetal hosts. The BMO in
93+
turn relies on the Ironic service [3] to manage and provision baremetal
94+
servers.
95+
96+
1. Will enable the user to deploy a control plane with 3 master nodes.
97+
2. Will enable the user to grow the cluster by dynamically adding worker
98+
nodes.
99+
3. Will enable the user to scale down the cluster by removing worker nodes.
100+
101+
### Implementation Details/Notes/Constraints
102+
103+
Baremetal IPI is integrated with OpenShift through the metal3.io [8] project.
104+
Metal3.io is a set of Kubernetes controllers that wrap the OpenStack Ironic
105+
project to provide Kubernetes native APIs for managing deployment and
106+
monitoring of physical hosts.
107+
108+
The installer support for Baremetal IPI deployments is described in more detail
109+
in [7]. The installer runs on a special "provisioning host" that needs to be
110+
connected to both a "provisioning network" and an "external network". The
111+
provisioning network is a dedicated network used just for the purposes of
112+
configuring baremetal servers to be part of the cluster. The traffic on the
113+
provisioning network needs to be isolated from the traffic on the external
114+
network (hence 2 seperate networks.). The external network is used to carry
115+
cluster traffic which which includes cluster control plane traffic, application
116+
and data traffic.
117+
118+
Control Plane Deployment
119+
120+
1. A minimin Baremetal IPI deployment consists of 4 hosts, one to be used
121+
first as a provisioning host and later potentially re-purposed as a worker.
122+
The other 3 make up the control plane. These 4 hosts need to be connected
123+
to both the provisioning and external networks.
124+
125+
2. Installation can be kicked off by downloading and running
126+
"openshift-baremetal-install". This image differs from the "openshift-install"
127+
binary only because libvirt is needs to be always linked for the baremetal
128+
install. Removing a bootstrap node would remove the dependency on libvirt
129+
and then baremetal IPI installs can be part of the normal Openshift installer.
130+
This is in the roadmap for this work and being investigated.
131+
132+
3. The installer starts a bootstrap VM on the provisioning host. With other
133+
platform types supported by OpenShift, a cloud already exists and the installer
134+
runs the bootstrap VM on the control plane of this existing cloud. In the case
135+
of the baremetal platform type, this cloud does not already exist, so the
136+
installer starts the bootstrap VM using libvirt.
137+
138+
4. The bootstrap VM needs to be connected to the provisioning network and so the
139+
the network interface on the provisioning host that is connected to the
140+
provisioning network needs to be provided to the installer.
141+
142+
5. The bootstrap VM must be configured with a special well-known IP within the
143+
provisioning network that needs to provided as input to the installer.
144+
145+
6. The installer user Ironic in the bootstrap VM to provision each host that
146+
makes up the control plane. The installer uses terraform to invoke Ironic API
147+
that configures each host to boot over the provisioning network using DHCP
148+
and PXE.
149+
150+
7. The bootstrap VM runs a DHCP server and responds with network infomation and
151+
PXE instructions when Ironic powers on a host. The host boots the Ironic Agent
152+
image which is hosted on the httpd instance also running on the bootstrap VM.
153+
154+
8. After the Ironic Agent on the host boots and runs from its ramdisk image, it
155+
looks for the Ironic Service either using an URL passed in as a kernel command line
156+
arguement in the PXE response or by using MDNS to seach for Ironic in the local L2
157+
network.
158+
159+
9. Ironic on the bootstrap VM then copies the RHCOS image hosted on the httpd
160+
instance to the local disk of the host and also writes the necessary ignition files
161+
so that the host can start creating the control plane when it runs the local image.
162+
163+
10. After Ironic writes the image and ignition configs to the local disk of the host,
164+
Ironic power cycles the host causing it to reboot. The boot order on the host is set
165+
to boot from the image on the local drive instead of PXE booting.
166+
167+
11. After the control plane hosts have an OS, the normal bootstrapping process continues
168+
with the help of the bootstrap VM. The bootstrap VM runs a temporary API service to talk
169+
to the etcd cluster on the control plane hosts.
170+
171+
12. The manifests constructed by the installer are pushed into the new cluster. The
172+
operators launched in the new cluster would bring up other services and reconcile cluster
173+
state and configuration.
174+
175+
13. The Machine API Opentaror (MAO) running on the control plane cluster detects the
176+
platform type as being "baremetal" and launches the "metal3" pod and the cluster-api-
177+
provider-baremetal (CAPBM) controller. The metal3 pod runs several Ironic services in
178+
containers in addition to the baremetal-operator (BMO). After the control plane is
179+
completely up, the bootstrap VM is destroyed.
180+
181+
14. The baremetal-operator that is part of the metal3 service starts monitoring hosts
182+
using the Ironic service which is also part of metal3. The baremetal-operator uses the
183+
BareMetalHost CRD to get information about the on-board controllers on the servers. As
184+
mentioned previously in this document, this CRD exists in non baremetal platform types
185+
too but does not represent any usable information for other platforms.
186+
187+
Worker Deployment
188+
189+
Unlike the control plane deployment, the worker deployment is managed by metal3. Not
190+
all aspects of worker deployment are implemented completely.
191+
192+
1. All worker nodes need to be attached to both the provisioning and external networks
193+
and configured to PXE boot over the provisioning network. A temporary provisioning IP
194+
address in the provisioning network are assigned to each of these hosts.
195+
196+
2. The user adds hosts to the available inventory for their cluster by creating
197+
BareMetalHost CRs. For more information about the 3 CRs that already exist for a host
198+
transitioning from a baremetal host to a Node, please refer to [9].
199+
200+
3. The cluster-api-provider-baremetal (CAPBM) controller finds an unassigned/free
201+
BareMetalHost and uses it to fulfill a Machine resource. It then sets the configuration
202+
on the host to start provisioning with the RHCOS image (using RHCOS image URL present
203+
in the Machine provider spec) and the worker ignition config for the cluster.
204+
205+
4. Baremetal operator uses the Ironic service to provision the worker nodes in a
206+
process that is very similar to the provisioning of the control plane except for
207+
some key differences. The DHCP server is now running within the metal3 pod instead
208+
of in the bootstarp VM.
209+
210+
5. The provisioning IP used to bring up worker nodes remains the same as the control
211+
plane case and the provisoning network also remains the same. The installer also
212+
provides with a DHCP range within the same network that the workers are assigned IP
213+
addresses from.
214+
215+
6. The ignition configs for the worker nodes are as passed as user data in the config
216+
drive. Just as in the control plane hosts, Ironic power cycles the hosts that boot
217+
using the RHCOS image now in their local disk. The host then joins the cluster as a
218+
worker.
219+
220+
Currently, there is no way to pass the provisioning config known to the installer to
221+
metal3 that is responsible for provisioning the workers.
222+
223+
### Risks and Mitigations
224+
225+
Will be specified in follow-up enhancement requests mentioned above.
226+
227+
## Design Details
228+
229+
### Test Plan
230+
231+
True e2e and integration testing can happen only after implementation for
232+
enhancement [2] lands. Until then, e2e testing is being performed with the
233+
help of some developer scripts.
234+
235+
Unit tests have been added to MAO and the Installer to test additions
236+
made for the Baremetal IPI case.
237+
238+
### Graduation Criteria
239+
240+
Metal3 integration is in tech preview in 4.2 and is targetted for GA in 4.4.
241+
242+
Metal3 integration is currently missing an important piece to information on
243+
the baremetal servers and ther provisioning environment. Without this, true
244+
end to end testing cannot be performed in order to graduate to GA.
245+
246+
### Upgrade / Downgrade Strategy
247+
248+
Metal3 integration is in tech preview n 4.2 and missing key pieces that allows
249+
a user to specify the baremetal server details and its provisioning setup. It
250+
is really not usable in this state without the help of external scripts that
251+
provied the above information in the form of a Config Map.
252+
253+
In 4.4, when all the installer features land, the Metal3 integration would be
254+
fully functional within OpenShift. Due to those reasons, at this point an
255+
upgrade strategy would not be necessary.
256+
257+
### Version Skew Strategy
258+
259+
This enahncement serves as a backgroup for the rest of the enhancements. We will
260+
discuss the version skew strategy for each enhancement individually in their
261+
respective requests.
262+
263+
## Implementation History
264+
265+
Implementation to deploy a Metal3 cluster from the MAO was added via [4].
266+
267+
## Infrastructure Needed
268+
269+
The Baremetal IPI solution depends on the Baremetal Operator and the baremetal
270+
Machine actuator both of which can be found at [5].
271+
OpenShift integration can be found here : [6].
272+
Implementation is complete on the metal3-io and relevant bits have been
273+
added to the OpenShift repo.
274+
275+
[1] - https://github.com/metal3-io/baremetal-operator
276+
[2] - https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/baremetal-provisioning-config.md
277+
[3] - https://github.com/openstack/ironic
278+
[4] - https://github.com/openshift/machine-api-operator/commit/43dd52d5d2dfea1559504a01970df31925501e35
279+
[5] - https://github.com/metal3-io
280+
[6] - https://github.com/openshift-metal3
281+
[7] - https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
282+
[8] - https://metal3.io/
283+
[9] - https://github.com/metal3-io/metal3-docs/blob/master/design/nodes-machines-and-hosts.md

0 commit comments

Comments
 (0)