Skip to content

Commit b99f8ad

Browse files
Merge pull request #102 from sadasu/baremetal-IPI
Providing background for Baremetal IPI based enhancements
2 parents 4634624 + 63167f7 commit b99f8ad

File tree

1 file changed

+283
-0
lines changed

1 file changed

+283
-0
lines changed
Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
---
2+
title: Adding Baremetal Installer Provisioned Infrastructure (IPI) to OpenShift
3+
authors:
4+
- "@sadasu"
5+
reviewers:
6+
- "@smarterclayton"
7+
- "@abhinavdahiya"
8+
- "@enxebre"
9+
- "@deads2k"
10+
approvers:
11+
- "@abhinavdahiya"
12+
- "@smarterclayton"
13+
- "@enxebre"
14+
- "@deads2k"
15+
creation-date: 2019-11-06
16+
last-updated: 2019-11-06
17+
status: implemented
18+
---
19+
20+
# Adding Baremetal IPI capabilities to OpenShift
21+
22+
This enhancement serves to provide context for a whole slew of features
23+
and enhancements that will follow to make Baremetal IPI deployments via
24+
OpenShift a reality.
25+
26+
At the time of this writing, code for some of these enhancements have already
27+
merged, some are in progress and others are yet to me implemented. References
28+
to all these features in different stages of development will be provided
29+
below.
30+
31+
## Release Signoff Checklist
32+
33+
- [ ] Enhancement is `implementable`
34+
- [ ] Design details are appropriately documented from clear requirements
35+
- [ ] Test plan is defined
36+
- [ ] Graduation criteria for dev preview, tech preview, GA
37+
- [ ] User-facing documentation is created in [openshift/docs]
38+
39+
## Summary
40+
41+
Baremetal IPI deployments enable OpenShift to enroll baremetal servers to become
42+
Nodes that can run K8s workloads.
43+
The Baremetal Operator [1] along with other provisioning services (Ironic and
44+
dependencies) run in their own pod called "metal3". This pod is deployed by the
45+
Machine API Operator when the Platform type is `BareMetal`. The OpenShift
46+
Installer is responsble for providing all the necessary configs required for
47+
a successful deployment.
48+
49+
## Motivation
50+
51+
The motivation for this enhancement request is to provide a background for all the
52+
the subsequent enhancement requests for Baremetal IPI deployments.
53+
54+
### Goals
55+
56+
The goal of this enhancement request is to provide context for all the changes
57+
that have already been merged towards making Baremetal IPI deployments a reality.
58+
All future Baremetal enhancement requests will refer back to this one to provide
59+
context.
60+
61+
### Non-Goals
62+
63+
Raising development PRs as a result of this enhancement request.
64+
65+
## Proposal
66+
67+
Every OpenShift based Baremetal IPI deployment will run a "metal3" pod on
68+
one Master Node. A "metal3" pod includes a container running BareMetal
69+
Operator(BMO) and several other supporting containers that work together.
70+
71+
The BMO and other supporting containers together are able to discover a
72+
baremetal server in a pre-determined provisioning network, learn the
73+
HW attributes of the server and eventually boot it to make it available
74+
as a Machine within a MachineSet.
75+
76+
The Machine API Operator (MAO) currently deploys the "metal3" pod only
77+
when the Platform type is `BareMetal` but the BaremetalHost CRD is exposed
78+
by the MAO as part of the release payload which is managed by the cluster
79+
version operator. The MAO is responsible for starting the BMO and the
80+
containers running the Ironic services and for providing these containers
81+
with their necessary configurations via env vars.
82+
83+
The installer is responsible for kicking off a Baremetal IPI deployment
84+
with the right configuration.
85+
86+
### User Stories
87+
88+
With the addition of features described in this and other enhancements
89+
detailed in this current directory, OpenShift can be used to bring up
90+
a functioning cluster starting with a set of baremetal servers. As
91+
mentioned earlier, these enhancements rely on the Baremetal Operator (BMO)
92+
[1] running within the "metal3" pod to manage baremetal hosts. The BMO in
93+
turn relies on the Ironic service [3] to manage and provision baremetal
94+
servers.
95+
96+
1. Will enable the user to deploy a control plane with 3 master nodes.
97+
2. Will enable the user to grow the cluster by dynamically adding worker
98+
nodes.
99+
3. Will enable the user to scale down the cluster by removing worker nodes.
100+
101+
### Implementation Details/Notes/Constraints
102+
103+
Baremetal IPI is integrated with OpenShift through the metal3.io [8] project.
104+
Metal3.io is a set of Kubernetes controllers that wrap the OpenStack Ironic
105+
project to provide Kubernetes native APIs for managing deployment and
106+
monitoring of physical hosts.
107+
108+
The installer support for Baremetal IPI deployments is described in more detail
109+
in [7]. The installer runs on a special "provisioning host" that needs to be
110+
connected to both a "provisioning network" and an "external network". The
111+
provisioning network is a dedicated network used just for the purposes of
112+
configuring baremetal servers to be part of the cluster. The traffic on the
113+
provisioning network needs to be isolated from the traffic on the external
114+
network (hence 2 seperate networks.). The external network is used to carry
115+
cluster traffic which which includes cluster control plane traffic, application
116+
and data traffic.
117+
118+
Control Plane Deployment
119+
120+
1. A minimin Baremetal IPI deployment consists of 4 hosts, one to be used
121+
first as a provisioning host and later potentially re-purposed as a worker.
122+
The other 3 make up the control plane. These 4 hosts need to be connected
123+
to both the provisioning and external networks.
124+
125+
2. Installation can be kicked off by downloading and running
126+
"openshift-baremetal-install". This image differs from the "openshift-install"
127+
binary only because libvirt is needs to be always linked for the baremetal
128+
install. Removing a bootstrap node would remove the dependency on libvirt
129+
and then baremetal IPI installs can be part of the normal Openshift installer.
130+
This is in the roadmap for this work and being investigated.
131+
132+
3. The installer starts a bootstrap VM on the provisioning host. With other
133+
platform types supported by OpenShift, a cloud already exists and the installer
134+
runs the bootstrap VM on the control plane of this existing cloud. In the case
135+
of the baremetal platform type, this cloud does not already exist, so the
136+
installer starts the bootstrap VM using libvirt.
137+
138+
4. The bootstrap VM needs to be connected to the provisioning network and so the
139+
the network interface on the provisioning host that is connected to the
140+
provisioning network needs to be provided to the installer.
141+
142+
5. The bootstrap VM must be configured with a special well-known IP within the
143+
provisioning network that needs to provided as input to the installer.
144+
145+
6. The installer user Ironic in the bootstrap VM to provision each host that
146+
makes up the control plane. The installer uses terraform to invoke Ironic API
147+
that configures each host to boot over the provisioning network using DHCP
148+
and PXE.
149+
150+
7. The bootstrap VM runs a DHCP server and responds with network infomation and
151+
PXE instructions when Ironic powers on a host. The host boots the Ironic Agent
152+
image which is hosted on the httpd instance also running on the bootstrap VM.
153+
154+
8. After the Ironic Agent on the host boots and runs from its ramdisk image, it
155+
looks for the Ironic Service either using an URL passed in as a kernel command line
156+
arguement in the PXE response or by using MDNS to seach for Ironic in the local L2
157+
network.
158+
159+
9. Ironic on the bootstrap VM then copies the RHCOS image hosted on the httpd
160+
instance to the local disk of the host and also writes the necessary ignition files
161+
so that the host can start creating the control plane when it runs the local image.
162+
163+
10. After Ironic writes the image and ignition configs to the local disk of the host,
164+
Ironic power cycles the host causing it to reboot. The boot order on the host is set
165+
to boot from the image on the local drive instead of PXE booting.
166+
167+
11. After the control plane hosts have an OS, the normal bootstrapping process continues
168+
with the help of the bootstrap VM. The bootstrap VM runs a temporary API service to talk
169+
to the etcd cluster on the control plane hosts.
170+
171+
12. The manifests constructed by the installer are pushed into the new cluster. The
172+
operators launched in the new cluster would bring up other services and reconcile cluster
173+
state and configuration.
174+
175+
13. The Machine API Opentaror (MAO) running on the control plane cluster detects the
176+
platform type as being "baremetal" and launches the "metal3" pod and the cluster-api-
177+
provider-baremetal (CAPBM) controller. The metal3 pod runs several Ironic services in
178+
containers in addition to the baremetal-operator (BMO). After the control plane is
179+
completely up, the bootstrap VM is destroyed.
180+
181+
14. The baremetal-operator that is part of the metal3 service starts monitoring hosts
182+
using the Ironic service which is also part of metal3. The baremetal-operator uses the
183+
BareMetalHost CRD to get information about the on-board controllers on the servers. As
184+
mentioned previously in this document, this CRD exists in non baremetal platform types
185+
too but does not represent any usable information for other platforms.
186+
187+
Worker Deployment
188+
189+
Unlike the control plane deployment, the worker deployment is managed by metal3. Not
190+
all aspects of worker deployment are implemented completely.
191+
192+
1. All worker nodes need to be attached to both the provisioning and external networks
193+
and configured to PXE boot over the provisioning network. A temporary provisioning IP
194+
address in the provisioning network are assigned to each of these hosts.
195+
196+
2. The user adds hosts to the available inventory for their cluster by creating
197+
BareMetalHost CRs. For more information about the 3 CRs that already exist for a host
198+
transitioning from a baremetal host to a Node, please refer to [9].
199+
200+
3. The cluster-api-provider-baremetal (CAPBM) controller finds an unassigned/free
201+
BareMetalHost and uses it to fulfill a Machine resource. It then sets the configuration
202+
on the host to start provisioning with the RHCOS image (using RHCOS image URL present
203+
in the Machine provider spec) and the worker ignition config for the cluster.
204+
205+
4. Baremetal operator uses the Ironic service to provision the worker nodes in a
206+
process that is very similar to the provisioning of the control plane except for
207+
some key differences. The DHCP server is now running within the metal3 pod instead
208+
of in the bootstarp VM.
209+
210+
5. The provisioning IP used to bring up worker nodes remains the same as the control
211+
plane case and the provisoning network also remains the same. The installer also
212+
provides with a DHCP range within the same network that the workers are assigned IP
213+
addresses from.
214+
215+
6. The ignition configs for the worker nodes are as passed as user data in the config
216+
drive. Just as in the control plane hosts, Ironic power cycles the hosts that boot
217+
using the RHCOS image now in their local disk. The host then joins the cluster as a
218+
worker.
219+
220+
Currently, there is no way to pass the provisioning config known to the installer to
221+
metal3 that is responsible for provisioning the workers.
222+
223+
### Risks and Mitigations
224+
225+
Will be specified in follow-up enhancement requests mentioned above.
226+
227+
## Design Details
228+
229+
### Test Plan
230+
231+
True e2e and integration testing can happen only after implementation for
232+
enhancement [2] lands. Until then, e2e testing is being performed with the
233+
help of some developer scripts.
234+
235+
Unit tests have been added to MAO and the Installer to test additions
236+
made for the Baremetal IPI case.
237+
238+
### Graduation Criteria
239+
240+
Metal3 integration is in tech preview in 4.2 and is targetted for GA in 4.4.
241+
242+
Metal3 integration is currently missing an important piece to information on
243+
the baremetal servers and ther provisioning environment. Without this, true
244+
end to end testing cannot be performed in order to graduate to GA.
245+
246+
### Upgrade / Downgrade Strategy
247+
248+
Metal3 integration is in tech preview n 4.2 and missing key pieces that allows
249+
a user to specify the baremetal server details and its provisioning setup. It
250+
is really not usable in this state without the help of external scripts that
251+
provied the above information in the form of a Config Map.
252+
253+
In 4.4, when all the installer features land, the Metal3 integration would be
254+
fully functional within OpenShift. Due to those reasons, at this point an
255+
upgrade strategy would not be necessary.
256+
257+
### Version Skew Strategy
258+
259+
This enahncement serves as a backgroup for the rest of the enhancements. We will
260+
discuss the version skew strategy for each enhancement individually in their
261+
respective requests.
262+
263+
## Implementation History
264+
265+
Implementation to deploy a Metal3 cluster from the MAO was added via [4].
266+
267+
## Infrastructure Needed
268+
269+
The Baremetal IPI solution depends on the Baremetal Operator and the baremetal
270+
Machine actuator both of which can be found at [5].
271+
OpenShift integration can be found here : [6].
272+
Implementation is complete on the metal3-io and relevant bits have been
273+
added to the OpenShift repo.
274+
275+
[1] - https://github.com/metal3-io/baremetal-operator
276+
[2] - https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/baremetal-provisioning-config.md
277+
[3] - https://github.com/openstack/ironic
278+
[4] - https://github.com/openshift/machine-api-operator/commit/43dd52d5d2dfea1559504a01970df31925501e35
279+
[5] - https://github.com/metal3-io
280+
[6] - https://github.com/openshift-metal3
281+
[7] - https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
282+
[8] - https://metal3.io/
283+
[9] - https://github.com/metal3-io/metal3-docs/blob/master/design/nodes-machines-and-hosts.md

0 commit comments

Comments
 (0)