Skip to content

Commit 7066df8

Browse files
committed
Proving background for Baremetal IPI based enhancements
This enhancement request is intented to provide context for all the work that is in progress for BareMetal IPI deployments and a backdrop for all the future enhancement requests in this area.
1 parent ee0368a commit 7066df8

File tree

1 file changed

+281
-0
lines changed

1 file changed

+281
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
---
2+
title: Adding Baremetal Installer Provisioned Infrastructure (IPI) to OpenShift
3+
authors:
4+
- "@sadasu"
5+
reviewers:
6+
- "@smarterclayton"
7+
- "@abhinavdahiya"
8+
- "@enxebre"
9+
- "@deads2k"
10+
approvers:
11+
- "@abhinavdahiya"
12+
- "@smarterclayton"
13+
- "@enxebre"
14+
- "@deads2k"
15+
creation-date: 2019-11-06
16+
last-updated: 2019-11-06
17+
status: implemented
18+
---
19+
20+
# Adding Baremetal IPI capabilities to OpenShift
21+
22+
This enhancement serves to provide context for a whole slew of features
23+
and enhancements that will follow to make Baremetal IPI deployments via
24+
OpenShift a reality.
25+
26+
At the time of this writing, code for some of these enhancements have already
27+
merged, some are in progress and others are yet to me implemented. References
28+
to all these features in different stages of development will be provided
29+
below.
30+
31+
## Release Signoff Checklist
32+
33+
- [x] Enhancement is `implementable`
34+
- [x] Design details are appropriately documented from clear requirements
35+
- [ ] Test plan is defined
36+
- [ ] Graduation criteria for dev preview, tech preview, GA
37+
- [ ] User-facing documentation is created in [openshift/docs]
38+
39+
## Summary
40+
41+
Baremetal IPI deployments enable OpenShift to enroll baremetal servers to become
42+
Nodes that can run K8s workloads.
43+
The Baremetal Operator [1] along with other provisioning services (Ironic and
44+
dependencies) run in their own pod called "metal3". This pod is deployed by the
45+
Machine API Operator when the Platform type is `BareMetal`. The OpenShift
46+
Installer is responsble for providing all the necessary configs required for
47+
a successful deployment.
48+
49+
## Motivation
50+
51+
The motivation for this enhancement request is to provide a background for all the
52+
the subsequent enhancement requests for Baremetal IPI deployments.
53+
54+
### Goals
55+
56+
The goal of this enhancement request is to provide context for all the changes
57+
that have already been merged towards making Baremetal IPI deployments a reality.
58+
All future Baremetal enhancement requests will refer back to this one to provide
59+
context.
60+
61+
### Non-Goals
62+
63+
Raising development PRs as a result of this enhancement request.
64+
65+
## Proposal
66+
67+
Every OpenShift based Baremetal IPI deployment will run a "metal3" pod on
68+
one Master Node. A "metal3" pod includes a container running BareMetal
69+
Operator(BMO) and several other supporting containers that work together.
70+
71+
The BMO and other supporting containers together are able to discover a
72+
baremetal server in a pre-determined provisioning network, learn the
73+
HW attributes of the server and eventually boot it to make it available
74+
as a Machine within a MachineSet.
75+
76+
The Machine API Operator (MAO) currently deploys the "metal3" pod only
77+
when the Platform type is `BareMetal` but the BaremetalHost CRD is exposed
78+
by the MAO as part of the release payload which is managed by the cluster
79+
version operator. The MAO is responsible for starting the BMO and the
80+
containers running the Ironic services and for providing these containers
81+
with their necessary configurations via env vars.
82+
83+
The installer is responsible for kicking off a Baremetal IPI deployment
84+
with the right configuration.
85+
86+
### User Stories
87+
88+
With the addition of features described in this and other enhancements
89+
detailed in this current directory, OpenShift can be used to bring up
90+
a functioning cluster starting with a set of baremetal servers. As
91+
mentioned earlier, these enhancements rely on the Baremetal Operator (BMO)
92+
[1] running within the "metal3" pod to manage baremetal hosts. The BMO in
93+
turn relies on the Ironic service [3] to manage and provision baremetal
94+
servers.
95+
96+
1. Will enable the user to deploy a control plane with 3 master nodes.
97+
2. Will enable the user to grow the cluster by dynamically adding worker
98+
nodes.
99+
3. Will enable the user to scale down the cluster by removing worker nodes.
100+
101+
### Implementation Details/Notes/Constraints
102+
103+
Baremetal IPI is integrated with OpenShift through the metal3.io [8] project.
104+
Metal3.io is a set of Kubernetes controllers that wrap the OpenStack Ironic
105+
project to provide Kubernetes native APIs for managing deployment and
106+
monitoring of physical hosts.
107+
108+
The installer support for Baremetal IPI deployments is described in more detail
109+
in [7]. The installer runs on a special "provisioning host" that needs to be
110+
connected to both a "provisioning network" and an "external network". The
111+
provisioning network is a dedicated network used just for the purposes of
112+
configuring baremetal servers to be part of the cluster. The traffic on the
113+
provisioning network needs to be isolated from the traffic on the external
114+
network (hence 2 seperate networks.). The external network is used to carry
115+
cluster traffic which which includes cluster control plane traffic, application
116+
and data traffic.
117+
118+
Control Plane Deployment
119+
120+
1. A minimin Baremetal IPI deployment consists of 4 hosts, one to be used
121+
first as a provisioning host and later potentially re-purposed as a worker.
122+
The other 3 make up the control plane. These 4 hosts need to be connected
123+
to both the provisioning and external networks.
124+
125+
2. Installation can be kicked off by downloading and running
126+
"openshift-baremetal-install". This image differs from the "openshift-install"
127+
binary only because libvirt is needs to be always linked for the baremetal
128+
install.
129+
130+
3. The installer starts a bootstrap VM on the provisioning host. With other
131+
platform types supported by OpenShift, a cloud already exists and the installer
132+
runs the bootstrap VM on the control plane of this existing cloud. In the case
133+
of the baremetal platform type, this cloud does not already exist, so the
134+
installer starts the bootstrap VM using libvirt.
135+
136+
4. The bootstrap VM needs to be connected to the provisioning network and so the
137+
the network interface on the provisioning host that is connected to the
138+
provisioning network needs to be provided to the installer.
139+
140+
5. The bootstrap VM must be configured with a special well-known IP within the
141+
provisioning network that needs to provided as input to the installer.
142+
143+
6. The installer user Ironic in the bootstrap VM to provision each host that
144+
makes up the control plane. The installer uses terraform to invoke Ironic API
145+
that configures each host to boot over the provisioning network using DHCP
146+
and PXE.
147+
148+
7. The bootstrap VM runs a DHCP server and responds with network infomation and
149+
PXE instructions when Ironic powers on a host. The host boots the Ironic Agent
150+
image which is hosted on the httpd instance also running on the bootstrap VM.
151+
152+
8. After the Ironic Agent on the host boots and runs from its ramdisk image, it
153+
looks for the Ironic Service either using an URL passed in as a kernel command line
154+
arguement in the PXE response or by using MDNS to seach for Ironic in the local L2
155+
network.
156+
157+
9. Ironic on the bootstrap VM then copies the RHCOS image hosted on the httpd
158+
instance to the local disk of the host and also writes the necessary ignition files
159+
so that the host can start creating the control plane when it runs the local image.
160+
161+
10. After Ironic writes the image and ignition configs to the local disk of the host,
162+
Ironic power cycles the host causing it to reboot. The boot order on the host is set
163+
to boot from the image on the local drive instead of PXE booting.
164+
165+
11. After the control plane hosts have an OS, the normal bootstrapping process continues
166+
with the help of the bootstrap VM. The bootstrap VM runs a temporary API service to talk
167+
to the etcd cluster on the control plane hosts.
168+
169+
12. The manifests constructed by the installer are pushed into the new cluster. The
170+
operators launched in the new cluster would bring up other services and reconcile cluster
171+
state and configuration.
172+
173+
13. The Machine API Opentaror (MAO) running on the control plane cluster detects the
174+
platform type as being "baremetal" and launches the "metal3" pod and the cluster-api-
175+
provider-baremetal (CAPBM) controller. The metal3 pod runs several Ironic services in
176+
containers in addition to the baremetal-operator (BMO). After the control plane is
177+
completely up, the bootstrap VM is destroyed.
178+
179+
14. The baremetal-operator that is part of the metal3 service starts monitoring hosts
180+
using the Ironic service which is also part of metal3. The baremetal-operator uses the
181+
BareMetalHost CRD to get information about the on-board controllers on the servers. As
182+
mentioned previously in this document, this CRD exists in non baremetal platform types
183+
too but does not represent any usable information for other platforms.
184+
185+
Worker Deployment
186+
187+
Unlike the control plane deployment, the worker deployment is managed by metal3. Not
188+
all aspects of worker deployment are implemented completely.
189+
190+
1. All worker nodes need to be attached to both the provisioning and external networks
191+
and configured to PXE boot over the provisioning network. A temporary provisioning IP
192+
address in the provisioning network are assigned to each of these hosts.
193+
194+
2. The user adds hosts to the available inventory for their cluster by creating
195+
BareMetalHost CRs. For more information about the 3 CRs that already exist for a host
196+
transitioning from a baremetal host to a Node, please refer to [9].
197+
198+
3. The cluster-api-provider-baremetal (CAPBM) controller finds an unassigned/free
199+
BareMetalHost and uses it to fulfill a Machine resource. It the sets the configuration
200+
on the host to start provisioning with the RHCOS image (using RHCOS image URL present
201+
in the Machine provider spec) and the worker ignition config for the cluster.
202+
203+
4. Baremetal operator uses the Ironic service to provision the worker nodes in a
204+
process that is very similar to the provisioning of the control plane except for
205+
some key differences. The DHCP server is now running within the metal3 pod instead
206+
of in the bootstarp VM.
207+
208+
5. The provisioning IP used to bring up worker nodes remains the same as the control
209+
plane case and the provisoning network also remains the same. The installer also
210+
provides with a DHCP range within the same network that the workers are assigned IP
211+
addresses from.
212+
213+
6. The ignition configs for the worker nodes are as passed as user data in the config
214+
drive. Just as in the control plane hosts, Ironic power cycles the hosts that boot
215+
using the RHCOS image now in their local disk. The host then joins the cluster as a
216+
worker.
217+
218+
Currently, there is no way to pass the provisioning config known to the installer to
219+
metal3 that is responsible for provisioning the workers.
220+
221+
### Risks and Mitigations
222+
223+
Will be specified in follow-up enhancement requests mentioned above.
224+
225+
## Design Details
226+
227+
### Test Plan
228+
229+
True e2e and integration testing can happen only after implementation for
230+
enhancement [2] lands. Until then, e2e testing is being performed with the
231+
help of some developer scripts.
232+
233+
Unit tests have been added to MAO and the Installer to test additions
234+
made for the Baremetal IPI case.
235+
236+
### Graduation Criteria
237+
238+
Metal3 integration is in tech preview in 4.2 and is targetted for GA in 4.4.
239+
240+
Metal3 integration is currently missing an important piece to information on
241+
the baremetal servers and ther provisioning environment. Without this, true
242+
end to end testing cannot be performed in order to graduate to GA.
243+
244+
### Upgrade / Downgrade Strategy
245+
246+
Metal3 integration is in tech preview n 4.2 and missing key pieces that allows
247+
a user to specify the baremetal server details and its provisioning setup. It
248+
is really not usable in this state without the help of external scripts that
249+
provied the above information in the form of a Config Map.
250+
251+
In 4.4, when all the installer features land, the Metal3 integration would be
252+
fully functional within OpenShift. Due to those reasons, at this point an
253+
upgrade strategy would not be necessary.
254+
255+
### Version Skew Strategy
256+
257+
This enahncement serves as a backgroup for the rest of the enhancements. We will
258+
discuss the version skew strategy for each enhancement individually in their
259+
respective requests.
260+
261+
## Implementation History
262+
263+
Implementation to deploy a Metal3 cluster from the MAO was added via [4].
264+
265+
## Infrastructure Needed
266+
267+
The Baremetal IPI solution depends on the Baremetal Operator and the baremetal
268+
Machine actuator both of which can be found at [5].
269+
OpenShift integration can be found here : [6].
270+
Implementation is complete on the metal3-io and relevant bits have been
271+
added to the OpenShift repo.
272+
273+
[1] - https://github.com/metal3-io/baremetal-operator
274+
[2] - https://github.com/openshift/enhancements/pull/90
275+
[3] - https://github.com/openstack/ironic
276+
[4] - https://github.com/openshift/machine-api-operator/commit/43dd52d5d2dfea1559504a01970df31925501e35
277+
[5] - https://github.com/metal3-io
278+
[6] - https://github.com/openshift-metal3
279+
[7] - https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
280+
[8] - https://metal3.io/
281+
[9] - https://github.com/metal3-io/metal3-docs/blob/master/design/nodes-machines-and-hosts.md

0 commit comments

Comments
 (0)