Skip to content

Commit 987cbb8

Browse files
committed
Amendments & improvements
1 parent 881ce59 commit 987cbb8

File tree

6 files changed

+48
-206
lines changed

6 files changed

+48
-206
lines changed

docs/CAPI-mgmt/01-prerequisites.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,28 +23,32 @@ which directly interact with the OpenStack APIs.
2323

2424
This management cluster has two main requirements in order to operate:
2525

26+
<!-- markdownlint-disable MD007 -->
2627
- Firstly, it must be capable of reaching the public OpenStack APIs.
2728
- Secondly, the management cluster must be reachable from the control
2829
plane nodes on which the Magnum containers are running.
29-
- This is so that the Magnum containers may reach the management
30-
cluster’s IP listed in the `kubeconfig`.
30+
- This is so that the Magnum conductor(s) may reach the management
31+
cluster’s API server address listed in the `kubeconfig`.
32+
<!-- markdownlint-enable MD007 -->
3133

3234
### OpenStack project quotas
3335

3436
A standard high-availability (HA) deployment with a seed node, 3 control plane nodes and
3537
3 worker nodes, requires the following resources:
3638

39+
<!-- markdownlint-disable MD007 -->
3740
- 1 x network, 1 x subnet, 1 x router
3841
- 1 x seed node (4 vCPU, 8 GB)
3942
- 4 x control plane nodes (4 vCPU, 8 GB)
40-
- 3 x during normal operation, 4 x during rolling upgrade
43+
- 3 x during normal operation, 4 x during rolling upgrade
4144
- 4 x worker nodes (8 vCPU, 16 GB)
42-
- 3 x during normal operation, 4 x during rolling upgrade
45+
- 3 x during normal operation, 4 x during rolling upgrade
4346
- 3 x load-balancers
4447
- 500GB Cinder storage
4548
- 2 x floating IPs
46-
- One for accessing the seed node
47-
- One for the ingress controller for accessing HTTP services
49+
- One for accessing the seed node
50+
- One for the ingress controller for accessing HTTP services
51+
<!-- markdownlint-enable MD007 -->
4852

4953
<!-- prettier-ignore-start -->
5054
!!! tip

docs/CAPI-mgmt/02-kubernetes-config.md

Lines changed: 16 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,95 +1,35 @@
11
# Kubernetes configuration
22

3-
The concepts in this section apply to any Kubernetes clusters created using Cluster API,
4-
i.e. the HA cluster in a HA deployment and tenant clusters.
3+
The concepts in this section apply to the Cluster API management clusters, and not
4+
the tenant cluster; configuration concerning the tenant cluster are set via cluster labels.
5+
6+
<!-- prettier-ignore-start -->
7+
!!! note "Tenant cluster labels"
8+
For an outline of the tenant cluster configuration and their variables, please visit the example
9+
file [here](https://github.com/azimuth-cloud/azimuth-config/blob/2025.10.0/environments/capi-mgmt-example/inventory/group_vars/all/variables.yml).
10+
<!-- prettier-ignore-end -->
511

612
The variables used to configure HA deployments are the same as those for Azimuth and so
713
only a surface level of detail will be covered below. For further details visit the
814
[Azimuth Kubernetes configuration documentation](../configuration/03-kubernetes-config.md).
915

10-
## Cluster configuration
11-
12-
The shape and form which the cluster can take is possible to customise through defining
13-
variables which can control the image, Kubernetes version, node flavors, and cluster scaling.
14-
15-
Below is the list of variables that can be used to customise a CAPI management cluster
16-
deployment. These variables have defaults which are described above each of the variables.
17-
18-
```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
19-
#### Configuration for the HA cluster ####
20-
21-
# The ID of the image that will be used for the nodes of the HA cluster.
22-
# By default, a suitable image is uploaded to the target project.
23-
capi_cluster_machine_image_id: "<image id>"
24-
25-
# The Kubernetes version that will be used for the HA cluster.
26-
# This should match the image specified above.
27-
capi_cluster_kubernetes_version: 1.31.10
28-
29-
# The name of the flavor to use for control plane nodes.
30-
# At least 2 vCPUs and 8GB RAM is required.
31-
# By default, the first flavor matching these requirements will be used.
32-
capi_cluster_control_plane_flavor: "<flavor name>"
33-
34-
# The name of the flavor to use for worker nodes.
35-
# At least 2 vCPUs and 8GB RAM is required.
36-
# By default, the first flavor matching these requirements will be used.
37-
capi_cluster_worker_flavor: "<flavor name>"
38-
39-
# The number of worker nodes to deploy in the cluster.
40-
# Defaults to 3.
41-
capi_cluster_worker_count: 3
42-
```
43-
44-
<!-- prettier-ignore-start -->
45-
!!! tip
46-
Ensure that the Kubernetes version selected corresponds to an available
47-
image built and tested for that version. See the [Images](#images) section
48-
below for further details.
49-
50-
!!! note
51-
The specified flavors must exist in the target OpenStack project and have
52-
enough sufficient resources to host the desired number of nodes.
53-
<!-- prettier-ignore-end -->
54-
5516
## Images
5617

57-
The clusters deployed by the Cluster API (CAPI) driver make use of the Ubuntu Kubernetes images
58-
built from the [azimuth-images repository](https://github.com/azimuth-cloud/azimuth-images), alongside
59-
[capi-helm-charts](https://github.com/azimuth-cloud/capi-helm-charts) in order to provide the Helm charts
60-
which define these clusters based on the image.
18+
The clusters deployed by the Cluster API (CAPI) driver will, of course, require
19+
access to an Ubuntu Kubernetes image, as well as, a cluster template.
6120

62-
These two repositories have CI jobs regularly building and testing the images and Helm charts
63-
for the latest Kubernetes versions. Therefore, it is important to update the cluster templates
64-
on each cloud regularly.
65-
66-
<!-- prettier-ignore-start -->
67-
!!! note
68-
These templates are tested as sets against specific CAPI management cluster versions. As such,
69-
it is vitally important to update the CAPI management cluster to the latest release before
70-
updating to the latest templates.
71-
72-
!!! note
73-
Information on community images and how they are built can be found [here](../configuration/09-community-images.md).
74-
<!-- prettier-ignore-end -->
75-
76-
If required, it is possible to reference an image's IDs using the `community_images_image_ids`
77-
variable. This, for example, could be used to create [custom Kubernetes templates](./10-kubernetes-clusters.md#custom-cluster-templates).
78-
79-
```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
80-
kube_1_25_image_id: "{{ community_images_image_ids.kube_1_25 }}"
81-
kube_1_26_image_id: "{{ community_images_image_ids.kube_1_26 }}"
82-
kube_1_27_image_id: "{{ community_images_image_ids.kube_1_27 }}"
83-
```
21+
The way these user-facing images are managed differs from those of
22+
[Azimuth](../configuration/03-kubernetes-config.md#images), instead the images
23+
and Magnum cluster templates are managed by tools found in the openstack-config
24+
[repository](https://github.com/stackhpc/openstack-config#magnum-cluster-templates).
8425

8526
## Docker Hub rate limits
8627

8728
<!-- prettier-ignore-start -->
8829
!!! warning
8930
Docker Hub [imposes rate limits](https://docs.docker.com/docker-hub/download-rate-limit/)
90-
on image downloads, which can cause issues for both the HA cluster and, in particular,
91-
tenant clusters. This can be worked around by mirroring the images to a local registry.
92-
31+
on image downloads, which may cause issues for both the HA cluster and, in particular,
32+
tenant clusters.
9333

9434
!!! warning
9535
For more information please see [here](../configuration/03-kubernetes-config.md#docker-hub-mirror).

docs/CAPI-mgmt/03-monitoring.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,10 @@ Apart from aforementioned monitoring services, there are also log aggregate serv
1616

1717
## Accessing web interfaces
1818

19-
The monitoring and alerting web dashboards are exposed via the ingress controller IP address or even as
20-
subdomains under the `ingress_base_domain`, which if configured are:
19+
The monitoring and alerting web dashboards are currently exposed via the use of this
20+
port-forwarding [script](https://github.com/azimuth-cloud/azimuth-config/blob/devel/bin/port-forward).
21+
Once run, the various services will be available on the CAPI management cluster's floating
22+
IP under the service subdomains. The following services are exposed:
2123

2224
- `grafana` for the Grafana dashboards
2325
- `prometheus` for the Prometheus web interface

docs/CAPI-mgmt/04-disaster-recovery.md

Lines changed: 11 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -9,127 +9,18 @@ store and has a plugin-based system to enable snapshotting of a cluster's persis
99
Backup and restore is only available for production-grade HA installations of clusters.
1010
<!-- prettier-ignore-end -->
1111

12-
The playbooks install Velero on the HA management cluster and the Velero command-line-tool on the seed node. Once configured with the appropriate credentials, the installation process will create a [Schedule](https://velero.io/docs/latest/api-types/schedule/) on the HA cluster, which triggers a daily backup at midnight and cleans up backups older which are more than 1 week old.
13-
14-
The
15-
[AWS Velero plugin](https://github.com/vmware-tanzu/velero-plugin-for-aws) is used for S3 support
16-
and the
17-
[CSI plugin](https://github.com/vmware-tanzu/velero-plugin-for-csi) for volume snapshots.
18-
The CSI plugin uses Kubernetes generic support for
19-
[Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/), which is
20-
implemented for OpenStack by the
21-
[Cinder CSI plugin](https://github.com/kubernetes/cloud-provider-openstack).
22-
23-
## Configuration
24-
25-
To enable backup and restore functionality, the following variables must be set in your environment:
26-
27-
```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
28-
# Enable Velero
29-
velero_enabled: true
30-
31-
# The URL of the S3 storage endpoint
32-
velero_s3_url: <object-store-endpoint-url>
33-
34-
# The name of the bucket to use for backups
35-
velero_bucket_name: <bucket-name>
36-
```
37-
38-
<!-- prettier-ignore-start -->
39-
!!! warning "Bucket must already exist"
40-
The specified bucket must already exist - neither azimuth-ops nor Velero will create it.
41-
<!-- prettier-ignore-end -->
42-
43-
You will also need to consult the documentation for your S3 provider to obtain S3 credentials for
44-
the bucket, and add the access key ID and secret to the following variables:
45-
46-
```yaml title="environments/my-site/inventory/group_vars/all/secrets.yml"
47-
# Access key ID and secret for accessing the S3 bucket
48-
velero_aws_access_key_id: <s3-access-key-id>
49-
velero_aws_secret_access_key: <s3-secret-value>
50-
```
51-
52-
<!-- prettier-ignore-start -->
53-
!!! tip "Generating credentials for Keystone-integrated Ceph Object Gateway"
54-
If the S3 target is Ceph Object Gateway integrated with Keystone, a common configuration with OpenStack clouds, S3 credentials can be generated using the following:
55-
```sh
56-
openstack ec2 credentials create
57-
```
58-
See [Ceph Object Gateway integrated with Keystone](https://docs.ceph.com/en/latest/radosgw/keystone/).
59-
60-
!!! danger
61-
The S3 credentials should be kept secret. If you want to keep them in Git - which is recommended - then they must be encrypted.
62-
See [secrets](../repository/secrets.md) for instructions on how to do this.
63-
<!-- prettier-ignore-end -->
64-
65-
## Velero CLI
66-
67-
The Velero installation process also installs the Velero CLI on the seed node, which can be
68-
used to inspect the state of the backups:
69-
70-
```sh title="On the seed node, with the kubeconfig for the HA cluster exported"
71-
# List the configured backup locations
72-
velero backup-location get
73-
74-
# List the backups and their statuses
75-
velero backup get
76-
```
77-
78-
See `velero -h` for other useful commands.
79-
80-
## Restoring from a backup
81-
82-
To restore from a backup, you must first know the name of the target backup. This can be inferred
83-
from the object names in S3 if the Velero CLI is no longer available.
84-
85-
Once you have the name of the backup to restore, run the following command with your environment
86-
activated (similar to a provision):
87-
88-
```bash
89-
ansible-playbook azimuth_cloud.azimuth_ops.restore \
90-
-e velero_restore_backup_name=<backup name>
91-
```
92-
93-
This will provision a new HA cluster, restore the backup onto it and then bring the installation
94-
up-to-date with your configuration.
95-
96-
## Performing ad-hoc backups
97-
98-
In order to perform ad-hoc backups using the same config parameters as the installed backup schedule,
99-
run the following Velero CLI command from the seed node:
100-
101-
```sh title="On the seed node, with the kubeconfig for the HA cluster exported"
102-
velero backup create --from-schedule default
103-
```
104-
105-
This will begin the backup process in the background. The status of this backup (and others) can be
106-
viewed with the `velero backup get` command shown above.
107-
108-
<!-- prettier-ignore-start -->
109-
!!! tip
110-
Ad-hoc backups will have the same time-to-live as the configured schedule backups (default = 7 days).
111-
To change this, pass the `--ttl <hours>` option to the `velero backup create` command.
112-
<!-- prettier-ignore-end -->
113-
114-
## Modifying the backup schedule
115-
116-
The following config options are available for modifying the regular backup schedule:
117-
118-
```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
119-
# Whether or not to perform scheduled backups
120-
velero_backup_schedule_enabled: true
121-
# Name for backup schedule kubernetes resource
122-
velero_backup_schedule_name: default
123-
# Schedule to use for backups (defaults to every day at midnight)
124-
# See https://en.wikipedia.org/wiki/Cron for format options
125-
velero_backup_schedule: "0 0 * * *"
126-
# Time-to-live for existing backups (defaults to 1 week)
127-
# See https://pkg.go.dev/time#ParseDuration for duration format options
128-
velero_backup_ttl: "168h"
129-
```
12+
The playbooks install Velero on the HA management cluster and the Velero command-line-tool on the seed node.
13+
Once configured with the appropriate credentials, the installation process will create a
14+
[Schedule](https://velero.io/docs/latest/api-types/schedule/) on the HA cluster, which triggers a daily
15+
backup at midnight and cleans up backups older which are more than 1 week old.
13016

13117
<!-- prettier-ignore-start -->
13218
!!! note
133-
Setting `velero_backup_schedule_enabled: false` does not prevent the backup schedule from being installed - instead it sets the schedule state to `paused`.
134-
This allows for ad-hoc backups to still be run on demand using the configured backup parameters.
19+
- The [AWS Velero plugin](https://github.com/vmware-tanzu/velero-plugin-for-aws) is used for S3 support.
20+
- The [CSI plugin](https://github.com/vmware-tanzu/velero-plugin-for-csi) for volume snapshots.
21+
- The CSI plugin uses Kubernetes generic support for [Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/).
22+
- This is implemented for OpenStack by the [Cinder CSI plugin](https://github.com/kubernetes/cloud-provider-openstack).
13523
<!-- prettier-ignore-end -->
24+
25+
Information on how to configure and use disaster recovery can be found
26+
[here](../configuration/15-disaster-recovery.md#configuration)

docs/CAPI-mgmt/index.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,7 @@ standalone CAPI management cluster, using Magnum as the chosen COE.
1010

1111
<!-- prettier-ignore-start -->
1212
!!! note
13-
Make sure you have some understanding of [stackhpc-kayobe-config](https://github.com/stackhpc/stackhpc-kayobe-config),
14-
as well as satisfying the [deployment prerequisites](https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-2025.1/configuration/magnum-capi.html#deployment-prerequisites).
13+
This deployment of a standalone Cluster API management cluster is, as the name suggests, able to work without the backing of another cloud infrastructure. However, if you're using [stackhpc-kayobe-config](https://github.com/stackhpc/stackhpc-kayobe-config), or some other OpenStack deployment tool, these documents are complimented by the following [documentation](https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-2025.1/configuration/magnum-capi.html).
1514

1615
!!! note
1716
It is assumed that you have already followed the steps in setting up a configuration repository, and so have an environment for your site that is ready to be configured.

mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,12 @@ nav:
5252
- debugging/caas.md
5353
- Developing:
5454
- developing/index.md
55+
- CAPI management cluster:
56+
- CAPI-mgmt/index.md
57+
- CAPI-mgmt/01-prerequisites.md
58+
- CAPI-mgmt/02-kubernetes-config.md
59+
- CAPI-mgmt/03-monitoring.md
60+
- CAPI-mgmt/04-disaster-recovery.md
5561

5662
theme:
5763
name: material

0 commit comments

Comments
 (0)