Amendments & improvements

MaxBed4d · MaxBed4d · commit 987cbb820f58 · 2025-10-24T16:21:39.000+01:00
diff --git a/docs/CAPI-mgmt/01-prerequisites.md b/docs/CAPI-mgmt/01-prerequisites.md
@@ -23,28 +23,32 @@ which directly interact with the OpenStack APIs.
 
 This management cluster has two main requirements in order to operate:
 
+<!-- markdownlint-disable MD007 -->
 - Firstly, it must be capable of reaching the public OpenStack APIs.
 - Secondly, the management cluster must be reachable from the control
   plane nodes on which the Magnum containers are running.
-  - This is so that the Magnum containers may reach the management
-    cluster’s IP listed in the `kubeconfig`.
+    - This is so that the Magnum conductor(s) may reach the management
+    cluster’s API server address listed in the `kubeconfig`.
+<!-- markdownlint-enable MD007 -->
 
 ### OpenStack project quotas
 
 A standard high-availability (HA) deployment with a seed node, 3 control plane nodes and
 3 worker nodes, requires the following resources:
 
+<!-- markdownlint-disable MD007 -->
 - 1 x network, 1 x subnet, 1 x router
 - 1 x seed node (4 vCPU, 8 GB)
 - 4 x control plane nodes (4 vCPU, 8 GB)
-  - 3 x during normal operation, 4 x during rolling upgrade
+    - 3 x during normal operation, 4 x during rolling upgrade
 - 4 x worker nodes (8 vCPU, 16 GB)
-  - 3 x during normal operation, 4 x during rolling upgrade
+    - 3 x during normal operation, 4 x during rolling upgrade
 - 3 x load-balancers
 - 500GB Cinder storage
 - 2 x floating IPs
-  - One for accessing the seed node
-  - One for the ingress controller for accessing HTTP services
+    - One for accessing the seed node
+    - One for the ingress controller for accessing HTTP services
+<!-- markdownlint-enable MD007 -->
 
 <!-- prettier-ignore-start -->
 !!! tip
diff --git a/docs/CAPI-mgmt/02-kubernetes-config.md b/docs/CAPI-mgmt/02-kubernetes-config.md
@@ -1,95 +1,35 @@
 # Kubernetes configuration
 
-The concepts in this section apply to any Kubernetes clusters created using Cluster API,
-i.e. the HA cluster in a HA deployment and tenant clusters.
+The concepts in this section apply to the Cluster API management clusters, and not
+the tenant cluster; configuration concerning the tenant cluster are set via cluster labels.
+
+<!-- prettier-ignore-start -->
+!!! note "Tenant cluster labels"
+    For an outline of the tenant cluster configuration and their variables, please visit the example
+    file [here](https://github.com/azimuth-cloud/azimuth-config/blob/2025.10.0/environments/capi-mgmt-example/inventory/group_vars/all/variables.yml).
+<!-- prettier-ignore-end -->
 
 The variables used to configure HA deployments are the same as those for Azimuth and so
 only a surface level of detail will be covered below. For further details visit the
 [Azimuth Kubernetes configuration documentation](../configuration/03-kubernetes-config.md).
 
-## Cluster configuration
-
-The shape and form which the cluster can take is possible to customise through defining
-variables which can control the image, Kubernetes version, node flavors, and cluster scaling.
-
-Below is the list of variables that can be used to customise a CAPI management cluster
-deployment. These variables have defaults which are described above each of the variables.
-
-```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
-#### Configuration for the HA cluster ####
-
-# The ID of the image that will be used for the nodes of the HA cluster.
-# By default, a suitable image is uploaded to the target project.
-capi_cluster_machine_image_id: "<image id>"
-
-# The Kubernetes version that will be used for the HA cluster.
-# This should match the image specified above.
-capi_cluster_kubernetes_version: 1.31.10
-
-# The name of the flavor to use for control plane nodes.
-# At least 2 vCPUs and 8GB RAM is required.
-# By default, the first flavor matching these requirements will be used.
-capi_cluster_control_plane_flavor: "<flavor name>"
-
-# The name of the flavor to use for worker nodes.
-# At least 2 vCPUs and 8GB RAM is required.
-# By default, the first flavor matching these requirements will be used.
-capi_cluster_worker_flavor: "<flavor name>"
-
-# The number of worker nodes to deploy in the cluster.
-# Defaults to 3.
-capi_cluster_worker_count: 3
-```
-
-<!-- prettier-ignore-start -->
-!!! tip
-    Ensure that the Kubernetes version selected corresponds to an available
-    image built and tested for that version. See the [Images](#images) section
-    below for further details.
-
-!!! note
-    The specified flavors must exist in the target OpenStack project and have
-    enough sufficient resources to host the desired number of nodes.
-<!-- prettier-ignore-end -->
-
 ## Images
 
-The clusters deployed by the Cluster API (CAPI) driver make use of the Ubuntu Kubernetes images
-built from the [azimuth-images repository](https://github.com/azimuth-cloud/azimuth-images), alongside
-[capi-helm-charts](https://github.com/azimuth-cloud/capi-helm-charts) in order to provide the Helm charts
-which define these clusters based on the image.
+The clusters deployed by the Cluster API (CAPI) driver will, of course, require
+access to an Ubuntu Kubernetes image, as well as, a cluster template.
 
-These two repositories have CI jobs regularly building and testing the images and Helm charts
-for the latest Kubernetes versions. Therefore, it is important to update the cluster templates
-on each cloud regularly.
-
-<!-- prettier-ignore-start -->
-!!! note
-    These templates are tested as sets against specific CAPI management cluster versions. As such,
-    it is vitally important to update the CAPI management cluster to the latest release before
-    updating to the latest templates.
-
-!!! note
-    Information on community images and how they are built can be found [here](../configuration/09-community-images.md).
-<!-- prettier-ignore-end -->
-
-If required, it is possible to reference an image's IDs using the `community_images_image_ids`
-variable. This, for example, could be used to create [custom Kubernetes templates](./10-kubernetes-clusters.md#custom-cluster-templates).
-
-```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
-kube_1_25_image_id: "{{ community_images_image_ids.kube_1_25 }}"
-kube_1_26_image_id: "{{ community_images_image_ids.kube_1_26 }}"
-kube_1_27_image_id: "{{ community_images_image_ids.kube_1_27 }}"
-```
+The way these user-facing images are managed differs from those of
+[Azimuth](../configuration/03-kubernetes-config.md#images), instead the images
+and Magnum cluster templates are managed by tools found in the openstack-config
+[repository](https://github.com/stackhpc/openstack-config#magnum-cluster-templates).
 
 ## Docker Hub rate limits
 
 <!-- prettier-ignore-start -->
 !!! warning
     Docker Hub [imposes rate limits](https://docs.docker.com/docker-hub/download-rate-limit/)
-    on image downloads, which can cause issues for both the HA cluster and, in particular,
-    tenant clusters. This can be worked around by mirroring the images to a local registry.
-
+    on image downloads, which may cause issues for both the HA cluster and, in particular,
+    tenant clusters.
 
 !!! warning
     For more information please see [here](../configuration/03-kubernetes-config.md#docker-hub-mirror).
diff --git a/docs/CAPI-mgmt/03-monitoring.md b/docs/CAPI-mgmt/03-monitoring.md
@@ -16,8 +16,10 @@ Apart from aforementioned monitoring services, there are also log aggregate serv
 
 ## Accessing web interfaces
 
-The monitoring and alerting web dashboards are exposed via the ingress controller IP address or even as
-subdomains under the `ingress_base_domain`, which if configured are:
+The monitoring and alerting web dashboards are currently exposed via the use of this
+port-forwarding [script](https://github.com/azimuth-cloud/azimuth-config/blob/devel/bin/port-forward).
+Once run, the various services will be available on the CAPI management cluster's floating
+IP under the service subdomains. The following services are exposed:
 
 - `grafana` for the Grafana dashboards
 - `prometheus` for the Prometheus web interface
diff --git a/docs/CAPI-mgmt/04-disaster-recovery.md b/docs/CAPI-mgmt/04-disaster-recovery.md
@@ -9,127 +9,18 @@ store and has a plugin-based system to enable snapshotting of a cluster's persis
     Backup and restore is only available for production-grade HA installations of clusters.
 <!-- prettier-ignore-end -->
 
-The playbooks install Velero on the HA management cluster and the Velero command-line-tool on the seed node. Once configured with the appropriate credentials, the installation process will create a [Schedule](https://velero.io/docs/latest/api-types/schedule/) on the HA cluster, which triggers a daily backup at midnight and cleans up backups older which are more than 1 week old.
-
-The
-[AWS Velero plugin](https://github.com/vmware-tanzu/velero-plugin-for-aws) is used for S3 support
-and the
-[CSI plugin](https://github.com/vmware-tanzu/velero-plugin-for-csi) for volume snapshots.
-The CSI plugin uses Kubernetes generic support for
-[Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/), which is
-implemented for OpenStack by the
-[Cinder CSI plugin](https://github.com/kubernetes/cloud-provider-openstack).
-
-## Configuration
-
-To enable backup and restore functionality, the following variables must be set in your environment:
-
-```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
-# Enable Velero
-velero_enabled: true
-
-# The URL of the S3 storage endpoint
-velero_s3_url: <object-store-endpoint-url>
-
-# The name of the bucket to use for backups
-velero_bucket_name: <bucket-name>
-```
-
-<!-- prettier-ignore-start -->
-!!! warning "Bucket must already exist"
-    The specified bucket must already exist - neither azimuth-ops nor Velero will create it.
-<!-- prettier-ignore-end -->
-
-You will also need to consult the documentation for your S3 provider to obtain S3 credentials for
-the bucket, and add the access key ID and secret to the following variables:
-
-```yaml title="environments/my-site/inventory/group_vars/all/secrets.yml"
-# Access key ID and secret for accessing the S3 bucket
-velero_aws_access_key_id: <s3-access-key-id>
-velero_aws_secret_access_key: <s3-secret-value>
-```
-
-<!-- prettier-ignore-start -->
-!!! tip "Generating credentials for Keystone-integrated Ceph Object Gateway"
-    If the S3 target is Ceph Object Gateway integrated with Keystone, a common configuration with OpenStack clouds, S3 credentials can be generated using the following:
-    ```sh
-    openstack ec2 credentials create
-    ```
-    See [Ceph Object Gateway integrated with Keystone](https://docs.ceph.com/en/latest/radosgw/keystone/).
-
-!!! danger
-    The S3 credentials should be kept secret. If you want to keep them in Git - which is recommended - then they must be encrypted.
-    See [secrets](../repository/secrets.md) for instructions on how to do this.
-<!-- prettier-ignore-end -->
-
-## Velero CLI
-
-The Velero installation process also installs the Velero CLI on the seed node, which can be
-used to inspect the state of the backups:
-
-```sh title="On the seed node, with the kubeconfig for the HA cluster exported"
-# List the configured backup locations
-velero backup-location get
-
-# List the backups and their statuses
-velero backup get
-```
-
-See `velero -h` for other useful commands.
-
-## Restoring from a backup
-
-To restore from a backup, you must first know the name of the target backup. This can be inferred
-from the object names in S3 if the Velero CLI is no longer available.
-
-Once you have the name of the backup to restore, run the following command with your environment
-activated (similar to a provision):
-
-```bash
-ansible-playbook azimuth_cloud.azimuth_ops.restore \
-  -e velero_restore_backup_name=<backup name>
-```
-
-This will provision a new HA cluster, restore the backup onto it and then bring the installation
-up-to-date with your configuration.
-
-## Performing ad-hoc backups
-
-In order to perform ad-hoc backups using the same config parameters as the installed backup schedule,
-run the following Velero CLI command from the seed node:
-
-```sh title="On the seed node, with the kubeconfig for the HA cluster exported"
-velero backup create --from-schedule default
-```
-
-This will begin the backup process in the background. The status of this backup (and others) can be
-viewed with the `velero backup get` command shown above.
-
-<!-- prettier-ignore-start -->
-!!! tip
-    Ad-hoc backups will have the same time-to-live as the configured schedule backups (default = 7 days).
-    To change this, pass the `--ttl <hours>` option to the `velero backup create` command.
-<!-- prettier-ignore-end -->
-
-## Modifying the backup schedule
-
-The following config options are available for modifying the regular backup schedule:
-
-```yaml title="environments/my-site/inventory/group_vars/all/variables.yml"
-# Whether or not to perform scheduled backups
-velero_backup_schedule_enabled: true
-# Name for backup schedule kubernetes resource
-velero_backup_schedule_name: default
-# Schedule to use for backups (defaults to every day at midnight)
-# See https://en.wikipedia.org/wiki/Cron for format options
-velero_backup_schedule: "0 0 * * *"
-# Time-to-live for existing backups (defaults to 1 week)
-# See https://pkg.go.dev/time#ParseDuration for duration format options
-velero_backup_ttl: "168h"
-```
+The playbooks install Velero on the HA management cluster and the Velero command-line-tool on the seed node.
+Once configured with the appropriate credentials, the installation process will create a
+[Schedule](https://velero.io/docs/latest/api-types/schedule/) on the HA cluster, which triggers a daily
+backup at midnight and cleans up backups older which are more than 1 week old.
 
 <!-- prettier-ignore-start -->
 !!! note
-    Setting `velero_backup_schedule_enabled: false` does not prevent the backup schedule from being installed - instead it sets the schedule state to `paused`.
-    This allows for ad-hoc backups to still be run on demand using the configured backup parameters.
+    - The [AWS Velero plugin](https://github.com/vmware-tanzu/velero-plugin-for-aws) is used for S3 support.
+    - The [CSI plugin](https://github.com/vmware-tanzu/velero-plugin-for-csi) for volume snapshots.
+    - The CSI plugin uses Kubernetes generic support for [Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/).
+        - This is implemented for OpenStack by the [Cinder CSI plugin](https://github.com/kubernetes/cloud-provider-openstack).
 <!-- prettier-ignore-end -->
+
+Information on how to configure and use disaster recovery can be found
+[here](../configuration/15-disaster-recovery.md#configuration)
diff --git a/docs/CAPI-mgmt/index.md b/docs/CAPI-mgmt/index.md
@@ -10,8 +10,7 @@ standalone CAPI management cluster, using Magnum as the chosen COE.
 
 <!-- prettier-ignore-start -->
 !!! note
-    Make sure you have some understanding of [stackhpc-kayobe-config](https://github.com/stackhpc/stackhpc-kayobe-config),
-    as well as satisfying the [deployment prerequisites](https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-2025.1/configuration/magnum-capi.html#deployment-prerequisites).
+    This deployment of a standalone Cluster API management cluster is, as the name suggests, able to work without the backing of another cloud infrastructure. However, if you're using [stackhpc-kayobe-config](https://github.com/stackhpc/stackhpc-kayobe-config), or some other OpenStack deployment tool, these documents are complimented by the following [documentation](https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-2025.1/configuration/magnum-capi.html).
 
 !!! note
     It is assumed that you have already followed the steps in setting up a configuration repository, and so have an environment for your site that is ready to be configured.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -52,6 +52,12 @@ nav:
       - debugging/caas.md
   - Developing:
       - developing/index.md
+  - CAPI management cluster:
+      - CAPI-mgmt/index.md
+      - CAPI-mgmt/01-prerequisites.md
+      - CAPI-mgmt/02-kubernetes-config.md
+      - CAPI-mgmt/03-monitoring.md
+      - CAPI-mgmt/04-disaster-recovery.md
 
 theme:
   name: material