This documentation provides a step-by-step guide how to configure AKS auto-scaling for GitHub self-hosted runners. This is established by configuring the actions-runner-controller.
The below auto-scaling guide consists of the following self-hosted runner specification:
- Optimized for Azure Kubernetes Service
- Compatible with GitHub Server and Cloud
- Organization-level runners
- Ephemeral runners
- Auto-scaling with workflow_job webhooks
- Webhook secret
- Ingress TLS termination
- Auto-provisioning Let's Encrypt SSL certificate
- GitHub App API authentication
- Prerequisites
- Setup AKS Cluster
- Setup Helm client
- Add cert-manager and NGINX ingress repositories
- Install cert-manager
- Apply Let's Encrypt ClusterIssuer config for cert-manager
- Install NGINX ingress controller
- Setup domain A record
- Create a GitHub App and configure GitHub App authenthication
- Prepare Actions Runner Controller configuration
- Install Actions Runner Controller
- Deploy runner manifest
- Verify deployment of all cluster services
- Verify status of runners and pods
- Resources
- An Azure subscription
- GitHub Enterprise Server 3.3 or GitHub Enterprise Cloud
- A top-level domain name (In this guide the example subdomain webhook.tld.com will be used)
# Install Azure CLI - https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
az login
# Install kubectl - https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az_aks_install_cli
az aks install-cli
# Create resource group
az group create -n <your-resource-group> --location <your-location>
# Create AKS cluster
az aks create -n <your-cluster-name> -g <your-resource-group> --node-resource-group <your-node-resource-group-name> --enable-managed-identity 
# Get AKS access credentials
az aks get-credentials -n <your-cluster-name> -g <your-resource-group># Install Helm - https://helm.sh/docs/intro/install/
brew install helm # macOS
choco install kubernetes-helm # Windows
sudo snap install helm --classic # Debian/Ubuntu# Add repositories
helm repo add jetstack https://charts.jetstack.io
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
# Update repositories
helm repo update# Install cert-manager - https://cert-manager.io/docs/installation/helm/
helm install --wait --create-namespace --namespace cert-manager cert-manager jetstack/cert-manager --version v1.6.1 --set installCRDs=truekubectl apply -f clusterissuer.yaml- email:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: [email protected]
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
      - http01:
          ingress:
            class: nginx# Install NGINX Ingress controller
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace actions-runner-system --create-namespace
# Retrieve public load balancer IP from ingress controller
kubectl -n actions-runner-system get svcNavigate to your domain registrar and create a new A record linking the above ingress load balancer IP to your TLD as a subdomain. e.g. webhook.tld.com
- Activate the GitHub App webhook feature and add your earlier created domain A record as a Webhook URL
- Navigate to permissions & events and enable webhook workflow job events
Prepare a webhook secret for use in the values.yaml file github_webhook_secret_token and configure the same webhook secret in the created GitHub App
# Generate random webhook secret
ruby -rsecurerandom -e 'puts SecureRandom.hex(20)'Modify the default values.yaml with your custom values like specified below
# Configure values.yaml
vim values.yaml- githubEnterpriseServerURL:only needed when using GHES
- authSecret:
- githubWebhookServer:- ingress:
- github_webhook_secret_token
 
# The URL of your GitHub Enterprise server, if you're using one.
githubEnterpriseServerURL: https://github.example.com
# Only 1 authentication method can be deployed at a time
# Uncomment the configuration you are applying and fill in the details
authSecret:
  create: true
  name: "controller-manager"
  annotations: {}
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  github_app_id: "3"
  github_app_installation_id: "1"
  github_app_private_key: |-
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA2zl6z+uMcS4D+D9f1ENLJY2w/9lLPajs/wA2gnt74/7bcB1f
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    2x/9kVAWKQ2UJGxqupGqV14vLaNpmA2uILBxc5jKXHu1nNkgUwU=
    -----END RSA PRIVATE KEY-----
  ### GitHub PAT Configuration
  #github_token: ""githubWebhookServer:
  enabled: true
  replicaCount: 1
  syncPeriod: 10m
  secret:
    create: false
    name: "github-webhook-server"
    ### GitHub Webhook Configuration
    github_webhook_secret_token: ""
  imagePullSecrets: []
  nameOverride: ""
  fullnameOverride: ""
  serviceAccount:
    # Specifies whether a service account should be created
    create: true
    # Annotations to add to the service account
    annotations: {}
    # The name of the service account to use.
    # If not set and create is true, a name is generated using the fullname template
    name: ""
  podAnnotations: {}
  podLabels: {}
  podSecurityContext: {}
  # fsGroup: 2000
  securityContext: {}
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  priorityClassName: ""
  service:
    type: ClusterIP
    annotations: {}
    ports:
      - port: 80
        targetPort: http
        protocol: TCP
        name: http
        #nodePort: someFixedPortForUseWithTerraformCdkCfnEtc
  ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: nginx
          cert-manager.io/cluster-issuer: "letsencrypt-prod"
        hosts:
          - host: webhook.tld.com
            paths:
            - path: /
        tls:
          - secretName: letsencrypt-prod
            hosts:
              - webhook.tld.com# Install actions-runner-controller
helm upgrade --install -f values.yaml --wait --namespace actions-runner-system actions-runner-controller actions-runner-controller/actions-runner-controller# View all namespace resources
kubectl --namespace actions-runner-system get all
# Verify certificaterequest status
kubectl get certificaterequest --namespace actions-runner-system
# Verify certificate status
kubectl describe certificate letsencrypt --namespace actions-runner-system
# Verify if SSL certificate is working properly
curl -v --connect-to webhook.tld.com https://webhook.tld.com# Create a new namespace
kubectl create namespace self-hosted-runners
# Edit runnerdeployment yaml
vim runnerdeployment.yaml
# Apply runnerdeployment manifest
kubectl apply -f runnerdeployment.yamlThe below manifest deploys organization-level auto-scaling ephemeral runners, using a minimal keep-alive configuration of 1 runner. Runners are scaled up to 5 active replicas based on incoming workflow_job webhook events. Scaling them back down to 1 runner by idle timeout of 5 minutes
- organization:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: org-runner
  namespace: self-hosted-runners
spec:
  template:
    metadata:
      labels:
        app: org-runner
    spec:
      organization: your-github-organization
      labels:
        - self-hosted
      ephemeral: true
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: org-runner
  namespace: self-hosted-runners
spec:
  scaleTargetRef:
    name: org-runner
  scaleUpTriggers:
    - githubEvent: {}
      amount: 1
      duration: "5m"
  minReplicas: 1
  maxReplicas: 5# List running pods
kubectl get pods -n self-hosted-runners
# List active runners
kubectl get runners -n self-hosted-runnerskubectl get all -A 