Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions ansible/doc/README.testbed.k8s.Setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# SONiC Kubernetes Design

This document describes the design to test Kubernetes features in SONiC.

## Background

Each SONiC DUT is a worker node managed by a High Availability Kubernetes master. The High Availability Kubernetes master is composed of three master node machines and one load balancer machine.

By connecting each SONiC DUT to HA Kubernetes master, containers running in SONiC can be managed by the Kubernetes master. SONiC containers managed by the Kubernetes master are termed to be running in "Kubernetes mode" as opposed to the original "Local mode."

In Kubernetes mode, SONiC container properties are based on specifications defined in the associated Kubernetes manifest. A Kubernetes manifest is a file in the Kubernetes master that defines the Kubernetes object and container configurations. In our case, we use Kubernetes Daemonset objects. The Kubernetes Daemonset object ensures that each worker node is running exactly one container of the image specified in the Daemonset manifest file.

For example, in order to run SNMP and Telemetry containers in Kubernetes mode, we must have two manifests that define two Kubernetes Daemonset objects- one for each container running in "Kubernetes mode."

The following is a snippet of the Telemetry Daemonset manifest file that specifies the Kubernetes object type and container image:

```
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telemetry-ds
spec:
template:
metadata:
labels:
name: telemetry
spec:
hostname: sonic
hostNetwork: true
containers:
- name: telemetry
image: sonicanalytics.azurecr.io/sonic-dockers/any/docker-sonic-telemetry:20200531
tty: true
.
.
.
```


## Topology Overview

In order to connect each SONiC DUT to a High Availability Kubernetes master, we need to set up the following topology:
![alt text](https://github.com/isabelmsft/k8s-ha-master-starlab/blob/master/k8s-testbed-linux.png)
- Each high availability master setup requires 4 new Linux KVMs running on a Testbed Server via bridged networking.
- 3 Linux KVMs to serve as 3-node high availability Kubernetes master
- 1 Linux KVM to serve as HAProxy Load Balancer node
- Each KVM has one management interface assigned an IP address reachable from SONiC DUT.
- HAProxy Load Balancer proxies requests to 3 backend Kubernetes master nodes.

Our setup meets Kubernetes Minimum Requirements to setup a High Available cluster. The Minimum Requirements are as follows:
- 2 GB or more of RAM per machine
- 2 CPUs or more per machine
- Full network connectivity between all machines in the cluster (public or private network)
- sudo privileges on all machines
- SSH access from one device to all nodes in the system

## How to Setup High Availability Kubernetes Master

1. Prepare Testbed Server and build and run `docker-sonic-mgmt` container as described [here](https://github.com/Azure/sonic-mgmt/blob/master/ansible/doc/README.testbed.Setup.md)
2. Allocate 4 available IPs reachable from SONiC DUT.
3. Update [`ansible/k8s-ubuntu`](../k8s-ubuntu) to include your 4 newly allocated IP addresses for the HA Kubernetes master and IP address of testbed server.

- We will walk through an example of setting up HA Kubernetes master set 1 on server 19 (STR-ACS-SERV-19). The following snippets are the relevant portions from [`ansible/k8s-ubuntu`](../k8s-ubuntu).

```
k8s_vm_host19:
hosts:
STR-ACS-SERV-19:
ansible_host: 10.251.0.101
```
- Replace `ansible_host` value above with the IP address of the testbed server.

```
k8s_vms1_19:
hosts:
kvm19-1m1:
ansible_host: 10.250.0.2
master: true
master_leader: true
kvm19-1m2:
ansible_host: 10.250.0.3
master: true
master_member: true
kvm19-1m3:
ansible_host: 10.250.0.4
master_member: true
master: true
kvm19-1ha:
ansible_host: 10.250.0.5
haproxy: true
```

- Replace each `ansible_host` value with an IP address allocated in step 2.

- Take note of the group name `k8s_vms1_19`. At the bottom of [`ansible/k8s-ubuntu`](../k8s-ubuntu), make sure that `k8s_server_19` has its `host_var_file` and two `children` properly set:

```
k8s_server_19:
vars:
host_var_file: host_vars/STR-ACS-SERV-19.yml
children:
k8s_vm_host19:
k8s_vms1_19:
```

4. Update the server network configuration for the Kubernetes VM management interfaces in [`ansible/host_vars/STR-ACS-SERV-19.yml`](../host_vars/STR-ACS-SERV-19.yml).
- `mgmt_gw`: ip of the gateway for the VM management interfaces
- `mgmt_prefixlen`: prefixlen for the management interfaces
5. If necessary, set proxy in [`ansible/group_vars/all/env.yml`](../group_vars/all/env.yml)
6. Update the testbed server credentials in [`ansible/group_vars/k8s_vm_host/creds.yml`](../group_vars/k8s_vm_host/creds.yml).
7. If using Azure Storage to source Ubuntu 18.04 KVM image, set `k8s_vmimage_saskey` in [`ansible/vars/azure_storage.yml`](../vars/azure_storage.yml). Alternatively, manually download and store the Ubuntu 18.04 qcow2 image file at the path `/home/azure/ubuntu-vm/images/bionic-server-cloudimg-amd64.qcow2` on your testbed server.
8. From `docker-sonic-mgmt` container, `cd` into `sonic-mgmt/ansible` directory and run `./testbed-cli.sh -m k8s-ubuntu [additional OPTIONS] create-master <k8s-server-name> ~/.password`
- `k8s-server-name` corresponds to the group name used to describe the testbed server in the [`ansible/k8s-ubuntu`](../k8s-ubuntu) inventory file, of the form `k8s_server_{unit}`.
- Please note: `~/.password` is the ansible vault password file name/path. Ansible allows users to use ansible-vault to encrypt password files. By default, this shell script requires a password file. If you are not using ansible-vault, just create an empty file and pass the file name to the command line. The file name and location are created and maintained by the user.
- For HA Kubernetes master set 1 running on server 19 shown above, the proper command would be:
`./testbed-cli.sh -m k8s-ubuntu create-master k8s_server_19 ~/.password`
- OPTIONAL: We offer the functionality to run multiple master sets on one server.
- Each master set is one HA Kubernetes master composed of 4 Linux KVMs.
- Should an additional HA master set be necessary on an occupied server, add the option `-s <msetnumber>`, where `msetnumber` would be 2 if this is the 2nd master set running on `<k8s-server-name>`. Make sure that [`ansible/k8s-ubuntu`](../k8s-ubuntu) is updated accordingly. `msetnumber` is 1 by default.


9. Join Kubernetes-enabled SONiC DUT to cluster (kube_join function to be written).


#### To remove a HA Kubernetes master:
- Run `./testbed-cli.sh -m k8s-ubuntu [additional OPTIONS] destroy-master <k8s-server-name> ~/.password`
- For HA Kubernetes master set 1 running on server 19 shown above, the proper command would be:
`./testbed-cli.sh -m k8s-ubuntu destroy-master k8s_server_19 ~/.password`

## Testing Scope

This setup allows us to test the following:
- Successful deployment of SONiC containers via manifests defined in master
- Expected container behavior after the container is intentionally or unintentionally stopped
- Switching between Local and Kubernetes management mode for a given container
- Addition and removal of SONiC DUT labels
- Changing image version in middle of Daemonset deployment

During each of the following states:
- When all master servers are up and running
- When one master server is down
- When two master servers are down
- When all master servers are down

Down: shut off, disconnected, or in the middle of reboot


In this setup, we do not consider load balancer performance. For Kubernetes feature testing purposes, HAProxy is configured to perform vanilla round-robin load balancing on available master servers.


## How to Create Tests
Each manifest is a yaml file

CLI to make changes to manifest files

pytests to apply manifest changes and check status
6 changes: 6 additions & 0 deletions ansible/group_vars/k8s_ubu/creds.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
ansible_ssh_user: ubuntu
ansible_ssh_pass: admin
ansible_become: True
become_method: sudo
ansible_become_password: admin
4 changes: 4 additions & 0 deletions ansible/group_vars/k8s_vm_host/creds.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
ansible_user: use_own_value
ansible_password: use_own_value
ansible_become_password: use_own_value
4 changes: 4 additions & 0 deletions ansible/group_vars/k8s_vm_host/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
k8s_root_path: /home/azure/ubuntu-vm
k8s_vm_images_url: https://acsbe.blob.core.windows.net/vmimages
k8s_hdd_image_filename: bionic-server-cloudimg-amd64.qcow2
k8s_skip_image_downloading: false
3 changes: 3 additions & 0 deletions ansible/host_vars/STR-ACS-SERV-19.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mgmt_bridge_k8s: br1
mgmt_prefixlen_k8s: use_own_value
mgmt_gw_k8s: use_own_value
3 changes: 3 additions & 0 deletions ansible/host_vars/STR-ACS-SERV-20.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mgmt_bridge_k8s: br1
mgmt_prefixlen_k8s: use_own_value
mgmt_gw_k8s: use_own_value
115 changes: 115 additions & 0 deletions ansible/k8s-ubuntu
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
all:
children:
k8s_vm_host:
children:
k8s_vm_host19:
k8s_vm_host20:
k8s_ubu:
children:
k8s_vms1_19:
k8s_vms2_19:
k8s_vms1_20:
k8s_vms2_20:
k8s_servers:
children:
k8s_server_19:
k8s_server_20:


k8s_vm_host19:
hosts:
STR-ACS-SERV-19:
ansible_host: 10.251.0.101

k8s_vm_host20:
hosts:
STR-ACS-SERV-20:
ansible_host: 10.251.0.102

k8s_vms1_19:
hosts:
kvm19-1m1:
ansible_host: 10.251.0.103
master: true
master_leader: true
kvm19-1m2:
ansible_host: 10.251.0.104
master: true
master_member: true
kvm19-1m3:
ansible_host: 10.251.0.105
master_member: true
master: true
kvm19-1ha:
ansible_host: 10.251.0.106
haproxy: true

k8s_vms2_19:
hosts:
kvm19-2m1:
ansible_host: 10.251.0.107
master: true
master_leader: true
kvm19-2m2:
ansible_host: 10.251.0.108
master: true
master_member: true
kvm19-2m3:
ansible_host: 10.251.0.109
master_member: true
master: true
kvm19-2ha:
ansible_host: 10.251.0.110
haproxy: true

k8s_vms1_20:
hosts:
kvm20-1m1:
ansible_host: 10.251.0.111
master: true
master_leader: true
kvm20-1m2:
ansible_host: 10.251.0.112
master: true
master_member: true
kvm20-1m3:
ansible_host: 10.251.0.113
master_member: true
master: true
kvm20-1ha:
ansible_host: 10.251.0.114
haproxy: true

k8s_vms2_20:
hosts:
kvm20-2m1:
ansible_host: 10.251.0.115
master: true
master_leader: true
kvm20-2m2:
ansible_host: 10.251.0.116
master: true
master_member: true
kvm20-2m3:
ansible_host: 10.251.0.117
master_member: true
master: true
kvm20-2ha:
ansible_host: 10.251.0.118
haproxy: true

# The groups below are helper to limit running playbooks to specific server(s) only
k8s_server_19:
vars:
host_var_file: host_vars/STR-ACS-SERV-19.yml
children:
k8s_vm_host19:
k8s_vms1_19:

k8s_server_20:
vars:
host_var_file: host_vars/STR-ACS-SERV-20.yml
children:
k8s_vm_host20:
k8s_vms1_20:

22 changes: 22 additions & 0 deletions ansible/roles/k8s_haproxy/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
- name: update apt cache
apt: update_cache=yes cache_valid_time=3600
environment: "{{ proxy_env | default({}) }}"

- name: Install haproxy
apt: name=haproxy state=present
environment: "{{ proxy_env | default({}) }}"

- name: Enable init script
replace: dest='/etc/default/haproxy'
regexp='ENABLED=0'
replace='ENABLED=1'

- name: Setup haproxy config file
template:
src: haproxy.j2
dest: /etc/haproxy/haproxy.cfg
backup: yes

- name: Restart HAProxy
become: yes
service: name=haproxy state=restarted
73 changes: 73 additions & 0 deletions ansible/roles/k8s_haproxy/templates/haproxy.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2

chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon

# turn on stats unix socket
stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend k8s-api
bind 0.0.0.0:80
mode tcp
option tcplog
default_backend k8s-api

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend k8s-api
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

{% for host in groups['k8s_vms' + msetnumber + '_' + servernumber] %}
{% if hostvars[host].master is defined %}
server {{ hostvars[host].inventory_hostname }} {{ hostvars[host].ansible_host }}:6443 check
{% endif %}
{% endfor %}
Loading