Kubespray advanced configuration for a production cluster

In this post, we will see some advanced options provided by kubespray for production-grade cluster creation. In most places on the internet, you will see the basic instructions to create a simple(Hello World) Kubernetes cluster using a bunch of virtual machines. However, kubespray offer much more than that. For example:

1. You can install several add-ons required in the daily operations of the cluster.
2. Kubespray provides capabilities to expose control plan nodes via pre-existing load balancers.

We will explore some of the critical configurations. However, there would always be more in the official documents.

You might want to see this page if you intend to create a simple cluster that works fine for development or learning.

If you are unfamiliar with load-balancing, consider reading the basics here to make more sense of the information provided in this post.

The entire procedure described here is automated and available; check this page for details.

Use case:

The following is the objective we will achieve in this post. These actions will make the cluster resilient to a single point of failure and provide several add-ons ready to use after cluster instantiation.

1. Create a High-available cluster, using multiple(two) Kube-API servers and one worker node(You can have more).
2. Create multiple frontend external load balancers nodes(2).
3. Configure the external load balancers, and expose the Kube-API server via Virtual IP.
4. Using kubespray addons, Install helm, Kubernetes dashboard, metric server, and a few more addons.

The Setup: I have already created the following virtual machines

Name       MAC address          Protocol     Address                             Id   Name                       State
-------------------------------------------------------------------------------  ------------------------------------------
 vnet20     52:54:00:97:cb:dd    ipv4         192.168.122.105/24                  21   prod-kube-controller-1     running
 vnet21     52:54:00:19:d9:6e    ipv4         192.168.122.137/24                  22   prod-kube-controller-2     running
 vnet22     52:54:00:90:40:35    ipv4         192.168.122.253/24                  23   prod-kube-worker-1         running
 vnet23     52:54:00:ca:4c:d1    ipv4         192.168.122.225/24                  24   prod-kube-loadbalancer-1   running
 vnet24     52:54:00:5d:d7:13    ipv4         192.168.122.43/24                   25   prod-kube-loadbalancer-2   running


# I will use 192.168.122.211 as my front end virtal IP of the cluster. You can choose any spare IP in your network.

Traffic path in Green, when LB-1 is VRRP Master

Start of the Procedure:

Step-1: Install git and Pip3

sudo apt update && sudo apt-get -y install python3-pip git -y

Step-2: Clone The Git Repo

git clone https://github.com/kubernetes-sigs/kubespray.git

Step-3: Install the requirements

cd kubespray/
sudo pip3 install -r requirements.txt

Step-4: Copy the blank inventory template

cp -rfp inventory/sample inventory/mycluster

step-5: Make a list/array of hostname and IP for all controller and worker nodes

#format: Hostname,IP Hostname,IP Hostname,IP... for all the controller and worker nodes

declare -a IPS=(controller-vm-hostname-1,IP-1 controller-vm-hostname-2,IP-2........)

#Example
declare -a IPS=(prod-kube-controller-1,192.168.122.105 prod-kube-controller-2,192.168.122.137 prod-kube-worker-1,192.168.122.253)

Step-6: Build the inventory

CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py "${IPS[@]}"

#example:
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py "${IPS[@]}"
DEBUG: Adding group all
DEBUG: Adding group kube_control_plane
DEBUG: Adding group kube_node
DEBUG: Adding group etcd
DEBUG: Adding group k8s_cluster
DEBUG: Adding group calico_rr
DEBUG: adding host rod-kube-controller-1 to group all
DEBUG: adding host prod-kube-controller-2 to group all
DEBUG: adding host prod-kube-worker-1 to group all
DEBUG: adding host rod-kube-controller-1 to group etcd
DEBUG: adding host prod-kube-controller-2 to group etcd
DEBUG: adding host prod-kube-worker-1 to group etcd
DEBUG: adding host rod-kube-controller-1 to group kube_control_plane
DEBUG: adding host prod-kube-controller-2 to group kube_control_plane
DEBUG: adding host rod-kube-controller-1 to group kube_node
DEBUG: adding host prod-kube-controller-2 to group kube_node
DEBUG: adding host prod-kube-worker-1 to group kube_node

Step-7: Understand, Validate and visualize the generated inventory file

In the inventory file, there are three host groups created.
1. kube_control_plane: the nodes under this host group will be configured as controller nodes(API server will run here)
2. kube_node: The nodes under this host group will be configured as worker nodes. (payload nodes)
3. etcd: The nodes under this host group will be running the etcd database for the cluster. The number of nodes under this host group must be odd. To meet, within the etcd quorum.

cat inventory/mycluster/hosts.yaml 
all:
  hosts:
    rod-kube-controller-1:
      ansible_host: 192.168.122.105
      ip: 192.168.122.105
      access_ip: 192.168.122.105
    prod-kube-controller-2:
      ansible_host: 192.168.122.137
      ip: 192.168.122.137
      access_ip: 192.168.122.137
    prod-kube-worker-1:
      ansible_host: 192.168.122.253
      ip: 192.168.122.253
      access_ip: 192.168.122.253
  children:
    kube_control_plane:
      hosts:
        rod-kube-controller-1:
        prod-kube-controller-2:
    kube_node:
      hosts:
        rod-kube-controller-1:
        prod-kube-controller-2:
        prod-kube-worker-1:
    etcd:
      hosts:
        rod-kube-controller-1:
        prod-kube-controller-2:
        prod-kube-worker-1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}

Now the inventory file is reNext, we next; we can work towards more advanced configuration options. Most of these options are mutually exclusive from each other, so if you skip anyone, likely, it won’t trouble you.

File-1: Tweaking the cluster configuration

The first file we would want to tweak is inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml. These are the bare minimum configurations I would like to highlight. Feel free to explore the rest of the file and tweak it if required.

The last variable, “supplementary_addresses_in_ssl_keys,” is critical for load balancers to work; in this step, we tell kubeadm to add VIP while creating the certificates.

## Change this to use another Kubernetes version, e.g. a current beta release
kube_version: v1.24.3

## Container runtime
## docker for docker, crio for cri-o and containerd for containerd.
## Default: containerd
container_manager: containerd

# DNS configuration.
# Kubernetes cluster name, also will be used as DNS domain
cluster_name: prod.local


# Choose network plugin (cilium, calico, kube-ovn, weave or flannel. Use cni for generic cni plugin)
# Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
kube_network_plugin: calico


# K8s image pull policy (imagePullPolicy)
k8s_image_pull_policy: IfNotPresent


# audit log for kubernetes
kubernetes_audit: true 

## Encrypting Secret Data at Rest
kube_encrypt_secret_data: true


# Make a copy of kubeconfig on the host that runs Ansible in {{ inventory_dir }}/artifacts
kubeconfig_localhost: true

# Download kubectl onto the host that runs Ansible in {{ bin_dir }}
kubectl_localhost: true


## Supplementary addresses that can be added in kubernetes ssl keys.
## That can be useful for example to setup a keepalived virtual IP
## IMPORTANT FOR LOADBALANCERS TO WORK
supplementary_addresses_in_ssl_keys: [10.0.0.1, 10.0.0.2, 10.0.0.3, 192.168.122.211 ]

File-2: adding the external load balancer VIP

In this section, we will provide the external load balancer Virtual IP and port in inventory/mycluster/group_vars/all/all.yml

loadbalancer_apiserver:
  address: "192.168.122.211"
  port: "8443"

File-3: Manage the addons

I am limiting this section to the dashboard, registry, helm, metrics server, an ingress controller. However, many more add-ons are present in the inventory/mycluster/group_vars/k8s_cluster/addons.yml fi—for example, metallb, cert-manager, etc.
Consider going through the file and choosing what you need.

# Kubernetes dashboard
# RBAC required. see docs/getting-started.md for access details.
dashboard_enabled: true

# Helm deployment
helm_enabled: true

# Registry deployment
registry_enabled: true
registry_namespace: kube-system
registry_storage_class: ""
registry_disk_size: "10Gi"


# Metrics Server deployment
metrics_server_enabled: true
metrics_server_container_port: 4443
metrics_server_kubelet_insecure_tls: true
metrics_server_metric_resolution: 15s
metrics_server_kubelet_preferred_address_types: "InternalIP,ExternalIP,Hostname"


# Nginx ingress controller deployment
ingress_nginx_enabled: true
ingress_nginx_host_network: true
ingress_publish_status_address: ""
ingress_nginx_nodeselector:
  kubernetes.io/os: "linux"
ingress_nginx_tolerations:
  - key: "node-role.kubernetes.io/master"
    operator: "Equal"
    value: ""
    effect: "NoSchedule"
  - key: "node-role.kubernetes.io/control-plane"
    operator: "Equal"
    value: ""
    effect: "NoSchedule"
ingress_nginx_namespace: "ingress-nginx"
ingress_nginx_insecure_port: 80
ingress_nginx_secure_port: 443
ingress_nginx_class: nginx

Now we have desired configuration done, and all add-ons are set up. We will now set up the load balancers. The kubespray does not cover this process. We will use keepalived and haproxy for the configuration. In this post, we will superficially go through the installation. However, You can read more about load balancers in great detail HERE. If you already have load balancers configured, you may skip all the following steps(LB-step-xx)

LB-step-1: Install keepalived and haproxy by running the following commands on all load balancers

#install haproxy
sudo apt-get update
sudo apt-get upgrade
sudo apt install haproxy -y

#install keepalived, it will go to loaded state not running state.
sudo apt install keepalived -y

LB-Step-2: Create the keepalived config file on all load balancer nodes with minor changes described below

Add the following text in /etc/keepalived/keepalived.conf present on all load balancer nodes. Note that I have added the Virtual IP here. Copy the same content to all the load balancer nodes, except change the state to BACKUP and decrease the priority to 254.
IMPORTANT NOTE: My load balancer nodes are running ubuntu22.04, which has ‘enp1s0’ as a default interface. If you have some other interface, you might need to replace it with the right one.

global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface enp1s0
    virtual_router_id 51
    priority 255
    authentication {
    auth_type PASS
    auth_pass mypass
    }
    virtual_ipaddress {
        192.168.122.211/24
    }
    track_script {
        check_apiserver
    }
    notify_master "/etc/keepalived/status_capture.sh MASTER"
    notify_backup "/etc/keepalived/status_capture.sh BACKUP"
    notify_fault  "/etc/keepalived/status_capture.sh FAULT"
}

LB-Step-3: Create an identical Keepalived check script on all the load balancer node

Add the following content to /etc/keepalived/check_apiserver.sh

# vi /etc/keepalived/check_apiserver.sh

#!/bin/sh

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:8443/ -o /dev/null || errorExit "Error GET https://localhost:8443/"

if ip addr | grep -q 192.168.122.211; then
    curl --silent --max-time 2 --insecure https://192.168.122.211:8443/ -o /dev/null || errorExit "Error GET https://192.168.122.211:8443/"
fi

LB-Step-4: Create an identical keepalived notify script on both load balancer node

Create a notify script at /etc/keepalived/status_capture.sh; this is optional but very helpful in real-world scenarios to trigger actions based on state change.

#!/bin/bash
echo "$(date): The loadbalancer instance running on $(hostname) is currently marker $1" |tee /tmp/load-balancer-status
chmod 755 /tmp/load-balancer-status || true

LB-Step-5: Create an identical haproxy config file on both load balancer node

Create the haproxy config file at /etc/haproxy/haproxy.cfg

defaults
  mode tcp
  timeout connect 10s
  timeout client 30s
  timeout server 30s

frontend apiserver
  bind *:8443
  mode tcp
  option tcplog
  log 127.0.0.1 local0
  default_backend apiserver

backend apiserver
  option httpchk GET /healthz
  http-check expect status 200
  mode tcp
  option ssl-hello-chk
  balance     roundrobin
  server kube-controller-1 192.168.122.105:6443 check
  server kube-controller-2 192.168.122.137:6443 check

LB-Step-6: Make the keepalive check and notify script executable on both load balancer nodes

sudo chmod u+x  /etc/keepalived/check_apiserver.sh
sudo chmod u+x  /etc/keepalived/status_capture.sh

LB-Step-7: Enable non-local binding on both load balancer nodes

echo 'net.ipv4.ip_nonlocal_bind=1'|sudo tee -a /etc/sysctl.conf
sudo sysctl -p

LB-Step-8: Enable keepalived and haproxy on both load balancer nodes

#at this step, you must see that keepalived and haproxy are running on both load balancers
sudo service keepalived start
sudo service keepalived status
sudo service haproxy start
sudo service haproxy status

We have created the load balancer nodes, and the kubespray configuration is ready. Trigger the playbook for cluster creation.

Trigger the cluster creation

#trigger the playbook with with supplying required user name via -u and sudo password via -K

ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml   -u technekey -K

Once the playbook completes, you will notice zero failure, indicating that whatever we have done so far is working.

PLAY RECAP **************************************************************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
prod-kube-controller-1     : ok=813  changed=136  unreachable=0    failed=0    skipped=1268 rescued=0    ignored=9   
prod-kube-controller-2     : ok=711  changed=125  unreachable=0    failed=0    skipped=1102 rescued=0    ignored=4   
prod-kube-worker-1         : ok=560  changed=92   unreachable=0    failed=0    skipped=783  rescued=0    ignored=3

Locate the kubeconfig file

Remember, we set up a variable called kubeconfig_localhost in earlier steps. That caused the kubeconfig to be copied to inventory/mycluster/artifacts/admin.conf

 ls -lrt inventory/mycluster/artifacts/admin.conf 
-rw------- 1 technekey technekey 5683 Jul 21 14:06 inventory/mycluster/artifacts/admin.conf

Checking the addons and cluster state

kubectl get pod -A --kubeconfig inventory/mycluster/artifacts/admin.conf
NAMESPACE       NAME                                             READY   STATUS    RESTARTS   AGE
ingress-nginx   ingress-nginx-controller-59g57                   1/1     Running   0          3m37s
ingress-nginx   ingress-nginx-controller-bnxvf                   1/1     Running   0          3m37s
ingress-nginx   ingress-nginx-controller-djv4p                   1/1     Running   0          3m37s
kube-system     calico-node-5pxff                                1/1     Running   0          3m59s
kube-system     calico-node-gnkhj                                1/1     Running   0          3m59s
kube-system     calico-node-k6xjs                                1/1     Running   0          3m59s
kube-system     coredns-74d6c5659f-d9n87                         1/1     Running   0          3m24s
kube-system     coredns-74d6c5659f-pqd4n                         1/1     Running   0          3m19s
kube-system     dns-autoscaler-59b8867c86-z7dhx                  1/1     Running   0          3m21s
kube-system     kube-apiserver-prod-kube-controller-1            1/1     Running   1          5m35s
kube-system     kube-apiserver-prod-kube-controller-2            1/1     Running   1          5m9s
kube-system     kube-controller-manager-prod-kube-controller-1   1/1     Running   1          5m34s
kube-system     kube-controller-manager-prod-kube-controller-2   1/1     Running   1          5m7s
kube-system     kube-proxy-6h96q                                 1/1     Running   0          5m11s
kube-system     kube-proxy-9hpg2                                 1/1     Running   0          4m14s
kube-system     kube-proxy-q72x6                                 1/1     Running   0          5m28s
kube-system     kube-scheduler-prod-kube-controller-1            1/1     Running   1          5m34s
kube-system     kube-scheduler-prod-kube-controller-2            1/1     Running   1          5m6s
kube-system     kubernetes-dashboard-55bf5db569-m4sqm            1/1     Running   0          3m19s
kube-system     kubernetes-metrics-scraper-84bbbc8b75-28zr9      1/1     Running   0          3m19s
kube-system     metrics-server-68b8967c9f-4fprw                  1/1     Running   0          3m
kube-system     nodelocaldns-9mlxt                               1/1     Running   0          3m21s
kube-system     nodelocaldns-k2qqg                               1/1     Running   0          3m21s
kube-system     nodelocaldns-zrbg6                               1/1     Running   0          3m21s
kube-system     registry-mw4b6                                   1/1     Running   0          3m6s

#registry and dashboard are running
kubectl get svc -n kube-system --kubeconfig inventory/mycluster/artifacts/admin.conf
NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
coredns                     ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   4m6s
dashboard-metrics-scraper   ClusterIP   10.233.58.178   <none>        8000/TCP                 4m1s
kubernetes-dashboard        ClusterIP   10.233.57.157   <none>        443/TCP                  4m1s
metrics-server              ClusterIP   10.233.24.49    <none>        443/TCP                  3m42s
registry                    ClusterIP   10.233.18.18    <none>        5000/TCP                 3m49s

# demo showing matrics server is runnning
kubectl top node --kubeconfig inventory/mycluster/artifacts/admin.conf
NAME                     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
prod-kube-controller-1   184m         10%    1503Mi          110%      
prod-kube-controller-2   176m         9%     1407Mi          103%      
prod-kube-worker-1       114m         6%     1197Mi          73%

#ingress controller is running
kubectl get pod  -n ingress-nginx --kubeconfig inventory/mycluster/artifacts/admin.conf
NAME                             READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-59g57   1/1     Running   0          5m38s
ingress-nginx-controller-bnxvf   1/1     Running   0          5m38s
ingress-nginx-controller-djv4p   1/1     Running   0          5m38s