This playground provides a complete HA Kubernetes cluster with:
cplane-01
, cplane-02
, cplane-03
worker-01
, worker-02
kubectl
(alias: k
): Kubernetes cluster management and debuggingnerdctl
: Docker-compatible CLI for containerd
krew
: kubectl
plugin manager for extending functionalityOne of the key benefits of an HA cluster is its resilience to node failures. Here are several tests you can perform to verify your cluster's fault tolerance:
# On each control plane node
ip a | grep inet | grep vip
# If you see the VIP address, it means the node is active
# On the active control plane node (e.g., cplane-02)
sudo systemctl stop kubelet
# From another node, verify the cluster still functions
kubectl get nodes
kubectl get pods --all-namespaces
# The failed node should show as "NotReady" (may take a few minutes)
# but the cluster should remain operational
# On the same control plane node, stop the API server container
# List containers in the k8s.io namespace
nerdctl --namespace k8s.io ps
# Stop the API server containers
nerdctl --namespace k8s.io stop $(nerdctl --namespace k8s.io ps -a --format "{{json .}}" | jq -r --arg name "k8s://kube-system/kube-apiserver-$(hostname)" 'select(.Names == $name or .Names == $name+"/kube-apiserver") | .ID')
# Verify from another node that the API is still accessible
kubectl cluster-info
kubectl get componentstatuses
# Find the new active control plane node
ip a | grep inet | grep vip
# Start kubelet service
sudo systemctl start kubelet
# Verify node returns to Ready state
kubectl get nodes
# Deploy a test application
kubectl create deployment podinfo --image=ghcr.io/stefanprodan/podinfo --port=9898
# Figure out which node the pod is running on
kubectl get pods -o wide
# Stop kubelet on that worker node
sudo systemctl stop kubelet
# Watch pods get rescheduled to other nodes (it may take a few minutes)
kubectl get pods -o wide --watch
# Pods should be automatically rescheduled to healthy worker nodes
Happy learning! 🚀