Challenge, Medium,  on  Kubernetes

Diagnose Why a DaemonSet Skips the Control Plane Node

Scenario

A DaemonSet named fluentd-elasticsearch has been deployed in the kube-system namespace, but it is currently running Pods only on worker nodes. The control plane node cplane-01 has a taint applied and the DaemonSet is not configured to tolerate it — so no Pod is scheduled there.

The DaemonSet manifest is located at /home/laborant/daemonset.yaml on dev-machine.


Task

  1. Identify why the DaemonSet Pods are not running on the control plane node
  2. Update the DaemonSet manifest at /home/laborant/daemonset.yaml on dev-machine so that Pods are scheduled on all nodes, including cplane-01
  3. Apply the updated manifest and verify all Pods are running

Hint 1 — Identify Why Pods Are Not on the Control Plane

Check the taint on the control plane node — this is what prevents DaemonSet Pods from being scheduled there:

kubectl describe node cplane-01 | grep -i taint

Then check whether the DaemonSet has any tolerations configured:

kubectl get daemonset fluentd-elasticsearch -n kube-system \
  -o jsonpath='{.spec.template.spec.tolerations}'

If the output is empty, the DaemonSet has no toleration for the control-plane taint — which is why no Pod is scheduled on cplane-01.

Documentation

Hint 2 — Add the Toleration to the DaemonSet Manifest

Edit the manifest at /home/laborant/daemonset.yaml and add a toleration for the control-plane taint under spec.template.spec:

vi /home/laborant/daemonset.yaml

Add the following toleration:

spec:
  template:
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule

Then apply the updated manifest:

kubectl apply -f /home/laborant/daemonset.yaml

Documentation

Hint 3 — Verify All Pods Are Running

After applying the manifest, check that a Pod is now running on every node:

kubectl get pods -n kube-system -l name=fluentd-elasticsearch -owide

You should see one Pod per node — including cplane-01. All Pods should be in Running state.

Documentation


💡 Test Cases