Challenge, Hard,  on  KubernetesContainersLinux

When you limit the memory usage of a multi-process Docker container, the OOM killer often terminates only one of the processes if the container runs out of memory. If this process is not the topmost process of the container (PID 1), the container will keep running, which may or may not be desirable.

Pods consist of containers, and originally, this OOM killer behavior was present in Kubernetes, too. When a Pod's container with limited resources and multiple processes inside would run out of memory, the OOM killer would terminate only some of the processes, often leaving the container (hence, the Pod) running.

This made some of us start tracking down "invisible" OOM kills, hunting zombie Pods, and dodging silent Pod killers, so in Kubernetes 1.28, the OOM behavior was changed to terminate the entire container if any of its processes runs out of memory.

However, while improving things for some workloads, in perfect agreement with Hyrum's Law, this change broke some other workloads, and outages followed. So, naturally, a requirement arose to tweak the OOM killer behavior depending on the workload's needs.

The problem is that the change in Kubernetes 1.28 was unconditional, and neither Kubernetes API nor kubelet config can be used to restore the pre-1.28 OOM killer behavior. At least not yet.

But there is always a way!

In this challenge, you will need to deploy a multi-process container to the Kubernetes cluster, and make it tolerate the OOM killer terminating its sub-processes.

The image you will need to use is ghcr.io/iximiuz/labs/resource-hog/herder:v1.0.0. It's a "resource hog" application that gradually consumes all CPU and RAM resources on the host machine, eventually leading to the host’s unavailability if deployed without proper limits set.

Your task is to deploy this resource-greedy container to a Kubernetes cluster and make it run for a while without restarts or crashes:

  • Run the application as a Deployment called herder in the default namespace.
  • The Deployment must have at least as many replicas as there are nodes in the cluster.
  • Every Pod must run on its own node.
  • No Pods should consume more than 250m of the node’s CPU and 256Mi of RAM.
  • The Deployment must run for at least 60 seconds without a single Pod restart.

Good luck!

Hint 1 💡

Don't know how to limit the resources of a Pod? Solving this simpler challenge first may help you - Deploy a Resource-Greedy Pod Without Breaking the Kubernetes Cluster

Hint 2 💡

You can learn about the involved cgroup machinery by solving this challenge first - Kill the Entire Process Group When One Process Runs Out of Memory

Hint 3 💡

Can you think of a way to write to the memory.oom.group file of the herder Pod on a given node? On all nodes? In an automated way?

Hint 4 💡

Use the force a privileged DaemonSet, Luke.

Level up your Server Side game — Join 9,000 engineers who receive insightful learning materials straight to their inbox