Welcome to Klustered: Level Four. The previous levels had you elbow-deep in the control plane: admission policies, disabled controllers, and node internals. This one is different. Every break lives entirely at the workload layer. There's nothing to fix in /etc/kubernetes, no policy to delete, no controller to re-enable. It's all kubectl against Deployments, Pods, Services, and Secrets, the day-to-day surface a developer actually owns. Think of it as CKAD batting practice.
The Goal
Open the App tab in the playground UI. Right now it's blank, because nothing is reachable. You're aiming for two milestones:
- The app comes back to life. Once you've repaired the workloads, the App tab shows Rawkode tapping his watch, as if to say "any day now". The plumbing works, but you're still on the old version. Halfway there.
- The
v2upgrade lands. Roll the deployment forward to thev2image and the watch-tapping is replaced with Rawkode's victory dance. That's the win.
You'll know you're done when the dancing starts.
Two Ways to Play
- Hard mode. Just get it working and ship
v2. No hints, no peeking. - Guided mode. Work through each break step by step, gathering evidence before each fix is revealed.
Pick your path below.
Getting Started
You'll be working from cplane-01. The admin kubeconfig is only readable by root, so the very first thing to do is switch users:
sudo su
Quick sanity check (you should see a node listed):
kubectl get nodes
If you get a permission or auth error, double-check you actually became root before running kubectl.
Hard Mode
You're on cplane-01. The klustered app Deployment, its database (PostgreSQL) Deployment, and both Services already exist, but curl http://localhost:30000 gives you nothing. Several independent breaks stand between you and a working app, and they surface in layers: clear the one in front of you and the next reveals itself, all the way down to whether the app can actually talk to its database. The last layer in particular hides behind a pod that looks perfectly healthy, so don't trust a green Running at face value.
Reach for the usual tools (kubectl get, describe, logs, get events) and read the symptoms before you touch the manifests. Everything here is fixable with kubectl edit/patch/set on workloads; you never need to touch the control plane. Then ship v2. Keep the App tab open in another window.
Stop reading here if you want the full challenge.
Guided Mode
There are four planted breaks standing between you and a working app, plus a final step to perform the v1 -> v2 upgrade the series is named after.
Each step gives you the symptom first, and a question to sit with. The hint blocks are staged: where to look, what the evidence means, then how to change it. Before opening a fix block, make yourself name the exact field and value that looked wrong in your own output.
Step 0: Get the Lay of the Land
Before changing anything, survey what exists:
kubectl get pods,svc,deploy
kubectl get events --sort-by=.lastTimestamp
curl -s http://localhost:30000
Read it closely before touching anything:
- Which pods are
Running, and which aren't? For the ones that aren't, what state are they actually in,Pending,Init:0/1,Runningbut notReady? - Does the
klusteredService have endpoints? (kubectl get endpoints klustered) - What does
curl -s http://localhost:30000give you right now?
The state of each pod is the thread you'll pull on in each step below. Some problems announce themselves loudly; at least one is hiding behind something that looks fine.
Step 1: The App Pod Won't Schedule
Look at the klustered-* pod. kubectl get pods shows it Pending, and it never moves off that. A pod sits in Pending until the scheduler can find a node to place it on, so the question is why no node qualifies.
Hint 1: Where to Look
A Pending pod carries the scheduler's verdict with it. Read the pod's events, and then read what the pod is actually asking for:
kubectl describe pod -l app=klustered
kubectl get deploy klustered -o jsonpath='{.spec.template.spec.containers[0].resources}{"\n"}'
kubectl describe node
The Events section will tell you, in the scheduler's own words, why it couldn't place the pod. Let that reason point you at the right field in the spec.
Hint 2: The Concept
A container's resources.requests is a scheduling input, not just a wish. The scheduler will only place a pod on a node that can reserve the full request for it, and a single pod's request can never exceed one node's allocatable capacity. When nothing fits, the pod stays Pending indefinitely, no eviction, no crash, just a FailedScheduling event repeating.
Don't fix this from the word Pending alone. Use the scheduler event to identify which resource does not fit, then compare the request in the Deployment with the node's allocatable capacity. Once you can point to the impossible number, you have the field to change.
Hint 3: The Fix
Bring the CPU request down to something a node can actually reserve:
kubectl edit deployment klustered
# spec.template.spec.containers[0].resources.requests.cpu -> e.g. "100m"
Or patch it in place:
kubectl patch deployment klustered --type json \
-p '[{"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/cpu","value":"100m"}]'
The scheduler places the new pod immediately. It won't reach Running yet though, watch where it gets stuck next.
Step 2: Stuck in Init
The pod scheduled, but kubectl get pods now shows it as Init:0/1, and it sits there. That prefix means an init container is still running (or stuck), and the app container hasn't been allowed to start yet.
Hint 1: Where to Look
Init containers run before the app containers, so their logs are where the story is. Address the init container by name:
kubectl logs -l app=klustered -c wait-for-db
kubectl get deploy klustered -o jsonpath='{.spec.template.spec.initContainers[0].command}{"\n"}'
It's clearly waiting on something. Look at exactly what host and port it's trying to reach, then check that literal name against the Services that actually exist:
kubectl get svc
Hint 2: The Concept
Init containers run to completion, in order, before any app container starts. If one never exits, the pod is parked at Init:0/1 forever. This one waits for the database with until nc -z <host> 5432; do sleep 2; done, a reasonable gate only if <host> resolves and accepts a connection.
Service DNS names are literal. Before opening the fix, write down the hostname from the init command and the actual metadata.name of the Postgres Service. If those strings differ, nc will fail every iteration and the loop will never end.
Hint 3: The Fix
Point the init container at the Service that actually exists:
kubectl edit deployment klustered
# initContainers[0].command: change the hostname to the real Postgres Service name
With the host corrected, nc connects (the database has been Running the whole time), the init container exits 0, and the app container finally starts. The pod moves past Init:0/1, but check whether it actually reaches Ready.
Step 3: Running But No Endpoints
The pod is finally Running, yet curl http://localhost:30000 still returns nothing. Look at the READY column: it says 0/1. The process is up, but Kubernetes is holding it back from receiving traffic, and a Service only sends traffic to pods it considers ready.
Hint 1: Where to Look
Start from the empty pipe and work backwards: no traffic means no endpoints, and no endpoints means no ready pod.
kubectl get endpoints klustered
kubectl describe pod -l app=klustered
kubectl get deploy klustered -o jsonpath='{.spec.template.spec.containers[0]}{"\n"}'
describe will show you why the pod isn't ready. Then compare the related fields: which port is the readiness probe testing, which port does the Service target, and which port does the app manifest declare? Treat containerPort as a clue, not proof that the process is listening there.
Hint 2: The Concept
A pod joins its Service's endpoint list only once it is Ready, and readiness is decided by the readiness probe. Point that probe at a port (or socket) nothing is listening on and it fails forever: the pod never goes Ready, never lands in the endpoints, and the Service quietly routes to nothing. There's no error on the Service itself, just an empty endpoint list and a curl that hangs or refuses.
containerPort does not make a process bind a port; it documents the port Kubernetes objects are expected to use. In this lab, the Service and the app declaration agree, while the readiness probe is testing a different socket. The field keeping the pod out of endpoints is the probe port.
Hint 3: The Fix
Point the readiness probe at the port the app actually serves on:
kubectl edit deployment klustered
# readinessProbe -> set the port to the one the container serves on
Once the new pod passes its probe it becomes 1/1 READY and lands in the Service endpoints:
kubectl get endpoints klustered # now shows the pod IP
curl -s http://localhost:30000
This time curl gets a real answer from the app, but it isn't the quote you were hoping for. One layer left.
Step 4: Reachable, But It Can't Reach Its Database
Traffic finally lands on the app, but instead of a quote you get a database error. Here's the twist: the database pod has been sitting there Running and Ready this whole time, kubectl get pods has nothing bad to say about it. So why does a healthy database reject the app?
Hint 1: Where to Look
The app is talking; let it tell you what it's hearing back. Read its logs, and read the database's:
kubectl logs deploy/klustered
kubectl logs deploy/database
The app's log spells out exactly how Postgres is turning it away. That phrasing is the difference between "I can't find the database" and "the database won't let me in". Decide which category you are seeing before you inspect anything else.
Hint 2: The Concept
The app authenticates to Postgres as a fixed user and password. Postgres sets that superuser password when it first initialises an empty data directory, from the POSTGRES_PASSWORD env var. If the value it was initialised with isn't the one the app sends, every connection is rejected with an authentication error, while the database itself stays perfectly healthy. That's why nothing looked wrong until now.
So inspect how that env var is built, but pause on the output before opening the fix:
kubectl get deploy database -o jsonpath='{range .spec.template.spec.containers[0].env[*]}{.name}{" = "}{.value}{.valueFrom.secretKeyRef.key}{"\n"}{end}'
kubectl get secret postgres-credentials -o jsonpath='{.data.password}' | base64 -d; echo
Answer these from what you see:
- How many
POSTGRES_PASSWORDentries are present? - Which entry comes from the Secret?
- Which entry is a plain literal?
- Which value will the container actually receive?
Kubernetes accepts the duplicate-name list, and duplicate names are easy to miss because the pod still becomes healthy. In this manifest, list order is the trap.
Hint 3: The Fix
Remove the duplicate literal so the Secret-sourced value is the only one left:
kubectl edit deployment database
# delete the second `- name: POSTGRES_PASSWORD` / `value: ...` entry,
# leaving the one that reads from secretKeyRef
Editing the env isn't enough on its own, though. Postgres only reads POSTGRES_PASSWORD when it initialises a fresh data directory, so the running instance is still carrying the old password. This database has no persistent volume, so the cleanest reset is to let it re-initialise from scratch:
kubectl rollout restart deployment database # or: kubectl delete pod -l app=postgresql
The new pod initialises with the Secret's password and re-seeds the quotes table. Give it a few seconds, then:
curl -s http://localhost:30000
Flip to the App tab. You should see Rawkode tapping his watch, with a quote underneath. The app is alive, just running the wrong version. One step to the dance floor.
Step 5: Final Step (Ship v2)
This is the goal of the level. The cluster is healthy, the app is serving a quote, Rawkode is tapping his watch on the App tab. Time to roll the deployment forward and unlock the dance.
Hint 1: The Concept
A Deployment's pod template is the source of truth for what runs. Change the image tag and Kubernetes performs a rolling update: it creates a new ReplicaSet, scales it up, scales the old one down, and keeps the old pods serving until the new ones are Ready. If a rollout ever seems to hang, kubectl rollout status tells you why, and kubectl rollout undo walks it back.
Watch the new pod become Ready before the old one goes away, that's readiness gating the rollout, the same mechanism you fixed in Step 3. It's worth remembering that a bad pod template can stall a rollout: an infeasible resource request like Step 1, or a failing readiness probe like Step 3, leaves the old pods serving while the new ReplicaSet never goes Ready.
Hint 2: The Rollout
kubectl set image deployment/klustered klustered=ghcr.io/rawkode-academy/klustered:v2
kubectl rollout status deployment/klustered
Or kubectl edit deployment klustered and change :v1 to :v2.
Refresh the App tab. Dance party.