Kubernetes Pod Scaling: Horizontal Pod Autoscaler
HPA watches a deployment and adjusts its replica count based on observed CPU utilization. The utilization is always a ratio: actual CPU usage divided by requests.cpu. Without requests set, HPA has no baseline to compare against and reports <unknown> - it cannot calculate or act on anything.
Task 1 - HPA Without CPU Requests
Deploy a workload without CPU requests and attach an HPA to it. Observe what happens.
Steps:
- Create Deployment named
web, imageregistry.k8s.io/hpa-example, 1 replica - Expose the deployment as a ClusterIP Service on port 80
- Create an HPA for
web: min 1 replica, max 5, target CPU utilization 50%
kubectl get hpa web
The TARGETS column shows <unknown>/50%. The HPA exists and knows its target - but it cannot calculate current utilization because there is no requests.cpu to divide against.
Hint: Create HPA imperatively
kubectl expose deployment web --port=80
kubectl autoscale deployment web --min=1 --max=5 --cpu=50%
Hint: Add behavior section via dry-run
Generate the HPA manifest, add the behavior section, then apply:
kubectl autoscale deployment web --min=1 --max=5 --cpu=50% --dry-run=client -o yaml > hpa.yaml
Add under spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 30
kubectl apply -f hpa.yaml
Task 2 - Add CPU Requests, HPA Activates
Set CPU requests on the deployment. HPA uses them as the denominator: utilization% = actual_cpu / requests.cpu × 100.
Steps:
- Set CPU requests of
200mon thewebdeployment
kubectl get hpa web
kubectl describe hpa web
Hint: Set resource requests imperatively
kubectl set resources deployment web --requests=cpu=200m
After a few seconds the TARGETS column updates from <unknown> to a real percentage. At idle, the app uses very little CPU - the percentage will be low. The HPA is now active and will act if utilization crosses 50%.
Task 3 - Generate Load and Watch Scale-Up
Send traffic to the app. The registry.k8s.io/hpa-example image runs a PHP script that does CPU work on every request. Under load, utilization climbs above 50% and HPA adds replicas.
Steps:
- Create a load generator pod that floods the service:
kubectl run load-gen \
--image=busybox \
--restart=Never \
-- sh -c "while true; do wget -q -O- http://web; done"
Watch HPA react:
kubectl get hpa web -w
As utilization rises above 100m (50% of 200m requested), HPA scales the deployment up. With sustained load the replica count will reach 5 - the max defined in the HPA. At that point HPA stops adding replicas even if utilization stays high.
kubectl get pods
Task 4 - Stop the Load
Delete the load generator. CPU utilization drops and HPA will scale back down to the minimum replica count.
kubectl delete pod load-gen
kubectl describe hpa web
Scale-down does not happen immediately. HPA uses a stabilization window (5 minutes by default) before reducing replicas - this prevents flapping when load briefly dips. The Events section in describe shows the full scaling history.
kubectl get hpa web -w
This challenge is part of the Kubernetes Pod Scheduling skill path.