Puzzle Block Game Application Is Gone, Recover It from etcd
Scenario
It's 2:00 AM. Your phone rings — the on-call alert is firing. Someone ran
kubectl delete namespace prod by mistake and the entire production application
is gone. The Puzzle Block Game — a browser-based puzzle game running in
the prod namespace — has been wiped out along with its Deployment, ConfigMap,
and NodePort Service.
Fortunately, your team follows backup best practices. An etcd snapshot was taken before the incident and synced to an S3 bucket. The on-call engineer has already downloaded it to the node.
The backup is available at: /home/laborant/prod-etcd-backup/snapshot.db
Task
Your job is to bring everything back before the business wakes up.
Perform a full etcd disaster recovery and restore the production application:
- Restore the etcd snapshot from
/home/laborant/prod-etcd-backup/snapshot.dbinto/var/lib/prod-etcd - Stop the API server and etcd safely, update the etcd manifest to point to the restored data, and bring the cluster back up
- Restart the necessary components so the application is reachable on NodePort 32222
Backup location : /home/laborant/prod-etcd-backup/snapshot.db
Restore target : /var/lib/prod-etcd
This cluster runs etcd v3.6. The snapshot restore subcommand was removed
from etcdctl in v3.6 and moved to etcdutl. Use etcdutl snapshot restore
— not etcdctl snapshot restore — or you will get Error: unknown flag: --data-dir.
Hint 1 — Restore the Snapshot
Use etcdutl snapshot restore and point --data-dir to the target path.
sudo etcdutl snapshot restore \
/home/laborant/prod-etcd-backup/snapshot.db \
--data-dir=/var/lib/prod-etcd
# confirm the directory structure
ls -la /var/lib/prod-etcd/member/
A successful restore produces:
/var/lib/prod-etcd/member/wal/— write-ahead log/var/lib/prod-etcd/member/snap/db— the restored database file
Documentation
Hint 2 — Update the etcd Manifest
Edit the etcd static pod manifest:
sudo vi /etc/kubernetes/manifests/etcd.yaml
Find and update two places:
1. The --data-dir flag in the command args:
- --data-dir=/var/lib/prod-etcd
2. The hostPath volume that mounts the data directory into the pod:
volumes:
- hostPath:
path: /var/lib/prod-etcd # ← update this
type: DirectoryOrCreate
name: etcd-data
Then restart the kubelet to apply immediately:
sudo systemctl restart kubelet
Wait for etcd and apiserver to recover before running any kubectl commands:
# watch etcd pod come back
kubectl get pod -n kube-system -l component=etcd -w
# confirm cluster is responding
kubectl get nodes
Documentation
Hint 3 — Verify the Application is Restored
# check prod namespace is back
kubectl get namespace prod
# check deployment and pods
kubectl get all -n prod
# check configmap
kubectl get configmap puzzle-block-config -n prod
# check the service and nodeport
kubectl get svc puzzle-block-game-deployment -n prod
# hit the app from the node
curl http://localhost:32222
If pods are in Pending or ContainerCreating, give them a minute — the scheduler and kubelet need to reconcile after the restore.