Avoid Data Loss When Running Database Containers: Named vs Anonymous Volumes

You are working on a service that uses PostgreSQL for data storage. For development, you spawn up a local copy of the application. To avoid installing Postgres on your host, you run it in a Docker container. Day-to-day, this setup works well:

On day 1, you started the Postgres container, applied the schema, and inserted the initial dataset.
After working on the app for a while, the database schema and the dataset evolved.
But even after a dev machine reboot, a quick docker start on the exited container brings Postgres back with your data intact.

Now you need to upgrade to a newer Postgres version. You stop and remove the old container, and start a new one, almost identical, but using a newer image. However, when you point the app to it, suddenly the data is gone. But why? Postgres containers put the data folder on a volume by default, so shouldn't the database be there even after a container upgrade? 🤔

Using a volume to persist app state across different versions of the same container (e.g., image upgrades).

In this challenge, you'll reproduce the above unfortunate scenario, then try to recover the missing data, and finally improve the setup by storing the Postgres database in a named volume.

⚠️ Cross-major Postgres upgrades require pg_upgrade or manual dump and restore; they're out of scope here. This challenge focuses on avoiding data loss when replacing containers by persisting Postgres data on a Docker volume.

1. Run Postgres and store some data in it

Start a postgres:17.5 container named app-db-1 with the following environment variables:

POSTGRES_USER=admin
POSTGRES_PASSWORD=1234
POSTGRES_DB=acme

Hint: Passing environment variables to containers 💡

If you need a refresher on passing environment variables when starting containers, check out this challenge.

Then create the notes table and insert one or more rows (any text value is fine). You can do it by executing these psql commands inside the container:

psql -U admin -d acme -c \
    "CREATE TABLE IF NOT EXISTS notes( \
        id uuid primary key default gen_random_uuid(), \
        created_at timestamp default current_timestamp, \
        message text \
    );"

psql -U admin -d acme -c \
    "INSERT INTO notes(message) VALUES ('hello'), ('world');"

Hint: Executing commands inside running containers 💡

If you need a refresher on executing commands inside running containers, check out this challenge.

2. Verify the data is stored in a volume

Postgres stores its data on disk at a location specified by the PGDATA environment variable, which usually defaults to /var/lib/postgresql/data.

Inspect the Postgres container and confirm that the data is stored in a volume mounted at the PGDATA location.

Hint: Inspecting container details 💡

If you need a refresher on inspecting container details, check out this challenge.

The Postgres container uses a volume even though your docker run command didn't specify the -v|--volume or --mount flags. This happens because the Postgres container image has a VOLUME instruction, typically set in the Dockerfile as follows:

Dockerfile

FROM base-image

RUN ...install postgres...

VOLUME /var/lib/postgresql/data

The above VOLUME instruction makes Docker create an anonymous volume and mount it at container's /var/lib/postgresql/data each time a new Postgres container is started.

3. Restart the DB container and verify the data is still there

Now restart the app-db-1 container and confirm the rows are still present:

Hint: Stopping and restarting containers 💡

If you need a refresher on stopping and restarting containers, check out this challenge.

Select a primary key (UUID) of one of the rows and paste it to the form below:

psql -U admin -d acme -tAc "SELECT id FROM notes LIMIT 1;"

4. Upgrade the DB container and observe missing data

Upgrade the Postgres container to use a newer version of the image: postgres:17.6.

The new container should have the same name: app-db-1, so you'll need to stop and remove the old container first.

If you try to select a row from the notes table, you'll see that not only the data is gone, but the table itself is also missing - the database is brand new. Even though the PGDATA folder was stored outside of the container's ephemeral filesystem, the new postgres:17.6 container started with a completely new volume mounted at the PGDATA location.

Luckily, the data is still highly likely available on the host, because Docker keeps anonymous volumes around even after the container that created them is gone.

⚠️ The above is true most of the time. However, if the docker run command specified the --rm flag, the anonymous volume will be removed when the container exits, making data on it truly ephemeral.

5. Recover the missing database

To recover the data, you need to create a named volume and mount both the original anonymous volume and the new named volume into a temporary container, so that you can copy the data from the original volume to the new named volume.

🤔 Why not to just mount the original anonymous volume to the new Postgres container by its (long and random) name?

Anonymous volumes tend to pile up over time, and system administrators (or automation scripts) will usually run docker volume prune to clean them up. If at such a moment, the anonymous volume with our database is not attached to any container, it will be removed, and the database will be gone forever.

Create a named volume called app-db-1-data:

Hint: Creating and using a named volume 💡

If you need a refresher on creating and using a named volume, check out this challenge.

Start a temporary container (e.g., alpine or debian), mounting the original anonymous volume and the new named volume at arbitrary locations:

Copy the data from the original anonymous volume to the new named volume using the standard cp command:

6. Launch a new DB container using the named volume and upgrade safely

Run another container named app-db-1, mounting the prepopulated app-db-1-data volume at the PGDATA location:

To verify the data is still present, identify the row with the highest creation time (created_at) in the notes table and paste its timestamp to the form below:

psql -U admin -d acme -tAc \
    "SELECT to_char(MAX(created_at), 'YYYY-MM-DD HH24:MI:SS') FROM notes;"

Now stop and remove the current app-db-1 container and start a new one with the same name but using a newer postgres:17.6 image:

Is the data still present this time?