Tutorial Β onΒ  Containers,Β Linux

What's Inside Distroless Container Images: Taking a Closer Look

GoogleContainerTools' distroless images are often mentioned as one of the ways to produce small(er), fast(er), and secure(r) containers. But what are these distroless images, really? What's the difference between a container built from a distroless base and one built FROM scratch? Let's dive in!

Why do we need distroless images?

Container images built FROM full-blown Linux distributions like debian, ubuntu, or their derivatives (e.g., node:lts or python:3) often come packed with tools and libraries. While comprehensive, most of these components are unnecessary for typical containerized applications at runtime. The result? Larger image sizes and more potential vulnerabilities to manage for no good technical reason.

The need to create smaller, more secure images by including only the essentials is quite natural, and the most extreme way to achieve this minimalism is to start FROM scratch (i.e. an empty folder) and add only the required files and packages. However, as explored in the previous post, scratch-based containers come with their own set of challenges:

  • No essential directories like /tmp, /home, or /var.
  • Missing user management files (/etc/passwd, /etc/group).
  • Missing CA certificates for secure connections.
  • Missing shared libraries for dynamically linked applications.
  • No timezone information.

...and possibly more.

In other words, while FROM scratch containers offer a clean start, they are often incomplete and impractical for production use without significant manual effort to fill in the gaps.

That's where distroless images come in!

The GoogleContainerTools/distroless project provides prebuilt minimal base images that are as close to scratch as possible but have the necessary system files and folders in place. And they are pretty easy to use - the only thing you need is to understand the hierarchy of the distroless images and pick the one that best fits your application's requirements.

Meet the first distroless image: gcr.io/distroless/static

A good starting point to become familiar with the project's offering is the gcr.io/distroless/static image:

docker pull gcr.io/distroless/static

Inspect it with dive:

dive gcr.io/distroless/static
The contents of the gcr.io/distroless/static image.

The dive output tells us that:

  • The image is Debian-based (so, there is a distro in the distroless image after all, but it's stripped down to the bones).
  • It's just ~2MB big and has a single layer (which is just great).
  • There is a Linux distro-like directory structure inside.
  • The /etc/passwd, /etc/group, and even /etc/nsswitch.conf files are present.
  • Certificates and the timezone db seem to be in place as well.
  • Last but not least, the licenses seem to be preserved (but I'm not a copyright expert).

And that's it! So, it's 99.99% static assets (well, there is a tzconfig executable). No packages, no package manager, not even a trace of libc!

Guess what? If I used the gcr.io/distroless/static as a base image (instead of scratch), it'd be a single-line fix for almost all issues I've been exploring in the previous post πŸ”₯ Even the one with the nonroot user because here is how the /etc/passwd file looks like in the distroless image:

root:x:0:0:root:/root:/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/sbin/nologin
nonroot:x:65532:65532:nonroot:/home/nonroot:/sbin/nologin

Not every program is statically linked

A nice by-product of experimenting with FROM scratch containers is that it allows you to learn what is actually needed for a program to run. For a statically linked executable, it seems to be just a bunch of config files and a proper rootfs directory structure. But what would it take for a dynamically linked one?

I'll try to compile this Go program with CGO enabled and then run it on a full-blown Ubuntu distro to see what dynamically-loaded libraries it needs:

package main

import (
  "fmt"
  "os/user"
)

func main() {
    u, err := user.Current()
    if err != nil {
        panic(err)
    }
    fmt.Println("Hello from", u.Username)
}
Click here for the complete scenario πŸ‘¨β€πŸ”¬
Dockerfile
# syntax=docker/dockerfile:1
# -=== Builder image ===-
FROM golang:1 AS builder

WORKDIR /app

COPY <<EOF main.go
package main

import (
  "fmt"
  "os/user"
)

func main() {
    u, err := user.Current()
    if err != nil {
        panic(err)
    }
    fmt.Println("Hello from", u.Username)
}
EOF

RUN CGO_ENABLED=1 go build main.go

# -=== Target image ===-
FROM ubuntu
COPY --from=builder /app/main /
CMD ["/main"]
docker buildx build -t go-cgo-ubuntu .
docker run --rm go-cgo-ubuntu
Hello from root

The mighty ldd should do the trick:

docker run --rm go-cgo-ubuntu ldd /main
linux-vdso.so.1 (0x0000ffffbe929000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffbe8d0000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbe720000)
/lib/ld-linux-aarch64.so.1 (0x0000ffffbe8f0000)

The output looks like a standard set of shared libraries needed for a dynamically linked Linux executable, including libc. But, of course, none of them can be found in the gcr.io/distroless/static image...

Meet the second distroless image: gcr.io/distroless/base

The gcr.io/distroless/static image sounds like a perfect choice for a base image if your program is a statically linked Go binary. But what if you absolutely have to use CGO and the libraries you depend on can't be statically linked (I'm looking at your, glibc)? Or you write things in Rust, or C, or any other compiled language with less perfect support of static builds than in Go?

Meet the distroless/base image!

docker pull gcr.io/distroless/base
dive gcr.io/distroless/base
The contents of the gcr.io/distroless/base image.

What the dive output tells us:

  • It's 10 times bigger than distroless/static (but still just ~20MB).
  • It has two layers (and the first layer IS distroless/static).
  • The second layer brings tons of shared libraries - most notably libc and openssl.
  • Again, no typical Linux distro fluff.

Here is how to adjust the target Go image to make it work with the new distroless base:

Dockerfile
...build stage remains unchanged...

# -=== Target image ===-
# Replace the 'FROM scratch' with 'FROM gcr.io/distroless/base'
FROM gcr.io/distroless/base

COPY --from=builder /app/main /
CMD ["/main"]

Not every dynamically linked use case is the same

I mentioned Rust in the previous section because it's pretty popular these days. Let's see if it can actually work with the distroless/base image. Here is a simple hello-world program:

fn main() {
  println!("Hello world! (Rust edition)");
}
Click here for the complete scenario πŸ‘¨β€πŸ”¬

Dockerfile:

# syntax=docker/dockerfile:1

# -=== Builder image ===-
FROM rust:1 AS builder

WORKDIR /app

COPY <<EOF Cargo.toml
[package]
name = "hello-world"
version = "0.0.1"
EOF

COPY <<EOF src/main.rs
fn main() {
  println!("Hello world! (Rust edition)");
}
EOF

RUN cargo install --path .

# -=== Target image ===-
FROM gcr.io/distroless/base

COPY --from=builder /usr/local/cargo/bin/hello-world /

CMD ["/hello-world"]

Build it with:

docker buildx build -t distroless-base-rust .

Let's try to run it:

docker run --rm distroless-base-rust
/hello-world: error while loading shared libraries:
libgcc_s.so.1: cannot open shared object file:
No such file or directory

Oh, shoot! Apparently, the gcr.io/distroless/base image doesn't provide all the needed shared libraries. For some reason, Rust has a runtime dependency on libgcc, and it's not present in the container.

Meet the third distroless image: gcr.io/distroless/cc

Apparently, Rust is not so unique in its requirements. This dependency is so common that even a separate base image has been created - distroless/cc:

docker pull gcr.io/distroless/cc
dive gcr.io/distroless/cc
The contents of the gcr.io/distroless/cc image.

The dive output tells us that:

  • It's a three-layered image (based on gcr.io/distroless/base),
  • The new layer is just ~2MB big.
  • The new layer contains libstdc++, a bunch of static assets, and even some Python scripts (but no Python itself)!

The fix for the Rust example:

# -=== Target image ===-
FROM gcr.io/distroless/cc

COPY --from=builder /usr/local/cargo/bin/hello-world /

CMD ["/hello-world"]

Base images for interpreted or VM-based languages

Some languages (like Python) require an interpreter for a script to run. Some others (like JavaScript or Java) require a full-blown runtime (like Node.js or JVM). Since the distroless images considered so far lack package managers, adding Python, OpenJDK, or Node.js to them might be problematic.

Luckily, the distroless project seems to support the most popular runtimes out of the box:

The above base images are built on top of the distroless/cc image, adding extra one-two layers with a corresponding runtime or interpreter.

Here is what the final image hierarchy looks like:

The hierarchy of the distroless images.

Who uses distroless base images

I use! Mainly the distroless/static one. It's my favorite replacement for FROM scratch. On a more serious note though, I'm aware of the following prominent users:

...and 40K+ matches on GitHub code search for FROM gcr.io/distroless/base.

Pros, cons, and alternatives of distroless images

The distroless images are small, fast, and, potentially, more secure. To me, it's the most important pro. Additionally, since their generation is deterministic, theoretically, it should be possible to encode SBOM(-like) information in every build simplifying life for the vulnerability scanners (but to the best of my knowledge, it's not done yet, and the scanners actually struggle to produce meaningful results for the distroless-based images).

At the same time, this particular implementation of distroless seems to be inflexible. Adding new stuff to a distroless base is tricky: changing the base itself requires knowing bazel (and becoming a fork maintainer?), and adding things later on is complicated by the lack of package managers. The choice of base images is limited by the project maintainers, so if you don't fit, you can't benefit from them.

The distroless base images (automatically) track the upstream Debian releases, so it makes CVE resolution in them as good as it is in the said distro (draw your own conclusion here) and in the corresponding language runtime.

So, my opinion is - the idea is brilliant and much needed, but the implementation might not be the best one.

If you're keen on the idea of carefully crafting your images from some minimal base, you may want to take a look at:

  • Chainguard Images - uses Wolfi as a minimalistic & secure base image, and with the help of two tools, apko and melange, allows to build an application-tailored image containing only (mostly?) the necessary bits.
  • Chisel - a somewhat similar idea to the above project, but from Canonical, hence, Ubuntu-based. The project seems very new, but Microsoft has already used it in your production.
  • Multi-stage Docker builds - no kidding! You can still start FROM scratch and carefully copy over only the needed bits from the build stages to your target image.
  • buildah - is a powerful tool to build container images that, in particular, allows you to build containers FROM scratch, potentially leveraging the host system. Here is an example.

Still want to have minimalistic container images but don't have time for the above wizardry? Then I have a "wizard in the box" for you:

  • minT(oolkit) (formerly DockerSlim) - a CLI tool that allows you to automatically convert a "fat" container image into a "slim" one by doing a runtime analysis of the target container and throwing away the unneeded stuff.

You can read more about the struggle of producing decent container images in πŸ‘‰ this article of mine.

Conclusion

So, when to use the distroless images? Here is my rule of thumb:

  • Every time you want to build an image FROM scratch, you should consider the gcr.io/distroless/static image.
  • Find yourself adding shared libraries to a FROM scratch image? Try the gcr.io/distroless/base or gcr.io/distroless/cc images instead.
  • Running in a highly regulated environment and the security and compliance is top priority? Then the gcr.io/distroless/{java,nodejs,python} images might be worth the try.

Resources

Level up your Server Side game β€” Join 9,000 engineers who receive insightful learning materials straight to their inbox