What's Inside Distroless Container Images: Taking a Closer Look
GoogleContainerTools' distroless images are often mentioned as
one of the ways to produce small(er), fast(er), and secure(r) containers.
But what are these distroless images, really?
What's the difference between a container built from a distroless base and one built FROM scratch
?
Let's dive in!
Why do we need distroless images?
Container images built FROM
full-blown Linux distributions like debian
, ubuntu
,
or their derivatives (e.g., node:lts
or python:3
) often come packed with tools and libraries.
While comprehensive, most of these components are unnecessary for typical containerized applications at runtime.
The result? Larger image sizes and more potential vulnerabilities to manage for no good technical reason.
The need to create smaller, more secure images by including only the essentials is quite natural,
and the most extreme way to achieve this minimalism is to start FROM scratch
(i.e. an empty folder)
and add only the required files and packages.
However, as explored in the previous post,
scratch-based containers come with their own set of challenges:
- No essential directories like
/tmp
,/home
, or/var
. - Missing user management files (
/etc/passwd
,/etc/group
). - Missing CA certificates for secure connections.
- Missing shared libraries for dynamically linked applications.
- No timezone information.
...and possibly more.
In other words, while FROM scratch
containers offer a clean start,
they are often incomplete and impractical for production use without significant manual effort to fill in the gaps.
That's where distroless images come in!
The GoogleContainerTools/distroless project provides prebuilt minimal base images that are as close to scratch as possible but have the necessary system files and folders in place. And they are pretty easy to use - the only thing you need is to understand the hierarchy of the distroless images and pick the one that best fits your application's requirements.
Meet the first distroless image: gcr.io/distroless/static
A good starting point to become familiar with the project's offering is
the gcr.io/distroless/static
image:
docker pull gcr.io/distroless/static
Inspect it with dive
:
dive gcr.io/distroless/static
The dive
output tells us that:
- The image is Debian-based (so, there is a distro in the distroless image after all, but it's stripped down to the bones).
- It's just ~2MB big and has a single layer (which is just great).
- There is a Linux distro-like directory structure inside.
- The
/etc/passwd
,/etc/group
, and even/etc/nsswitch.conf
files are present. - Certificates and the timezone db seem to be in place as well.
- Last but not least, the licenses seem to be preserved (but I'm not a copyright expert).
And that's it!
So, it's 99.99% static assets (well, there is a tzconfig
executable).
No packages, no package manager, not even a trace of libc!
Guess what?
If I used the gcr.io/distroless/static
as a base image (instead of scratch
),
it'd be a single-line fix for almost all issues I've been exploring in the previous post π₯
Even the one with the nonroot
user because here is how the /etc/passwd
file looks like in the distroless image:
root:x:0:0:root:/root:/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/sbin/nologin
nonroot:x:65532:65532:nonroot:/home/nonroot:/sbin/nologin
Not every program is statically linked
A nice by-product of experimenting with FROM scratch
containers is that it allows you to learn what is actually needed for a program to run.
For a statically linked executable,
it seems to be just a bunch of config files and a proper rootfs directory structure.
But what would it take for a dynamically linked one?
I'll try to compile this Go program with CGO enabled and then run it on a full-blown Ubuntu distro to see what dynamically-loaded libraries it needs:
package main
import (
"fmt"
"os/user"
)
func main() {
u, err := user.Current()
if err != nil {
panic(err)
}
fmt.Println("Hello from", u.Username)
}
Click here for the complete scenario π¨βπ¬
# syntax=docker/dockerfile:1
# -=== Builder image ===-
FROM golang:1 AS builder
WORKDIR /app
COPY <<EOF main.go
package main
import (
"fmt"
"os/user"
)
func main() {
u, err := user.Current()
if err != nil {
panic(err)
}
fmt.Println("Hello from", u.Username)
}
EOF
RUN CGO_ENABLED=1 go build main.go
# -=== Target image ===-
FROM ubuntu
COPY --from=builder /app/main /
CMD ["/main"]
docker buildx build -t go-cgo-ubuntu .
docker run --rm go-cgo-ubuntu
Hello from root
The mighty ldd
should do the trick:
docker run --rm go-cgo-ubuntu ldd /main
linux-vdso.so.1 (0x0000ffffbe929000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffbe8d0000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbe720000)
/lib/ld-linux-aarch64.so.1 (0x0000ffffbe8f0000)
The output looks like a standard set of shared libraries needed for a dynamically linked Linux executable, including libc.
But, of course, none of them can be found in the gcr.io/distroless/static
image...
Meet the second distroless image: gcr.io/distroless/base
The gcr.io/distroless/static
image sounds like a perfect choice for a base image if your program is a statically linked Go binary. But what if you absolutely have to use CGO and the libraries you depend on can't be statically linked (I'm looking at your, glibc)? Or you write things in Rust, or C, or any other compiled language with less perfect support of static builds than in Go?
Meet the distroless/base
image!
docker pull gcr.io/distroless/base
dive gcr.io/distroless/base
What the dive
output tells us:
- It's 10 times bigger than
distroless/static
(but still just ~20MB). - It has two layers (and the first layer IS
distroless/static
). - The second layer brings tons of shared libraries - most notably libc and openssl.
- Again, no typical Linux distro fluff.
Here is how to adjust the target Go image to make it work with the new distroless base:
...build stage remains unchanged...
# -=== Target image ===-
# Replace the 'FROM scratch' with 'FROM gcr.io/distroless/base'
FROM gcr.io/distroless/base
COPY --from=builder /app/main /
CMD ["/main"]
Not every dynamically linked use case is the same
I mentioned Rust in the previous section because it's pretty popular these days. Let's see if it can actually work with the distroless/base
image. Here is a simple hello-world program:
fn main() {
println!("Hello world! (Rust edition)");
}
Click here for the complete scenario π¨βπ¬
Dockerfile:
# syntax=docker/dockerfile:1
# -=== Builder image ===-
FROM rust:1 AS builder
WORKDIR /app
COPY <<EOF Cargo.toml
[package]
name = "hello-world"
version = "0.0.1"
EOF
COPY <<EOF src/main.rs
fn main() {
println!("Hello world! (Rust edition)");
}
EOF
RUN cargo install --path .
# -=== Target image ===-
FROM gcr.io/distroless/base
COPY --from=builder /usr/local/cargo/bin/hello-world /
CMD ["/hello-world"]
Build it with:
docker buildx build -t distroless-base-rust .
Let's try to run it:
docker run --rm distroless-base-rust
/hello-world: error while loading shared libraries:
libgcc_s.so.1: cannot open shared object file:
No such file or directory
Oh, shoot! Apparently, the gcr.io/distroless/base
image doesn't provide all the needed shared libraries.
For some reason, Rust has a runtime dependency on libgcc
, and it's not present in the container.
Meet the third distroless image: gcr.io/distroless/cc
Apparently, Rust is not so unique in its requirements. This dependency is so common that even a separate base image has been created - distroless/cc
:
docker pull gcr.io/distroless/cc
dive gcr.io/distroless/cc
The dive
output tells us that:
- It's a three-layered image (based on
gcr.io/distroless/base
), - The new layer is just ~2MB big.
- The new layer contains
libstdc++
, a bunch of static assets, and even some Python scripts (but no Python itself)!
The fix for the Rust example:
# -=== Target image ===-
FROM gcr.io/distroless/cc
COPY --from=builder /usr/local/cargo/bin/hello-world /
CMD ["/hello-world"]
Base images for interpreted or VM-based languages
Some languages (like Python) require an interpreter for a script to run. Some others (like JavaScript or Java) require a full-blown runtime (like Node.js or JVM). Since the distroless images considered so far lack package managers, adding Python, OpenJDK, or Node.js to them might be problematic.
Luckily, the distroless project seems to support the most popular runtimes out of the box:
The above base images are built on top of the distroless/cc
image, adding extra one-two layers with a corresponding runtime or interpreter.
Here is what the final image hierarchy looks like:
Who uses distroless base images
I use! Mainly the distroless/static
one. It's my favorite replacement for FROM scratch
.
On a more serious note though, I'm aware of the following prominent users:
- Kubernetes (motivation)
- Knative
- Kubebuilder
ko(switched tocgr.dev/chainguard/static
)Jib(switched toeclipse-temurin
)
...and 40K+ matches on GitHub code search for FROM gcr.io/distroless/base
.
Pros, cons, and alternatives of distroless images
The distroless images are small, fast, and, potentially, more secure. To me, it's the most important pro. Additionally, since their generation is deterministic, theoretically, it should be possible to encode SBOM(-like) information in every build simplifying life for the vulnerability scanners (but to the best of my knowledge, it's not done yet, and the scanners actually struggle to produce meaningful results for the distroless-based images).
At the same time, this particular implementation of distroless seems to be inflexible. Adding new stuff to a distroless base is tricky: changing the base itself requires knowing bazel (and becoming a fork maintainer?), and adding things later on is complicated by the lack of package managers. The choice of base images is limited by the project maintainers, so if you don't fit, you can't benefit from them.
The distroless base images (automatically) track the upstream Debian releases, so it makes CVE resolution in them as good as it is in the said distro (draw your own conclusion here) and in the corresponding language runtime.
So, my opinion is - the idea is brilliant and much needed, but the implementation might not be the best one.
If you're keen on the idea of carefully crafting your images from some minimal base, you may want to take a look at:
- Chainguard Images - uses Wolfi as a minimalistic & secure base image, and with the help of two tools, apko and melange, allows to build an application-tailored image containing only (mostly?) the necessary bits.
- Chisel - a somewhat similar idea to the above project, but from Canonical, hence, Ubuntu-based. The project seems very new, but Microsoft has already used it in your production.
- Multi-stage Docker builds - no kidding! You can still start
FROM scratch
and carefully copy over only the needed bits from the build stages to your target image. - buildah - is a powerful tool to build container images that, in particular, allows you to build containers
FROM scratch
, potentially leveraging the host system. Here is an example.
Still want to have minimalistic container images but don't have time for the above wizardry? Then I have a "wizard in the box" for you:
- minT(oolkit) (formerly DockerSlim) - a CLI tool that allows you to automatically convert a "fat" container image into a "slim" one by doing a runtime analysis of the target container and throwing away the unneeded stuff.
You can read more about the struggle of producing decent container images in π this article of mine.
Conclusion
So, when to use the distroless images? Here is my rule of thumb:
- Every time you want to build an image
FROM scratch
, you should consider thegcr.io/distroless/static
image. - Find yourself adding shared libraries to a
FROM scratch
image? Try thegcr.io/distroless/base
orgcr.io/distroless/cc
images instead. - Running in a highly regulated environment and the security and compliance is top priority? Then the
gcr.io/distroless/{java,nodejs,python}
images might be worth the try.
Resources
Level up your Server Side game β Join 9,000 engineers who receive insightful learning materials straight to their inbox