Like me, you’re probably tired of all the tiresome (read: bullshit) alerts about dependencies found in your container that have some big scary vulnerability in them - even though those dependencies have nothing to do with your application. Surely, there’s a way to get rid of them!
Enter: distroless containers.
“Distroless” images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution. … Restricting what’s in your runtime container to precisely what’s necessary for your app is a best practice employed by Google and other tech giants that have used containers in production for many years. It improves the signal to noise of scanners (e.g. CVE) and reduces the burden of establishing provenance to just what you need.
From what I’ve observed, here are they key pros and cons with distroless containers.
Pros:
- Distroless containers create super small images, which are quick to build and deploy, and required a tiny amount of storage.
- Because distroless containers are “distro-less” there are very few resources available in your container outside of your application. This will make it harder for attackers to escalate their privileges, attain persistence, or move laterally in your environment.
- By running distroless containers, you’ll get more relevant (i.e. less bullshit) notifications from your business’ security teams and dependency scanners, because the notifications you do get will be more often about actual runtime dependencies of your application.
Cons:
- You need to be prepared to do some extra legwork to get all your dependencies installed in to your container. A lot of dependencies are bloat, but they’re convenient to have by default, and moving to distroless containers will mean you need to be intentional about what you install.
- If you regularly need to shell into your containers for debugging, you will not have any tools for this by default. Ex. if you run distroless containers in production, you may need to deploy a whole new version of your application to a production environment with debugging tools.
These tradeoffs work for me, so let’s walk through converting a sample Python application to distroless. It’ll only take about ten minutes!
Our Original Application
Our application will be simple, but basically representative of typical containerized Python applications:
- Downloads Python modules that it depends on from PyPI
- Our whole application is plopped in some directory, in this case we’ll start in
/opt
- We’re already being good about using a “slim” image to limit size and waste
Here’s our project layout for those wanting to follow along at home:
demo/
├── Dockerfile
├── requirements.txt
└── src
└── app.py
Our app.py
contains:
import emoji
print(emoji.emojize("Distroless is :kissing_face_with_smiling_eyes::OK_hand:"))
And requirements.txt
contains our sole dependency from PyPI:
emoji
Finally, our Dockerfile, which will be the only file we’re revising throughout the rest of this post, will start with:
FROM python:3.11-slim
# Install dependencies
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Install our application
ADD src /opt
ENTRYPOINT ["python3", "/opt/app.py"]
Using python:3.11-slim
(a Debian Bookworm-based image, provided by Docker) the uncompressed size of our container is 138 MB. Enumerating all the packages with Anchore’s Syft tool, we find 117 dependencies across the base OS, Python, Python dependencies, and the Python package we installed.
This is already a lot better than if we’d used python:3.11
, which takes up 979 MB of space uncompressed and has 442 dependencies, but we can drive bloat down even further!
Moving to Distroless
Reading through a menu of premade distroless images, we can see that distroless images for Python already exist. The container gcr.io/distroless/python3-debian12
looks like it’d be a good choice for us, as it’ll be close to the image we used already.
Let’s see what happens with our new Dockerfile based on a distroless container!
# Switch to the distroless image
FROM gcr.io/distroless/python3-debian12
# Install dependencies
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Install our application
ADD src /opt
ENTRYPOINT ["python3", "/opt/app.py"]
But our build failed!
#5 0.079 runc run failed: unable to start container process: error during container init: exec: "/bin/sh": stat /bin/sh: no such file or directory
#5 ERROR: process "/bin/sh -c pip install --no-cache-dir -r /opt/requirements.txt" did not complete successfully: exit code: 1
Installing Packages from PyPI
When we say that only the application’s runtime dependencies will be available, we mean it! sh
isn’t present, but switching what shell we’re using won’t help as pip
isn’t either! This is because pip
is a build-time dependency and doesn’t need to be present after your application has been built - so we should get the dependencies we need without installing pip in our application container.
To solve this, what we’ll do is use multi-stage builds to download/install our dependencies in a build container (that has typical tools like sh
and pip
), and then copy the installed dependencies into our application container without dragging along any unneeded dependencies! So let’s add a first stage called “builder” to our Dockerfile, install our required packages, and copy the installed packages from our first stage (the builder image) into our second stage (the application image). This sounds difficult, but it’s actually quite simple to get going - we have almost all of the code available already!
The only thing aside from “multi stage build basics” that we’ll need to know is where the dependencies are being installed to. You can run pip show <packagename>
to find out on your local machine (or run the same in your build container after installing) - but to cut to the chase, when running pip
as root on Debian1 it will install packages you download from PyPI to /usr/local/lib/python<version>/site-packages
. As we’re using Python 3.11, that’ll be /usr/local/lib/python3.11/site-packages
, and after our pip install
finishes, we’ll copy those directories from our build container to the application container.
Because pip
and pip
dependencies (wheel
, setuptool
…) themselves are installed in the site-packages directory, we’ll also add a line to remove these after we’re done installing dependencies, so build dependencies aren’t accidentally copied over into our application image!2
# Use a regular distro-containing image as our build container
FROM python:3.11-slim as builder
# Install our dependencies in the build container
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Ensure pip, setuptools, and wheel don't piggyback into our application container
RUN rm -r /usr/local/lib/python3.11/site-packages/pip* \
/usr/local/lib/python3.11/site-packages/setuptools* \
/usr/local/lib/python3.11/site-packages/wheel*
# Switch to distroless for our application container
FROM gcr.io/distroless/python3-debian12
# Copy the packages we need from our build container
COPY --from=builder /usr/local/lib/python3.11/site-packages \
/usr/local/lib/python3.11/site-packages
# Install our application
ADD src /opt
ENTRYPOINT ["python3", "/opt/app.py"]
And now our container builds - the build container fetches the packages we need, and they’re coped over into our application container! Could it really be that easy??
…Nope! When we try to run our application container we see:
% docker run --rm demo
Traceback (most recent call last):
File "/opt/app.py", line 1, in <module>
import emoji
ModuleNotFoundError: No module named 'emoji'
Making Installed Packages Discoverable
There are a lot of possible reasons the emoji
dependency we’ve tried to install couldn’t be found, and when working on the first distroless container I made, I originally assumed that I had broken the install or needed to copy additional data from my build container.3
In this case, the answer is deceptively simple - because we don’t have pip
installed on our distroless container, the location where pip
installs packages to has not been added to the PYTHONPATH
environment variable (…if that variable’s been set at all)! Python uses PYTHONPATH
to discover modules, such as those installed via pip
, so you can import them in your application.
All we need to do make our dependencies discoverable set that environment variable to where our application container can expect to find the modules we downloaded from PyPI:
# Use a regular distro-containing image as our build container
FROM python:3.11-slim as builder
# Install our dependencies in the build container
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Ensure pip, setuptools, and wheel don't piggyback into our application container
RUN rm -r /usr/local/lib/python3.11/site-packages/pip* \
/usr/local/lib/python3.11/site-packages/setuptools* \
/usr/local/lib/python3.11/site-packages/wheel*
# Switch to distroless for our application container
FROM gcr.io/distroless/python3-debian12
# Copy the packages we need from our build container
COPY --from=builder /usr/local/lib/python3.11/site-packages \
/usr/local/lib/python3.11/site-packages
# Set environment variable so Python can find the packages we installed
ENV PYTHONPATH=/usr/local/lib/python3.11/site-packages
# Install our application
ADD src /opt
ENTRYPOINT ["python3", "/opt/app.py"]
And with that, our simple application works!
% docker run --rm demo
Distroless is 😙👌
Bonus Round: Moving to a Non-Root User
The distroless container we’re using also has a nonroot
tag available, which we can take advantage of to switch our application away from running as root. Keen-eyed readers may have noticed that our sample application is running as root by default.
Containers by default are not a security boundary, though there are more solutions than ever to build security boundaries around containers. In many modern multi-tenant container setups, the container is not used as a security boundary - ex. in services like AWS Fargate, Fly.io, etc. containers are converted to microVMs, and the microVM is the security boundary (fun factoid: both use Firecracker under the hood).
Litigating this exact argument is out of scope for this post (and feel free to skip to the results if you prefer), but since we can get non-root user for ~free (three line change) in our distroless journey, I’m adding this section for completeness’ sake. Normally you’d need to create an unprivileged user and switch the working directory yourself, so this is one less thing to do/maintain. I’m all about having less to maintain!
All we need to do is update our application image to be based on gcr.io/distroless/python3-debian12:nonroot
, and for consistency’s sake we’ll move our application from /opt
to /home/nonroot
:
# Use a regular distro-containing image as our build container
FROM python:3.11-slim as builder
# Install our dependencies in the build container
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Ensure pip, setuptools, and wheel don't piggyback into our application container
RUN rm -r /usr/local/lib/python3.11/site-packages/pip* \
/usr/local/lib/python3.11/site-packages/setuptools* \
/usr/local/lib/python3.11/site-packages/wheel*
# Switch to distroless+nonroot for our application container
# USER: nonroot
# WORKDIR: /home/nonroot/
FROM gcr.io/distroless/python3-debian12:nonroot
# Copy the packages we need from our build container
COPY --from=builder /usr/local/lib/python3.11/site-packages \
/usr/local/lib/python3.11/site-packages
# Set environment variable so Python can find the packages we installed
ENV PYTHONPATH=/usr/local/lib/python3.11/site-packages
# Install our application to nonroot
ADD src /home/nonroot
ENTRYPOINT ["python3", "/home/nonroot/app.py"]
And everything still works!
% docker run --rm demo
Distroless is 😙👌
Results
Our new distroless container is 55.3 MB and contains a mere 34 dependencies as counted by Syft. While it’d be possible to slim this down even further, we’ve made significant headway from where we started. By comparison, our new application image is:
- 60% smaller and contains 71% fewer dependencies than
python:3.11-slim
- 94% smaller and contains 92% fewer dependencies than
python:3.11
Most critically: we’ve reduced the amount of “noise” we deal with from security scanners, eliminating unnecessary dependency upgrades to maintain a clean bill of health for our application, and ensured alerts we do get in the future are likely to be relevant. A modest investment of time in changing how we build our application will pay dividends by saving our time for as long as we need to maintain our application.
In addition:
- We’ve made all sorts of post-exploitation activities more difficult for attackers, and unsophisticated attackers will struggle for purchase before eventually giving up. No more
wget -O - ... | bash
one-liners! - The baseline of “what is normal” for our application, dependencies, etc. is now as small as possible - so any runtime security detections based on identifying anomalies will be more accurate. This is likely only applicable for larger companies that have the resources to build and monitor anomaly-based runtime detections.
- If you’ve followed along, you’re probably further on your journey towards immutable infrastructure now than when you started, because none of your build dependencies are present in your application container and most-to-all write activity is in your build container. This wasn’t a major factor for our sample application, but it’s another security and operational excellence goal you could drive towards with distroless containers.
- Finally, our application will require fewer resources (esp. storage), so it is cheaper to deploy and scale. However, this is only a marginal benefit (if a benefit at all) as we were already using small images, and storage/compute is typically cheap compared to engineer-time outside highly scaled applications.
I’m looking forward to getting zero or near-zero bullshit vulnerability alerts after this - reducing the time I need to take for triaging new security findings (or upgrading when those are - occasionally - relevant)!
Samples
I’ve included the SBOMs produced by Syft for each of the following versions of our application image, so you can compare if you’re interested in the details of what’s included/excluded from each:
- SBOM for our sample application based on
python:3.11
(a full size image based on Debian Bookworm) here - SBOM for our sample application based on
python:3.11-slim
(a reduced size image based on Debian Bookworm) here - SBOM for our final sample application based on
gcr.io/distroless/python3-debian12
(a ‘distroless’ image based on Debian Bookworm) here
I also have a live distroless application example, for those who want to see the build live or clone a repository to start: a small program I run called s3-recycler to clean out an S3 bucket periodically.
Appendix
-
Assuming you don’t have any virtual environment (venv) set up. ↩
-
A cleaner way to do this in the future would be to exclude pip/setuptools/wheel (etc.) directories using COPY –exclude when that is added to the Dockerfile stable syntax. ↩
-
This can be true! It just isn’t true in this case.
pip
-installed packages do not necessarily only download/modify files in/usr/local/lib/python<version>/site-packages
. ↩