It seems to me that the prevailing mental model among users of container technology1 right now is that a container is a tiny little virtual machine. It’s like a machine in the sense that it is provisioned and deprovisioned by explicit decisions, and we talk about “booting” containers. We configure it sort of like we configure a machine; dropping a bunch of files into a volume, setting some environment variables.
In my mind though, a container is something fundamentally different than a VM. Rather than coming from the perspective of “let’s take a VM and make it smaller so we can do cool stuff” - get rid of the kernel, get rid of fixed memory allocations, get rid of emulated memory access and instructions, so we can provision more of them at higher density... I’m coming at it from the opposite direction.
For me, containers are “let’s take a program and made it bigger so we can do cool stuff”. Let’s add in the whole user-space filesystem so it’s got all the same bits every time, so we don’t need to worry about library management, so we can ship it around from computer to computer as a self-contained unit. Awesome!
Of course, there are other ecosystems that figured this out a really long time ago, but having it as a commodity within the most popular server deployment environment has changed things.
Of course, an individual container isn’t a whole program. That’s why we need tools like compose to put containers together into a functioning whole. This makes a container not just a program, but rather, a part of a program. And of course, we all know what the smaller parts of a program are called:
Functions.2
A container of course is not the function itself; the image is the function. A container itself is a function call.
Perceived through this lens, it becomes apparent that Docker is missing some
pretty important information. As a tiny VM, it has all the parts you need: it
has an operating system (in the docker build
) the ability to boot and reboot
(docker run
), instrumentation (docker inspect
) debugging (docker exec
)
etc. As a really big function, it’s strangely anemic.
Specifically: in every programming language worth its salt, we have a type system; some mechanism to identify what parameters a function will take, and what return value it will have.
You might find this weird coming from a Python person, a language where
1 2 | deffoo(a,b,c):returna.x(c.d(b)) |
is considered an acceptable level of type documentation by some3; there’s no
requirement to say what a
, b
, and c
are. However, just because the type
system is implicit, that doesn’t mean it’s not there, even in the text of the
program. Let’s consider, from reading this tiny example, what we can discover:
foo
takes 3 arguments, their names are “a”, “b”, and “c”, and it returns a value.- Somewhere else in the codebase there’s an object with an
x
method, which takes a single argument and also returns a value. - The type of
<unknown>.x
’s argument is the same as the return type of another method somewhere in the codebase,<unknown-2>.d
And so on, and so on. At runtime each of these types takes on a specific,
concrete value, with a type, and if you set a breakpoint and single-step into
it with a debugger, you can see each of those types very easily. Also at
runtime you will get TypeError
exceptions telling you exactly what was wrong
with what you tried to do at a number of points, if you make a mistake.
The analogy to containers isn’t exact; inputs and outputs aren’t obviously in the shape of “arguments” and “return values”, especially since containers tend to be long-running; but nevertheless, a container does have inputs and outputs in the form of env vars, network services, and volumes.
Let’s consider the “foo” of docker, which would be the middle tier of a 3-tier web application (cribbed from a real live example):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | FROM pypy:2RUN apt-get update -ym
RUN apt-get upgrade -ym
RUN apt-get install -ym libssl-dev libffi-dev
RUN pip install virtualenv
RUN mkdir -p /code/env
RUN virtualenv /code/env
RUN pwd
COPY requirements.txt /code/requirements.txt
RUN /code/env/bin/pip install -r /code/requirements.txt
COPY main /code/main
RUN chmod a+x /code/main
VOLUME /clfVOLUME /siteVOLUME /etc/ssl/privateENTRYPOINT ["/code/main"] |
In this file, we can only see three inputs, which are filesystem locations:
/clf
, /site
, and /etc/ssl/private
. How is this different than our Python
example, a language with supposedly “no type information”?
- The image has no metadata explaining what might go in those locations, or
what roles they serve. We have no way to annotate them within the
Dockerfile
. - What services does this container need to connect to in order to get its job done? What hostnames will it connect to, what ports, and what will it expect to find there? We have no way of knowing. It doesn’t say. Any errors about the failed connections will come in a custom format, possibly in logs, from the application itself, and not from docker.
- What services does this container export? It could have used an
EXPOSE
line to give us a hint, but it doesn’t need to; and even if it did, all we’d have is a port number. - What environment variables does its code require? What format do they need to be in?
- We do know that we could look in
requirements.txt
to figure out what libraries are going to be used, but in order to figure out what the service dependencies are, we’re going to need to read all of the code to all of them.
Of course, the one way that this example is unrealistic is that I deleted all
the comments explaining all of those things. Indeed, best practice these days
would be to include comments in your Dockerfile
s, and include example compose
files in your repository, to give users some hint as to how these things all
wire together.
This sort of state isn’t entirely uncommon in programming languages. In fact, in this popular GitHub project you can see that large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.
That is the current state of the container ecosystem. We are at the “late ’60s assembly language” stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.
When you’re building an image, you’re building it for a particular purpose. You already pretty much know what you’re trying to do and what you’re going to need to do it.
- When instantiated, the image is going to consume network services. This is not just a matter of hostnames and TCP ports; those services need to be providing a specific service, over a specific protocol. A generic reverse proxy might be able to handle an arbitrary HTTP endpoint, but an API client needs that specific API. A database admin tool might be OK with just “it’s a database” but an application needs a particular schema.
- It’s going to consume environment variables. But not just any variables; the variables have to be in a particular format.
- It’s going to consume volumes. The volumes need to contain data in a particular format, readable and writable by a particular UID.
- It’s also going to produce all of these things; it may listen on a network service port, provision a database schema, or emit some text that needs to be fed back into an environment variable elsewhere.
Here’s a brief sketch of what I want to see in a Dockerfile
to allow me to
express this sort of thing:
1 2 3 4 5 6 7 8 9 | FROM ...RUN ...LISTENS ON: TCP:80 FOR: org.ietf.http/com.example.my-application-apiCONNECTS TO: pgwritemaster.internal ON: TCP:5432 FOR: org.postgresql.db/com.example.my-app-schemaCONNECTS TO: {{ETCD_HOST}} ON: TCP:{{ETCD_PORT}} FOR: com.coreos.etcd/client-communicationENVIRONMENT NEEDS: ETCD_HOST FORMAT: HOST(com.coreos.etcd/client-communication)ENVIRONMENT NEEDS: ETCD_PORT FORMAT: PORT(com.coreos.etcd/client-communication)VOLUME AT: /logs FORMAT: org.w3.clf REQUIRES: WRITE UID: 4321 |
An image thusly built would refuse to run unless:
- Somewhere else on its network, there was an etcd host/port known to it, its host and port supplied via environment variables.
- Somewhere else on its network, there was a postgres host, listening on port 5432, with a name-resolution entry of “pgwritemaster.internal”.
- An environment variable for the etcd configuration was supplied
- A writable volume for /logs was supplied, owned by user-ID 4321 where it could write common log format logs.
There are probably a lot of flaws in the specific syntax here, but I hope you
can see past that, to the broader point that the software inside a container
has precise expectations of its environment, and that we presently have no way
of communicating those expectations beyond writing a Melvilleian essay in each
Dockerfile
comments, beseeching those who would run the image to give it what
it needs.
Why bother with this sort of work, if all the image can do with it is “refuse to run”?
First and foremost, today, the image effectively won’t run. Oh, it’ll start up, and it’ll consume some resources, but it will break when you try to do anything with it. What this metadata will allow the container runtime to do is to tell you why the image didn’t run, and give you specific, actionable, fast feedback about what you need to do in order to fix the problem. You won’t have to go groveling through logs; which is always especially hard if the back-end service you forgot to properly connect to was the log aggregation service. So this will be an order of magnitude speed improvement on initial deployments and development-environment setups for utility containers. Whole applications typically already come with a compose file, of course, but ideally applications would be built out of functioning self-contained pieces and not assembled one custom container at a time.
Secondly, if there were a strong tooling standard for providing this metadata within the image itself, it might become possible for infrastructure service providers (like, ahem, my employer) to automatically detect and satisfy service dependencies. Right now, if you have a database as a service that lives outside the container system in production, but within the container system in development and test, there’s no way for the orchestration layer to say “good news, everyone! you can find the database you need here: ...”.
My main interest is in allowing open source software developers to give service operators exactly what they need, so the upstream developers can get useful bug reports. There’s a constant tension where volunteer software developers find themselves fielding bug reports where someone deployed their code in a weird way, hacked it up to support some strange environment, built a derived container that had all kinds of extra junk in it to support service discovery or logging or somesuch, and so they don’t want to deal with the support load that that generates. Both people in that exchange are behaving reasonably. The developers gave the ops folks a container that runs their software to the best of their abilities. The service vendors made the minimal modifications they needed to have the container become a part of their service fabric. Yet we arrive at a scenario where nobody feels responsible for the resulting artifact.
If we could just say what it is that the container needs in order to really work, in a way which was precise and machine-readable, then it would be clear where the responsibility lies. Service providers could just run the container unmodified, and they’d know very clearly whether or not they’d satisfied its runtime requirements. Open source developers - or even commercial service vendors! - could say very clearly what they expected to be passed in, and when they got bug reports, they’d know exactly how their service should have behaved.
which mostly but not entirely just means “docker”; it’s weird, of course, because there are pieces that docker depends on and tools that build upon docker which are part of this, but docker remains the nexus. ↩
Yes yes, I know that they’re not really functionsTristan, they’re subroutines, but that’s the word people use for “subroutines” nowadays. ↩
Just to be clear: no it isn’t. Write a damn docstring, or at least some type annotations. ↩