Today, I want to talk a bit more about Docker. My previous post focused on the basics of Docker (what it is, what problem it solves) and how to run an image that already exists. This time, I’ll run through building your own image from scratch so you can containerize your own application.

This is part 2 in a series on Docker and containers.

The Dockerfile

Since we’ve already covered how containers work, let’s jump right into building an image. Drawing from the previous post, a Docker image is a specification for how a Docker container should run. The image is defined in a Dockerfile, which is a series of special commands that the Docker runtime executes to build the image. Each command represents a new “layer” of the image. Images always start with a FROM command indicating which image should be used as a source, and make modifications to that image which are stored as “layers” on top of the original specification.

As an example, let’s take a look at the etheos Dockerfile.

[hosted on GitHub]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
FROM alpine:3.17.2 AS build

WORKDIR /build
COPY . .

RUN apk add --update --no-cache git curl gnupg bash gcompat &&\
    ./scripts/install-deps.sh &&\
    ./build-linux.sh -c --sqlite ON --mariadb ON --sqlserver ON

FROM alpine:3.17.2

WORKDIR /etheos

COPY scripts/install-deps.sh /tmp
COPY --from=build /build/install/ .

# curl/gnupg :: required for package installs in install-deps
# bash :: required to run install-deps
# gcompat :: compatibility layer for glibc in Alpine Linux. See: https://wiki.alpinelinux.org/wiki/Running_glibc_programs
RUN rm -rf config_local test &&\
    apk add --update --no-cache curl gnupg bash gcompat &&\
    /tmp/install-deps.sh --skip-cmake --skip-json &&\
    rm /tmp/install-deps.sh &&\
    addgroup -g 8078 -S etheos && adduser -S -D -u 8078 -h /etheos etheos etheos &&\
    chown -R etheos:etheos /etheos

USER etheos:etheos

VOLUME /etheos/config_local /etheos/data

EXPOSE 8078

CMD [ "./etheos" ]

There’s a lot going on here, but hopefully the domain language of the Dockerfile is easy enough to understand with a quick read.

FROM

1
FROM alpine:3.17.2 AS build

As mentioned, the FROM command allows a Dockerfile to specify the source image that should be used. In the case of etheos, this is alpine:3.17.2. Alpine is a Linux distro focused on security and small runtime size. The :3.17.2 indicates which version we should use. You can ignore the AS command for now - we’ll cover that further down.

You may see the format imageName:version in many places. The two pieces of this syntax are referred to as a “repository” (left side of the colon) and “tag” (right side of the colon). When we reference an image, these two pieces of information are always required.

If you are wondering where the tag is in a command like docker run -it darthchungis/etheos, docker defaults to using the latest tag if no tag is specified. So the full command is actually docker run -it darthchungis/etheos:latest. It is a good idea to always use a specific image version to avoid accidentally pulling in regressions as newer images are published.

Registries

Speaking of image names, there are semantics that allow you to pull images from other sources as well. These are referred to as registries. The default registry is hub.docker.com. If you specify an image repository and tag without a registry URL, hub.docker.com is used as a default. If you want to specify a separate registry (e.g. Azure Container Registry, or a self-hosted option), you can simply prefix the URL of the registry to the image.

I host a private registry for CI testing of etheos. This allows publishing an untested version of the etheos image without making it publicly available. You can see in the pipeline yaml where the image gets published, and later, consumed. Note that this image is not available for public consumption because access to the registry is managed by Azure rules.

[hosted on GitHub]
41
42
43
44
    - name: Push CI test docker image
      run: |
        az acr login -n etheos
        docker push etheos.azurecr.io/etheos:ci-test
60
61
62
63
    - name: Pull CI test docker image
      run: |
        az acr login -n etheos
        docker pull etheos.azurecr.io/etheos:ci-test

WORKDIR

3
WORKDIR /build

The next command we see is WORKDIR. This command sets the current working directory for the image. Any subsequent commands in the Dockerfile such as COPY, RUN, CMD, and ENTRYPOINT (more on those later) will be executed with this directory as the current working directory.

COPY

4
COPY . .

The next command in the Dockerfile is COPY. This command allows us to copy files into the image’s file system for that particular layer. The syntax is COPY [host_path] [container_path]. Any files copied in this way will be available when the built image is later run as a container, at the specified target container_path. As previously mentioned, the WORKDIR sets the working directory for the remaining commands in the Dockerfile, so the line COPY . . copies everything from the build context (the repository root) recursively to the WORKDIR, or /build.

When copying files from the host, COPY is dependent on the “build context” that is set by the build command. Usually, this context is the same directory as your Dockerfile. A common pattern for single-image software projects (including etheos) is to keep a top-level Dockerfile at the root of the repository so that it is easy to copy any files generated by the build system into the image.

RUN

6
7
8
RUN apk add --update --no-cache git curl gnupg bash gcompat &&\
    ./scripts/install-deps.sh &&\
    ./build-linux.sh -c --sqlite ON --mariadb ON --sqlserver ON

The easiest and most obvious way to make meaningful changes to an existing image on a new layer is to run a command to do something. This is where the RUN command comes in - it allows us to run a shell command using the base image’s default shell. Note that the shell used by RUN may be changed via the SHELL command.

You may be wondering why, in this Dockerfile, I’ve chosen to use the && syntax for chaining commands together in a single RUN instruction rather than using individual commands. Surely readability and maintainability is much better when commands are on their own line! The answer is simple: each new layer of a docker image adds overhead in size and time complexity for building and pulling the image. Images are much more efficient when they have fewer layers. As such, it is a best practice to collapse as many RUN commands down as possible. One exception is when debugging container build errors, it is easier to validate changes when the build cache can use previous layers, since it takes less time to build up the existing image. In this case, separating the RUN instructions is the preferred approach.

Generally, RUN commands install dependencies and execute container setup. In this first RUN instruction, the etheos image installs build-time dependencies and executes the linux build script.

FROM - again?

10
FROM alpine:3.17.2

If you recall from the first time we covered FROM, I mentioned we’d get to the AS part of the command later. Well, now is the time!

etheos uses a process called “multi-stage” builds. Multi-stage builds allow you to select multiple images as sources, and use a prior “stage” (each stage being marked by a new FROM command) as a source for files. This means we can use one stage of the Dockerfile as a build stage, selectively copying outputs to the final image. When we write FROM {image} AS {name}, we name a particular stage; this name can be used as an additional argument to COPY commands to specify the source from which files should be copied.

You can see this usage in a future copy command, which takes the output of the build from the build stage of the Dockerfile.

15
COPY --from=build /build/install/ .

USER

27
USER etheos:etheos

Hopefully by this point the syntax of the Dockerfile is starting to make sense, so we’ll jump to the commands that haven’t been covered yet. Next up on the list is USER, which specifies which Linux user (and optionally, group) the eventual entrypoint and image command are executed as. By default, docker images will run as root within the container, which has implications for any files mounted into the container and is generally less secure. It is highly recommended to specify a non-root user when building a Docker image.

The user can be a ‘uid’ rather than a friendly name. In the case of etheos, we run the container as the etheos user, which is created in the RUN command of the Dockerfile immediately prior:

24
25
    addgroup -g 8078 -S etheos && adduser -S -D -u 8078 -h /etheos etheos etheos &&\
    chown -R etheos:etheos /etheos

VOLUME

29
VOLUME /etheos/config_local /etheos/data

The VOLUME command marks a path (or paths) as volumes in the image. This allows the files stored on the volume to be copied out of a running container and persisted between runs of the container, without needing to explicitly mount a volume via docker run -v.

EXPOSE

31
EXPOSE 8078

EXPOSE marks a network port for exposure. This allows the Docker runtime to automatically know which ports the container will be listening on. Docker provides a -P flag (note the casing difference) which will take all ports marked via the EXPOSE command and automatically bind them to random ports on the host system. This can be convenient if there are many ports in an image, although you will need to examine the port mapping to figure out which ports are available and what container ports they each map to.

CMD (and ENTRYPOINT)

33
CMD [ "./etheos" ]

The final command in our Dockerfile is CMD. This command sets which binary is executed by the container when it is run or started. In the case of etheos, we simply launch the server with no arguments, which is the standard way of running it.

The complement to CMD is ENTRYPOINT, which is what actually goes to execute the CMD command in the container. The default entrypoint for docker is /bin/sh -c, but no default command is set. Setting the command specifies what argument will be passed to the entrypoint. The full invocation of etheos then looks like: /bin/sh -c ./etheos (remember that the WORKDIR previously set the current working directory).

Building and Pushing

Now that we understand the format of the Dockerfile, let’s move on to what we can actually do with it. The whole point is to assemble an image for redistribution, so let’s talk a bit about the ‘build’ and ‘push’ commands.

The docker build command

The build command is relatively straightforward:

docker build -t {image_tag} {context_folder}

Where -t specifies the image repository and tag, and context_folder specifies the directory to send to the build command as the build context (for things like COPY).

In the case of etheos, you can see this invoked in the build pipeline:

[hosted on GitHub]
68
69
        export FULL_VERSION="$BASE_VERSION.$VERSION_NUMBER_WITH_OFFSET"
        docker build -t darthchungis/etheos:$FULL_VERSION .

Here, the image is named darthchungis/etheos and given the $FULL_VERSION tag ($FULL_VERSION being some pipeline env var magic to expand to 0.7.1.{rev}). This way, each new build generates an image with a different tag so that prior versions do not get clobbered.

On your local system, you can see the results of an image build by running docker images. The tag you specified will be shown in the list, with a timestamp indicating it was recently built.

If a tag is not specified with a repository name, as in the above example with local_demo, the latest tag is used as a default.

The docker push command

After an image is built, the next thing to do is share it with the world! This docker push is the mechanism for sharing an image so it may be easily consumed by others via docker pull. The push command is relatively straightforward as well:

docker push {image_tag}

Remember that a reference to a docker image is actually three parts (two of which are often omitted):

  1. The registry name (default value: hub.docker.com)
  2. The repository name (default value: unset)
  3. The image tag (default value: latest)

In the case of etheos, publishing the image is a matter of ensuring it has both the latest tag (which is clobbered on every build) and a unique version tag that can be traced back to the pipeline that built it.

143
144
145
        docker tag darthchungis/etheos:$FULL_VERSION darthchungis/etheos:latest
    - name: Push latest docker image
      run: docker push --all-tags darthchungis/etheos

Because the default value is hub.docker.com, this will publish a public image to Docker hub with the repository darthchungis/etheos and the tags latest and $FULL_VERSION.

Note that you must be authenticated to the remote registry in order to push an image. On Docker hub, your username is the first part of the repository (in my case, darthchungis). The command docker login -u darthchungis provides an interactive prompt for your password. Automated systems such as CI pipelines usually store the credentials in a secret and pass that to a specialized task to authenticate with the registry.

Wrapping up

That’s about it for this one. Between this explanation and part 1, you should have enough requisite knowledge to start exploring other features of Docker.

Here are some helpful documentation links:

Next I plan to cover more advanced deployment scenarios, such as docker compose and Kubernetes. Docker compose allows specification of container runs in a yaml format, which makes it very useful for setting up development environments, especially if you want to test integration with an external system (e.g. SQL Server or MySQL). Kubernetes is a full-featured, distributed, orchestration platform that provides reliability and hosting capabilities. It is extremely powerful and makes use of containers as compute units. Discussing both should provide for an interesting read.

Until next time - happy dockering!