From 0 to Docker

Recently, I took some time to learn about Docker.

Terminology

A single docker image can be reused to provision containers. An image can be thought about like a DVD. You can use it on multiple devices and the same movie plays.

A dockerfile is a text document with the sequential instructions necessary to build a docker image.

An application along with its dependencies is packaged in a container. A container is a running, paused or stopped instance of an image. Although you can run more than one process with a container, they are designed to run a single process. Containers are also designed to be ephemeral. The data stored on containers is inherently ephemeral, as well.

A registry is a repository for images. The most popular example of a registry is Docker Hub. I think of Docker Hub like I think of the NPM registry. Instead of NPM packages you can pull container images.

A tag allows you to add a unique version identifier to an image name.

Install

While experimenting with Docker I’ve been using WSL2 on my Windows 10 machine to run Ubuntu 20.04. These are the installation steps I’ve followed to get Docker up and running.

Make sure to start the docker service before running the sample image.


$ sudo service docker start

$ sudo docker run hello-world

This runs one instance (container) of the hello-world image.

I followed the Linux post install steps to avoid having to run sudo every time I run a docker command.


$ sudo groupadd docker

$ sudo usermod -aG docker $USER

Command-line usage

Container management

You can use the container’s name in many (maybe all?) of these commands. For the sake of brevity I’ll use container when I mean container_id or container_name.

To provision a container use the docker run command:


$ docker run image

To name your container instead of using the randomly generated name:


$ docker run --name cool_name image

Note: You can’t have 2 containers deployed with the same name.

Use the -e option to add environment variables.


$ docker run -e MY_VAR=myValue image

Containers are isolated so you need to explicitly expose them to the outside world. To do this using the docker run command we can leverage the --expose option. Also you can map or publish the exposed port to a port on your host system with the -P option or -p container_port:host_port. I had to use -P in order to properly connect to my Ubuntu localhost from my Windows browser when using nginx.


$ docker run -P nginx

See which containers are running:


$ docker ps

See all containers (running, paused and stopped):


$ docker ps -a

You can use the smallest number of characters to uniquely identify a container. For example, if you’re only running one container and it has an ID of d64c42228850 you can just use d to identify the container when running docker commands.

Get detailed info about a container:


$ docker inspect container

Some other useful commands for managing containers are listed below.


$ docker pause container

$ docker unpause container

$ docker stop container

Similar to docker stop, the docker kill command will stop the container but it will kill the process instead of gracefully shutting it down.


$ docker kill container

Stopped containers remain in the system and take up space. To remove a stopped container (it must be stopped before you can remove):


$ docker rm container

To automatically remove the container when it exits:


$ docker run --rm image

To stop all running containers in bash:


$ docker stop $(docker ps -q)

To remove all exited containers in bash:


$ docker rm $(docker ps -q -f status=exited)

If all containers are stopped you can remove them all with one command.


$ docker rm $(docker ps -aq)

The exec command executes a command against a running container.


$ docker exec container ls /

To get into an interactive bash shell inside a container:


$ docker exec -it container /bin/bash

This is nice for running commands against a container to later be added to a dockerfile once tested out or for debugging a running container. We can use the exit command or CMD + d to disconnect.

To inspect container logs:


$ docker logs container_id

We can also add -f to tail logs.

Image management

To list local images:


$ docker images

Get new images:


$ docker pull image_name

To remove docker images:


$ docker rmi image_name

Note: to remove an image all associated containers running, paused or stopped must first be removed.

To tag images:


$ docker tag image new_image

To push images to a repository:


$ docker login

$ docker push image

Persisting Data

Volume Mapping

We can map a container directory to a directory on the host machine. This will allow data saved to the specified directory in the container to remain in the directory on the host machine when the container is removed. Likewise, changes made on the host machine will be reflected in the container.


$ docker run -itd -v ~/local_relative_dir:/container_dir ubuntu

Note: I had to use the absolute path to get this to work properly with WSL 2.

Docker Volumes

Another option is to use volume mounts that are managed by Docker. Docker Volumes are the preferred way to persist data. You can mount more than one volume inside a container and you can mount any number of containers to a single volume.

To create a volume:


$ docker volume create my_volume

To list volumes:


$ docker volume ls

To map a volume:


$ docker run -itd -v my_volume:/container_dir ubuntu

To find the path to the volume on your local host machine:


$ docker volume inspect my_volume

In the resulting JSON, the value of “Mountpoint” is the path to the volume. This path can be configured.

Dockerfiles

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

There are several Dockerfile instruction options.

FROM - Specify the image to base your image off of. Multiply FROM values designate a multi-stage docker build.

RUN - Run a linux command.

ADD - Copy files into the container from the host machine or from a remote URL.

ENV - Set an environment variable.

EXPOSE - Expose a port.

WORKDIR - Set the working directory.

USER - Set the user or group to use when running the container.

VOLUME - Create a mount point with the specified name and mark it as holding externally mounted volumes from the host or other containers.

ENTRYPOINT - Set how to enter, or, how to start the application.

CMD

Set default command for running a container.
Also can be used to postfix the ENTRYPOINT command if it exits.
If you list more than one CMD then only the last CMD will take effect.
An inline CMD will override the default in the Dockerfile (docker run image [CMD])

LABEL - Add metadata.

See the Dockerfile reference documentation for a full list of all the options.

Once we have our Dockerfile we can use docker build to create an automated build that executes several command-line instructions from the Dockerfile in succession. If you run the build command more than once it will only build the delta between build runs. However, you can utilize the --no-cache flag to build everything instead.

Here’s a simple example of building an image using a Dockerfile.

Script file (hello_world.sh)

#!/bin/bash

echo "Hey! We did some docker stuff!"

Dockerfile

FROM ubuntu

RUN apt update -y && apt upgrade -y

WORKDIR ~/script

COPY hello-world.sh .
RUN chmod +x hello-world.sh

CMD './hello-world.sh'


$ docker build -t hello_world .

$ docker run hello_world

Performance

Multi-stage builds are useful for ending up with smaller image sizes. Here’s an excerpt from the Docker docs for multi-stage builds:

One of the most challenging things about building images is keeping the image size down. Each instruction in the Dockerfile adds a layer to the image, and you need to remember to clean up any artifacts you don’t need before moving on to the next layer. To write a really efficient Dockerfile, you have traditionally needed to employ shell tricks and other logic to keep the layers as small as possible and to ensure that each layer has the artifacts it needs from the previous layer and nothing else.

It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”. Maintaining two Dockerfiles is not ideal.

With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image. To show how this works, let’s adapt the Dockerfile from the previous section to use multi-stage builds.

Here’s an example of cutting down image size using a multi-stage build.

373MB

FROM golang:alpine
WORKDIR /fizzbuzz
COPY . .
RUN go build -v -o FizzBuzzApp
CMD ["./FizzBuzzApp"]

7.64MB

FROM golang:alpine AS build-step
WORKDIR /fizzbuzz
COPY . .
RUN go build -v -o FizzBuzzApp

FROM alpine
WORKDIR /app
COPY --from=build-step /fizzbuzz/FizzBuzzApp .
CMD ["./FizzBuzzApp"]

In addition to multi-stage builds, we can leverage layers to boost performance. Layers are intermediate changes in an image. Each instruction in a dockerfile creates a new layer. Layers can be cached and reused for performance benefit.

Networking

Network management


$ docker network ls

$ docker network create my_network

$ docker network inspect my_network

$ docker network rm my_network

You can’t remove a network with running containers associated with it.

Containers cannot speak across networks only within them. To run a container within a specified network.


$ docker run -d --network=my_network nginx

Note: Since the container ID is also the host name of the container you can use it to ping, etc.

Network types

Containers can communicate via IP addresses or hostnames within a network (unless they have a network type of none).

Bridged - Default network created by Docker to allow connection between Docker networks and the host machine.

Host - Network config for the container as if it was the host machine with no port mapping required.

None - No networking outside of the container for true isolation.

Docker Compose

Docker compose is used for managing more than one container.

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

This hands-on example voting application uses Docker Compose.

To get all the images defined in the docker-compose.yml file


$ docker-compose pull

Build everything in the docker-compose.yml file


$ docker-compose build

To start Compose and run the entire application:


$ docker-compose up

To run in background:


$ docker-compose up -d

Some other useful commands:


$ docker-compose ps

$ docker-compose logs -f

$ docker-compose down

Note: The down command also removes containers.

Compose won’t help with things like load balancing. Orchestrators are good for things like this. Container orchestration is the automated process of managing or scheduling the work of individual containers for applications.

Common Orchestrators

Kubernetes
Amazon ECS
Docker Swarm
Red Hat Openshift

Docker-izing an existing application

The general strategy for taking an existing application and incorporating Docker looks something like this.

Identify which base image you need
Identify which tools/apps/runtimes to install in the container
Identify what you’ll need to copy to the container (if anything)
Get the app working
Then worry about data persistence
Identify opportunities for configuration (env variables, config file(s), scripts, etc)
Optimize
- Make the image as small as possible
- Build time optimization (via cached layering, etc)
Add logging

Here’s an example of an existing application and it’s docker-ized implementation.

References

Contained - Docker Deep Dive

Intro to Docker for Web Developers