From 0 to Docker
Recently, I took some time to learn about Docker.
Terminology
A single docker image can be reused to provision containers. An image can be thought about like a DVD. You can use it on multiple devices and the same movie plays.
A dockerfile is a text document with the sequential instructions necessary to build a docker image.
An application along with its dependencies is packaged in a container. A container is a running, paused or stopped instance of an image. Although you can run more than one process with a container, they are designed to run a single process. Containers are also designed to be ephemeral. The data stored on containers is inherently ephemeral, as well.
A registry is a repository for images. The most popular example of a registry is Docker Hub. I think of Docker Hub like I think of the NPM registry. Instead of NPM packages you can pull container images.
A tag allows you to add a unique version identifier to an image name.
Install
While experimenting with Docker I’ve been using WSL2 on my Windows 10 machine to run Ubuntu 20.04. These are the installation steps I’ve followed to get Docker up and running.
Make sure to start the docker service before running the sample image.
$ sudo service docker start$ sudo docker run hello-world
This runs one instance (container) of the hello-world
image.
I followed the Linux post install steps to avoid having to run sudo
every time I run a docker command.
$ sudo groupadd docker$ sudo usermod -aG docker $USER
Command-line usage
Container management
You can use the container’s name in many (maybe all?) of these commands. For the sake of brevity I’ll use container
when I mean container_id
or container_name
.
To provision a container use the docker run
command:
$ docker run image
To name your container instead of using the randomly generated name:
$ docker run --name cool_name image
Note: You can’t have 2 containers deployed with the same name.
Use the -e
option to add environment variables.
$ docker run -e MY_VAR=myValue image
Containers are isolated so you need to explicitly expose them to the outside world. To do this using the docker run
command we can leverage the --expose
option. Also you can map or publish the exposed port to a port on your host system with the -P
option or -p container_port:host_port
. I had to use -P in order to properly connect to my Ubuntu localhost from my Windows browser when using nginx.
$ docker run -P nginx
See which containers are running:
$ docker ps
See all containers (running, paused and stopped):
$ docker ps -a
You can use the smallest number of characters to uniquely identify a container. For example, if you’re only running one container and it has an ID of d64c42228850
you can just use d
to identify the container when running docker commands.
Get detailed info about a container:
$ docker inspect container
Some other useful commands for managing containers are listed below.
$ docker pause container$ docker unpause container$ docker stop container
Similar to docker stop
, the docker kill
command will stop the container but it will kill the process instead of gracefully shutting it down.
$ docker kill container
Stopped containers remain in the system and take up space. To remove a stopped container (it must be stopped before you can remove):
$ docker rm container
To automatically remove the container when it exits:
$ docker run --rm image
To stop all running containers in bash:
$ docker stop $(docker ps -q)
To remove all exited containers in bash:
$ docker rm $(docker ps -q -f status=exited)
If all containers are stopped you can remove them all with one command.
$ docker rm $(docker ps -aq)
The exec
command executes a command against a running container.
$ docker exec container ls /
To get into an interactive bash shell inside a container:
$ docker exec -it container /bin/bash
This is nice for running commands against a container to later be added to a dockerfile once tested out or for debugging a running container. We can use the exit
command or CMD + d to disconnect.
To inspect container logs:
$ docker logs container_id
We can also add -f
to tail logs.
Image management
To list local images:
$ docker images
Get new images:
$ docker pull image_name
To remove docker images:
$ docker rmi image_name
Note: to remove an image all associated containers running, paused or stopped must first be removed.
To tag images:
$ docker tag image new_image
To push images to a repository:
$ docker login$ docker push image
Persisting Data
Volume Mapping
We can map a container directory to a directory on the host machine. This will allow data saved to the specified directory in the container to remain in the directory on the host machine when the container is removed. Likewise, changes made on the host machine will be reflected in the container.
$ docker run -itd -v ~/local_relative_dir:/container_dir ubuntu
Note: I had to use the absolute path to get this to work properly with WSL 2.
Docker Volumes
Another option is to use volume mounts that are managed by Docker. Docker Volumes are the preferred way to persist data. You can mount more than one volume inside a container and you can mount any number of containers to a single volume.
To create a volume:
$ docker volume create my_volume
To list volumes:
$ docker volume ls
To map a volume:
$ docker run -itd -v my_volume:/container_dir ubuntu
To find the path to the volume on your local host machine:
$ docker volume inspect my_volume
In the resulting JSON, the value of “Mountpoint” is the path to the volume. This path can be configured.
Dockerfiles
A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.
There are several Dockerfile instruction options.
FROM
- Specify the image to base your image off of. Multiply FROM values designate a multi-stage docker build.
RUN
- Run a linux command.
ADD
- Copy files into the container from the host machine or from a remote URL.
ENV
- Set an environment variable.
EXPOSE
- Expose a port.
WORKDIR
- Set the working directory.
USER
- Set the user or group to use when running the container.
VOLUME
- Create a mount point with the specified name and mark it as holding externally mounted volumes from the host or other containers.
ENTRYPOINT
- Set how to enter, or, how to start the application.
CMD
Set default command for running a container.
Also can be used to postfix the ENTRYPOINT command if it exits.
If you list more than one CMD then only the last CMD will take effect.
An inline CMD will override the default in the Dockerfile (
docker run image [CMD]
)
LABEL
- Add metadata.
See the Dockerfile reference documentation for a full list of all the options.
Once we have our Dockerfile we can use docker build
to create an automated build that executes several command-line instructions from the Dockerfile in succession. If you run the build command more than once it will only build the delta between build runs. However, you can utilize the --no-cache
flag to build everything instead.
Here’s a simple example of building an image using a Dockerfile.
Script file (hello_world.sh)
#!/bin/bashecho "Hey! We did some docker stuff!"
Dockerfile
FROM ubuntuRUN apt update -y && apt upgrade -yWORKDIR ~/scriptCOPY hello-world.sh .RUN chmod +x hello-world.shCMD './hello-world.sh'
$ docker build -t hello_world .$ docker run hello_world
Performance
Multi-stage builds are useful for ending up with smaller image sizes. Here’s an excerpt from the Docker docs for multi-stage builds:
One of the most challenging things about building images is keeping the image size down. Each instruction in the Dockerfile adds a layer to the image, and you need to remember to clean up any artifacts you don’t need before moving on to the next layer. To write a really efficient Dockerfile, you have traditionally needed to employ shell tricks and other logic to keep the layers as small as possible and to ensure that each layer has the artifacts it needs from the previous layer and nothing else.
It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”. Maintaining two Dockerfiles is not ideal.
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image. To show how this works, let’s adapt the Dockerfile from the previous section to use multi-stage builds.
Here’s an example of cutting down image size using a multi-stage build.
373MB
FROM golang:alpineWORKDIR /fizzbuzzCOPY . .RUN go build -v -o FizzBuzzAppCMD ["./FizzBuzzApp"]
To
7.64MB
FROM golang:alpine AS build-stepWORKDIR /fizzbuzzCOPY . .RUN go build -v -o FizzBuzzAppFROM alpineWORKDIR /appCOPY --from=build-step /fizzbuzz/FizzBuzzApp .CMD ["./FizzBuzzApp"]
In addition to multi-stage builds, we can leverage layers to boost performance. Layers are intermediate changes in an image. Each instruction in a dockerfile creates a new layer. Layers can be cached and reused for performance benefit.
Networking
Network management
$ docker network ls$ docker network create my_network$ docker network inspect my_network$ docker network rm my_network
You can’t remove a network with running containers associated with it.
Containers cannot speak across networks only within them. To run a container within a specified network.
$ docker run -d --network=my_network nginx
Note: Since the container ID is also the host name of the container you can use it to ping, etc.
Network types
Containers can communicate via IP addresses or hostnames within a network (unless they have a network type of none
).
Bridged - Default network created by Docker to allow connection between Docker networks and the host machine.
Host - Network config for the container as if it was the host machine with no port mapping required.
None - No networking outside of the container for true isolation.
Docker Compose
Docker compose is used for managing more than one container.
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.
This hands-on example voting application uses Docker Compose.
To get all the images defined in the docker-compose.yml
file
$ docker-compose pull
Build everything in the docker-compose.yml
file
$ docker-compose build
To start Compose and run the entire application:
$ docker-compose up
To run in background:
$ docker-compose up -d
Some other useful commands:
$ docker-compose ps$ docker-compose logs -f$ docker-compose down
Note: The down
command also removes containers.
Compose won’t help with things like load balancing. Orchestrators are good for things like this. Container orchestration is the automated process of managing or scheduling the work of individual containers for applications.
Common Orchestrators
- Kubernetes
- Amazon ECS
- Docker Swarm
- Red Hat Openshift
Docker-izing an existing application
The general strategy for taking an existing application and incorporating Docker looks something like this.
- Identify which base image you need
- Identify which tools/apps/runtimes to install in the container
- Identify what you’ll need to copy to the container (if anything)
- Get the app working
- Then worry about data persistence
- Identify opportunities for configuration (env variables, config file(s), scripts, etc)
- Optimize
- Make the image as small as possible
- Build time optimization (via cached layering, etc)
- Add logging
Here’s an example of an existing application and it’s docker-ized implementation.