An Introduction to Docker and Containerization

What is Docker?

Docker is both, a brand and a technology. It was developed under the Open Container Initiative by Docker (the company, formerly known as dotCloud) when it virtually went bankrupt. Docker (the product) not only helped it raise funds, but also paved a way for its strong revival into the game. On a Linux platform, it allows an end user to run multiple containers out of which each container can hold a single application. In precise technical terms, when you run an application on an operating system, it runs on its “user space,” and every OS comes with a single instance of this user space. In Docker, every container has one separate user space to offer. What this means is that containers enable us to have multiple instances of user spaces on a single operating system. Therefore, in the simplest terms, a container is just an isolated version of a user space. That’s it!

How Is It Different From VMs?

Docker is different from a VM in the following ways:

  1. It’s very lightweight in comparison to a VM in terms of size and resource consumption. This is because a container is just the bare bones of an operating system. It contains only the most basic packages required for an OS to run.
  2. It takes less time to spin up in contrast to a virtual machine (it depends on the application that you’ll be hosting on a container, but usually, it’s just a matter of seconds).
  3. Unlike a VM, a container can run only one process and when that process stops out for some reason, the container expires, as well. You can apparently modify this behavior to have it run multiple processes, but with that, you lose the essential concept of “loose coupling” your components. Better stick to VMs then.

Core Constructs of Docker

A Docker-based environment consists mainly of the following things:

Docker Engine

This is the main component responsible for running workloads in the form of a container. You have three options to choose from: the Community edition, the Enterprise edition, and Experimental, the last of which shouldn’t be used in Production.

Docker Client

It comes equipped with the Docker Engine package in form of Docker binary, and by default connects to the locally installed Docker Engine. You interact with the Docker Engine using this client only.

Docker Image

A Docker image for a container is just what an ISO image is for a VM. A Docker image consists of multiple layers stacked on top of one another and presented via union mounts. Here, first layer (zero-indexed) is the base image, second is the application layer (like Tomcat or NGINX), and the third contains any sort of updates. When you start a container using an image, an additional layer gets added to it that is writable, whereas the rest of the layers are read-only.

Docker Repository and Registry

A repository is a place where the Docker images, by default, go when you push one. A repository is contained within a Registry, so these two are different things, so be aware. One well-known public Docker registry is Dockerhub.

Docker Container

A container, as explained above, is an isolated version of user space, starting with using a Docker image. An important thing to note here is that unlike Linux systems where first PID is assigned to init or systemd, in a container, this PID is assigned to the command or service that it is supposed to run. When that process is dead, the container exits out.

How Does Docker Work?

While on a Linux-based OS, a container leverages existing Kernel features.

Name Space

Don’t confuse it with user space, as it’s different. Essentially, these namespaces are Network, PID, IPC, User, Mount, and UTS. They allow a Docker container to have its own view of Network, PID, hostname, users and groups, etc.

CGroups

CGroups, short for Control Groups, are what allow containers to have a reserved/dedicated amount of resources assigned to them in the form of CPU and memory.

Apart from these two (namespace and CGroups), Docker also makes use of storage drivers like AUFS, DeviceMapper, Overlay, BTRFS, and VFS. I won’t explain the difference between them and their features to keep this article as simple as possible. Just keep in mind that the default storage driver for Docker on an RHEL-type OS (like CentOS) is DeviceMapper, while on a Debian-based OS like Ubuntu, it’s AUFS.

What Do We Need to Run a Docker Environment?

At the very basic level you need the following two Docker components:

1.) A Docker Engine

2.) A Docker image (as appropriate)

How Do We Get or Create a Docker Image?

If you don’t have very specific requirements, you can just find and pull an image directly from Docker hub that fulfills your needs using Docker command line. For example, if you just need to run NGINX with default settings, you need not to compile your own image; just pull one from Docker hub. Remember, the higher the star count an image has, the more reliable it is. If you have specific requirements and need a custom image that’s not already available, then you can:

  • Pull a base image, run a container from it, do all the modifications as needed, and commit it as an image.
  • Create a Dockerfile and compile an image from scratch using it. Again, I don’t want to make this article too complex, so I will just give an overview of what a Dockerfile is.

What Is a Dockerfile?

A Dockerfile (case sensitive) is a plain text file where you write your instruction to create an image. These instructions are read one at a time, from left to right, top to bottom. These instructions may include terms like FROM, MAINTAINER, RUN, COMMAND, or ENVIRONMENT. You can read more about a Dockerfile from here, and while you try to gain more familiarity with it, just keep two things in mind:

  • The more RUN instructions you add (these are mainly meant to provision an image), the more layers get added to an image. Recall that an image is comprised of layers.
  • There can be only one COMMAND instruction per Dockerfile. If you add multiple, the last one will overwrite others.

Alright, let’s take a look how Docker commands look like.

Docker Command Examples

  • To pull an image from the default registry (Docker Hub) use: docker pull [NAME OF THE IMAGE] 
  • To run a container using the downloaded image run: docker run –d [NAME_OF_THE_IMAGE] [COMMAND]. By the way, this command will automatically pull an image if the specified one doesn’t already exist on the local Docker host.
  •  —d parameter detaches you from the container and returns you to the host’s shell. Without a command, the container will exit out if one is not already specified in the image (hope you still remember that too).
  • To search an image from Docker Hub use: docker search [ANY_STRING] 
  • To view all the locally available images run: docker images 
  • To view just the running containers use: docker ps 
  • To view all the container run: docker ps –a 
  • To remove a container run: docker rm [CONTAINER_ID/NAME] 
  • To remove an image: docker rmi [IMAGE_NAME] 

I could go on and on and on but would prefer to stop here. For a full list of commands, go to the Docker command documentation.

How Docker Is Used in A Real World

You cannot run merely Docker as-is to handle your workloads, especially the production ones. You need to have a scheduling and orchestration solution in place for a containerized environment. Some of the most popular container orchestration solutions include:

  • Kubernetes from Google
  • EC2 Container Service from AWS (A managed service)
  • Mesos Marathon from Apache
  • Docker Swarm from Docker (mainly a host-based clustering solution rather than a container)

Which one you should use depends mainly upon your business and workload needs, and familiarity.

I’d prefer Kubernetes as it has been used by Google for over a decade and has probably gotten a bit more adept with working in a containerized environment, and because I’m more familiar with it. However, a business/organization runs as per its needs, and one should be ready to understand and respect that fact and work in accordance.