This is a series blogs about Docker and Kubernetes — I’m not 100% sure about the structure yet so it can be fragmented 🐳. Anyway, here we go!
When I search for Docker basics, it often comes up with many tutorials that teach you to create and run an image from scratch. But this is not we want. I feel it’s important to understand the basic concepts before you start to do something.
So what is Docker?
Docker is a containerisation system. It allows multiple self-contained applications to run on a single server. What does it mean? On Docker, each application is isolated from each other and has its own virtualised operating system (so on Windows you can have a Linux-based Docker container). Noted that inside one Docker container, you can run multiple applications as long as their operating system in the same (so 2 Applications each requiring Linux and Windows system cannot run in one Container).
Why would we need Docker?
- Provides a robust way to bundle/run/test services/dependencies that can be easily shared across organisations and shipped across different environment, as Docker is fully self-sufficient, so it can manage all the dependencies it required to run itself.
- Requires less resources comparing to e.g. Virtual Machines as it does not have an an operating system installed (a virtual one instead) . Also, you can bundle multiple applications together in one container.
- Improves scalability and development efficiency. As with Docker, developers won’t need to check the dependencies versions and install them one by one, they can just wrap a whole application inside a container and pass it around — no more questioning with “ This works on my local machine!”
So what is inside Docker Ecosystem?
When we say “Docker” we might refer it to a lot of things. Docker image, Docker compose, etc. Really it’s just a platform or ecosystem around creating and running containers. So here are a few of key concepts:
- Docker Engine: It acts as a client-server application with: a server with a long-running daemon process and a client that take the user input from CLI and communicates with daemon.
- Docker Machine: It allows you to install Docker Engine on virtual hosts and manage hosts based on different environment, including virtual machines that reside on your local system, on cloud providers, or on physical computers. For example, AWS, DigitalOcean on the cloud.
- Docker Image: This is a single file template that defines the steps and configs to run a Container. An image is defined by its Dockerfile.
- Docker Container: Docker containers are created from Docker images. You can view them as instance created from Image.
- Docker Hub: A cloud platform where you can upload your images and share with others. (A bit like Github — Git repositories)
- Docker Compose: A way to run multi-container applications — a collection of images.
You may wonder what do we mean by “server” vs “client” in Docker Engine. Well, let me give you an example. Let’s say we have a Docker image “hello world” that will simply print “hello world” to the terminal.
So what’s the process of to run this application?
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the “hello-world” image from the Docker Hub. (amd64)
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
Now you understand the process. So what is happening under the hood in this process?
Docker takes advantage of several features of the Linux system to build its core. So we need to understand 2 important concepts that Docker use to run independent application processes:
Each Docker container starts with a clean state Linux Operating System. The cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. Docker relies on cgroups to control and isolate resource limits.
For example, CPU group is for managing system CPU time and usage while Memory cgroup is for managing accounting, limits and notifications.
In contrary, while cgroup manage the hardware resources, namespaces allocate virtualize system resources. When you run a container, Docker creates a set of namespaces for that container. Each aspect of a container runs in a separate namespace and its access is limited to that namespace.
Docker Engine uses namespaces such as the following on Linux:
pidnamespace: Process isolation (PID: Process ID).
netnamespace: Managing network interfaces (NET: Networking).
ipcnamespace: Managing access to IPC resources (IPC: InterProcess Communication).
mntnamespace: Managing filesystem mount points (MNT: Mount).
utsnamespace: Isolating kernel and version identifiers. (UTS: Unix Timesharing System).
A lot of these namespace is originated as a tree. For example, the Process Tree for PID marks the first process as the root of the tree as PID 1. And Filesystem Tree to organise the filesystem root as
/ . With all these namespaces, Docker helps you to achieve isolation to the level you requires.
Now we understand how Docker manages independent services, we are going to look at how Docker builds and stores images, and how these images are used by containers:
Docker uses a copy-on-write technique and a union file system for both images and containers to optimise resources and speed performance. Multiple copies of an entity share the same instance and each one makes only specific changes to its unique layer.
In Docker, each image has a list of read-only layers of filesystems serving as base image that is defined in the Dockerfile.
When containers created from an image, a writable layer is added on top of the base docker image. All the addition, modification, deletion is made to this layer, and the base image is untouched.
Docker uses storage drivers to manage the contents of the image layers and the writable container layer. Each storage driver handles the implementation differently, but all drivers use stackable image layers and the copy-on-write (CoW) strategy.
Copy on Write Strategy
Copy-on-write is a strategy of sharing and copying files for maximum efficiency. If a file or directory exists in a lower layer within the image, and another layer (including the writable layer) needs read access to it, it just uses the existing file. The first time another layer needs to modify the file (when building the image or running the container), the file is copied into that layer and modified. This minimizes I/O and the size of each of the subsequent layers. These advantages are explained in more depth below.
Docker makes use of copy-on-write technology with both images and containers. This CoW strategy optimizes both image disk space usage and the performance of container start times.
One implementation of CoW is the snapshot filesystem. This means that a given file or directory is only copied to the container’s writable layer when it is modified or deleted by that container.
Union file systems
Union file systems is an addition to Copy on Write strategy to make the image build more efficient. Docker uses union file systems in conjunction with copy-on-write techniques to provide the building blocks for containers, making them very lightweight and fast. Docker Engine can use multiple UnionFS variants, including AUFS, btrfs, vfs, and DeviceMapper.
Union file systems implement a union mount and operate by creating layers/branches, which means that it layers multiple directories on a single Linux host and presents them as a single directory. These directories are called branches in AUFS terminology, and layers in Docker terminology.
On AUFS for example each image layer, and the container layer, are represented on the Docker host as subdirectories within
/var/lib/docker/. The union mount provides the unified view of all layers. The directory names do not directly correspond to the IDs of the layers themselves. The unification process is referred to as a union mount.
Read more here.
So now you understand how Docker container can be deleted with no trace left behind, you may wonder what if you have a Database container and some example data in the database container, when this container get deleted, your data will also get lost. What you can do to persist the data?
In order to be able to save (persist) data and also to share data between containers, Docker came up with the concept of volumes. Quite simply, volumes are directories (or files) that are outside of the default Union File System and exist as normal directories and files on the host filesystem.
So you can say volumes exist outside the lifecycle of a given container. There are several ways of persisting data in Docker other than volumes, but volumes are normally the best way to do so.
- Volumes are stored in a part of the host filesystem which is managed by Docker (
- Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
tmpfsmounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
With volumes, you can:
- Sharing data among multiple running containers
- Storing and migrating your data to on the cloud
That’s so much for today.
🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋🐬 🐳 🐋