Understanding Basic Kubernetes Concepts I - An Introduction To Pods, Labels, and Replicas
This post is the first in a series of blog posts about basic Kubernetes concepts. In the second post we talk about Deployments. The third post explains the Services concept and in the forth we look at Secrets and ConfigMaps. In the fifth and final post we talk about Daemon Sets and Jobs.
There’s lots of introductions and tutorials out there that help you get started with Kubernetes. However, most of them focus on showing how the commands work and how to get stuff running. Don’t get me wrong, trying out the commands is important, but to really get into working productively with Kubernetes you need to go deeper and understand the concepts and their functionalities so you can actually use them in the way they were intended. This is especially hard coming from vanilla Docker, as the concepts of Docker don’t directly translate to Kubernetes - at least if you want to use Kubernetes the right way.
In this blog post we will take a look at the most basic concepts: pods, labels & selectors, and replica sets. I won’t include any actual usage instructions, but try to focus solely on explaining the concepts, their functionality, and what you should use them for. These are only the most basic concepts and you need some more to get a full overview of Kubernetes, so rest assured I will explore further concepts in following blog posts.
“Pods are the smallest deployable units of computing that can be created and managed in Kubernetes.” say the official Kubernetes docs for pods. This sometimes leads to the confusion that pods are single containers, as that’s what people are used to from Docker. While pods can contain one single container, they are not limited to one and can contain as many containers as needed.
What makes these containers a pod, is that all containers in a pod run as if they would have been running on a single host in pre-container world. Thus, they share a set of Linux namespaces and do not run isolated from each other. This results in them sharing an IP address and port space, and being able to find each other over
localhost or communicate over the IPC namespace. Further, all containers in a pod have access to shared volumes, that is they can mount and work on the same volumes if needed.
In order to gain all this functionality a pod is a single deployable unit. Each single instance of the pod (with all its containers) is always scheduled together.
Now for the typical Docker user this concept is quite new. While Giant Swarm users have been able to use pods for quite a while even without Kubernetes, only few really do. For some it might sound like going back from the isolated “one process per container” to “deploying your whole LAMP stack together”. However, this is not the intended use case for pods. The main motivation for the pod concept is supporting co-located, co-managed helper containers next to the application container. These include things like: logging or monitoring agents, backup tooling, data change watchers, event publishers, proxies, etc. If you are not sure what to use pods for in the beginning, you can for now use them with single containers like you might be used to from Docker.
The pod is the most basic concept in Kubernetes. By itself, it is ephemeral and won’t be rescheduled to a new node once it dies. If we want to keep one or more instances of a pod alive we need another concept: replica sets. But before that we need to understand what labels and selectors are.
Labels & Selectors
Labels are key/value pairs that can be attached to objects, such as pods, but also any other object in Kubernetes, even nodes. They should be used to specify identifying attributes of objects that are meaningful and relevant to users. You can attach labels to objects at creation time and modify them or add new ones later.
Labels can be used to organize and select subsets of objects. They are often used for example to identify releases (beta, stable), environments (dev, prod), or tiers (frontend, backend), but are flexible to support many other cases, too. They help get some order into the multi-dimensionality of modern deployment pipelines.
Labels are a key concept of Kubernetes as they are used together with selectors to manage objects or groups thereof. This is done without the actual need for any specific information about the object(s), not even the number of objects that exist.
Especially the fact that the number of objects is unkown should be kept in mind when working with label selectors. In general, you should expect many objects to carry the same label(s).
Currently, there are two types of selectors in Kubernetes: equality-based and set-based. The former use key value pairs to filter based on basic equality (or inequality). The latter are a bit more powerful and allow filtering keys according to a set of values.
Using label selectors a client or user can identify and subsequently manage a group of objects. This is the core grouping primitive of Kubernetes and used in many places. One example of its use is working with replica sets.
Replica Sets (and Replication Controllers)
Replica sets, for those who have read a bit about Kubernetes before, are the next-generation of replication controllers. Currently, the main difference being that replica sets support the more advanced set-based selectors and thus are more flexible than replication controllers. However, the gist of the following explanation fits to both.
As mentioned above a pod by itself is ephemeral and won’t be rescheduled if the node it is running on goes down. This is where the replica set comes in and ensures that a specific number of pod instances (or replicas) are running at any given time. Thus, if you want your pod to stay alive you make sure you have an according replica set specifying at least one replica for that pod. The replica set then takes care of (re)scheduling your instances for you.
As indicated above the replica set ensures a specific number of replicas are running. By modifying the number of replicas in the set’s definition you can scale your pods up and down.
A replica set can not only manage a single pod but also a group of different pods selected based on a common label. This enables a replica set to for example scale all pods that together compose the frontend of an application together without having to have identical replica sets for each pod in the frontend.
You can include the definition of the pod directly in the definition of the replica set, so you can manage them together. However, there is a higher level concept called a deployment, which manages replica sets. Therefore, you usually won’t need to create or manipulate replica set objects directly. It’s still important knowing about this concept as without it you won’t understand the specific workings of how Kubernetes will help you run and manage your applications. I will explain deployments and the many features they bring with them in a follow-up blog post, for now suffice to say that they will manage your replica sets for you.
Now that you learned a bit about the basic concepts you can read more about them and their usage in the official Kubernetes docs:
Try to play around with them and then maybe move on to a more advanced example and try it out on a local environement. Keep watching out for follow-ups to this blog post explaining deployments and more Kubernetes concepts. For feedback and requests just get in touch.