Kubernetes from Scratch in 2022
Kubernetes is the leading platform for running self-healing containerized applications with fine-granular configuration, access, and security settings. Working with a managed Kubernetes distribution practically for more than 2 years, I decided to prepare for the various certifications to facilitate my learning. What could be a better approach than to start exploring Kubernetes from scratch? Learning about the components and their interactions, see the various Kubernetes distributions and perform installations on on-premise or cloud infrastructure.
In this article, you will get a comprehensive overview and introduction into Kubernetes. You will learn about the basic components of a Kubernetes cluster, see the features of managed Kubernetes platforms, and explore Kubernetes distributions for installation on your own.
Kubernetes Components
A Kubernetes cluster is divided into master nodes and worker nodes. The master nodes define the control plane of the cluster, and the worker nodes are used to host the containerized application workloads.
Master nodes run the cluster state (almost always ETCD), API server, scheduler, and controller manager components. Worker nodes require a container runtime engine (Docker, containerd) and the kubelet for communication with the kube API Server. Also, on each worker node, a kube proxy process runs to enable network communication between the various K8S resources.
In general, components can be installed as binaries on the nodes and started as processes, or they can run as containers themselves.
Let’s take a closer look to these components and what they do.
- ETCD: A high performance key value store in which the complete configuration of the Kubernetes cluster and all Kubernetes resources are included. It is the single source of truth about the continuously changed state of the cluster.
- Kube API Server: Central component that offers a versioned API for all Kubernetes operations, including node management and CRUD for all Kubernetes resources. It uses ETCD as its datastore and offers its API to all other components. Technically speaking, it will authorize and validate API request, fetch data from ETCD, and return it to the caller. It can be queried manually, yet the scheduler, controller manager and kubelet are its main users.
- Kube Scheduler: This component is tasked with deciding how to schedule individual pods on all the available worker nodes. It takes all the available information and restrictions about the pods, such as resource requests and limits, the specified node affinity, tolerations and more. Once this decision is made, the desired state in ETCD is updated, and from here on the individual kubelets that run on each node will create pods as defined.
- Kube Controller Manager. The controller manager is a conglomerate of individual controllers for all K8S Resources and nodes. Its purpose is simply described: Observe the status of its resources, compare it to the desired state as defined in ETCD, and take all necessary actions to create the desired state by orchestrating processes and installations on the nodes.
- Kubelet: Each worker node has this very specific component running. It performs several functions: First, it registers the node as a worker node by addressing the kube API server. Second, it communicates with the kube API Server to get the desired state in regard of pods. If it sees information that a new pod needs to be created, it will start it immediately. Lastly, it also performs monitoring activities of the Node itself and of the pods, in conjunction with the controller manager.
- Kubeproxy: The kubepoxy is a federated, live DNS registry that runs on each node. It also constantly accesses the kube API server to get information about all available services and will then manipulate the Nodes iptables data to insert up-to-date DNS entries. This is the backbone of pod communication.
With this understanding, we continue with a brief introduction to managed Kubernetes platforms.
Managed Kubernetes Platforms
A Kubernetes platform is like a framework: It provides a functional, modular system that can be instantiated immediately. Most essential configuration is already defined, and users will need to parametrize the system. The primary execution environment for platforms is cloud computing.
According to a containiq overview article and a user survey from Red Hat, the most popular platforms are the following ones:
- OpenShift is an open source platform that facilitates on-premise and cloud installations. It supports several container runtimes and comes with a dedicated network plugin for software defined networking. Its enterprise features promise tight integration with Red Hat Enterprise Linux and Ansible automation.
- Amazon Elastic Kubernetes Service is a federated, multi-cluster distribution build inside AWS. Its most prominent feature is simply its embedding into the wider AWS environment. Computational resources and storage can be added instantaneously to the Kubernetes cluster, giving powerful autoscaling capabilities. Several CRI, CNI, and storage systems are provided.
- Google Kubernetes Engine provides a similar experience in the context of Google Cloud Computing. It’s a completely managed Kubernetes platform, with emphasize on auto scaling, security features such as container scanning, and a marketplace for prebuild Kubernetes applications. An additional addon is Google Athos, a management environment that allows GKS applications to run in non-Google environments, including on-premise and edge infrastructures.
- Rancher is an enterprise managed platform that works well for large to medium cooperation. It includes all required components. Requirements are to use Docker as the container runtime engine, and when using block storage, their own solution Longhorn is used.
- Tanzu is a Kubernetes platform that emphasizes the developer. It provides custom container images for the most important technology stacks, and several additional abstractions that help with application development and streamlined Kubernetes hosting. As several other stacks as well, its build with strong support of multi-cluster management.
- Mirantis, an enterprise version of RedHat OpenStack targeted to build IAAS platforms on-premise and in the cloud. This platform enriches the open stack platform with integrated databases, messaging queues and service orchestration components that enable very complex applications to be hosted. It uses Ceph for block storage and Calico as its network interface.
- Docker Kubernetes Service is an addon to Docker Enterprise that streamlines application development and hosting in Docker. Via the powerful Docker Desktop tool suite, developers gain quick and easy to use access to a Kubernetes Cluster for deploying their applications. The Kubernetes cluster is fully CRI compliant, allowing other runtimes such as containerd. Also, several storage providers are supported.
Managed platforms are great when you need a running and very scalable cluster. If you want to install Kubernetes, take a look at the following distributions.
Kubernetes Distributions
A Kubernetes distribution is an opinionated, slim-downed framework: Essential configuration with fewer options, and the need to provide the target infrastructure on which Kubernetes is installed. The following distributions are popular open-source projects:
- Kubespray is a framework for configuring nodes and Kubernetes components using Ansible. Users will typically start with creating an inventory of nodes, determining which are the master nodes and which are the worker nodes. With the default configuration, a Kubernetes cluster is created. However, the power of Ansible is that it provides infrastructure as code: Every aspect of the cluster, the components, their versions, their configurations, can be edited and applied to the nodes. It offers several CNI options (Calico, Flannel, Weave), Ingresses (MetalLB, Nginx Ingress), and can also interface directly with cloud provides.
- Kubeadm is a more barebones tool, focusing on cluster node setup and Kubernetes component installation. It’s a command line tool that facilitates the various setup operations, but configuration needs to be done manually and per node.
- Kops is a cluster setup and management command line tool that deploys a Kubernetes cluster to AWS. It provides configuration abstractions such as manifest YAML files that facilitate node and components configuration. And like Ansible, it will provide dry-run capabilities and ensures idempotency of changing the nodes.
- K3S is an open source Kubernetes distribution for edge computing devices (yes, it can run on your Raspberry Pi!). It provides a powerful command line tool for installation, node creation, and node joining towards a cluster. It provides not the same amount of options as others, but emphasizes using containerd and Flanel as core plugins. The CLI makes nodes management and updates very simple. A complete cluster can be created in mere minutes.
Conclusion
In this introductory article to Kubernetes you learned three things. First, the main components of Kubernetes, their responsibilities and how they work together to create a cluster. Second, managed Kubernetes platforms, powerful and scalable out-of-the box Kubernetes installations in cloud computing environments. Third, Kubernetes distribution, opinionated installation scripts and binaries that create a cluster on custom infrastructure. In the next articles, I will systematically explain the Kubernetes distributions.