Kubernetes

These last few months I’ve been writing a lot for Greater and today I can give away a tiny bit of the writing process.

I’ve always been OK at writing documentation, but I certainly won’t go as far as saying that writing is a second nature. Writing a good piece of documentation and especially a blog post is hard. For a blog post, for me it takes a tremendous amount of time preparing, researching, testing, rewriting, proofreading, rewriting some more, etc. Coming up with only the idea for a new post can be, because of the gazillions of possibilities in the cloud native landscape, quite the challenge as well.

Now why am I telling you this? Because I’ve been working on a few new blogs and then I realized that I was assuming you know what Kubernetes is, where it came from, what does it do and what not, what are the pros, what are the cons. Now that’s quite an assumption. Although I’m sure most of you know Kubernetes a little bit, I would’ve sold it short not dedicating a little post on it.

Containers

Let’s start with the building blocks: containers. What are these? Years ago, I saw a presentation on it, explaining it as a baby computer inside a computer. Although it is not entirely correct as an analogy, I like it and it stuck with me.

A container is a set of 1 or more processes that are isolated from the rest of the system It is a technology that allows you to package and isolate applications with their entire runtime environment. This makes it easy to move the contained application between environments (development, test, production i.e.) while retaining full functionality.

You probably already know virtualization and am thinking of it when reading this and this picture compares the two nicely:

You can substitute the ‘Docker Engine’ on the right with any container engine, because although Docker made containers what it is today, it is not the only option out there. The main point to take away from the image, is that where a virtual machine holds an entire operating system, a container holds just one app. It’s way more portable, lightweight and flexible.
Containers go way back. They originated from Unix chroot in 1979 and was first widely used in FreeBSD jails in 2000. Via a couple of other technologies like Zone, LMCTFY and LXC it was adopted by a small group of people in 2008 that later founded Docker Inc. and was released publicly and open sourced in 2013. This is when companies all around the world started embracing containers in their IT infrastructures.

Orchestration

In the next few years container usage and their numbers grew exponentially. People recognized the benefits of containers (isolation, portability, efficiency, speed, versioning, automation, to name a couple) and converted their applications to microservices. Having dozens of containers on a single host was no exception. But with this rapid adoption, new challenges arose. Probably the simplest example being the dreaded single point of failure; when this single host dies, you can imagine what happens with the dozens of containers on it…

So, we need a way to cluster these hosts, to make them redundant, but also a way to make these containers running on them, redundant. In come the orchestrators. Container orchestration automates the provisioning, deployment, networking, scaling, availability, and lifecycle management of containers. There are quite a couple that roam(ed) the IT landscape:
  • Apache Aurora (dead)
  • Apache Mesos (kind of)
  • Azure Kubernetes Service
  • Docker Swarm (dead)
  • Elastic Kubernetes Service
  • Google Kubernetes Engine
  • Kontena (now Lakend Labs) Pharos
  • Kubernetes
  • Mirantis Kubernetes Engine
  • New Relic Centurion (dead)
  • Nomad from HashiCorp
  • Rancher Kubernetes Engine
  • Red Hat Openshift

This is just a very small subset, and I left the dead projects in there on purpose, for two reasons:

  1. To show that it is a fast-moving market
  2. To show the dominance of Kubernetes and its difference distributions

Basically, the rule of thumb is when you need container orchestration, you pick Kubernetes or one of its distributions, unless there is a very clear reason you’ll need a non-Kubernetes solution. A container orchestration war took place for a while, but Kubernetes has become the clear winner.

Benefits

So, what about the need and benefits of this container orchestration? In the previous section the bulk was already mentioned, but I would like to give you the key benefits of cloud orchestration in general and Kubernetes in particular:

  1. Reducing development and release times. Container orchestration greatly simplifies the development, release, and deployment processes: for example, it enables container integration or facilitates the administration of access to storage resources from different providers.
  2. Optimizing IT costs. Through dynamic and intelligent container administration, an orchestrator can help organizations save on their ecosystem management, ensuring scalability across multiple environments. Resource allocation is automatically modulated to the actual application needs, while low-level manual operations on the infrastructure are significantly reduced.
  3. Increased software scalability and availability. All orchestrators can scale the applications up or down and almost all can also do this for the underlying infrastructure, based on the contingent needs of the organization, facilitating the dynamic management of peaks. For example, as an event date nears, an e-ticketing system will experience a sudden increase in requests to purchase tickets.
  4. Flexibility in multi-cloud environments. Containerization and orchestration – one of the biggest benefits offered by the solution – make it possible to realize the promises of the new hybrid and multi-cloud environments, guaranteeing the operation of applications in any public and private environment, without functional or performance losses.
  5. Cloud migration paths. Finally, an orchestrator makes it possible to simplify and accelerate the migration of applications from an on-premises environment to public or private clouds, offered by any provider. Applications can be migrated to the cloud through the adoption of various methodologies:

    a. the simple transposition of the application, without any coding changes (Lift & Shift).
    b. the minimum changes necessary to allow the application to work on new environments (replatforming).
    c. the extensive rewriting of the application structure and functionality (refactoring).

Cloud Native

The above list is mostly ‘stolen’ from the CNCF (source), the Cloud Native Compute Foundation.

The CNCF website is an excellent resource for some of the cloud native question you might have. This immediately forces me to answer the question ‘What is cloud native?’. It’s an excellent and important question in these times

You can find A LOT of material and definitions out there which probable all are at least a bit correct. I’ll try to give my most simple take on it in one line:

“Cloud native is all about speed and agility – it’s about running elastic workloads in dynamic environments like the cloud.”

Like DevOps, it is important to understand that cloud native is not a technology or a set of technologies. It is a way of doing things. It’s an approach to build, deploy and maintain your workloads, and doing so mostly in the cloud. It leverages cloud technologies like dynamic provisioned load balancers and databases providing single-digit millisecond performance at any scale.

These are some of the technologies I want you to think about when hearing the term ‘cloud native’:

  • Containers
  • Microservices
  • Service meshes
  • Immutable infrastructure
  • Serverless deployments
  • Declarative APIs

These are the principles I want you to think about when hearing the term ‘cloud native’:

  • Scalability
  • Resiliency
  • Manageability
  • Observability
  • Automation

Finally, these two gems:

  1. The cloud native trail map. How to begin building a cloud native app and some of the options you have.
  2. The cloud native landscape. All the options you have in the fast landscape. What is open source, what is closed, what is the CNCF relation, filter by stars, etc.

Because this topic is really a blog post on its own, I’ll leave it at that. This should get you going in the right direction for sure.

Kubernetes

The above was a bit of an excursion, deviating from the original topic, but I think it is essential for the overall picture of Kubernetes. Since, in case you missed it: Kubernetes is in fact cloud native technology.

This finally gets us to the question: what is Kubernetes? We already know it does container orchestration, we know the benefits it brings, and we know it is cloud native technology. But there is more to tell about it. I might have mentioned some of this info already, but let’s take it from the start.

Kubernetes is an open source container orchestration tool that was originally developed by Google under project name ‘Project 7’, forked from an internal orchestration tool named ‘Borg’ (Hi Trekkies!). In 2014 it was donated to the CNCF. The end goal is to manage hundreds or thousands of containers on clustered physical and/or virtual machines. A cluster is most simply explained a group of computers working together as it was one bigger computer. It most often involves a scheduler, dividing workloads on the computers and redundancy in various ways, simply because there is more than one box.

The architecture. A Kubernetes cluster has at least one master but needs 3 (because of Raft consensus in etcd) for high availability. The masters house the control plane. This consists of:

  1. The controller manager. Keeps track of everything that happens in the cluster
  2. The API server. The entrypoint to your cluster (via API, CLI of UI)
  3. The scheduler. Intelligent process that schedules all your pods
  4. The etcd database. Holds all configuration data and current cluster state. The etcd database is sometimes placed on its own cluster

With these masters we have not yet deployed a single of your applications, because these land on the worker nodes. This is at least one server but needs at least two for high availability. All workers have an important process called the Kubelet. The Kubelet makes communicating with the control plane possible and creates the pods for workloads. A pod is the smallest unit in Kubernetes, consisting of one or more containers.

Explaining all ins and outs is waaay to much to fit in a single post, but these pods are the building blocks of Kubernetes. As said, inside the pods there is one or more containers that of course holds a part of your app. When the container dies inside the pod, the pod is automatically restarted. When the pod dies, a new one is created. This is for all pods that are part of another Kubernetes construct, called a ‘deployment’. These are being tracked by the earlier mentioned controller manager and its reconciliation loops, making sure the actual state of the cluster is the same as the desired state.
That’s actually a very important concept that came up here: Kubernetes is based on a declarative approach, meaning we tell it what we would like the configuration to be, and it must figure out on its own how to get to that desired state. We feed it YAML like this:
				
					apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

				
			
And Kubernetes figures out the rest. Besides pods and deployments there is of course way more like services, ingresses, validating webhooks, CRDs, etc. Maybe something to elaborate on another time.

A last important concept I want to mention, is networking in Kubernetes. Every pod has a unique IP address, so how does this work on existing networks with existing IP ranges, subnets, VLANs and other networky stuff? Well, there is a lot of network magic going on in and around Kubernetes and I want to cover two:

  1. Container vs pod networking
  2. Networking overlays

Container vs pod networking. Why did the Kubernetes developers feel the need to put containers in pods? Over time the ability to put a helper container (more often called a sidecar container) with the main container in a pod has proven to be very useful, but this is not why containers were put into pods. This was done because of networking restrictions, more specifically because of networking ports.

Every host can only have one service listening on a specific port. So, let’s take a Nginx container that is by default listening on port 80. You can map any free host port to this port 80. For Nginx container ABC you can map host port 8080 to Nginx container port 80 and for Nginx container XYZ you can map host port 8081 to Nginx container port 80. Doing it like this is technically possible but has many drawbacks when the system grows and in large distributed systems it is virtually impossible.

Then there are pods. They all have a unique IP address which is, by default, reachable from all other pods in the cluster. A pod in fact can even be seen as a separate, mini virtual machine. It can be seen as a separate host with its own range of ports to allocate. When a pod is created on a host, it gets its own network namespace and its own virtual ethernet connection. This is an important concept, because of this the above explained port restriction doesn’t apply to pods.

Containers in the same pod, so a main container and its sidecar containers, can talk to each other via localhost in its own, already mentioned, network namespace.

Networking overlays. Basically, there are two options when you’re dealing with networking in Kubernetes. You’ve got your flat routed, unencapsulated networking model, as you probably know it from traditional networking. In this scenario pods get an IP address directly on the host network, in this way avoiding extra hops and encapsulation for maximal performance and simplicity. The Azure CNI is an example here, where we will be playing with in a future post.

The other option, as shown above, is overlay networking. Although this is not a Kubernetes construct, it certainly popularized it in recent years. In this model pods still receive a unique IP address, but it gets one on the overlay network, which has its own IP range and is completely isolated from the rest of the network.

Flannel is the easiest one to get started in a dev environment. More production ready solutions include Calico and Weave Net. Be sure to check out the official Kubernetes documentation on most of them.

Final word on this, is that this also a fast-moving market. Plugins come and go; new features are being adopted fast. I find it very interesting seeing this evolving from something basic but usable like Flannel to something as awesome as Cilium, which pushes eBPF as the next huge thing (which it probably is) for Kubernetes.

Concluding

Kubernetes is awesome but complex. I could have diverted for hundreds of words on different topics along the way, but we must keep it compact somehow. I’m sure this is useful to give you a good grasp on the basis. And maybe soon I’ll elaborate some more on Kubernetes YAMLs you can feed to the cluster, storage, ingresses, Custom Resource definitions, etc. But I think you believe me now when I say that all these topics warrant a blog post on its own.