Kubernetes in Production Essentials - Cloud infrastructure (1/6)
In a previous article we focused on the 6 most important things you need to run Kubernetes in production. In order to get a more in-depth overview on setting up a production-ready Kubernetes cluster, this is the first article of a more-detailed series on how to run Kubernetes in production.
One of the 6 categories is infrastructure. When you setup a Kubernetes cluster, that is the place you will start. This article will focus on the Cloud Infrastructure that is necessary for a production-ready Kubernetes stack, so you will (hopefully) get it right the first time.
Setting up Kubernetes in the cloud can be done using just a few clicks in the cloud portal of your choice. Azure makes this even simpler than AWS. However, if you want a mature, production-ready stack, manual actions in a portal, console, or any other type of UI is not the way to go. Also, the default Kubernetes setup (e.g. from a networking-perspective) is unfortunately not the most secure way. You need to design your IaC setup, networking strategy, node pools and surrounding infrastructure when you create a Kubernetes cluster.
1 - Use Infrastructure as Code
First of all, Infrastructure as Code should be your only deployment strategy if you are working with cloud infrastructure. Don’t use a User Interface to create production-infrastructure. Also, try to stay away from scripting your Infrastructure using cloud APIs directly or using cloud CLIs, as you would need to write code logic for the entire CRUD (Create, Read, Update, Delete) of your infrastructure. The best way is to declaratively setup your Infrastructure, only specifying the desired end-state.
There are many tools that do Infrastructure as Code. Terraform is the most-used open source IaC tool and has the best documentation to get started quickly. Pulumi can be a good alternative if you would like a developer-focused IaC tool. A more recent and promising IaC solution is Cross Plane, which is completely based on Kubernetes and has a great potential as a GitOps-style IaC solution. It even runs Terraform. But before you go this route, make sure you do extensive testing. At this moment, Terraform is the most mature and widely-used solution.
Build infrastructure deployment Pipelines
Having IaC is already a great benefit, but to be completely code-driven, make sure you automate the deployment as much as possible. Use (semi-)automated pipelines for Infrastructure deployments, and don’t forget to treat your Infrastructure “as code”, meaning you should autotest your Infrastructure in your deployment pipeline (e.g. using Terratest).
Use Blue/Green deployments
Undoubtedly the best way to deploy new versions of modern infrastructure is by recreating it. For example, zero-downtime infrastructure updates can be done using blue/green deployments. This way, you can treat your Kubernetes clusters as Cattle, meaning your Kubernetes cluster can be recreated anytime if not healthy. Be sure to test your blue / green deployment strategy frequently to avoid configuration drift.
2 - Have a networking strategy
Kubernetes is probably not the only part of your technology stack. You might have existing VMs, databases, VPNs, or even an entire on-premise network. This means that Kubernetes needs to integrate with your current Network.
IP Address planning
Most managed cloud services (like AKS or EKS) come with a default network with default IP range (often 10.0.0.0/16). If you stick with the cloud defaults, you will have a hard time peering networks. This often results in complex and confusing NAT rules to route traffic to overlapping networks. It is better to get it right the first time by setting up a good network architecture with predefined CIDR blocks. Make sure your private IP Addresses do not overlap, also if the networks are not peered (yet).
You probably know that Kubernetes consists of a control plane (also referred to as master nodes) and nodes where you run your applications (worker nodes). By default, in Azure (AKS) and AWS (EKS) your control plane API will be exposed publicly to the internet, which is an unnecessary security risk. In a recent scan, 380.000 out of 450.000 analyzed clusters were exposed publicly to the internet. This increases the attack surface and should be avoided. Read more on how to fix this on AKS or EKS. A VPN or IP Whitelisting can give engineers access to the Kubernetes API, while blocking all other external traffic.
Another part that can be easily missed is the management applications that expose APIs and UIs (ArgoCD, Prometheus, Grafana, Kubernetes Dashboard, etc.). Make sure these applications are only privately accessible, either using IP Whitelisting or a VPN. Needless to say, you should also secure them using Authentication & Authorization mechanisms.
Ingress & Egress traffic
The best-practice for public-facing APIs and UIs in your Kubernetes cluster is to use an Ingress Controller, linked to a Cloud LoadBalancer, IP and DNS. This way, you can centrally manage all incoming traffic with the according networking policies & ingress rules. Make sure these components are setup securely, and your Ingress Controller is patched regularly.
By default, in the cloud Kubernetes containers can communicate with the internet. However, some companies (like enterprise companies) have more strict security standards. Restricting internet traffic from your apps is a very common requirement. If this is the case, design your Firewall rules accordingly. This can also be arranged inside Kubernetes using Network Policies.
3 - Design your Node Groups / Pools
Kubernetes clusters can be compared with a living organism. Like cells, apps are continuously moving around over your nodes (VMs). they can be killed, or scale up and down. You need to architect your apps and clusters for this continuous movement. Think about your applications and their requirements like CPU, Memory, Storage, Autoscaling and High Availability.
Autoscaling can be implemented at multiple levels, both on application and node level. It has multiple benefits. Firstly, it will save costs because you are only paying for the resources that you use. Secondly, your applications will become more mature. Applications have to deal with being killed, automatically giving you a more robust architecture.
Kubernetes is meant to be “up”, all the time. If you want to prevent downtime in case of a failure, make sure you setup your stack Highly Available on multiple failure-levels:
Run multiple replicas of your application - multiple pods (Pod-level)
Deploy multiple nodes to your cluster (Node-level)
Deploy your nodes in multiple Availability Zones (AZ-level)
4 - Connect external cloud services
A managed Kubernetes cluster should never be a standalone service. There are many managed (cloud) services that can be used to offload certain functionality. A lot of things should not be managed within the cluster, as this gives a big maintenance overhead. Some examples of external services are listed below:
Identity & Access Management (IAM)
Alert notification platforms like Slack, PagerDuty, etc.
Git repositories for GitOps deployments
SSL Certificate issuers (e.g. LetsEncrypt)
In order to integrate with these external (cloud) services, you need to setup Kubernetes Service Accounts with the proper access and install Kubernetes Operators like ArgoCD or ExternalSecrets. With all these external tools, make sure you follow the principle of least privilege for security reasons.
It takes some planning and design to setup a production-grade cloud infrastructure for Kubernetes. First of all, make sure you have a declarative Infrastructure as Code setup for all your cloud infrastructure. Secondly, think of your networking in advance, so you won't run into issues later on. At last, make sure your Node Groups are setup correctly and you have a proper integration with external cloud services.
Please don't reinvent the wheel. There are a lot of examples and best-practices that you can use to setup a good architecture. To save precious time, and get straight to application development, at Pionative we can get you a complete production-ready Kubernetes stack in a week using our Code Libraries. Feel free to plan a demo, It'll be worth your time!