Kubernetes cluster usually has multiple applications running on it. One of the best practices for running workloads on Kubernetes is to ensure appropriate memory and CPU are allocated to the pods running in the cluster. This is usually hard to achieve because no engineer knows the exact resources an app needs to run. The simplest or fastest thing to do is to guess based on some previous experience. For example, an application written in Java/JVM will require between 4GB-8GB of memory to startup and run, whereas a Golang or Rust-based application does not need that much resource to get up and running. But guessing resources is not a very good way to determine the right resources to allocate for applications running in the cluster. Ensuring that pods have the appropriate resources allocated determines the performance and stability of the application. Apart from optimal resource allocation to ensure application stability, over-provisioning of resources needs to be adequately curtailed to avoid wastage in resources and improved carbon footprint. I also have a previous article to speaks to capacity management in Kubernetes here. Now let us see ways we can improve resource allocation of pods in Kubernetes and right-size adequately.
Pod Right-sizing
As mentioned earlier, it is almost impossible to know the resources an application will need when it is deployed for the first time in a Kubernetes cluster. But one trick that can be applied is to get the assumed metrics from the developer and double those metrics. This could be a good way to start, which I would call “informed-guessing”. When a developer is running tests locally, they can get metrics of the resource consumption by the application. For example, if an application uses 200m CPU and 500MB Memory to run locally, that can be doubled to 400m and 1GB respectively. This can be allocated to the pod at the initial stage (the best option for allocation will be to use the Guaranteed QoS). This does not mean the application will use these resources allocated, but is a safe starting point for resource allocation.
The pattern of access, the quality of code, will determine whether the initially allocated resources will be decreased or increased. This can not also be done via guessing, or any form of informed guessing. At this stage, the pod has been running in the cluster and there are some clear indications of how resources are being used. To get the appropriate resource to be allocated, a tool can be used. The tool is called Robusta KRR; which is an acronym for Robusta Kubernetes Resource Recommendation.
Robusta KRR is an open-source project, from the same company that owns the Robusta alert management system that works closely with Prometheus and Alertmanager. This is a command-line tool designed to optimize resource allocation in Kubernetes clusters. It collects pod usage metrics from Prometheus and provides recommendations for setting CPU and memory requests and limits. By doing so, it helps lower costs and enhance overall performance. Robusta has integrations with Prometheus, Victoria Metrics, Thanos, Amazon Managed Prometheus, Azure Managed Prometheus, Google Managed Prometheus, Grafana Cloud Managed Prometheus and Grafana Mimir. Cost savings for a cluster can also be posted in Slack for better visibility.
Since Robusta KRR is a CLI tool, it does not need to run in the Kubernetes cluster as an agent. The CLI tool runs on Mac, Windows, and Docker. There is the option to run it in Kubernetes as a job but it is not mandatory. Running it in Kubernetes. It can also be configured to send weekly, or monthly recommendations to a Slack channel.
See Robusta KRR Installation instructions here
When Robusta recommends resources, it is good to try them out. In my experience, it recommends the bare minimum, which means in the event of a spike, the application might need more resources than what is recommended. So take note of that fact when using the recommendations, KRR suggests.
Apart from the Robusta KRR, there is another tool that is good for pod right-sizing it is called Goldilocks. Goldilocks is a tool designed to help Kubernetes users optimize resource requests for CPU and memory. Goldilocks utilizes the Vertical Pod Autoscaler (VPA) in recommendation mode to provide suggested resource limits without automatically applying them. This allows teams to fine-tune their deployments, avoiding overallocation or underallocation of resources, which can lead to unnecessary costs or performance issues. By simplifying resource tuning, Goldilocks helps Kubernetes users strike the right balance, ensuring efficient and cost-effective workload management.
When the Pod right-sizing is fully sorted, the next resources that require right-sizing are the worker nodes, which power the pods.
Node Right-sizing
Node right-sizing involves using the right virtual machines for running the pods in the Kubernetes cluster. Similar to optimizing resources in pods, this can be applied to the nodes running in the Kubernetes cluster. The nodes host the pods, so the scalability of the nodes automatically affects the scalability of the pods and applications running in the Kubernetes cluster. Let us discuss the various right-sizing options that exist within a Kubernetes cluster:
Cluster Autoscaler: This is the first generation node autoscaling tool. Cluster autoscaler is a powerful tool that ensures the high-availability and scalability of the applications running within the cluster. When properly configured, the cluster autoscaler creates a new node whenever there is a pending pod in the cluster. A pending pod indicates that there are no more resources within the cluster for a specific pod to be scheduled. Cluster autoscaler automatically kicks in to create a new node and ensure the pending pod is adequately scheduled. This ensures the stability of the applications running in the pod.
In the same vein, cluster autoscaler can reduce the worker nodes, when it notices they are idle. It terminates the node to ensure unused nodes do not exist in the cluster, this way worker nodes are right-sized in the best way possible.
Even with the amazing features of cluster autoscaler, the right-sizing is not optimal enough. There are cases where the pod in a node might not be utilizing all the resources in the node, which leads to over-provisioning. There is another tool that solves this problem more efficiently, its name is Karpenter.
Karpenter: This helps to ensure that the nodes allocated are exactly what the pod needs to run efficiently based on the resource limit that has been configured for the pods. This means that the resource allocation for the pods is as appropriate as possible because, if the values are too high, Karpenter will select a larger instance size to accommodate the pod, thereby leading to resource over-provisioning for the worker nodes. Karpenter is unique and different from autoscaler because it can use different instance sizes based on the pod requirements running within the cluster. It ensures no pod is stuck on pending at any point in time. But instead of creating the same instance type to ensure the pod is scheduled, like for cluster autoscaler, it selects an appropriate instance type, which most times is different from the initial instance type it is running on
One important fact to note is that all pods running in a Kubernetes cluster will have unique resource requirements to run efficiently.
Conclusion
Right-sizing is essential for the stability of applications running in a Kubernetes cluster. It also helps to optimize resources and reduce the overall cost of running the Kubernetes cluster. This ensures that the pods get optimal resources allocated to them to run based on the load they are handling over time.