K8s Series : Part-4 : Scaling and Load Balancing with Kubernetes

Welcome back to our series on Kubernetes. This is part 4, where we discuss scaling and load balancing. In the earlier part, we discussed how to deploy a few apps that are very basic on AWS Elastic Kubernetes Service (EKS), with quite some manual intervention so far in the process. Now, it is time to see what we can do further to improve our applications in terms of performance and scalability. There are several approaches for Kubernetes to work with applications and services distributed in containers in a cluster of computers. The most important ones include Horizontal Pod Autoscaling and the Cluster Autoscaler. Besides, there are various ways to distribute the workload against a service. Right from the use of specific tools, often so-called Ingress controllers, to AWS Load Balancers, like Application Load Balancer (ALB), Network Load Balancer (NLB), and others. All these tools and approaches guarantee the smooth running of the applications under all conditions, regardless of the type of load.

Horizontal Pod Autoscaling (HPA)

Kubernetes Horizontal Pod Autoscaling is perhaps the most elemental capability of any technology that helps to handle varying workloads. Kubernetes makes it look easy by using HPA to scale, based on observed CPU utilisation or any other selected metric, the number of pod replicas in a deployment or replica set up and down received from the Metrics Server or an external custom metric source.

This is a list of the metrics that we can use with Kubernetes.

How HPA works?

Before we get into the other details and processes of HPA, you first need to know how HPA works. Ensure you read through the following points in this section to understand the workings of HPA along with its objectives and benefits.

- Metric Collection:

HPA gathers real-time performance indicators fetched from the Metrics Server that periodically looks at the CPU and memory in use by pods. This information is quite essential when making the right decisions in Kubernetes scaling.

- Decision Making:

HPA makes intelligent decisions to upscale or downscale the number of pod replicas based on collected metrics compared to predefined values. This enables your application to scale up when demand increases and scale down during resource conservation.

- Resource Efficiency:

HPA’s main objective is to maximize resource utilization. HPA can auto-tune pod counts to ensure resources are not used up by underutilized pods, ensuring the cluster is not starved of resources. At the same time, it ensures that performance objectives are met and load increases are handled efficiently.

For developers, HPA would define those metrics and thresholds within the HPA configuration. By using HPA, you ensure that your apps are responsive, even under heavy workloads. The Horizontal Pod Autoscaler automatically scales the number of such pods in a deployment based on observed CPU utilisation or certain selected metrics.

Here is how you set up HPA in AWS EKS:

HPA in AWS EKS

Step 1: Deploy Metrics Server

First, ensure that the Metrics Server, which HPA uses to fetch metrics, is deployed in your cluster. You can install it using Helm.

helm repo add metrics-server
https://kubernetes-sigs.github.io/metrics-server/

helm install metrics-server metrics-server/metrics-server --set
args={"--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP}

The integration of a monitoring tool inside a Kubernetes (K8s) environment is covered in this paragraph. In a K8s environment, the monitoring tool is essential for managing and observing applications deployed within the cluster. A Helm repository is first added to make it easier to install Kubernetes-specific monitoring tools. After that, a Helm chart is used to install the Metrics Server. This part is essential for combining information on how resources are used, such as CPU and memory use, for every pod and node in the cluster. Metrics Server focuses mostly on resource efficiency and ensuring the cluster runs well rather than getting extensively into network connections or security standards. The general health of the Kubernetes system and efficient resource management depends on this configuration.

Step 2: Define an HPA Resource

Create an HPA resource that targets deployment. Name the file hpa.yaml. Below is an example of an HPA configuration that scales based on CPU usage:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata: 
   name: example-hpa
   namespace: default
spec: 
   scaleTargetRef:
     apiVersion: apps/v1 
     kind: Deployment  
     name: example-deployment
   minReplicas: 1
   maxReplicas: 10
   targetCPUUtilizationPercentage: 50

Apply this configuration using kubectl:

kubectl apply -f hpa.yaml

This HPA will scale in a few replicas while the CPU utilisation is below 50% and scale them out when the CPU utilisation exceeds 50%. In simpler terms, the configuration helps with scaling the number of running copies of one specific application according to the power spent by the computer at that moment. Now, it will create more copies of the same application if one is using a lot of resources, which will tend to lessen the load and balance things around. If the application does not use much power, the count of copies will be reduced to save resources. The configuration will make the application run efficiently, reducing the wastage of energy and resources by dynamically scaling up and down according to demand.

Cluster Autoscaler

Generally, Kubernetes architecture is divided into two parts. First is the control plane and second is the nodes or compute machines. While HPA adjusts the number of pod replicas, the Cluster Autoscaler primarily focuses on the nodes. This is an important aspect of the Kubernetes architecture. It automatically adjusts the size of a Kubernetes cluster when it detects that:

- Pods fail to launch due to insufficient resources.

- Nodes are underutilised and could be consolidated into fewer nodes, reducing costs.

Cluster Autoscaler Benefits:

- Cost-Effective Scaling: The system automatically increases resources during times of high-demand times and decreases them during quieter periods, which helps manage costs effectively.

- Improved Resource Utilization: It ensures that resources are used efficiently, avoiding waste and maintaining optimal performance around the clock.

It is a key tool within cloud environments such as AWS, where the provisioning/de-provisioning of a node can be automated through integrations into services like EC2. The Cluster Autoscaler will adjust the number of nodes in your cluster according to demand. With AWS EKS, you must have your worker nodes inside an Auto Scaling Group.

Cluster Autoscaler in AWS EKS

Step 1: Enable Cluster Autoscaler

To enable Cluster Autoscaler, you must allow it to modify your Auto Scaling group. This is done by adding the following IAM policy to the role used by your worker nodes:

{
   "Effect": "Allow",  
   "Action":[     
      "autoscaling:DescribeAutoScalingGroups",     
      "autoscaling:DescribeAutoScalingInstances",     
      "autoscaling:DescribeLaunchConfigurations",    
      "autoscaling:DescribeTags",    
      "autoscaling:SetDesiredCapacity",   
      "autoscaling:TerminateInstanceInAutoScalingGroup"
   ], 
      "Resource":"*"
}

It is a series of instructions regarding the cloud computing service that stipulates what can be done vis-à-vis the management of the server resources. Operations that can be performed with this text are visualisation of details relating to groups of servers and individual servers, configuration checks and the review tags to organise and identify resources. Basically, it is almost on the line of giving someone the right to view and manage server activities in the cloud parts to ensure that the right number of servers is running without too much load to handle the workload but not too many to squander resources.

Step 2: Deploy Cluster Autoscaler

Deploy the Cluster Autoscaler to your cluster using Helm:

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler-chart --set 
autoDiscovery.clusterName=<Your-Cluster-Name>,awsRegion=<Your-region>,rbac.create=true --set image.tag=<Cluster-Autoscaler-Version>

Replace <Your-Cluster-Name>, <Your-Region> , and <Cluster-Autoscaler-Version> with your specific values.

Helm is a Kubernetes package manager used to set up and install the Cluster Autoscaler in a environment running Kubernetes. Initially, run helm “repo add autoscaler” to add a new Helm repository called “autoscaler” to your local Helm configuration. This repository includes the required Helm charts for setting up the Kubernetes autoscaler. The Kubernetes community maintains the authoritative source of these charts, which can be accessed using the above URL.

Cluster Autoscaler is installed with the second command. This particular deployment is identified within the Kubernetes cluster by its name, “cluster-autoscaler”. Installing the chart requires using the chart ‘autoscaler/cluster-autoscaler-chart’ from the previously installed repository. Setting the cluster name and AWS region tells the autoscaler which cluster it manages and where it is located. Furthermore, it allows the creation of RBAC resources, which are necessary for the Autoscaler to manage cluster resources with the required rights. Last but not least, it sets the picture tag to indicate the Cluster Autoscaler version to be utilized. This configuration maximises resource consumption and operational effectiveness by ensuring that the Cluster Autoscaler may dynamically adjust the number of nodes in the cluster in response to workload needs.

Load Balancing Strategies

Efficient load balancing improves application response and availability, which is why Kubernetes load balancer is of great help. Kubernetes offers many ways to do this, specifically using the Ingress controllers and the integrated cloud load balancers.

Ingress Controllers

Ingress controllers are the most flexible tool in the Kubernetes ecosystem for routing external traffic into the cluster toward internal services, following the constraints defined in the Ingress Resource.

Key Features:

- Traffic Management: This system routes incoming traffic to services relevant to the web address used, directing site visitors to the right service faster and more effectively.

- SSL/TLS Termination: It centralizes security settings for incoming connections to reduce load on individual services and increase general security.

AWS ALB and NLB

For applications deployed on AWS, leveraging native Kubernetes load balancers like ALB (Application Load Balancer) and NLB (Network Load Balancer) can enhance performance.

- ALB: This type of Kubernetes load balancer integrates well with complex web traffic to attempt to maximise data flow with security. It securely handles traffic routing and connection handling and also supports RTC-based functions.

- NLB: This load balancer is good for heavy data traffic. It will ensure that data is handled fast and effectively, making it ideal when speed and volume are of concern.

When you combine an AWS load balancer service with Kubernetes, you bring out the power in those scaling mechanisms of AWS and overlay the deployment strategies that Kubernetes brings.

Load Balancing Strategies in AWS EKS

Using AWS ALB Ingress Controller

AWS ALB acts as an Ingress controller within EKS. You need to set up the AWS ALB Ingress Controller in your cluster to use it.

Step 1: Deploy the ALB Ingress Controller

You can deploy the AWS ALB Ingress Controller using Helm:

helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller 
eks/aws-load-balancer-controller -n kube-system --set 
clusterName= --set serviceAccount.create=
true --set serviceAccount.name=aws-load-balancer-controller

The commands mentioned here are used to install the AWS Load Balancer Controller and add a Helm repository to a Kubernetes cluster managed by Amazon EKS (Elastic Kubernetes Service). The first command adds a Helm repository called “EKS” to your local Helm settings. It is “helm repo add eks”

AWS EKS Helm Repository This repository hosts the charts that AWS expressly provides for EKS, making it simple for customers to implement AWS-managed services in a Kubernetes environment.

With the help of the chart included in the “EKS” repository, the second command installs the AWS Load Balancer Controller. This installation is being done in the kube-system namespace, exclusive to Kubernetes-run system processes. There are several configurations for the configurations that are provided with the –set flags: The name of your Kubernetes cluster is specified by clusterName. Helm is given instructions to create a new Kubernetes service account called “aws-load-balancer-controller” if it does not already exist by setting the flags serviceAccount.create=true and serviceAccount.name=aws-load-balancer-controller. The AWS Load Balancer Controller then uses this service account, allowing it the permissions required to control AWS Load Balancers for the cluster. By ensuring that the Kubernetes cluster can use AWS Load Balancers to distribute traffic between pods, this configuration improves the scalability and availability of the application.

Step 2: Define an Ingress Resource

Here’s an example of defining an Ingress resource that uses the ALB Ingress Controller:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
   name: example-ingress
   annotations:
     kubernetes.io/ingress.class:"alb"
     alb.ingress.kubernetes.io/scheme: "internet-facing"
spec:
 rules:
 -http: 
   paths:   
   - path:/testpath  
     pathType: Prefix
     backend:    
       service:         
         name: test        
         port:        
           number: 80

This piece of code defines a system that channels internet traffic for an application to the right place. When a person goes on the internet to one part of a website (in this case, “/testpath”), the system directs that traffic to a service named ‘test’ running on that server. This setup will be custom-made to handle all the public internet traffic so that everybody trying to connect using this part of the website does it seamlessly and effectively. In layman’s terms, this is of major importance as it helps the website remain user-friendly and respond to all queries by visiting users.

Conclusion

Performance and reliability are important with any application, so this is actually done through gradual scaling and load balancing, which is crucial here. Kubernetes comes with a powerful set of tools and features to make these happen successfully. Whether you are auto-scaling your pods with HPA, right-sizing your cluster using the Cluster Autoscaler, or controlling network traffic flow with advanced strategies into your service, Kubernetes does it all.

For in-depth information and knowledge about Kubernetes scaling, load balancing, cloud computing and its services, cloud security, and more, visit the Cloudzenia website.

Nirav Raychura

Aug 14, 2024

K8s Series: Part-4: Scaling and Load Balancing with Kubernetes