Autoscaling-jiriaf

From epsciwiki
Jump to navigation Jump to search

Supporting Horizontal Pod Autoscaling (HPA) in Kubernetes

Introduction

This document provides essential insights and solutions for the effective implementation of Horizontal Pod Autoscaling (HPA) in Kubernetes, specifically for VK. It emphasizes the importance of VK establishing accurate pod conditions, crucial for the optimal functioning of HPA.

Understanding Autoscaling through Code Analysis

The HPA mechanism relies heavily on specific Kubernetes code to evaluate pod readiness, especially concerning CPU resource scaling. The following snippet from the [Kubernetes source code|https://github.com/kubernetes/kubernetes/blob/v1.29.3/pkg/controller/podautoscaler/replica_calculator.go#L378] illustrates this process:

if resource == v1.ResourceCPU {
    var unready bool
    _, condition := podutil.GetPodCondition(&pod.Status, v1.PodReady)
    if condition == nil || pod.Status.StartTime == nil {
        unready = true
    } else {
        if pod.Status.StartTime.Add(cpuInitializationPeriod).After(time.Now()) {
            unready = condition.Status == v1.ConditionFalse || metric.Timestamp.Before(condition.LastTransitionTime.Time.Add(metric.Window))
        } else {
            unready = condition.Status == v1.ConditionFalse && pod.Status.StartTime.Add(delayOfInitialReadinessStatus).After(condition.LastTransitionTime.Time)
        }
    }
    if unready {
        unreadyPods.Insert(pod.Name)
        continue
    }
}

This critical piece of logic helps ensure that only ready and appropriately initialized pods are considered for scaling actions based on CPU usage.

Implementing Correct Pod Conditions

For HPA to function as intended, it's crucial to correctly set pod conditions upon creation and update their status based on lifecycle events accurately.

Pod Creation (CreatePod)

The initial conditions for running and failed pods need to reflect their true state to avoid misinterpretation by the HPA logic.

  • startTime is the time when the pod was created.
  • The podReady status is determined by the current phase of the pod:
    • If a pod has failed, podReady is set to False.
    • If a pod is currently running, podReady is set to True.
  • The conditions of the pod are updated as follows:
pod.Status.Conditions = []v1.PodCondition{
  {
    Type:               v1.PodScheduled,
    Status:             v1.ConditionTrue,
    LastTransitionTime: startTime,
  },
  {
    Type:               v1.PodReady,
    Status:             podReady,
    LastTransitionTime: startTime,
  },
  {
    Type:               v1.PodInitialized,
    Status:             v1.ConditionTrue,
    LastTransitionTime: startTime,
  },
}

Retrieving Pods (GetPods)

The operation of a pod is heavily dependent on its readiness status. This status is encapsulated by the podReady variable. Another significant attribute is LastTransitionTime, which records the time of the last status change.

  • prevPodStartTime is equivalent to startTime in the CreatePod method.
  • prevContainerStartTime[pod.Spec.Containers[0].Name] denotes the start time of the first container in the pod. This holds true even for multiple containers, as they all initiate simultaneously.
  • The podReady status is determined by the current phase of the pod:
    • If a pod has either failed or succeeded, podReady is set to False.
    • If a pod is currently running, podReady is set to True.
  • The conditions of the pod are updated as follows:
Conditions: []v1.PodCondition{
  {
    Type:   v1.PodScheduled,
    Status: v1.ConditionTrue,
    LastTransitionTime: *prevPodStartTime,
  },
  {
    Type:   v1.PodInitialized,
    Status: v1.ConditionTrue,
    LastTransitionTime: *prevPodStartTime,
  },
  {
    Type:   v1.PodReady,
    Status: podReady,
    LastTransitionTime: prevContainerStartTime[pod.Spec.Containers[0].Name],
  },
}

Conclusion

Understanding and implementing pod condition checks correctly is crucial for effective use of Horizontal Pod Autoscaling in Kubernetes. By ensuring accurate status and condition reporting, we can enhance the reliability and efficiency of autoscaled deployments.


Testing Upscaling and Downscaling of Pods for VK using HPA of Kubernetes

This document describes the process of testing the upscaling and downscaling of pods for VK using the Horizontal Pod Autoscaler (HPA) of Kubernetes.

Setup

The test setup involves a HTTP load balancer implemented in Go (`load_balancer.go`). This load balancer redirects HTTP requests to multiple HTTP servers, each implemented in Go (`server.go`).

The deployment of Kubernetes is defined by this HTTP server. This means that the number of replicas or pods creates several HTTP servers. The scaling of these pods is managed by the HPA.

Load Generation

The load of HTTP requests is generated by the `hey` application, which is invoked by the bash script `add-load.sh`.

Test Results

The results of the test demonstrate that the HPA works for VK, including the upscaling and downscaling of pods. When the load increases, the HPA increases the number of pods to handle the load (upscaling). When the load decreases, the HPA reduces the number of pods (downscaling) after five mins from the last operation.

Load Balancer

The load balancer is implemented in Go and is defined in `load_balancer.go`. It maintains a list of servers and forwards incoming requests to these servers in a round-robin fashion. If a server is down, it is removed from the list. New servers can be added to the list through the `/register` endpoint.

Here is the complete code for the load balancer:

package main

import (
    "container/list"
    "net/http/httputil"
    "net/http"
    "net/url"
    "net"
    "io/ioutil"
)

var servers *list.List

func helloHandler(w http.ResponseWriter, r *http.Request) {
    for e := servers.Front(); e != nil; e = e.Next() {
        server := e.Value.(*url.URL)

        conn, err := net.Dial("tcp", server.Host)
        if err != nil {
            // remove the server from the list
            next := e.Next()
            servers.Remove(e)
            e = next
            continue
        }

        conn.Close()

        proxy := httputil.NewSingleHostReverseProxy(server)
        proxy.ServeHTTP(w, r)

        // Move the server to the back of the list
        servers.MoveToBack(e)
        break
    }

    if servers.Len() == 0 {
        http.Error(w, "No servers available", http.StatusInternalServerError)
        return
    }
}

func registerHandler(w http.ResponseWriter, r *http.Request) {
    serverURL, err := url.Parse(r.URL.Query().Get("url"))
    if err != nil {
        http.Error(w, "Invalid server URL", http.StatusBadRequest)
        return
    }

    servers.PushBack(serverURL)
}

func listServersHandler(w http.ResponseWriter, r *http.Request) {
    for e := servers.Front(); e != nil; e = e.Next() {
        server := e.Value.(*url.URL)
        w.Write([]byte(server.String() + "\n"))
    }
}

func main() {
    servers = list.New()

    http.HandleFunc("/", helloHandler)
    http.HandleFunc("/register", registerHandler)
    http.HandleFunc("/list", listServersHandler)
    http.ListenAndServe(":8080", nil)
}

HTTP Server

The HTTP server is implemented in Go and is defined in `server.go`. It has a single endpoint `/` which responds with "Hello, World!" The server registers itself with the load balancer upon startup.

Here is the complete code for the server:

package main

import (
	"fmt"
	"net/http"
	"net/url"
	"os"
	"net"
)

func helloHandler(w http.ResponseWriter, r *http.Request) {
	fmt.Fprint(w, "Hello, World!")
}

func main() {
	http.HandleFunc("/", helloHandler)

	listener, err := net.Listen("tcp", "localhost:0") 
	if err != nil {
		fmt.Fprintf(os.Stderr, "Failed to listen: %v\n", err)
		os.Exit(1)
	}

	serverURL := url.URL{
		Scheme: "http",
		Host:   listener.Addr().String(),
	}

	go func() {
		_, err = http.Get("http://localhost:8080/register?url=" + url.QueryEscape(serverURL.String()))
		if err != nil {
			fmt.Fprintf(os.Stderr, "Failed to register with load balancer: %v\n", err)
			os.Exit(1)
		}
	}()

	fmt.Printf("Server is listening on %s\n", listener.Addr().String())
	http.Serve(listener, nil)
}

Load Generation

`hey` is a HTTP load generator that is used to generate the load. To install `hey`, run the following command: go install github.com/rakyll/hey@latest.

Here is the bash script `add-load.sh` that generates the load.

#!/bin/bash
./hey -n 3000000 -c 1 http://localhost:8080/

Deployment and HPA yaml files for Kubernetes

Here is the deployment file `deployment.yaml` for the HTTP server:

kind: ConfigMap
apiVersion: v1
metadata:
  name: http-server
data:
  http-server.sh: |
    #!/bin/bash
    $SERVER_BIN/server

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: c
spec:
  selector:
    matchLabels:
      app: c
  template:
    metadata:
      labels:
        app: c
    spec:
      containers:
        - name: c1
          image: http-server
          command: ["bash"]
          args: [""]
          env:
            - name: SERVER_BIN
              value: ~/JIRIAF/virtual-kubelet-cmd/test-run/HPA/load
          volumeMounts:
            - name: http-server
              mountPath: stress/job1
          resources:
            requests:
              cpu: "1"
              memory: "7Mi"
            limits:
              cpu: "8"
              memory: "10Mi"

      volumes:
        - name: http-server
          configMap:
            name: http-server
      nodeSelector:
        kubernetes.io/role: agent
        kubernetes.io/hostname: vk
      tolerations:
        - key: "virtual-kubelet.io/provider"
          operator: "Equal"
          value: "mock"
          effect: "NoSchedule"
      restartPolicy: Always

Here is the HPA file `hpa.yaml`:

apiVersion: autoscaling/v2
kind: HorizontalPod Autoscaler

metadata:
  name: c
  namespace: default

spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: c
  minReplicas: 1
  maxReplicas: 10
  metrics:
  # - type: Resource
  #   resource:
  #     name: memory
  #     target:
  #       type: Utilization
  #       averageUtilization: 50
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30

Horizontal Pod Autoscaler (HPA) Formula Explanation

This provides an explanation of the formula used by Kubernetes' Horizontal Pod Autoscaler (HPA) to determine the desired number of pod replicas based on current metrics compared to target metrics.

HPA Replica Calculation Formula

The Horizontal Pod Autoscaler calculates the desired number of replicas using the following formula:

Desired Replicas = ceil[Current Replicas * (Current Metric / Target Metric)]
  • Desired Replicas is the number of replicas HPA aims to maintain for a particular deployment or replication controller, based on the current load.
  • Current Replicas is the current number of replicas in the deployment.
  • Current Metric is the current value of the metric being used for autoscaling (e.g., CPU utilization, memory usage).
  • Target Metric is the desired target value for that metric, as specified in the HPA configuration.

The formula adjusts the number of replicas dynamically to meet the target metric value, ensuring that the deployment scales up or down based on the actual demand.

Example

Assume you have an application deployed with HPA configured to maintain a CPU utilization of 50%. If the current CPU utilization is 100% and there are 4 current replicas, the formula for calculating the desired replicas would be:

Desired Replicas = ceil[4 * (100 / 50)] = ceil[8] = 8

This calculation suggests that to achieve the target CPU utilization of 50%, the number of replicas should be increased to 8.

Conclusion

Understanding this formula helps in configuring HPA appropriately and ensuring that your deployments are scaled efficiently according to the real-time demand or load on your application.