Autoscaling-jiriaf
Supporting Horizontal Pod Autoscaling (HPA) in Kubernetes
Introduction
This document provides essential insights and solutions for the effective implementation of Horizontal Pod Autoscaling (HPA) in Kubernetes, specifically for VK. It emphasizes the importance of VK establishing accurate pod conditions, crucial for the optimal functioning of HPA.
Understanding Autoscaling through Code Analysis
The HPA mechanism relies heavily on specific Kubernetes code to evaluate pod readiness, especially concerning CPU resource scaling. The following snippet from the Kubernetes source code illustrates this process:
if resource == v1.ResourceCPU {
var unready bool
_, condition := podutil.GetPodCondition(&pod.Status, v1.PodReady)
if condition == nil || pod.Status.StartTime == nil {
unready = true
} else {
if pod.Status.StartTime.Add(cpuInitializationPeriod).After(time.Now()) {
unready = condition.Status == v1.ConditionFalse || metric.Timestamp.Before(condition.LastTransitionTime.Time.Add(metric.Window))
} else {
unready = condition.Status == v1.ConditionFalse && pod.Status.StartTime.Add(delayOfInitialReadinessStatus).After(condition.LastTransitionTime.Time)
}
}
if unready {
unreadyPods.Insert(pod.Name)
continue
}
}
This critical piece of logic helps ensure that only ready and appropriately initialized pods are considered for scaling actions based on CPU usage.
Implementing Correct Pod Conditions
For HPA to function as intended, it's crucial to correctly set pod conditions upon creation and update their status based on lifecycle events accurately.
Pod Creation (CreatePod)
The initial conditions for running and failed pods need to reflect their true state to avoid misinterpretation by the HPA logic.
- startTime is the time when the pod was created.
- The podReady status is determined by the current phase of the pod:
- If a pod has failed, podReady is set to False.
- If a pod is currently running, podReady is set to True.
- The conditions of the pod are updated as follows:
pod.Status.Conditions = []v1.PodCondition{
{
Type: v1.PodScheduled,
Status: v1.ConditionTrue,
LastTransitionTime: startTime,
},
{
Type: v1.PodReady,
Status: podReady,
LastTransitionTime: startTime,
},
{
Type: v1.PodInitialized,
Status: v1.ConditionTrue,
LastTransitionTime: startTime,
},
}
Retrieving Pods (GetPods)
The operation of a pod is heavily dependent on its readiness status. This status is encapsulated by the podReady variable. Another significant attribute is LastTransitionTime, which records the time of the last status change.
- prevPodStartTime is equivalent to startTime in the CreatePod method.
- prevContainerStartTime[pod.Spec.Containers[0].Name] denotes the start time of the first container in the pod. This holds true even for multiple containers, as they all initiate simultaneously.
- The podReady status is determined by the current phase of the pod:
- If a pod has either failed or succeeded, podReady is set to False.
- If a pod is currently running, podReady is set to True.
- The conditions of the pod are updated as follows:
Conditions: []v1.PodCondition{
{
Type: v1.PodScheduled,
Status: v1.ConditionTrue,
LastTransitionTime: *prevPodStartTime,
},
{
Type: v1.PodInitialized,
Status: v1.ConditionTrue,
LastTransitionTime: *prevPodStartTime,
},
{
Type: v1.PodReady,
Status: podReady,
LastTransitionTime: prevContainerStartTime[pod.Spec.Containers[0].Name],
},
}
Conclusion
Understanding and implementing pod condition checks correctly is crucial for effective use of Horizontal Pod Autoscaling in Kubernetes. By ensuring accurate status and condition reporting, we can enhance the reliability and efficiency of autoscaled deployments.
Horizontal Pod Autoscaler (HPA) Formula Explanation
This provides an explanation of the formula used by Kubernetes' Horizontal Pod Autoscaler (HPA) to determine the desired number of pod replicas based on current metrics compared to target metrics.
HPA Replica Calculation Formula
The Horizontal Pod Autoscaler calculates the desired number of replicas using the following formula:
Desired Replicas = ceil[Current Replicas * (Current Metric / Target Metric)]
- Desired Replicas is the number of replicas HPA aims to maintain for a particular deployment or replication controller, based on the current load.
- Current Replicas is the current number of replicas in the deployment.
- Current Metric is the current value of the metric being used for autoscaling (e.g., CPU utilization, memory usage).
- Target Metric is the desired target value for that metric, as specified in the HPA configuration.
The formula adjusts the number of replicas dynamically to meet the target metric value, ensuring that the deployment scales up or down based on the actual demand.
Example
Assume you have an application deployed with HPA configured to maintain a CPU utilization of 50%. If the current CPU utilization is 100% and there are 4 current replicas, the formula for calculating the desired replicas would be:
Desired Replicas = ceil[4 * (100 / 50)] = ceil[8] = 8
This calculation suggests that to achieve the target CPU utilization of 50%, the number of replicas should be increased to 8.
Conclusion
Understanding this formula helps in configuring HPA appropriately and ensuring that your deployments are scaled efficiently according to the real-time demand or load on your application.
Evaluating Pod Scaling for VK using Kubernetes' Horizontal Pod Autoscaler and Metrics Server
This document describes the process of testing the upscaling and downscaling of pods for VK using the Horizontal Pod Autoscaler (HPA) of Kubernetes. The metrics used for the HPA are CPU and memory from metrics-server.
Setup
The test setup involves a HTTP load balancer implemented in Go (`load_balancer.go`). This load balancer redirects HTTP requests to multiple HTTP servers, each implemented in Go (`server.go`).
The deployment of Kubernetes is defined by this HTTP server. This means that the number of replicas or pods creates several HTTP servers. The scaling of these pods is managed by the HPA.
Load Generation
The load of HTTP requests is generated by the `hey` application, which is invoked by the bash script `add-load.sh`.
Test Results
The results of the test demonstrate that the HPA works for VK, including the upscaling and downscaling of pods. When the load increases, the HPA increases the number of pods to handle the load (upscaling). When the load decreases, the HPA reduces the number of pods (downscaling) after five mins from the last operation.
Load Balancer
The load balancer is implemented in Go and is defined in `load_balancer.go`. It maintains a list of servers and forwards incoming requests to these servers in a round-robin fashion. If a server is down, it is removed from the list. New servers can be added to the list through the `/register` endpoint.
Here is the complete code for the load balancer:
package main
import (
"container/list"
"net/http/httputil"
"net/http"
"net/url"
"net"
"io/ioutil"
)
var servers *list.List
func helloHandler(w http.ResponseWriter, r *http.Request) {
for e := servers.Front(); e != nil; e = e.Next() {
server := e.Value.(*url.URL)
conn, err := net.Dial("tcp", server.Host)
if err != nil {
// remove the server from the list
next := e.Next()
servers.Remove(e)
e = next
continue
}
conn.Close()
proxy := httputil.NewSingleHostReverseProxy(server)
proxy.ServeHTTP(w, r)
// Move the server to the back of the list
servers.MoveToBack(e)
break
}
if servers.Len() == 0 {
http.Error(w, "No servers available", http.StatusInternalServerError)
return
}
}
func registerHandler(w http.ResponseWriter, r *http.Request) {
serverURL, err := url.Parse(r.URL.Query().Get("url"))
if err != nil {
http.Error(w, "Invalid server URL", http.StatusBadRequest)
return
}
servers.PushBack(serverURL)
}
func listServersHandler(w http.ResponseWriter, r *http.Request) {
for e := servers.Front(); e != nil; e = e.Next() {
server := e.Value.(*url.URL)
w.Write([]byte(server.String() + "\n"))
}
}
func main() {
servers = list.New()
http.HandleFunc("/", helloHandler)
http.HandleFunc("/register", registerHandler)
http.HandleFunc("/list", listServersHandler)
http.ListenAndServe(":8080", nil)
}
HTTP Server
The HTTP server is implemented in Go and is defined in `server.go`. It has a single endpoint `/` which responds with "Hello, World!" The server registers itself with the load balancer upon startup.
Here is the complete code for the server:
package main
import (
"fmt"
"net/http"
"net/url"
"os"
"net"
)
func helloHandler(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello, World!")
}
func main() {
http.HandleFunc("/", helloHandler)
listener, err := net.Listen("tcp", "localhost:0")
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to listen: %v\n", err)
os.Exit(1)
}
serverURL := url.URL{
Scheme: "http",
Host: listener.Addr().String(),
}
go func() {
_, err = http.Get("http://localhost:8080/register?url=" + url.QueryEscape(serverURL.String()))
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to register with load balancer: %v\n", err)
os.Exit(1)
}
}()
fmt.Printf("Server is listening on %s\n", listener.Addr().String())
http.Serve(listener, nil)
}
Load Generation
`hey` is a HTTP load generator that is used to generate the load. To install `hey`, run the following command: go install github.com/rakyll/hey@latest
.
Here is the bash script `add-load.sh` that generates the load.
#!/bin/bash
./hey -n 3000000 -c 1 http://localhost:8080/
Deployment and HPA yaml files for Kubernetes
Here is the deployment file `deployment.yaml` for the HTTP server:
kind: ConfigMap
apiVersion: v1
metadata:
name: http-server
data:
http-server.sh: |
#!/bin/bash
$SERVER_BIN/server
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: c
spec:
selector:
matchLabels:
app: c
template:
metadata:
labels:
app: c
spec:
containers:
- name: c1
image: http-server
command: ["bash"]
args: [""]
env:
- name: SERVER_BIN
value: ~/JIRIAF/virtual-kubelet-cmd/test-run/HPA/load
volumeMounts:
- name: http-server
mountPath: stress/job1
resources:
requests:
cpu: "1"
memory: "7Mi"
limits:
cpu: "8"
memory: "10Mi"
volumes:
- name: http-server
configMap:
name: http-server
nodeSelector:
kubernetes.io/role: agent
kubernetes.io/hostname: vk
tolerations:
- key: "virtual-kubelet.io/provider"
operator: "Equal"
value: "mock"
effect: "NoSchedule"
restartPolicy: Always
Here is the HPA file `hpa.yaml`:
apiVersion: autoscaling/v2
kind: HorizontalPod Autoscaler
metadata:
name: c
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: c
minReplicas: 1
maxReplicas: 10
metrics:
# - type: Resource
# resource:
# name: memory
# target:
# type: Utilization
# averageUtilization: 50
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30