# Autoscale Cluster With Horizontal Pod Autoscaling DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI. [Cluster Autoscaling (CA)](https://docs.digitalocean.com/products/kubernetes/how-to/autoscale/index.html.md) manages the number of nodes in a cluster. It monitors the number of idle pods, or unscheduled pods sitting in the `pending` state, and uses that information to determine the appropriate cluster size. [Horizontal Pod Autoscaling (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) adds more pods and replicas based on events like sustained CPU spikes. HPA uses the spare capacity of the existing nodes and does not change the cluster’s size. CA and HPA can work in conjunction: if the HPA attempts to schedule more pods than the current cluster size can support, then the CA responds by increasing the cluster size to add capacity. These tools can take the guesswork out of estimating the needed capacity for workloads while controlling costs and managing cluster performance. In this tutorial, you deploy an example application that simulates workloads so you can see how the interaction between the CA and the HPA works, both when scaling up in response to demand and scaling down as load decreases. **Note**: Autoscaling works best with stateless applications with multiple instances of the application accepting traffic in parallel. We do not recommend using this tutorial’s autoscaling strategy for more complex processes. For example, for processes such as database autoscaling, this strategy cannot account for race conditions, data integrity, data synchronization, and constant additions and removals of database cluster members. ## Prerequisites To run the example application, you need to set up two tools: 1. [Install `doctl`](https://github.com/digitalocean/doctl), the DigitalOcean command-line tool, v1.32.2 or higher. 2. [Install `kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/), the Kubernetes command-line tool. ## Set Up Autoscaling-Enabled Cluster You can [enable autoscaling](https://docs.digitalocean.com/products/kubernetes/how-to/autoscale/index.html.md) on an existing cluster for this tutorial. Alternatively, once you have `doctl` and `kubectl`, create a new DigitalOcean Kubernetes cluster with autoscaling enabled: ```shell doctl k8s cluster create mycluster \ --node-pool "name=mypool;auto-scale=true;min-nodes=1;max-nodes=10" ``` ## Install Metrics Server Tool Install the [DigitalOcean Kubernetes metrics server tool](https://marketplace.digitalocean.com/apps/kubernetes-metrics-server) from the DigitalOcean Marketplace so the HPA can monitor the cluster’s resource usage. Confirm that the metrics server is installed using the following command: ```shell kubectl top nodes ``` It takes a few minutes for the metrics server to start reporting the metrics. If your installation is successful, the command returns your pods’ CPU and memory statistics: Output ``` NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% mypool-3z4hs 369m 36% 783Mi 49% mypool-3z4tz 84m 8% 791Mi 50% mypool-3z520 425m 42% 917Mi 58% mypool-3z52d 341m 34% 937Mi 59% mypool-3zhiq 324m 32% 856Mi 54% ``` **Note**: The `kubectl top node` command may report higher memory usage than the [**Insights** tab](https://docs.digitalocean.com/products/kubernetes/how-to/monitor-basic/index.html.md). This is because the metrics server collects data from [cAdvisor](https://github.com/google/cadvisor)(a component the kubelet uses to collect pod and container metrics). The data includes reclaimable buffer and cache memory and does not indicate the actual memory demand. The **Insights** tab excludes this reclaimable memory. ## Install CPU-spiking Service and HPA Deploy the CPU-spiking service and the HPA itself by using [`hpa.yaml`](https://docs.digitalocean.com/products/kubernetes/how-to/set-up-autoscaling/hpa.yaml), which defines a custom resource definition (CRD) for an HPA configured to scale to up to 20 replicas of any service on the cluster that experiences a CPU spike at or above 80%: ```shell kubectl apply -f ``` ## Test Autoscaling Test the autoscaling behavior by scheduling the load generator using [`load-generator.yaml`](https://docs.digitalocean.com/products/kubernetes/how-to/set-up-autoscaling/load-generator.yaml), which repeatedly sends requests to the CPU-spiking service: ```shell kubectl apply -f ``` As the load generator runs, you can check the status of the HPA and CA: ```shell kubectl describe hpa hello # Check HPA status kubectl get configmap cluster-autoscaler-status -n kube-system -oyaml # Check CA status ``` Continue checking the status of the HPA and CA. You can apply pressure to the cluster capacity by scaling up the load generator: ```shell kubectl scale deployment/load-generator --replicas 2 ``` After 5 minutes of sustained CPU spiking, the HPA starts scheduling more and more pods. Another 5 minutes after that, when the cluster runs out of capacity and the unscheduled pods start piling up, the CA kicks in to add more nodes. Conversely, you can scale down the load generator and watch the number of pods decrease in your workload: ```shell kubectl scale deployment/load-generator --replicas 1 ``` After 5 minutes of lowered CPU use, the HPA starts to delete unutilized pods. Another 5 minutes after that, the CA notices the excess capacity and begins scaling down the number of nodes in the cluster as well. ## Next Steps In this tutorial, you repeatedly sent CPU spikes to a DigitalOcean Kubernetes cluster and tested autoscaling both when scaling up in response to demand and scaling down as load decreased. You can customize many parts of this example’s configuration, including the kinds of events that trigger an action from the HPA and how long they need to last to trigger a response. In general, you need to configure the HPA to balance responsiveness (being sensitive enough for timely responses to load changes) against thrashing (being too sensitive and causing wild fluctuations). For more details on configuring HPAs, see [Horizontal Pod Autoscaler in the Kubernetes documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/).