# Kubernetes Best Practices

DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.

Kubernetes clusters require a balance of resources in both pods and nodes to maintain high availability and scalability. This article outlines some best practices to help you avoid common disruption problems.

## Use Replicas Instead of Bare Pods

We recommend deploying all of your applications in a highly available manner. This means [deploying multiple stable replicas](https://kubernetes.io/docs/concepts/workloads/) of your applications and not using [bare pods](https://kubernetes.io/docs/concepts/workloads/controllers/job/#bare-pods). Using replicas ensures that a stable set of pods are running your application at any given time.

### How do I do this?

Use the `replicas` field in your application spec to define at least three replicas:

An example app spec

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  # modify replicas according to your case
  replicas: 3
  selector:
    matchLabels:
      tier: frontend
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: gcr.io/google_samples/gb-frontend:v3
```

## Size Nodes Appropriately

The size of nodes determines the maximum amount of memory you can allocate to pods. Because of this, we recommend using nodes with less than 2 GB of allocatable memory only for development purposes and not production. For production clusters, we recommend sizing nodes large enough (2.5 GB or more) to absorb the workload of a down node.

### How do I do this?

During [cluster creation](https://docs.digitalocean.com/products/kubernetes/how-to/create-clusters/index.html.md), choose a plan from the **NODE PLAN** drop-down menu that fits your project’s purpose. Plans are divided into two categories: **Development plans** and **Production plans**.

## Size Nodes Pools For High Availability

To further ensure high availability, node pools with production workloads should have at least three nodes. This gives the cluster more flexibility to distribute and schedule work on other nodes if a node becomes unavailable.

### How do I do this?

You can specify three or more nodes during a [cluster’s creation](https://docs.digitalocean.com/products/kubernetes/how-to/create-clusters/index.html.md#choose-cluster-capacity) or you can [configure the autoscale function](https://docs.digitalocean.com/products/kubernetes/how-to/autoscale/index.html.md) to ensure minimum cluster size of three nodes.

## Set Requests and Limits

To keep your cluster running efficiently, we recommend defining the `requests` and `limits` objects in your application spec for all deployments.

- `requests` - Specifies how much of a resource (such as CPU and memory resources) a pod is allowed to request on a node before being scheduled. If the node doesn’t have the available resources, the pod is not scheduled. This prevents pods from being scheduled on nodes that are already under heavy workload.
- `limits` - Specifies the amount of resources (such as CPU and memory resources) a pod is allowed to utilize on a node. This prevents pods from potentially slowing down the work of other pods.

### How do I do this?

To set `requests` and `limits`, define their values in your application spec. See the [Kubernetes’ documentation for more resource types](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types).

Example app spec

```text
apiVersion: v1
kind: Deployment
metadata:
  name: frontend
spec:
  containers:
  - name: app
    image: images.my-company.example/app:v4
    resources:
      requests:
        cpu: 250m
        memory: 64Mi
      limits:
        memory: 128Mi
        cpu: 500m
  - name: log-aggregator
    image: images.my-company.example/log-aggregator:v6
    resources:
      requests:
        memory: 64Mi
        cpu: 250m
      limits:
        memory: 128Mi
        cpu: 500m
```

## Set Pod Disruption Budgets

To avoid disruptions to your production, such as during cluster upgrades, you can set up a [pod disruption budget (PDB)](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) that limits the number of replicated pods that can be down simultaneously. For example, you can have a replica count of 10 and a PDB that allows downtime of three simultaneous replicas by setting the `minAvailable` to 7.

### How do I do this?

To set up a pod disruption budget, you need to create a `PodDisruptionBudget` policy spec.

`policy/example-app-pdb.yaml`

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  minAvailable: 7
  selector:
    matchLabels:
      app: example-app
```

You can set the minimum available pods for an application using the `minAvailable` field and apply it to your applications using the `matchLabels` object.

## Enable Automatic Upgrades

Enabling automatic upgrades ensures that your cluster is running the latest features, security patches, and stability improvements.

We also recommend enabling [surge upgrades](https://docs.digitalocean.com/products/kubernetes/how-to/upgrade-cluster/index.html.md#surge-upgrades) when upgrading a cluster.

### How do I do this?

To enable automatic upgrades, follow our [Kubernetes upgrade guide](https://docs.digitalocean.com/products/kubernetes/how-to/upgrade-cluster/index.html.md).

## Enable Surge Upgrades

Surge upgrades reduce the overall cluster upgrade time and impact on applications. We recommend enabling surge upgrades when upgrading an existing cluster. Surge upgrades are enabled by default when you create a new cluster.

### How do I do this?

To enable surge upgrades, follow our [surge upgrade guide](https://docs.digitalocean.com/products/kubernetes/how-to/upgrade-cluster/index.html.md#surge-upgrades).

## Check Cluster Linter Messages

[Clusterlint](https://docs.digitalocean.com/support/clusterlint-error-fixes/index.html.md) is a standalone tool that connects to the cluster’s API server and flags issues with workloads deployed in a cluster. These issues might cause downtime during maintenance or upgrades and could complicate the maintenance or upgrade itself. Using `clusterlint` regularly can inform you of ongoing issues that otherwise might not be immediately apparent.

### How to do this?

You can access `clusterlint` using three different methods:

- **Control panel** - To view `clusterlint` messages from the control panel, click **Kubernetes** in the main menu, then select your cluster from the list of clusters. From the cluster’s **Overview** page, under **Production Readiness Check**, click **Run Check**. After the card updates, it displays any `clusterlint` results.
- **DigitalOcean API call** - [Send a diagnostic request](https://developers.digitalocean.com/documentation/v2/#run-clusterlint-checks-on-a-kubernetes-cluster) to the DigitalOcean API and then [retrieve the results](https://developers.digitalocean.com/documentation/v2/#fetch-clusterlint-diagnostics-for-a-kubernetes-cluster).
- **Command line tool** - [Install `clusterlint` from the command line](https://github.com/digitalocean/clusterlint) and begin using it to access your clusters.

For a list of common `clusterlint` errors and their respective fixes, see [Clusterlint Error Fixes](https://docs.digitalocean.com/support/clusterlint-error-fixes/index.html.md).

## Check Reconciler Messages

The DOKS reconciler checks the [managed elements of the cluster](https://docs.digitalocean.com/products/kubernetes/details/managed/index.html.md) during cluster reconciliation and returns information about any issues. These include issues related to account and resource limits and do not contain information about events inside the cluster.

### How to do this?

You can access the reconciler messages by navigating to the **Overview** tab of your cluster in the DigitalOcean Control Panel.

The messages only appear if the reconciler finds any issues. Review each message and follow the instructions in the message to fix the issue. These are not real-time messages, which means that if you resolve the issue, it may take some time for its status to update.