How to Upgrade DOKS Clusters to Newer Versions

DigitalOcean Kubernetes (DOKS) is a managed Kubernetes service that lets you deploy Kubernetes clusters without the complexities of handling the control plane and containerized infrastructure. Clusters are compatible with standard Kubernetes toolchains and integrate natively with DigitalOcean Load Balancers and block storage volumes.

You can upgrade DigitalOcean Kubernetes clusters to newer patch versions (e.g. 1.13.1 to 1.13.2) as well as new minor versions (e.g. 1.12.1 to 1.13.1) in the DigitalOcean Control Panel or in doctl, the command line interface (CLI) tool.

There are two ways to upgrade:

  • On demand. When an upgrade becomes available for DigitalOcean Kubernetes, you can manually trigger the upgrade process. You can upgrade to a new minor version using the manual process, provided you first perform all available patch-level upgrades for your current minor version.

  • Automatically. You can enable automatic upgrades for a cluster that happen within a maintenance window you specify. Automatic updates trigger on new patch versions of Kubernetes and new point releases of DigitalOcean Kubernetes subsystems, like the DigitalOcean Cloud Controller Manager or DigitalOcean Container Storage Interface. However, your cluster will not be automatically upgraded to new minor Kubernetes versions (e.g. 1.12.1 to 1.13.1).

The Upgrade Process

To avoid downtime and for a faster upgrade, we recommend enabling surge upgrade on existing clusters.

During an upgrade, the control plane (Kubernetes master) is replaced with a new control plane running the new version of Kubernetes. This process takes a few minutes, during which API access to the cluster is unavailable but workloads are not impacted.

Once the control plane has been replaced, the worker nodes are replaced in a rolling fashion, one worker pool at a time. Kubernetes reschedules each worker node’s workload, then replaces the node with a new node running the new version and reattaches any block storage volumes to the new nodes. The new worker nodes have new IP addresses.

Any data stored on the local disks of the worker nodes will be lost in the upgrade process. We recommend using persistent volumes for data storage, and not relying on local disk for anything other than temporary data.

During this process, workloads running on clusters with a single worker node will experience downtime because there is no additional capacity to host the node’s workload during the replacement.

If security-related issues arise, it may be necessary for us to force cluster upgrades even on clusters with automatic upgrades disabled. When this is the case, we work to upgrade during specified maintenance windows with advance notification via email, control panel notifications, and via our status page.

Surge Upgrades

Surge upgrades are enabled by default when you create a new Kubernetes cluster. We recommend enabling surge upgrades when upgrading an existing cluster. Surge upgrades ensure a faster and more stable upgrade. Surge upgrades create duplicate nodes, up to a maximum of 10 nodes. After the upgraded nodes are created, workloads are drained from the old nodes to the new nodes, before deleting the old nodes.

Because the maximum number of nodes created during an upgrade is 10, larger cluster nodes are only upgraded 10 at a time.

To enable surge upgrades, in the Surge upgrades section of the Settings tab of your cluster, click Edit. Select the Enable surge upgrades option and click Save.

Enable surge upgrade

To use surge upgrades for the entire upgrade duration, your Droplet limit must be at least n + min(10, num_nodes), where num_nodes is the number of nodes in your cluster and n is your current Droplet count. For example, if you have a 12-node cluster and 5 Droplets, your Droplet limit must at least be 15. You can request a Droplet limit increase at any time.

Surge upgrade droplet limit

If an upgrade starts with less than the required number of Droplets or the limit is reached during the upgrade, then a partial upgrade is done using the available Droplets and the remaining upgrade happens without the surge enabled.

Surge upgrades can have some additional cost. The cost depends on how long it takes to drain workloads from their old nodes. Depending on the length of the upgrade, the estimated cost can range from $0 to one hour of cluster cost. For example, if your cluster costs $0.19/hr, then your surge upgrade range can be $0 - $0.19/hr per upgrade. To help drain your nodes smoothly and minimize the cost of the upgrade, you can implement the measures described in Enabling Disruption-Free Upgrades.

Upgrading via Control Panel

Upgrading On Demand

To update a cluster manually, visit the Overview tab of the cluster in the control panel. Under Available Upgrades, you will see an Upgrade Now button if there is a new version available for your cluster. Click this button to begin the upgrade process.

Upgrading to a New Minor Version

The on-demand process is required when upgrading your cluster to a new minor version of Kubernetes. During this process, you can run our cluster linter before upgrading. This automatically checks the cluster to ensure it’s conforming to some common best practices, and links to the fixes recommended in our documentation, to help mitigate issues that might affect your cluster’s compatibility with the newer version of Kubernetes. Click Run Linter on the upgrade modal to begin.

Screenshot of upgrade modal showing 'Run Linter' link.

Upgrading Automatically

To enable automatic upgrades for a cluster, visit the Settings tab of the cluster. In the Version Upgrades section, click Enable Auto Upgrades.

Automatic upgrades occur during a cluster’s 4-hour maintenance window. The default maintenance window is chosen by the DigitalOcean Kubernetes backend to guarantee an even workload across all maintenance windows for optimal processing.

You can specify a different maintenance window in the Settings tab of a cluster. In the Maintenance Window section, click Edit to specify a different start time. Maintenance windows are made up of two parts: a time of day and, optionally, a day of the week. For example, you can set your maintenance window to 5am any day of the week or to 8pm on Mondays.

Even if you have auto upgrades enabled, you can still upgrade on-demand by clicking the Upgrade Now button in the Overview tab.

Upgrading via CLI

Upgrading to the latest version

First, obtain your cluster ID:

doctl kubernetes cluster list

Then pass the cluster ID to the upgrade command to upgrade to the latest version:

doctl kubernetes cluster upgrade 41b74c5d-9bd0-5555-5555-a57c495b81a3

Upgrading to a specific version

To upgrade to a specific Kubernetes version, rather than just automatically upgrading to the latest version, you must first use your cluster ID to get a list of available upgrades for that cluster:

doctl kubernetes cluster get-upgrades 41b74c5d-9bd0-5555-5555-a57c495b81a3

Then, use the slug value returned by the get-upgrades call to perform the upgrade:

doctl kubernetes cluster upgrade 41b74c5d-9bd0-5555-5555-a57c495b81a3 --version 1.15.3-do.3

Enabling Disruption-Free Upgrades

Upgrading your cluster can cause disruptions in the availability of services running in your workloads. Consider the following measures to ensure service availability during upgrades.

Configure a PodDisruptionBudget

A PodDisruptionBudget (PDB) specifies the minimum number of replicas that an application can tolerate during a voluntary disruption, relative to how many it is intended to have. For example, if you set the replicas value for a deployment to 5, and set the PDB to 1, potentially disruptive actions like cluster upgrades and resizes will occur with no fewer than four pods running.

For more information, see Specifying a Disruption Budget for your Application in the Kubernetes documentation.

Implement Graceful Shutdowns

Ensure that the containers in your workload respond to shutdown requests in a way that doesn’t suddenly destroy service. You can use tools like a preStop hook that responds to a scheduled Pod shutdown, and specify a grace period other than the 30-second default.

This is important because cluster upgrades will result in Pod shutdowns, which follow the standard Kubernetes termination lifecycle:

  1. The Pod is set to the “Terminating” state and removed as an endpoint.
  2. The preStop hook is executed, if it exists.
  3. A SIGTERM signal is sent to the Pod, notifying the containers that they are going to be shut down soon. Your code should listen for this event and start shutting down at this point.
  4. Kubernetes waits for a grace period to pass; the default grace period is 30 seconds.
  5. A SIGKILL signal is sent to any containers that still haven’t shut down, and the Pod is removed.

For more information, see Termination of Pods in the Kubernetes documentation.

Set up Readiness Probes

Readiness probes are useful if applications are running but not able to serve traffic, due to things like external services that are still starting up, loading of large data sets, etc. You can configure a readiness probe to report such a status. Think of a command that you could execute in the container every few seconds that would indicate readiness if it returns 0, and specify the command and the schedule in your Pod spec.

For more information, see Configure Liveness, Readines and Startup Probes in the Kubernetes Documentation.