How to Set Up Resource Alerts

DigitalOcean Monitoring is a free, opt-in service that gathers and displays metrics about Droplet-level resource utilization. Monitoring supports configurable alert policies with integrated email and Slack notifications to help you track the operational health of your infrastructure.


Resource alerts send notifications via Slack or email when Droplet metrics, like CPU usage or bandwidth, fall outside of a threshold you set.

Create Resource Alerts

Warning
Resource alerts rely on information provided by the DigitalOcean metrics agent: a lightweight, open-source program that gathers metrics. You must install the metrics agent on every Droplet that you want to receive alerts for. Kubernetes worker nodes have the metrics agent installed by default.

Once the metrics agent is running, you can create resource alerts. In the Create menu, click Resource Alerts:

DigitalOcean create page pull-down menu

This opens the resource alert creation page.

The pattern for defining a resource alert is the same for all metrics: choose a metric, rule, threshold, and duration.

  1. Choose the Metric Type:

    • 1 Minute Load Average: The average number of processes being executed and waiting to be executed by the CPU, over the past 1 minute
    • 5 Minute Load Average: The average number of processes being executed and waiting to be executed by the CPU, over the past 5 minutes
    • 15 Minute Load Average: The average number of processes being executed and waiting to be executed by the CPU, over the past 15 minutes
    • Memory Utilization: The percentage of total memory being used, out of 100%
    • Disk Utilization Percentage: The percentage of the root disk storage being used, out of 100%
    • CPU Utilization Percentage: The percentage of total CPU used on the Droplet, out of 100%
    • Disk Read I/O: The amount of read activity for the Droplet’s disks, in Mbps
    • Disk Write I/O: The amount of write activity for the Droplet’s disks, in Mbps
    • Public Outbound Bandwidth: The amount of outgoing traffic from the Droplet, in Mbps
    • Public Inbound Bandwidth: The amount of incoming traffic to the Droplet, in Mbps
    • Private Outbound Bandwidth: The amount of outgoing traffic from the Droplet, in Mbps
    • Private Inbound Bandwidth: The amount of incoming traffic to the Droplet, in Mbps
  2. Specify a Rule to apply to the metric, either is above or is below.

  3. Specify the usage Threshold itself as either a percentage of the total available capacity being used or as a usage rate, depending on the selected metric.

    An appropriate value depends on the metric, the goal of the alert, and the typical server usage patterns. In most scenarios, alerting when usage climbs above the threshold is the more helpful option because high usage indicates that the current resources may no longer be sufficient.

  4. Pick the Duration, which is how long a Droplet must exceed the threshold before a notification is triggered:

    • 5 minutes
    • 10 minutes
    • 30 minutes
    • 1 hour

Apply the Resource Alert to Droplets

The Select Droplets or Tags section includes a field where you apply the resource alert to specific Droplets or groups of Droplets.

Adding Droplets by name allows you to target individual resources unambiguously. Adding tags to a resource alert provides flexibility in deciding which Droplets are covered by the resource alert.

Note
Kubernetes worker nodes do not retain their names when a node is recycled. To ensure that worker node alert policies persist on node recycling, use tags (like the cluster name tag) instead of worker node names.

Select the Alert Notification Method

Select at least one of the two possible notification methods: email or Slack.

The default email address is the email address associated with your DigitalOcean account. You can add the email addresses of other team members to receive notifications.

Edit alert recipients window

If you are part of a Slack organization, you can choose to connect your Slack account to receive notifications in Slack. Click the Connect Slack button to authorize DigitalOcean to create notifications within your Slack organization:

Send alerts via window

On the authorization page that follows, you can select any Slack teams you are authenticated to or log in to a different team.

Warning
If the Slack team name includes non-unicode (UTF-8) characters, like emojis, monitoring notifications fails and throws a 500 error. We’re working on expanding character set support.

You can then choose to notify Slackbot (which sends messages only to you), notify a channel, or notify any person or group through direct messages.

Once you’ve authorized the link between DigitalOcean Monitoring and a Slack team, that connection is available and enabled by default the next time you create a resource alert. If you choose to unlink in a new resource alert, you are able to select a different channel or a different team without affecting any previous connections.

Name and Create the Resource Alert

Finally, choose a unique and descriptive name for the resource alert. This name is used to identify this specific resource alert when notifications are sent.

The name you choose:

  • Identifies the resource alert on the Monitoring index page.
  • Forms part of the subject line of the email alert.

Once configured, click the Create Resource Alert button. This creates the resource alert and starts the evaluation of incoming data.

The new alert appears on the Monitoring page in the Resource Alerts tab:

Untriggered resource alert

Receiving Notifications and Viewing Triggered Alerts

When you first create a resource alert, it may take a few minutes before the alerting service begins evaluating incoming data. After that delay, each time the alerting service receives a new data point from the monitoring agent, it replaces the oldest data point with the newest and reevaluates the average of the threshold interval.

If the average of the data points in the alert interval exceeds the threshold, the service triggers an alert. When the average of the data points in the alert interval falls back within the threshold, the service resolves the alert. At this time, it is not possible to manually resolve or acknowledge an alert.

When the alerting service triggers an alert or resolves an alert, it sends a notification using the method you’ve chosen (either email or Slack). Each notification includes the name of the alert, the name and IP address of the triggering Droplet, and a link to the triggering Droplet’s page in the control panel. Additionally, notifications about triggered alerts include the resource alert’s parameters and the average resource usage at the time the alert was triggered. Resolution notifications include the length of the alert event and the current average resource usage.

Control Panel Alerts

Triggered alerts are visible on the Monitoring page in the Triggered Alerts section. This section is only visible when there are active alerts:

Triggered resource alert

Email Alerts

If you’ve selected email notifications, you receive a notification email when an alert is triggered:

Subject: DigitalOcean monitoring triggered: CPU is running high - example_droplet

CPU Utilization Percent is currently at 71.56%, above setting of 70.00% for the last 5m

View droplet: https://cloud.digitalocean.com/droplets/12345
IP: 203.0.113.1
Edit monitor: https://cloud.digitalocean.com/monitors/b0fa6de7-00ex-ampl-e920-e52eeb35a903

Once the alert has been resolved, a similar resolution email is sent:

Subject: DigitalOcean monitoring resolved: Disk Utilization is high on a server tagged 'Database' - Database-01
The monitor was triggered for more than 1 hour.
Disk Utilization is currently at 69.70%.

View droplet Database-01: https://cloud.digitalocean.com/droplets/12345678
IP: 203.0.113.1

This indicates that the alert has been resolved.

Slack Alerts

If you’ve enabled Slack notifications, you receive a notification in Slack in the team and channel selected in the alert’s configuration:

Slack alert triggered

Once the average resource consumption has dipped below the threshold again, the alert service sends a similar Slack notification to indicate that the alert has been resolved:

Slack alert resolved