How to Set Up Monitoring Alerts

DigitalOcean Monitoring is a free, opt-in service that gathers metrics about Droplet-level resource utilization. It provides additional Droplet graphs and supports configurable metrics alert policies with integrated email Slack notifications to help you track the operational health of your infrastructure.

Like other DigitalOcean Monitoring functionalities, the alert policies and notifications feature relies on information provided by the DigitalOcean metrics agent: a lightweight, open-source program that gathers metrics.

Before setting up alert policies, you must install the metrics agent on each of the participating Droplets. Kubernetes worker nodes, a kind of Droplet, already have the metrics agent by default.

Create an Alert Policy

Once the metrics agent is running, you can begin creating alert policies. In the control panel, click the Create button in the top menu and select Alert Policies:

DigitalOcean create page pull-down menu

This opens the policy creation page.

The pattern for defining an alert policy is the same for all metrics:

  • Choose the metric:
    • CPU: The percentage of total CPU used on the Droplet, out of 100%
    • Bandwidth — Inbound: The amount of incoming traffic to the Droplet, in MBps
    • Bandwidth — Outbound: The amount of outgoing traffic from the Droplet, in MBps
    • Disk — Read: The read activity for the Droplet’s disks, in MB/s
    • Disk — Write: The write activity for the Droplet’s disks, in MB/s
    • Memory Utilization: The percentage of total memory being used, out of 100%
    • Disk Utilization: The percentage of the root disk storage being used, out of 100%
  • Specify the usage threshold itself as either a percentage of the total available capacity being used, or as a usage rate depending on the selected metric. An appropriate value here will depend on the metric, the goal of the alert, and the typical server usage patterns. In most scenarios, alerting when usage climbs above the threshold is the more helpful option because high usage tells us that the current resources may no longer be sufficient
  • Pick the alert interval, which is how long a Droplet must exceed the threshold before a notification is triggered:
    • 5 minutes
    • 10 minutes
    • 30 minutes
    • 1 hour

Apply the Policy to Droplets

The Select Droplets or Tags section includes a field where you apply the alert policy to specific Droplets or groups of Droplets.

Adding Droplets by name allows you to target individual resources unambiguously. Adding tags to an alert policy provides flexibility in deciding which Droplets are covered by the policy by adding or removing tags from Droplets.

Kubernetes worker nodes do not retain their names when a node is recycled. To ensure that worker node alert policies persist on node recycling, use tags, for example the cluster name, instead of worker node names.

Select the Alert Notification Method

To create a policy, you must select at least one of the two possible notification methods: email or Slack. The first choice, checked by default, is the verified email address of the account you’re using when you create the policy.

When you create an alert policy from a Team account, you’ll have the option to select any of your teammates as email recipients for an alert. When you click Add more recipients, you’ll be given a list of your team’s email addresses. Select each address individually or use the checkbox by the Team members header to select or deselect all the addresses on the list.

Edit alert recipients window

If you are part of a Slack organization, you can choose to connect your Slack account to receive notifications in Slack. Click the Connect Slack button to authorize DigitalOcean to create notifications within your Slack organization:

Send alerts via window

On the authorization page that follows, you can select any Slack teams you are authenticated to or log in to a different team.

If the Slack team name includes non-unicode (UTF-8) characters, like emojis, monitoring notifications will fail and throw a 500 error. We’re working on expanding monitoring’s character set support.

You can then choose to notify Slackbot (which will send messages only to you), notify a channel, or notify any person or group through direct messages.

Slack authorization prompt

Once you’ve authorized the link between DigitalOcean Monitoring and a Slack team, that connection will be available and enabled by default the next time you create an alert policy. If you choose to unlink in a new alert policy, you’ll be able to select a different channel or a different team without affecting any previous connections.

Name and Create the Alert Policy

Finally, choose a unique and descriptive name for the alert policy. This name will be used to identify this specific alert policy when notifications are sent.

The name you choose will:

  • Identify the policy on the Monitoring index page.
  • Form part of the subject line of the email alert.

When everything is configured, clicking the Create alert policy button will create the policy and kick off the evaluation of incoming data.

The new policy will appear on the Monitoring page under a section called Alert Policies:

Untriggered alert policy

Data Point Collection and Alert States

When a policy is first created, it may take a few minutes before the evaluation of incoming data begins. After that slight delay, data will be evaluated at regular intervals.

If the average of the data points in the alert interval exceeds the threshold, an alert is triggered. In our example, once monitoring begins, after 1440 minutes (one day), the monitoring will average out the data points over that period to determine the percentage of disk usage. If the average indicates that disk usage is above 70%, we would receive a notification.

This same data point evaluation process is used to determine when an alert has been resolved. Data points continue to be collected at regular intervals. Each time a new metric is received, the oldest point drops off, the newest is added, and the average of the threshold interval is evaluated. This means that if a threshold was barely exceeded and a new data point comes in that brings the new average below the threshold, a resolution notification could be triggered without much delay.

With our disk example, let’s say that a log rotation policy deletes an old and particularly large log file, causing the threshold to go down dramatically. We will receive the resolution notification in the same channels where we received the alert notification (unless, of course, we’ve edited the policy in the interim).

At this time, it is not possible to manually resolve or acknowledge an alert. Alerts are automatically resolved when resource usage falls back to an acceptable level according to the alert policy.

Receiving Notifications and Viewing Active Alerts

When an alert is triggered according to the process outlined above, a notification is sent using the chosen mediums. You will be notified once per configured medium when an alert has been triggered. A second notification is sent when the alert has been resolved.

Each notification includes the name of the alert, the name and IP address of the triggering Droplet, and a link to the triggering Droplet’s page in the control panel. Additionally, notifications about triggered alerts include the alert policy parameters and the average resource usage at the time the alert was triggered. Resolution notifications include the length of the alert event and the current average resource usage.

Alerts in the Control Panel

If an alert is triggered, a new section in the Monitoring interface will be displayed called Triggered Alerts. This section is only visible when there are active alerts:

Triggered alert policy

This section of the page displays the active alerts, including each of the Droplets that are currently above the usage threshold. Once the alert has been resolved, the entry will drop out of the Triggered Alerts section. If there are no longer any active alerts, the Triggered Alerts section will be hidden.

Email Notifications

If you’ve selected email notifications, you will receive a notification email when an alert is triggered:

Subject: DigitalOcean monitoring triggered: CPU is running high - example_droplet

CPU Utilization Percent is currently at 71.56%, above setting of 70.00% for the last 5m

View droplet:
Edit monitor:

Once the alert has been resolved, a similar resolution email will be sent:

Subject: DigitalOcean monitoring resolved: Disk Utilization is high on a server tagged 'Database' - Database-01
The monitor was triggered for more than 1 hour.
Disk Utilization is currently at 69.70%.

View droplet Database-01:

This indicates that the alert has been resolved.

Slack Notifications

If you’ve enabled Slack notifications, you will receive a notification in Slack in the team and channel selected in the alert policy:

Slack alert triggered

Once the average resource consumption has dipped below the threshold again, a similar Slack notification will be sent indicating that the alert has been resolved:

Slack alert resolved

Again, this message indicates that the alert has been resolved.