Monitoring Features

Validated on 3 Nov 2025 • Last edited on 10 Nov 2025

DigitalOcean Monitoring is a free, opt-in service that lets you track Droplet resource usage in real time, visualize performance metrics, and receive alerts via email or Slack to proactively manage your infrastructure’s health.

GPU Observability

GPU Observability extends DigitalOcean Insights to display GPU-level metrics for DOKS clusters that include GPU node pools created with AI/ML Ready images for AMD and NVIDIA GPUs. It provides a monitoring experience for GPU workloads, so you can track utilization, temperature, memory usage, and performance directly in the Insights tab.

do-agent automatically detects the GPU type on each node and enables the correct exporter (DCGM for NVIDIA GPUs or ROCm for AMD GPUs). Metrics are collected locally on each GPU worker node.

GPU Observability is available on DOKS 1.33.1-do.5 or higher and is automatically enabled when you select Improved metrics and monitoring during cluster creation.

For security, GPU exporters listen only on 127.0.0.1 to prevent external access.

  • AI/ML Ready Droplets: GPU metrics are enabled automatically when you select Improved Metrics and Monitoring during Droplet creation.
  • Basic Images: GPU metrics are not enabled by default. For Basic Images, you can enable GPU metrics by manually installing the exporter, binding it to 127.0.0.1, reconfiguring do-agent to scrape it, and restarting do-agent.

Droplet Graphs

Droplet graphs provide visual representations of system-level metrics. Use them to monitor resource usage over time and understand how it correlates to performance.

By default, Droplet graphs show public and private bandwidth usage, CPU usage, and disk I/O. By installing the DigitalOcean metrics agent, you also gain access to load averages (1-, 5-, and 15-minute), memory usage, and disk usage.

Alert Policies

Alert policies let you define thresholds for resource usage. When usage exceeds these thresholds, notifications are sent through email or Slack.

You can set alerts for total CPU usage, incoming and outgoing bandwidth, disk read and write operations, memory usage, and disk usage.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.