Apache Airflow Blueprint

This Terraform blueprint deploys Apache Airflow on DigitalOcean, streamlining workflow orchestration and management. It includes a managed PostgreSQL database for reliable data storage, a managed Redis instance for efficient caching and message brokering, and a DigitalOcean Spaces bucket for object storage and remote logging.

Apache Airflow is a powerful tool for scheduling and monitoring workflows. This blueprint simplifies the setup, allowing you to focus on developing and optimizing your workflows without worrying about infrastructure. Leveraging DigitalOcean’s managed services ensures high availability, security, and performance.

Ideal for data engineers, data scientists, and developers, this solution minimizes operational overhead while providing a robust environment for your data pipelines. Get started quickly with this preconfigured setup and streamline your workflow orchestration on DigitalOcean.

Getting Started After Deploying Apache Airflow Blueprint

Welcome to the DigitalOcean Airflow Terraform Stack!

This stack will deploy the following resources:

  • A droplet with an Airflow installation pre-configured with basic connections.
  • DBaaS PostgreSQL database instance.
  • DBaaS Redis instance.
  • Spaces Block Storage bucket for remote logging.

The connections for PostgreSQL, Spaces Block Storage, and Redis are configured out of the box.

How to use this blueprint?

Install Terraform

Head to the Terraform install page and follow the instructions for your platform.

You can validate your local Terraform installation by running:

$ terraform -v 
Terraform v1.5.7
...

Create DigitalOcean API token

Head to the Applications & API page and create a new personal access token (PAT) by clicking the Generate New Token button. Ensure to check Write scope for the token, as Terraform needs it to create new resources. Save the token as it disappears forever if you close the page. If lost, delete it and create a new one.

Set up a blueprint

Clone this repository to the machine where Terraform is installed:

$ git clone https://github.com/digitalocean/marketplace-blueprints.git

Navigate to the blueprint you’re interested in, for example, Airflow:

$ cd marketplace-blueprints/blueprints/airflow/

Define your variables

Edit variables.tf file and specify your API token like this:

variable "do_token" {
  default = "dop_v1_your_beautiful_token_here"
}

(Optional but Recommended) Use SSH keys to deploy your Droplets instead of passwords. Retrieve your list of SSH Key IDs using doctl.

Retrieve your SSH Key IDs:

doctl compute ssh-key list

Specify which SSH keys to use:

variable "ssh_key_ids" {
  default = [123, 456, 789] # Replace these numbers with actual SSH key IDs
  type = list(number)
}

(Optional but Recommended) Specify the region you want your Droplets to deploy:

variable "region" {
  default = "nyc3"
}

Below is a table of configurable variables along with their default values and descriptions:

Variable Name Default Value Description
do_token "dop_v1_your_token" DigitalOcean API token. Create one here.
ssh_key_ids [] List of SSH Key IDs. Retrieve your list of SSH Key IDs using doctl.
region "nyc3" DigitalOcean region. See regions for available regions.
spaces_access_id "your_spaces_access_key_here" Access key for DigitalOcean Spaces. Create one here.
spaces_secret_key "your_spaces_secret_key_here" Secret key for DigitalOcean Spaces.
spaces_bucket_name "airflow-bucket" Name of the Spaces bucket.
spaces_host "https://sfo3.digitaloceanspaces.com" Host URL for DigitalOcean Spaces. Find the region-specific host URL here.
droplet_name "airflow-droplet" Name of the Airflow droplet.
droplet_size_slug "s-4vcpu-8gb" Size slug for the Airflow droplet. See sizes for available sizes.
db_node_count 1 Number of nodes in the database cluster.
db_cluster_name "airflow-stack-db-cluster" Name of the database cluster.
db_size_slug "db-s-1vcpu-2gb" Size slug for the database cluster. See sizes for available sizes.
keystore_node_count 1 Number of nodes in the keystore cluster.
keystore_name "airflow-stack-kv-cluster" Name of the keystore cluster.
keystore_size_slug "db-s-1vcpu-2gb" Size slug for the keystore cluster. See sizes for available sizes.

Initialize and Apply the Blueprint

Initialize the Terraform project by running:

$ terraform init

Finally, after the project is initialized, run Terraform apply to spin the blueprint:

$ terraform apply

It can take a few minutes to spin the droplets, and some blueprints require extra time after the creation to finish the configuration.

Getting started with Airflow

After the stack is deployed, you can access the Airflow dashboard at http://your_droplet_public_ipv4. You should see the Login screen:

Airflow Login

After you log in, you will have access to the Airflow dashboard!

There are two example DAGs preinstalled to test connectivity with the Spaces bucket used for remote logging and with Redis.

Sample DAGS

To view the connection details, go to the Connections option under Admin.

Connection Details

Stack details

  • Latest versions of Airflow, PostgreSQL, and Redis.
  • Airflow server and scheduler run as systemctl services.
  • Remote logging is pre-configured out of the box and is available at https://<bucket-name>.<region>.digitaloceanspaces.com/logs/.

Security

Certbot is preinstalled. Run it to configure HTTPS. To make your Airflow droplet more secure, please refer to the Airflow Docs.


This guide should help you get started with deploying and configuring the Apache Airflow Terraform stack on DigitalOcean.