This Terraform blueprint deploys Apache Airflow on DigitalOcean, streamlining workflow orchestration and management. It includes a managed PostgreSQL database for reliable data storage, a managed Redis instance for efficient caching and message brokering, and a DigitalOcean Spaces bucket for object storage and remote logging.
Apache Airflow is a powerful tool for scheduling and monitoring workflows. This blueprint simplifies the setup, allowing you to focus on developing and optimizing your workflows without worrying about infrastructure. Leveraging DigitalOcean’s managed services ensures high availability, security, and performance.
Ideal for data engineers, data scientists, and developers, this solution minimizes operational overhead while providing a robust environment for your data pipelines. Get started quickly with this preconfigured setup and streamline your workflow orchestration on DigitalOcean.
Click the Deploy to DigitalOcean button to deploy this offering. If you aren’t logged in, this link will prompt you to log in with your DigitalOcean account.
This stack will deploy the following resources:
The connections for PostgreSQL, Spaces Block Storage, and Redis are configured out of the box.
Head to the Terraform install page and follow the instructions for your platform.
You can validate your local Terraform installation by running:
$ terraform -v
Terraform v1.5.7
...
Head to the Applications & API page and create a new personal access token (PAT) by clicking the Generate New Token button. Ensure to check Write scope for the token, as Terraform needs it to create new resources. Save the token as it disappears forever if you close the page. If lost, delete it and create a new one.
Clone this repository to the machine where Terraform is installed:
$ git clone https://github.com/digitalocean/marketplace-blueprints.git
Navigate to the blueprint you’re interested in, for example, Airflow:
$ cd marketplace-blueprints/blueprints/airflow/
Edit variables.tf
file and specify your API token like this:
variable "do_token" {
default = "dop_v1_your_beautiful_token_here"
}
(Optional but Recommended) Use SSH keys to deploy your Droplets instead of passwords. Retrieve your list of SSH Key IDs using doctl.
Retrieve your SSH Key IDs:
doctl compute ssh-key list
Specify which SSH keys to use:
variable "ssh_key_ids" {
default = [123, 456, 789] # Replace these numbers with actual SSH key IDs
type = list(number)
}
(Optional but Recommended) Specify the region you want your Droplets to deploy:
variable "region" {
default = "nyc3"
}
Below is a table of configurable variables along with their default values and descriptions:
Variable Name | Default Value | Description |
---|---|---|
do_token |
"dop_v1_your_token" |
DigitalOcean API token. Create one here. |
ssh_key_ids |
[] |
List of SSH Key IDs. Retrieve your list of SSH Key IDs using doctl. |
region |
"nyc3" |
DigitalOcean region. See regions for available regions. |
spaces_access_id |
"your_spaces_access_key_here" |
Access key for DigitalOcean Spaces. Create one here. |
spaces_secret_key |
"your_spaces_secret_key_here" |
Secret key for DigitalOcean Spaces. |
spaces_bucket_name |
"airflow-bucket" |
Name of the Spaces bucket. |
spaces_host |
"https://sfo3.digitaloceanspaces.com" |
Host URL for DigitalOcean Spaces. Find the region-specific host URL here. |
droplet_name |
"airflow-droplet" |
Name of the Airflow droplet. |
droplet_size_slug |
"s-4vcpu-8gb" |
Size slug for the Airflow droplet. See sizes for available sizes. |
db_node_count |
1 |
Number of nodes in the database cluster. |
db_cluster_name |
"airflow-stack-db-cluster" |
Name of the database cluster. |
db_size_slug |
"db-s-1vcpu-2gb" |
Size slug for the database cluster. See sizes for available sizes. |
keystore_node_count |
1 |
Number of nodes in the keystore cluster. |
keystore_name |
"airflow-stack-kv-cluster" |
Name of the keystore cluster. |
keystore_size_slug |
"db-s-1vcpu-2gb" |
Size slug for the keystore cluster. See sizes for available sizes. |
Initialize the Terraform project by running:
$ terraform init
Finally, after the project is initialized, run Terraform apply to spin the blueprint:
$ terraform apply
It can take a few minutes to spin the droplets, and some blueprints require extra time after the creation to finish the configuration.
After the stack is deployed, you can access the Airflow dashboard at http://your_droplet_public_ipv4
. You should see the Login screen:
After you log in, you will have access to the Airflow dashboard!
There are two example DAGs preinstalled to test connectivity with the Spaces bucket used for remote logging and with Redis.
To view the connection details, go to the Connections option under Admin.
https://<bucket-name>.<region>.digitaloceanspaces.com/logs/
.Certbot is preinstalled. Run it to configure HTTPS. To make your Airflow droplet more secure, please refer to the Airflow Docs.
This guide should help you get started with deploying and configuring the Apache Airflow Terraform stack on DigitalOcean.