Give Feedback

Vector Search Quickstart for OpenSearch

Validated on 28 Apr 2026 • Last edited on 28 Apr 2026

DigitalOcean Managed OpenSearch for vector search uses the same managed OpenSearch engine available under Managed Databases. It bundles the k-NN, ML Commons, and Neural Search plugins for vector similarity search, hybrid vector and keyword search, and remote embedding models.

Copy page as Markdown View page as Markdown

This quickstart walks you through creating a DigitalOcean Managed OpenSearch cluster, configuring it as a vector store, indexing a handful of sample embeddings, and running a k-Nearest Neighbor (k-NN) similarity query. It takes about 15 minutes.

OpenSearch 2.19 bundles the k-NN, ML Commons, and Neural Search plugins, so they are preinstalled on DigitalOcean managed OpenSearch clusters. You can create a k-NN index and run similarity queries as soon as the cluster is online.

Prerequisites

To complete this quickstart, you need:

A DigitalOcean account. You provision the cluster from the Control Panel.
A terminal with curl installed, or Python 3.9 or later with opensearch-py (>= 2.4.0). Both are shown below.

If you already have a managed OpenSearch cluster running version 2.14 or later, you can skip Step 1 and start at Step 2 by pointing the requests in this guide at your existing cluster.

Step 1: Create an OpenSearch Vector Database

In the Control Panel, click Create, then Vector Database.
Select OpenSearch 2.19 as the engine.
For this quickstart, the default Basic Shared CPU plan (1 vCPU, 2 GB RAM, 40 GiB disk) is enough.
Choose the region closest to your application and name the cluster.
Click Create Vector Database Cluster.

For production vector workloads, size for RAM: OpenSearch holds the HNSW graph in memory. See Create a Cluster for full sizing guidance.

Step 2: Secure the Cluster and Collect Connection Details

While the cluster provisions, open its Overview tab:

Under Trusted Sources, add your workstation IP or a DigitalOcean resource. Only listed sources can connect.
Copy the host, port, and doadmin password.
Export them as environment variables:

export OPENSEARCH_HOST="<your-cluster-host>"
export OPENSEARCH_PORT="25060"
export OPENSEARCH_USER="doadmin"
export OPENSEARCH_PASSWORD="<your-doadmin-password>"
export OS="https://$OPENSEARCH_USER:$OPENSEARCH_PASSWORD@$OPENSEARCH_HOST:$OPENSEARCH_PORT"

Verify connectivity:

curl -sS "$OS/" | jq '.version.number'

You should see "2.19.x".

Step 3: Create a k-NN Index

Create an index that stores 4-dimensional vectors. In production you typically use 384, 768, 1024, or 1536 dimensions depending on your embedding model.

curl -X PUT "$OS/articles" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "body":  { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 4,
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "space_type": "cosinesimil",
          "parameters": { "m": 16, "ef_construction": 128 }
        }
      }
    }
  }
}'

"knn": true enables the k-NN plugin for this index.
"type": "knn_vector" declares the vector field; dimension must match your embedding model.
"engine": "lucene" uses OpenSearch’s native HNSW, which supports efficient filtered search. Use Faiss for very large indexes (greater than 10 million vectors).

Step 4: Index Sample Vectors

Load four tiny documents with pre-computed embeddings.

curl -X POST "$OS/articles/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_id": "1" } }
{ "title": "Coffee brewing basics",    "body": "Pour-over, espresso, and cold brew compared.",      "embedding": [0.91, 0.10, 0.05, 0.02] }
{ "index": { "_id": "2" } }
{ "title": "Best espresso machines",   "body": "A buyer guide for home espresso setups.",          "embedding": [0.88, 0.15, 0.07, 0.04] }
{ "index": { "_id": "3" } }
{ "title": "Intro to deep learning",   "body": "Neural networks, backpropagation, activations.",   "embedding": [0.05, 0.92, 0.18, 0.10] }
{ "index": { "_id": "4" } }
{ "title": "Hiking trails near Denver","body": "Five scenic day hikes within an hour of the city.","embedding": [0.12, 0.08, 0.90, 0.22] }
'

OpenSearch responds with "errors": false when the bulk request succeeds.

Step 5: Run Your First k-NN Query

Find the two documents closest to a query vector that looks like a coffee-related embedding:

curl -X POST "$OS/articles/_search" -H 'Content-Type: application/json' -d '{
  "size": 2,
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.90, 0.12, 0.06, 0.03],
        "k": 2
      }
    }
  }
}'

You should see the two coffee articles ranked highest, with _score values close to 1.0. OpenSearch normalizes cosine similarity so that higher is better.

Optional: The Same Query in Python

import os
from opensearchpy import OpenSearch

client = OpenSearch(
    hosts=[{
        "host": os.environ["OPENSEARCH_HOST"],
        "port": int(os.environ.get("OPENSEARCH_PORT", 25060)),
    }],
    http_auth=(os.environ["OPENSEARCH_USER"], os.environ["OPENSEARCH_PASSWORD"]),
    use_ssl=True,
    verify_certs=True,
)

resp = client.search(
    index="articles",
    body={
        "size": 2,
        "query": {
            "knn": {
                "embedding": {
                    "vector": [0.90, 0.12, 0.06, 0.03],
                    "k": 2,
                }
            }
        },
    },
)

for hit in resp["hits"]["hits"]:
    print(hit["_score"], hit["_source"]["title"])

Next Steps

Create a k-NN Index: tune engines, space types, and HNSW parameters for your workload.
Index and Query Vectors: bulk ingestion, filtered k-NN, and exact search.
Run Hybrid Searches: combine BM25 with vector similarity.
Register a Remote Embedding Model: let OpenSearch call your embedding service directly.

For upstream OpenSearch vector documentation, see the official OpenSearch vector-search docs.

Warning

DigitalOcean Vector Databases are billed by the hour for as long as they exist. If you are experimenting, destroy the cluster from the Settings tab when you are finished. Destroying a cluster deletes all indexes and vectors irreversibly.