Give Feedback

Run Hybrid (Vector plus Keyword) Searches in OpenSearch

Validated on 28 Apr 2026 • Last edited on 28 Apr 2026

DigitalOcean Managed OpenSearch for vector search uses the same managed OpenSearch engine available under Managed Databases. It bundles the k-NN, ML Commons, and Neural Search plugins for vector similarity search, hybrid vector and keyword search, and remote embedding models.

Copy page as Markdown View page as Markdown

Pure vector search is good at semantic matches and weak at exact matches. It finds conceptually similar text but can miss specific product codes, names, or keywords. Pure BM25 is the opposite. Hybrid search runs both queries and combines the scores.

OpenSearch 2.19 implements hybrid search as a compound query (hybrid) plus a search pipeline that normalizes and combines sub-query scores.

Prerequisites

A k-NN index with searchable text fields. See Create a k-NN Index.
Documents already indexed. See Index and Query Vectors.
OpenSearch 2.10 or later. The hybrid query and normalization processor were promoted to GA in 2.10 and ship with DigitalOcean Managed OpenSearch 2.19.

Step 1: Create a Search Pipeline

A search pipeline applies processors to every search response before it returns. For hybrid search, use the normalization-processor so BM25’s unbounded scores and k-NN’s [0,1] similarity scores can be combined.

curl -X PUT "$OS/_search/pipeline/hybrid-search-pipeline" \
  -H 'Content-Type: application/json' -d '{
  "description": "Normalize and combine hybrid search sub-query scores",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": { "technique": "min_max" },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": { "weights": [0.3, 0.7] }
        }
      }
    }
  ]
}'

The weights array has one weight per sub-query in the hybrid query, in order. This example assumes two sub-queries and weights the second (k-NN) more heavily than the first (BM25).

Normalization Techniques

Technique	Behavior
min_max (recommended)	Rescales each sub-query’s scores to `[0, 1]` based on that response’s highest and lowest scores.
l2	L2-normalizes the score vector. Use only if you have benchmarked it against `min_max` and it wins.
z_score	Standardizes each sub-query’s scores by mean and standard deviation. More sensitive to outliers.

Combination Techniques

Technique	Behavior
arithmetic_mean (recommended)	Weighted average of normalized scores. Fast, predictable, and supports per-query weights.
geometric_mean	Weighted geometric mean. Penalizes documents that score low on any sub-query.
harmonic_mean	Emphasizes the lower-scoring sub-query more than `geometric_mean`. Rarely the best choice.

Step 2: Run a Hybrid Query

Attach the pipeline with the search_pipeline query-string parameter, then use the hybrid compound query:

curl -X POST "$OS/documents/_search?search_pipeline=hybrid-search-pipeline" \
  -H 'Content-Type: application/json' -d '{
  "size": 10,
  "_source": ["title", "source"],
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "body": "opensearch vector search"
          }
        },
        {
          "knn": {
            "embedding": {
              "vector": [0.013, -0.041, "..."],
              "k": 10
            }
          }
        }
      ]
    }
  }
}'

Each hit’s _score is the weighted, normalized combination of the BM25 and k-NN scores. Documents appear at most once.

The hybrid query supports up to five sub-queries. Beyond two (BM25 plus k-NN), common additions are a match_phrase for exact phrase boosting or a second knn against a different embedding field.

Step 3: Set the Pipeline as the Index Default

To make hybrid search the default for an index, attach the pipeline so clients do not have to pass the parameter:

curl -X PUT "$OS/documents/_settings" -H 'Content-Type: application/json' -d '{
  "index.search.default_pipeline": "hybrid-search-pipeline"
}'

Tune the Balance Between BM25 and Vector

Two knobs to experiment with:

Weights. Start at [0.5, 0.5]. Increase the vector weight when queries are predominantly natural language. Increase BM25 when queries are short or keyword-heavy.
k. The number of candidates the k-NN sub-query retrieves before combining. Set k to 3 * size or higher so the normalization step has enough vector candidates to rank against BM25 hits.

The only reliable way to tune weights is to build a labeled test set (query to relevant document IDs), run searches across a grid of weight combinations, and measure nDCG@10 or recall@10.

Debug Hybrid Scores

OpenSearch 2.19 added the hybrid_score_explanation response processor, which shows exactly what each sub-query contributed to a hit’s final score.

curl -X PUT "$OS/_search/pipeline/hybrid-debug-pipeline" \
  -H 'Content-Type: application/json' -d '{
  "phase_results_processors": [
    { "normalization-processor": { "normalization": { "technique": "min_max" },
                                    "combination":   { "technique": "arithmetic_mean" } } }
  ],
  "response_processors": [
    { "hybrid_score_explanation": {} }
  ]
}'

Add "explain": true to your search body. OpenSearch returns the normalized score for each sub-query, the combination weight, and the final combined score for every hit. See the upstream hybrid search explain docs.

Next Steps

Register a Remote Embedding Model: let OpenSearch generate the query vector from raw text using a neural sub-query inside a hybrid query.