Run Hybrid (Vector plus Keyword) Searches in OpenSearch
Validated on 27 Apr 2026 • Last edited on 27 Apr 2026
DigitalOcean Managed OpenSearch for vector search uses the same managed OpenSearch engine available under Managed Databases. It bundles the k-NN, ML Commons, and Neural Search plugins for vector similarity search, hybrid vector and keyword search, and remote embedding models.
Pure vector search is good at semantic matches and weak at exact matches. It finds conceptually similar text but can miss specific product codes, names, or keywords. Pure BM25 is the opposite. Hybrid search runs both queries and combines the scores.
OpenSearch 2.19 implements hybrid search as a compound query (hybrid) plus a search pipeline that normalizes and combines sub-query scores.
Prerequisites
- A k-NN index with searchable text fields. See Create a k-NN Index.
- Documents already indexed. See Index and Query Vectors.
- OpenSearch 2.10 or later. The hybrid query and normalization processor were promoted to GA in 2.10 and ship with DigitalOcean Managed OpenSearch 2.19.
Step 1: Create a Search Pipeline
A search pipeline applies processors to every search response before it returns. For hybrid search, use the normalization-processor so BM25’s unbounded scores and k-NN’s [0,1] similarity scores can be combined.
curl -X PUT "$OS/_search/pipeline/hybrid-search-pipeline" \
-H 'Content-Type: application/json' -d '{
"description": "Normalize and combine hybrid search sub-query scores",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": { "technique": "min_max" },
"combination": {
"technique": "arithmetic_mean",
"parameters": { "weights": [0.3, 0.7] }
}
}
}
]
}'The weights array has one weight per sub-query in the hybrid query, in order. This example assumes two sub-queries and weights the second (k-NN) more heavily than the first (BM25).
Normalization Techniques
| Technique | Behavior |
|---|---|
| min_max (recommended) | Rescales each sub-query’s scores to [0, 1] based on that response’s highest and lowest scores. |
| l2 | L2-normalizes the score vector. Use only if you have benchmarked it against min_max and it wins. |
| z_score | Standardizes each sub-query’s scores by mean and standard deviation. More sensitive to outliers. |
Combination Techniques
| Technique | Behavior |
|---|---|
| arithmetic_mean (recommended) | Weighted average of normalized scores. Fast, predictable, and supports per-query weights. |
| geometric_mean | Weighted geometric mean. Penalizes documents that score low on any sub-query. |
| harmonic_mean | Emphasizes the lower-scoring sub-query more than geometric_mean. Rarely the best choice. |
Step 2: Run a Hybrid Query
Attach the pipeline with the search_pipeline query-string parameter, then use the hybrid compound query:
curl -X POST "$OS/documents/_search?search_pipeline=hybrid-search-pipeline" \
-H 'Content-Type: application/json' -d '{
"size": 10,
"_source": ["title", "source"],
"query": {
"hybrid": {
"queries": [
{
"match": {
"body": "opensearch vector search"
}
},
{
"knn": {
"embedding": {
"vector": [0.013, -0.041, "..."],
"k": 10
}
}
}
]
}
}
}'Each hit’s _score is the weighted, normalized combination of the BM25 and k-NN scores. Documents appear at most once.
The hybrid query supports up to five sub-queries. Beyond two (BM25 plus k-NN), common additions are a match_phrase for exact phrase boosting or a second knn against a different embedding field.
Step 3: Set the Pipeline as the Index Default
To make hybrid search the default for an index, attach the pipeline so clients do not have to pass the parameter:
curl -X PUT "$OS/documents/_settings" -H 'Content-Type: application/json' -d '{
"index.search.default_pipeline": "hybrid-search-pipeline"
}'Tune the Balance Between BM25 and Vector
Two knobs to experiment with:
- Weights. Start at
[0.5, 0.5]. Increase the vector weight when queries are predominantly natural language. Increase BM25 when queries are short or keyword-heavy. - k. The number of candidates the k-NN sub-query retrieves before combining. Set
kto3 * sizeor higher so the normalization step has enough vector candidates to rank against BM25 hits.
The only reliable way to tune weights is to build a labeled test set (query to relevant document IDs), run searches across a grid of weight combinations, and measure nDCG@10 or recall@10.
Debug Hybrid Scores
OpenSearch 2.19 added the hybrid_score_explanation response processor, which shows exactly what each sub-query contributed to a hit’s final score.
curl -X PUT "$OS/_search/pipeline/hybrid-debug-pipeline" \
-H 'Content-Type: application/json' -d '{
"phase_results_processors": [
{ "normalization-processor": { "normalization": { "technique": "min_max" },
"combination": { "technique": "arithmetic_mean" } } }
],
"response_processors": [
{ "hybrid_score_explanation": {} }
]
}'Add "explain": true to your search body. OpenSearch returns the normalized score for each sub-query, the combination weight, and the final combined score for every hit. See the upstream hybrid search explain docs.
Next Steps
- Register a Remote Embedding Model: let OpenSearch generate the query vector from raw text using a
neuralsub-query inside a hybrid query.