Index and Tune Vector Search on PostgreSQL

Validated on 27 Apr 2026 • Last edited on 27 Apr 2026

DigitalOcean Managed PostgreSQL for vector search uses the same managed PostgreSQL engine available under Managed Databases, with the pgvector and pgvectorscale extensions for storing and querying vector embeddings alongside relational data.

Once you have embeddings in a table, the next job is to make similarity search fast and accurate. This guide covers pgvector distance operators, when to use HNSW versus IVFFlat, how to set index and query parameters, and how to combine similarity search with ordinary SQL filters.

Distance Operators

pgvector exposes four distance operators. Pick the one that matches how your embedding model was trained, and be consistent: the operator, the index operator class, and any ORDER BY clauses must all agree.

  • Cosine distance: <=>. The default for most modern embedding models (OpenAI, Cohere, BGE, sentence-transformers). Scores are in [0, 2]; smaller is more similar.
  • Negative inner product: <#>. Fastest to compute when vectors are already normalized. pgvector returns the negative so that ORDER BY ascending still returns the most similar rows first.
  • L2 distance: <->. Euclidean distance. A good default when you are unsure; used by some image models.
  • L1 distance: <+>. Manhattan distance. Rare.
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM   documents
ORDER  BY embedding <=> $1
LIMIT  5;

By default, a query with no index runs an exact nearest-neighbor scan: every row is compared against the query vector. That is fine for a few thousand rows but becomes slow as the table grows.

Approximate nearest-neighbor (ANN) indexes trade a small amount of recall for a large speedup. pgvector ships two index types: HNSW and IVFFlat. Choose one per embedding column based on the characteristics below.

  • HNSW: Higher recall and faster queries; slower to build; uses more memory. The best default for most workloads up to tens of millions of rows.
  • IVFFlat: Faster to build; uses less memory; lower recall unless you raise probes. Good when ingest rate is high and you rebuild indexes often.

For IVFFlat, always populate the table before creating the index. IVFFlat picks list centroids from the data present at build time. HNSW has no training step, but bulk-loading first and then indexing is still much faster than inserting into an existing HNSW index.

HNSW Index

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph of vectors. Queries traverse the graph instead of scanning the table.

Creation

CREATE INDEX documents_embedding_hnsw
    ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Swap the operator class to match your distance metric:

  • vector_cosine_ops pairs with <=>
  • vector_ip_ops pairs with <#>
  • vector_l2_ops pairs with <->

Key Parameters

  • m: Maximum connections per node. Default 16. Higher values increase recall and memory. Range 8-64.
  • ef_construction: Dynamic candidate list size during index build. Default 64. Higher builds a better graph at the cost of build time. Range 64-200.
  • ef_search: Dynamic candidate list size at query time. Set per session to trade recall for latency:
SET hnsw.ef_search = 100;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;

HNSW indexes are held in memory for best performance. A 1M-row index on 1,536-dimensional vectors is roughly 6 GB on disk and wants similar resident memory. Size your plan accordingly.

IVFFlat Index

IVFFlat (Inverted File with Flat quantization) partitions the vector space into lists Voronoi cells. Queries scan only the top probes cells.

Creation

-- Rule of thumb: lists = rows / 1000 for < 1M rows,
--                lists = sqrt(rows) for >= 1M rows
CREATE INDEX documents_embedding_ivfflat
    ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Key Parameters

  • lists: Number of partitions. Higher gives more precise partitioning, but each list holds fewer vectors.
  • probes: Number of lists to scan at query time. Default 1. Raising probes increases recall at the cost of latency. A reasonable starting point is sqrt(lists).
SET ivfflat.probes = 10;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;

IVFFlat uses the data in the table at the time the index is created to pick list centroids. If your data distribution shifts (for example, you switch embedding models), rebuild the index.

Query Patterns and Filtering

The power of pgvector comes from being able to combine similarity search with ordinary SQL. The planner can use your B-tree, GIN, or partial indexes together with the vector index.

Top-k with a Filter

SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM   documents
WHERE  tenant_id = $2
  AND  created_at > now() - interval '30 days'
ORDER  BY embedding <=> $1
LIMIT  10;

Partial Index for a Hot Tenant or Segment

CREATE INDEX documents_tenant_hot_hnsw
    ON documents
    USING hnsw (embedding vector_cosine_ops)
    WHERE tenant_id = '11111111-1111-1111-1111-111111111111';

Partial indexes are a good fit for multi-tenant systems where one tenant is disproportionately large. The partial index is smaller, faster, and only maintained for rows matching the predicate.

Rerank with Inner Product After an ANN Recall

If you need maximum precision, first fetch a wider candidate set from the ANN index, then rerank with a more expensive score:

WITH candidates AS (
    SELECT id, content, embedding
    FROM   documents
    ORDER  BY embedding <=> $1
    LIMIT  50
)
SELECT id, content, (embedding <#> $1) AS score
FROM   candidates
ORDER  BY embedding <#> $1
LIMIT  10;

Memory, Maintenance, and Operations

  • VACUUM and ANALYZE: Run after large inserts so the planner has accurate statistics.
  • REINDEX: Rebuild the index if you changed embedding models, shifted distributions, or see recall degrade.
  • Resize: Vector indexes are RAM-hungry. If searches start spilling to disk, move to a larger plan with more RAM.
  • Read-only nodes: Add a read-only node and route heavy search traffic there to protect writes on the primary.
  • Backups: Vectors are ordinary columns, so daily backups and point-in-time recovery cover them automatically.

Troubleshooting Slow Queries

Work through the checklist below in order. In most cases the issue is one of the first three.

  1. Confirm the index is being used. Run EXPLAIN (ANALYZE, BUFFERS). If you see a Seq Scan on the vector column, the planner has rejected the index (usually because the operator class does not match the ORDER BY operator).
  2. Match the operator class and operator. vector_cosine_ops works only with <=>, and so on.
  3. Raise ef_search or probes. If recall is low, increase these per session and re-measure.
  4. Check memory pressure. A too-small plan causes the index to swap. Upgrade the plan or reduce index size by moving to halfvec or a lower-dimensional embedding model.
  5. Reduce dimensions. 1,536-dim embeddings are often overkill. 768- or 384-dim models cut index size and speed up queries noticeably.
  6. Consider pgvectorscale. For corpora that exceed RAM by a lot, the StreamingDiskANN index is designed for disk-resident vector search. See Advanced Vector Workloads.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.