Ingest and Retrieve Data on Managed Weaviateprivate

Validated on 27 Apr 2026 • Last edited on 27 Apr 2026

DigitalOcean Managed Weaviate is a fully managed Weaviate vector database for retrieval-augmented generation, semantic search, and similarity-based AI workloads. Clusters are provisioned, secured, backed up, and patched by DigitalOcean.

This guide covers the data plane of a Managed Weaviate cluster: defining a collection, loading objects with DigitalOcean Serverless Inference embeddings, and running hybrid, vector, and keyword searches.

For control plane operations (provisioning, credentials, resizing, backups), see DigitalOcean Managed Weaviate. Once your cluster is active and you have its endpoints and API token, continue here.

Overview

This guide configures Weaviate with a server-side vectorizer, so Weaviate calls DigitalOcean Serverless Inference on your behalf for every insert and every vector or hybrid query. Your application sends raw text and Weaviate handles the embedding round trip.

Managed Weaviate exposes the same data plane API as open-source Weaviate:

  • REST at /v1/schema, /v1/objects, and /v1/batch/* for schema and ingest.
  • GraphQL at /v1/graphql for search and aggregation.
  • gRPC on a separate -grpc hostname (used by the Weaviate SDKs).

Both the HTTP and gRPC endpoints run over TLS on port 443. The walkthrough below uses curl end to end so you can run it from a terminal with no SDK installed. The same operations work with the Weaviate Python, JavaScript and TypeScript, Go, and Java clients when you’re ready for production.

Prerequisites

  • An active Managed Weaviate cluster. See Provision and Connect to a Cluster.
  • A DigitalOcean AI Platform workspace with a Serverless Inference endpoint.
  • Your Weaviate HTTP and gRPC endpoints and API token from GET /v2/vector-databases/{id}/credentials.
  • Your DigitalOcean Inference base URL and API key.
  • curl, plus jq if you want to prettify JSON output.

Set Environment Variables

The examples below assume the following variables are exported:

export WEAVIATE_URL="my-vector-db-abc123.weaviate.digitalocean.com"
export WEAVIATE_GRPC_URL="my-vector-db-abc123-grpc.weaviate.digitalocean.com"
export WEAVIATE_HTTP_PORT=443
export WEAVIATE_GRPC_PORT=443
export WEAVIATE_API_KEY="<your-api-token-from-credentials>"

export DO_INFERENCE_URL="https://inference.do-ai.run/v1"
export DO_INFERENCE_API_KEY="<your-do-inference-api-key>"
export DO_EMBED_MODEL="gte-large-en-v1.5"

The HTTP and gRPC endpoints are separate hostnames. The gRPC hostname has a -grpc suffix. The walkthrough uses HTTP and GraphQL via WEAVIATE_URL. The gRPC variables are useful when you connect with an SDK later.

Step 1: Create a Collection

A collection is the top-level container for your objects, the rough equivalent of a table. The example below creates an Article collection with title, body, author, and tags. The vectorizer is set to text2vec-openai and pointed at DigitalOcean Serverless Inference, so Weaviate embeds properties on insert and queries on read. The index uses the cluster’s default RQ8 compression.

curl -X POST "https://$WEAVIATE_URL/v1/schema" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
      "text2vec-openai": {
        "model":   "gte-large-en-v1.5",
        "baseURL": "https://inference.do-ai.run",
        "vectorizeClassName": false
      }
    },
    "vectorIndexType": "hnsw",
    "vectorIndexConfig": {
      "distance": "cosine",
      "rq": { "enabled": true, "bits": 8 }
    },
    "properties": [
      { "name": "title",  "dataType": ["text"] },
      { "name": "body",   "dataType": ["text"] },
      { "name": "author", "dataType": ["text"],   "indexFilterable": true,
        "moduleConfig": { "text2vec-openai": { "skip": true } } },
      { "name": "tags",   "dataType": ["text[]"], "indexFilterable": true,
        "moduleConfig": { "text2vec-openai": { "skip": true } } }
    ]
  }'

The baseURL in the schema is https://inference.do-ai.run with no /v1 suffix. Weaviate’s text2vec-openai module appends the OpenAI path itself when calling the endpoint. The application-side DO_INFERENCE_URL keeps the /v1 so direct curl calls to DigitalOcean Inference still work.

Parameters

Parameter Required Description
class Yes Collection name. Must start with an uppercase letter.
vectorizer Yes "text2vec-openai" to have Weaviate embed via an OpenAI-compatible endpoint (DigitalOcean Serverless Inference here). Use "none" for bring-your-own vectors.
moduleConfig.text2vec-openai.model Yes The DigitalOcean-hosted embedding model to use. Bound to the collection at creation time.
moduleConfig.text2vec-openai.baseURL Yes The DigitalOcean Inference base URL, without the /v1 suffix.
moduleConfig.text2vec-openai.vectorizeClassName No If true, the class name is included in the embedding input. Usually safer to set false.
vectorIndexType No hnsw (default) or flat. Use hnsw for production.
vectorIndexConfig.distance No cosine (default), l2-squared, dot, or hamming.
vectorIndexConfig.quantizer No rq, pq, bq, or sq. The cluster default is rq at 8 bits.
properties[].indexFilterable No Set true for properties you filter on. Builds an inverted index.
properties[].moduleConfig.text2vec-openai.skip No Set true on properties you do not want included in the embedding (author and tags here, which are facets rather than search content).

The response echoes the schema with all defaults populated. Sample output:

{
  "class": "Article",
  "vectorizer": "text2vec-openai",
  "vectorIndexType": "hnsw",
  "moduleConfig": {
    "text2vec-openai": {
      "baseURL": "https://inference.do-ai.run",
      "model": "gte-large-en-v1.5",
      "vectorizeClassName": false
    }
  },
  "invertedIndexConfig": {
    "bm25": { "b": 0.75, "k1": 1.2 },
    "stopwords": { "preset": "en" }
  },
  "vectorIndexConfig": {
    "distance": "cosine",
    "rq": { "enabled": true, "bits": 8, "rescoreLimit": 20 }
  },
  "replicationConfig": { "factor": 3 },
  "shardingConfig": { "actualCount": 3 }
}

Use DigitalOcean Serverless Inference for Embeddings

DigitalOcean Serverless Inference is recommended for embeddings on Managed Weaviate:

  • Less application code: With the server-side vectorizer above, your application sends raw text on insert and raw text on search. Weaviate handles the embedding round trip and the failure and retry logic.
  • Single-vendor stack: Your vector database and your embedding model both live inside DigitalOcean. The network path between them stays inside the DigitalOcean backbone, which keeps embed-then-insert latency low at high throughput.
  • OpenAI-compatible API: The endpoint speaks the OpenAI /v1/embeddings contract, so Weaviate’s text2vec-openai module works against it unchanged. The same endpoint also works for any OpenAI SDK or curl recipe if you ever need to call it directly.
  • Open-source embedding models: DigitalOcean hosts open-source models you would otherwise have to deploy yourself. See the AI Platform model catalog for the current list.
  • Pay-per-token billing: No minimum commitment.

Embedding Models

A representative set of embedding models hosted on DigitalOcean AI Platform:

Model Dimensions Best for
gte-large-en-v1.5 1024 General-purpose English text. Strong MTEB performance with 8K token context. Good default for English RAG.
Qwen3-Embedding-0.6B 1024 Multilingual content (100+ languages) with flexible dimension sizing. Reach for this when you want to trade off vector size against quality, or need strong multilingual and code retrieval.
all-MiniLM-L6-v2 384 Lightweight, fast English embeddings for short text (up to 256 tokens). Best when latency, storage, and cost matter more than peak accuracy, for example, high-volume semantic search on snippets.
multi-qa-mpnet-base-dot-v1 768 English question-answering and semantic search over short passages (up to 512 tokens). Tuned specifically for query-to-passage retrieval; uses dot-product similarity.
bge-m3 1024 (dense) Multilingual content (100+ languages) and long passages (up to 8K tokens). Reach for this when your corpus spans languages or contains long-form documents. Also supports sparse and ColBERT outputs for hybrid retrieval.
e5-large-v2 1024 High-recall English search, strong on long documents. Requires query: and passage: prefixes (capped at 512 tokens despite the “long documents” reputation).

The catalog evolves. Always check the AI Platform model catalog for the current list and exact model IDs.

Switch Models

With a server-side vectorizer, the embedding model is bound to the collection at creation time. To switch models, create a new collection with the new model and re-ingest.

Vectors from different models are not interchangeable. Mixing models within a single collection is unsupported. Always create a new collection when changing the embedding model.

Step 2: Load Data

Because the collection has a server-side vectorizer, you do not embed in your application. You send the raw properties and Weaviate calls DigitalOcean Serverless Inference for you. The DigitalOcean Inference API key is forwarded as a per-request header (X-OpenAI-Api-Key) so the cluster can authenticate without storing the key.

Insert a Single Object

curl -X POST "https://$WEAVIATE_URL/v1/objects" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "class": "Article",
    "properties": {
      "title":  "Cold brew coffee",
      "body":   "Steeped for 12 hours at room temperature.",
      "author": "A. Roaster",
      "tags":   ["coffee", "recipe"]
    }
  }'

Sample output:

{
  "class": "Article",
  "id": "c8bef156-691b-4782-a889-a8907a3a75e2",
  "creationTimeUnix": 1777246488487,
  "lastUpdateTimeUnix": 1777246488487,
  "properties": {
    "title":  "Cold brew coffee",
    "body":   "Steeped for 12 hours at room temperature.",
    "author": "A. Roaster",
    "tags":   ["coffee", "recipe"]
  },
  "vector": [0.002541669, -0.062028483, -0.6472548, "...1021 more dims..."]
}

Weaviate populates vector from DigitalOcean Inference automatically. The vector dimension matches the model (1024 for gte-large-en-v1.5).

Insert in Batches

For ingest beyond a handful of objects, use the batch endpoint. The embedding round trip and the request overhead happen once per batch instead of once per object.

curl -X POST "https://$WEAVIATE_URL/v1/batch/objects" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "objects": [
      { "class": "Article", "properties": { "title": "Iced coffee",      "body": "Brewed hot, served on ice." } },
      { "class": "Article", "properties": { "title": "Espresso",         "body": "Pulled as a double shot." } },
      { "class": "Article", "properties": { "title": "French press",     "body": "Coarse grind, four minute steep, then plunge." } },
      { "class": "Article", "properties": { "title": "Aeropress recipe", "body": "One scoop, hot water just off boil, invert and press in 30s." } }
    ]
  }'

Sample output:

[
  {
    "class": "Article",
    "id": "87b122fb-f090-45d5-995f-8ef5242c665d",
    "properties": { "title": "Iced coffee", "body": "Brewed hot, served on ice." },
    "vector": [-1.0839128, -0.92500794, -1.3542689, "...1021 more dims..."],
    "result": { "status": "SUCCESS" }
  },
  {
    "class": "Article",
    "properties": { "title": "Espresso", "body": "Pulled as a double shot." },
    "vector": ["...1024 dims..."],
    "result": { "status": "SUCCESS" }
  },
  {
    "class": "Article",
    "properties": { "title": "French press", "body": "Coarse grind, four minute steep, then plunge." },
    "vector": ["...1024 dims..."],
    "result": { "status": "SUCCESS" }
  },
  {
    "class": "Article",
    "properties": { "title": "Aeropress recipe", "body": "One scoop, hot water just off boil, invert and press in 30s." },
    "vector": ["...1024 dims..."],
    "result": { "status": "SUCCESS" }
  }
]

For batch sizing, start with 50 to 100 objects per call for vectors up to 1024 dimensions. Drop to 25 to 50 for 1536+ dimensional vectors so the request body and the embedding round trip stay under the load balancer limit.

If DigitalOcean Inference returns an error during ingest (rate limit, auth failure), Weaviate surfaces it per object in the batch response. Inspect the response and retry the failed objects.

Hybrid search blends vector similarity and BM25 keyword scoring with a tunable alpha. Use it as the default for end-user search UIs because users mix semantic intent (“cold brew at home”) with exact-match terms (“Aeropress”).

With the server-side vectorizer, you only pass the text query. Weaviate embeds it via DigitalOcean Serverless Inference, runs both halves in parallel, and blends the scores.

curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(hybrid: {query: \"cold brew coffee\", alpha: 0.5, properties: [\"title^2\", \"body\"]}, limit: 10) { title body author _additional { score } } } }"
  }'

Parameters

Parameter Description
query The raw text query. Used for the BM25 side and embedded by Weaviate for the vector side.
alpha 0.0 is pure BM25 (keyword only). 1.0 is pure vector. 0.5 is a sensible default. Lower toward 0 when users search by exact identifier, raise toward 1 for long, conversational queries.
properties Which properties to score against. Use ^N to boost. "title^2" weights title matches twice as much as body matches.
limit Maximum objects to return.

Sample output:

{
  "data": {
    "Get": {
      "Article": [
        {
          "title": "Cold brew coffee",
          "body": "Steeped for 12 hours at room temperature.",
          "author": "A. Roaster",
          "_additional": { "score": "1" }
        },
        {
          "title": "Aeropress recipe",
          "body": "One scoop, hot water just off boil, invert and press in 30s.",
          "author": "A. Roaster",
          "_additional": { "score": "0.3498051" }
        },
        {
          "title": "Iced coffee",
          "body": "Brewed hot, served on ice.",
          "author": "B. Bean",
          "_additional": { "score": "0.34319013" }
        },
        {
          "title": "French press",
          "body": "Coarse grind, four minute steep, then plunge.",
          "author": "D. Drip",
          "_additional": { "score": "0.31527957" }
        },
        {
          "title": "Espresso",
          "body": "Pulled as a double shot.",
          "author": "C. Crema",
          "_additional": { "score": "0.12629698" }
        },
        {
          "title": "Green tea",
          "body": "Steep at 80C for two minutes, no longer.",
          "author": "T. Leaf",
          "_additional": { "score": "0" }
        }
      ]
    }
  }
}

The top hit (Cold brew coffee) scores 1 because both halves of the hybrid score, vector similarity and BM25 keyword overlap, peg the highest match on it. Coffee-related results follow with intermediate scores. Off-topic content (Green tea) drops to 0.

For pure semantic search with no BM25 contribution, use nearText instead. Weaviate still embeds the query via DigitalOcean Inference for you, then ranks by vector distance only. This is useful when the user query is conversational and unlikely to share keywords with the indexed text.

curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(nearText: {concepts: [\"how do I make cold brew at home?\"]}, limit: 5) { title body _additional { distance } } } }"
  }'

Sample output:

{
  "data": {
    "Get": {
      "Article": [
        {
          "title": "Cold brew coffee",
          "body": "Steeped for 12 hours at room temperature.",
          "_additional": { "distance": 0.32995296 }
        },
        {
          "title": "Aeropress recipe",
          "body": "One scoop, hot water just off boil, invert and press in 30s.",
          "_additional": { "distance": 0.33600712 }
        },
        {
          "title": "French press",
          "body": "Coarse grind, four minute steep, then plunge.",
          "_additional": { "distance": 0.36295372 }
        },
        {
          "title": "Pour over basics",
          "body": "Bloom for thirty seconds, then pour in slow concentric circles.",
          "_additional": { "distance": 0.38054264 }
        }
      ]
    }
  }
}

Distance is lower-is-better (0 is identical). The nearText query returns coffee-related results even though the query string does not contain the word “coffee”. This is semantic matching rather than keyword matching. Use a distance threshold in your application code to filter out weak matches.

Step 5: Run a Keyword Search (BM25)

BM25 ignores vectors entirely and ranks on keyword overlap. Use it when your query is dominated by proper nouns, identifiers, or terms where semantic similarity hurts more than helps. Keyword search runs against the inverted index and does not require the X-OpenAI-Api-Key header. There is no embedding round trip.

curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(bm25: {query: \"cold brew\", properties: [\"title^2\", \"body\"]}, limit: 5) { title body author _additional { score } } } }"
  }'

Parameters

Parameter Description
query The keyword string. Tokenized and matched against the inverted index.
properties Which properties to search. Use ^N to boost. "title^2" weights title matches twice as much as body matches.
limit Maximum objects to return.

Sample output:

{
  "data": {
    "Get": {
      "Article": [
        {
          "title": "Cold brew coffee",
          "body": "Steeped for 12 hours at room temperature.",
          "author": "A. Roaster",
          "_additional": { "score": "1.6965694" }
        }
      ]
    }
  }
}

Only the article that literally contains the words cold and brew matches. Semantically related results (Aeropress, French press) are absent because they do not share the keywords.

curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
  -H "Authorization: Bearer $WEAVIATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(bm25: {query: \"cold brew\", properties: [\"title\", \"body\"]}, where: {path: [\"author\"], operator: Equal, valueText: \"A. Roaster\"}, limit: 5) { title body author _additional { score } } } }"
  }'

Sample output:

{
  "data": {
    "Get": {
      "Article": [
        {
          "title": "Cold brew coffee",
          "body": "Steeped for 12 hours at room temperature.",
          "author": "A. Roaster",
          "_additional": { "score": "0.8482847" }
        }
      ]
    }
  }
}

The BM25 score is lower than the unfiltered version because property boosts changed (title is no longer weighted ^2), but the filter has restricted the result set to articles by A. Roaster.

Step 6: Rerank Candidates

Hybrid and vector search return a recall set: the top N candidates that look promising. For higher-stakes use cases like RAG context selection or end-user search UIs where the quality of the top three results matters most, run a second pass through a reranker to re-score the candidates with a more expensive, higher-precision cross-encoder.

DigitalOcean Serverless Inference exposes a /rerank endpoint that takes a query plus the candidate documents (as plain strings) and returns a re-ordered list of indices with relevance scores.

curl -X POST "$DO_INFERENCE_URL/rerank" \
  -H "Authorization: Bearer $DO_INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "how do I make cold brew at home?",
    "documents": [
      "Cold brew coffee. Steeped for 12 hours at room temperature.",
      "Aeropress recipe. One scoop, hot water just off boil, invert and press in 30s.",
      "French press. Coarse grind, four minute steep, then plunge.",
      "Pour over basics. Bloom for thirty seconds, then pour in slow concentric circles."
    ]
  }'

Parameters

Parameter Required Description
model Yes The reranker model. See the AI Platform model catalog.
query Yes The user’s query, identical to what you passed to hybrid or nearText.
documents Yes The candidate documents to re-score, as plain strings. Concatenate the searchable fields (for example title + ". " + body) yourself before passing them in.

Sample output:

{
  "results": [
    { "index": 0, "relevance_score": -0.6698001623153687 },
    { "index": 3, "relevance_score": -5.365661144256592 },
    { "index": 1, "relevance_score": -6.568672180175781 },
    { "index": 2, "relevance_score": -6.795877456665039 }
  ],
  "usage": { "total_tokens": 129 }
}

The index field refers to the position of each document in your input array. Use it to map back to the original Weaviate objects (UUIDs, properties, and so on). Results are returned sorted by relevance_score, highest first.

Pure-keyword (BM25) results do not usually benefit much from reranking. If BM25 found a literal match, that is already a strong signal. Save the rerank pass for hybrid and pure-vector results where the top of the list is a fuzzy neighborhood of similar items, and where the latency budget allows the extra inference call.

Best Practices

  • Skip non-search properties from the vectorizer: Set moduleConfig.text2vec-openai.skip: true on facet properties (author, tags, IDs) so they do not dilute the embedding. The example schema does this for author and tags.
  • Mark filter properties indexFilterable: true: Filters that hit the inverted index are far cheaper than ones that do not.
  • Cache hot query embeddings at the application layer: Server-side vectorization means every vector or hybrid query triggers a DigitalOcean Inference call. For repeat queries (autocomplete, popular searches), consider caching the response at the application layer.
  • Plan for model immutability: The embedding model is fixed at collection creation. To change models, create a parallel collection, re-ingest, and use a collection alias for the cutover.
  • Start with hybrid at alpha = 0.5: Tune from there. Lower toward 0 when users search by exact identifier, raise toward 1 for long, conversational queries.
  • Use the SDKs for high-throughput production workloads: The Python, TypeScript, Go, and Java clients connect over gRPC at WEAVIATE_GRPC_URL:443, handle retries and back-pressure, and tune batch sizes automatically. The curl examples in this guide are useful for prototyping, scripting, and debugging.

Next Steps

We can't find any results for your search.

Try using different keywords or simplifying your search terms.