Ingest and Retrieve Data on Managed Weaviateprivate
Validated on 28 Apr 2026 • Last edited on 8 May 2026
DigitalOcean Managed Weaviate is a fully managed Weaviate vector database for retrieval-augmented generation, semantic search, and similarity-based AI workloads. Clusters are provisioned, secured, backed up, and patched by DigitalOcean.
Managed Weaviate’s data plane lets you define collections, ingest objects, and run hybrid, vector, and keyword searches using embeddings generated with DigitalOcean Serverless Inference.
For provisioning, credentials, scaling, and backups, use DigitalOcean Managed Weaviate.
Overview
You can configure Weaviate with a server-side vectorizer, so Weaviate calls DigitalOcean Serverless Inference on your behalf for every insert and every vector or hybrid query. Your application sends raw text and Weaviate manages the embedding process.
Managed Weaviate exposes the same data plane API as open-source Weaviate:
- REST: Endpoints at
/v1/schema,/v1/objects, and/v1/batch/*for schema management and data ingestion. - GraphQL: At
/v1/graphqlfor search and aggregation queries. - gRPC: On a separate
-grpchostname (used by Weaviate SDKs).
Both the HTTP and gRPC endpoints run over TLS on port 443. The sections below use curl for end-to-end examples so you can run everything from a terminal without installing an SDK. The same operations are supported in the Weaviate client libraries for Python, JavaScript and TypeScript, Go, and Java when you’re ready to move into production.
Prerequisites
- Managed Weaviate cluster: An
activecluster. See How to Provision and Connect to a Cluster. - AI Platform workspace: A DigitalOcean AI Platform workspace with a Serverless Inference endpoint configured.
- Weaviate endpoints and token: Your HTTP and gRPC endpoints, plus API token, from
GET /v2/vector-databases/{id}/credentials. - Inference credentials: Your DigitalOcean Inference base URL and API key.
- Tools:
curl, and optionallyjqfor formatting JSON output.
Set Environment Variables
The examples below assume the following variables are exported:
export WEAVIATE_URL="my-vector-db-abc123.weaviate.digitalocean.com"
export WEAVIATE_GRPC_URL="my-vector-db-abc123-grpc.weaviate.digitalocean.com"
export WEAVIATE_HTTP_PORT=443
export WEAVIATE_GRPC_PORT=443
export WEAVIATE_API_KEY="<your-api-token-from-credentials>"
export DO_INFERENCE_URL="https://inference.do-ai.run/v1"
export DO_INFERENCE_API_KEY="<your-do-inference-api-key>"
export DO_EMBED_MODEL="gte-large-en-v1.5"The HTTP and gRPC endpoints are separate hostnames. The gRPC hostname has a -grpc suffix. The sections below use HTTP and GraphQL via WEAVIATE_URL. The gRPC variables are useful when you connect with an SDK later.
Step 1: Create a Collection
A collection is the top-level container for your objects, the rough equivalent of a table. The example below creates an Article collection with title, body, author, and tags. The vectorizer is set to text2vec-openai and pointed at DigitalOcean Serverless Inference, so Weaviate embeds properties on insert and at search time. The index uses the cluster’s default RQ8 compression.
curl -X POST "https://$WEAVIATE_URL/v1/schema" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "gte-large-en-v1.5",
"baseURL": "https://inference.do-ai.run",
"vectorizeClassName": false
}
},
"vectorIndexType": "hnsw",
"vectorIndexConfig": {
"distance": "cosine",
"rq": { "enabled": true, "bits": 8 }
},
"properties": [
{ "name": "title", "dataType": ["text"] },
{ "name": "body", "dataType": ["text"] },
{ "name": "author", "dataType": ["text"], "indexFilterable": true,
"moduleConfig": { "text2vec-openai": { "skip": true } } },
{ "name": "tags", "dataType": ["text[]"], "indexFilterable": true,
"moduleConfig": { "text2vec-openai": { "skip": true } } }
]
}'The baseURL in the schema is https://inference.do-ai.run with no /v1 suffix. Weaviate’s text2vec-openai module appends the OpenAI path itself when calling the endpoint. The application-side DO_INFERENCE_URL keeps the /v1 so direct curl calls to DigitalOcean Inference still work.
Parameters
| Parameter | Required | Description |
|---|---|---|
class |
Yes | Collection name. Must start with an uppercase letter. |
vectorizer |
Yes | "text2vec-openai" to have Weaviate embed via an OpenAI-compatible endpoint (DigitalOcean Serverless Inference here). Use "none" for bring-your-own vectors. |
moduleConfig.text2vec-openai.model |
Yes | The DigitalOcean-hosted embedding model to use. Bound to the collection at creation time. |
moduleConfig.text2vec-openai.baseURL |
Yes | The DigitalOcean Inference base URL, without the /v1 suffix. |
moduleConfig.text2vec-openai.vectorizeClassName |
No | If true, the class name is included in the embedding input. Commonly set to false. |
vectorIndexType |
No | hnsw (default) or flat. Use hnsw for production. |
vectorIndexConfig.distance |
No | cosine (default), l2-squared, dot, or hamming. |
vectorIndexConfig.quantizer |
No | rq, pq, bq, or sq. The cluster default is rq at 8 bits. |
properties[].indexFilterable |
No | Set true for properties you filter on. Builds an inverted index. |
properties[].moduleConfig.text2vec-openai.skip |
No | Set true on properties you do not want included in the embedding (author and tags here, which are facets rather than search content). |
The response echoes the schema with all defaults populated. Sample output:
{
"class": "Article",
"vectorizer": "text2vec-openai",
"vectorIndexType": "hnsw",
"moduleConfig": {
"text2vec-openai": {
"baseURL": "https://inference.do-ai.run",
"model": "gte-large-en-v1.5",
"vectorizeClassName": false
}
},
"invertedIndexConfig": {
"bm25": { "b": 0.75, "k1": 1.2 },
"stopwords": { "preset": "en" }
},
"vectorIndexConfig": {
"distance": "cosine",
"rq": { "enabled": true, "bits": 8, "rescoreLimit": 20 }
},
"replicationConfig": { "factor": 3 },
"shardingConfig": { "actualCount": 3 }
}Use DigitalOcean Serverless Inference for Embeddings
DigitalOcean Serverless Inference is recommended for embeddings with Managed Weaviate:
- Less application code: With the server-side vectorizer, your application sends raw text on insert and on search. Weaviate performs embedding requests and handles retries.
- Single-vendor stack: Your vector database and embedding model both run on DigitalOcean. Traffic between them stays on the DigitalOcean network, which reduces latency for embedding and ingestion.
- OpenAI-compatible API: The endpoint speaks the OpenAI
/v1/embeddingscontract, so Weaviate’stext2vec-openaimodule works against it unchanged. The same endpoint also works for any OpenAI SDK orcurlrecipe if you ever need to call it directly. - Open-source embedding models: DigitalOcean hosts open-source models, so you do not need to deploy them yourself. See the AI Platform model catalog for the current list.
- Pay-per-token billing: No minimum commitment.
Embedding Models
A representative set of embedding models hosted on DigitalOcean Inference:
| Model | Dimensions | Best for |
|---|---|---|
gte-large-en-v1.5 |
1024 | General-purpose English text. Strong MTEB performance with 8K token context. Good default for English RAG. |
Qwen3-Embedding-0.6B |
1024 | Multilingual content (100+ languages) with flexible dimension sizing. Suitable when you want to trade off vector size against quality, or need strong multilingual and code retrieval. |
all-MiniLM-L6-v2 |
384 | Lightweight, fast English embeddings for short text (up to 256 tokens). Best when latency, storage, and cost matter more than peak accuracy, for example, high-volume semantic search on snippets. |
multi-qa-mpnet-base-dot-v1 |
768 | English question-answering and semantic search over short passages (up to 512 tokens). Tuned specifically for query-to-passage retrieval; uses dot-product similarity. |
bge-m3 |
1024 (dense) | Multilingual content (100+ languages) and long passages (up to 8K tokens). Reach for this when your corpus spans languages or contains long-form documents. Also supports sparse and ColBERT outputs for hybrid retrieval. |
e5-large-v2 |
1024 | High-recall English search, strong on long documents. Requires query: and passage: prefixes (capped at 512 tokens despite the “long documents” reputation). |
The catalog evolves. Always check the AI Platform model catalog for the current list and exact model IDs.
Switch Models
With a server-side vectorizer, the embedding model is bound to the collection at creation time. To switch models, create a new collection with the new model and re-ingest.
Vectors from different models are not interchangeable. Mixing models within a single collection is unsupported. Always create a new collection when changing the embedding model.
Step 2: Load Data
Because the collection has a server-side vectorizer, you do not embed in your application. You send the raw properties and Weaviate calls DigitalOcean Serverless Inference. The DigitalOcean Inference API key is forwarded as a per-request header (X-OpenAI-Api-Key) so the cluster can authenticate without storing the API key.
Insert a Single Object
curl -X POST "https://$WEAVIATE_URL/v1/objects" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"properties": {
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"author": "A. Roaster",
"tags": ["coffee", "recipe"]
}
}'Sample output:
{
"class": "Article",
"id": "c8bef156-691b-4782-a889-a8907a3a75e2",
"creationTimeUnix": 1777246488487,
"lastUpdateTimeUnix": 1777246488487,
"properties": {
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"author": "A. Roaster",
"tags": ["coffee", "recipe"]
},
"vector": [0.002541669, -0.062028483, -0.6472548, "...1021 more dims..."]
}Weaviate populates vector from DigitalOcean Inference automatically. The vector dimension matches the model (1024 for gte-large-en-v1.5).
Insert in Batches
For ingest beyond a handful of objects, use the batch endpoint. Embedding requests and HTTP overhead occur once per batch instead of once per object.
curl -X POST "https://$WEAVIATE_URL/v1/batch/objects" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"objects": [
{ "class": "Article", "properties": { "title": "Iced coffee", "body": "Brewed hot, served on ice." } },
{ "class": "Article", "properties": { "title": "Espresso", "body": "Pulled as a double shot." } },
{ "class": "Article", "properties": { "title": "French press", "body": "Coarse grind, four minute steep, then plunge." } },
{ "class": "Article", "properties": { "title": "Aeropress recipe", "body": "One scoop, hot water just off boil, invert and press in 30s." } }
]
}'Sample output:
[
{
"class": "Article",
"id": "87b122fb-f090-45d5-995f-8ef5242c665d",
"properties": { "title": "Iced coffee", "body": "Brewed hot, served on ice." },
"vector": [-1.0839128, -0.92500794, -1.3542689, "...1021 more dims..."],
"result": { "status": "SUCCESS" }
},
{
"class": "Article",
"properties": { "title": "Espresso", "body": "Pulled as a double shot." },
"vector": ["...1024 dims..."],
"result": { "status": "SUCCESS" }
},
{
"class": "Article",
"properties": { "title": "French press", "body": "Coarse grind, four minute steep, then plunge." },
"vector": ["...1024 dims..."],
"result": { "status": "SUCCESS" }
},
{
"class": "Article",
"properties": { "title": "Aeropress recipe", "body": "One scoop, hot water just off boil, invert and press in 30s." },
"vector": ["...1024 dims..."],
"result": { "status": "SUCCESS" }
}
]For batch sizing, start with 50 to 100 objects per call for vectors up to 1024 dimensions. Drop to 25 to 50 for 1536+ dimensional vectors so the request body and the embedding requests stay within load balancer limits.
If DigitalOcean Inference returns an error during ingest (rate limit, auth failure), Weaviate surfaces it per object in the batch response. Inspect the response and retry failed objects.
Step 3: Run a Hybrid Search
Hybrid search blends vector similarity and BM25 keyword scoring with a tunable alpha. It is commonly used for end-user search UIs because queries often mix semantic intent (“cold brew at home”) with exact-match terms (“Aeropress”).
With the server-side vectorizer, you only pass the text query. Weaviate embeds the query, executes both vector and keyword searches, and combines the scores.
curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(hybrid: {query: \"cold brew coffee\", alpha: 0.5, properties: [\"title^2\", \"body\"]}, limit: 10) { title body author _additional { score } } } }"
}'Parameters
| Parameter | Description |
|---|---|
query |
The raw text query. Used for the BM25 side and embedded by Weaviate for the vector side. |
alpha |
0.0 is pure BM25 (keyword only). 1.0 is pure vector. 0.5 is a sensible default. Lower toward 0 when users search by exact identifier, raise toward 1 for long, conversational queries. |
properties |
Which properties to score against. Use ^N to boost. "title^2" weights title matches twice as much as body matches. |
limit |
Maximum objects to return. |
Sample output:
{
"data": {
"Get": {
"Article": [
{
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"author": "A. Roaster",
"_additional": { "score": "1" }
},
{
"title": "Aeropress recipe",
"body": "One scoop, hot water just off boil, invert and press in 30s.",
"author": "A. Roaster",
"_additional": { "score": "0.3498051" }
},
{
"title": "Iced coffee",
"body": "Brewed hot, served on ice.",
"author": "B. Bean",
"_additional": { "score": "0.34319013" }
},
{
"title": "French press",
"body": "Coarse grind, four minute steep, then plunge.",
"author": "D. Drip",
"_additional": { "score": "0.31527957" }
},
{
"title": "Espresso",
"body": "Pulled as a double shot.",
"author": "C. Crema",
"_additional": { "score": "0.12629698" }
},
{
"title": "Green tea",
"body": "Steep at 80C for two minutes, no longer.",
"author": "T. Leaf",
"_additional": { "score": "0" }
}
]
}
}
}The top hit (Cold brew coffee) scores 1 because both components of the hybrid score, vector similarity and BM25 keyword overlap, rank it highest. Coffee-related results follow with intermediate scores. Off-topic content (Green tea) receives a score of 0.
Step 4: Run a Pure Vector Search
For semantic search without BM25 scoring, use nearText. Weaviate embeds the query and ranks results by vector distance. This is useful when queries are conversational and unlikely to share keywords with the indexed text.
curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-OpenAI-Api-Key: $DO_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(nearText: {concepts: [\"how do I make cold brew at home?\"]}, limit: 5) { title body _additional { distance } } } }"
}'Sample output:
{
"data": {
"Get": {
"Article": [
{
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"_additional": { "distance": 0.32995296 }
},
{
"title": "Aeropress recipe",
"body": "One scoop, hot water just off boil, invert and press in 30s.",
"_additional": { "distance": 0.33600712 }
},
{
"title": "French press",
"body": "Coarse grind, four minute steep, then plunge.",
"_additional": { "distance": 0.36295372 }
},
{
"title": "Pour over basics",
"body": "Bloom for thirty seconds, then pour in slow concentric circles.",
"_additional": { "distance": 0.38054264 }
}
]
}
}
}Distance is lower-is-better (0 indicates identical vectors). The nearText query returns coffee-related results even though the query string does not contain the word “coffee”. This reflects semantic matching rather than keyword matching. Use a distance threshold in application code to filter out weak matches.
Step 5: Run a Keyword Search (BM25)
BM25 ignores vectors and ranks based on keyword overlap. This is useful when queries are dominated by proper nouns, identifiers, or terms where semantic similarity is less effective. Keyword search runs against the inverted index and does not require the X-OpenAI-Api-Key header. No embedding requests are made.
curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(bm25: {query: \"cold brew\", properties: [\"title^2\", \"body\"]}, limit: 5) { title body author _additional { score } } } }"
}'Parameters
| Parameter | Description |
|---|---|
query |
The keyword string. Tokenized and matched against the inverted index. |
properties |
Which properties to search. Use ^N to boost. "title^2" weights title matches twice as much as body matches. |
limit |
Maximum objects to return. |
Sample output:
{
"data": {
"Get": {
"Article": [
{
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"author": "A. Roaster",
"_additional": { "score": "1.6965694" }
}
]
}
}
}Only the article containing the words cold and brew matches. Semantically related results (Aeropress, French press) are absent because they do not share those keywords.
Filter a Keyword Search
curl -X POST "https://$WEAVIATE_URL/v1/graphql" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(bm25: {query: \"cold brew\", properties: [\"title\", \"body\"]}, where: {path: [\"author\"], operator: Equal, valueText: \"A. Roaster\"}, limit: 5) { title body author _additional { score } } } }"
}'Sample output:
{
"data": {
"Get": {
"Article": [
{
"title": "Cold brew coffee",
"body": "Steeped for 12 hours at room temperature.",
"author": "A. Roaster",
"_additional": { "score": "0.8482847" }
}
]
}
}
}The BM25 score is lower than the unfiltered version because property boosts changed (title is no longer weighted ^2), but the filter restricts the result set to articles by A. Roaster.
Step 6: Rerank Candidates
Hybrid and vector search return a recall set: the top N candidates that look promising. For uses like RAG context selection or search UIs where top results need additional precision, run a second pass through a reranker to re-score candidates with a higher-precision cross-encoder.
DigitalOcean Serverless Inference exposes a /rerank endpoint that takes a query and candidate documents (as plain strings) and returns a re-ordered list of indices with relevance scores.
curl -X POST "$DO_INFERENCE_URL/rerank" \
-H "Authorization: Bearer $DO_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "how do I make cold brew at home?",
"documents": [
"Cold brew coffee. Steeped for 12 hours at room temperature.",
"Aeropress recipe. One scoop, hot water just off boil, invert and press in 30s.",
"French press. Coarse grind, four minute steep, then plunge.",
"Pour over basics. Bloom for thirty seconds, then pour in slow concentric circles."
]
}'Parameters
| Parameter | Required | Description |
|---|---|---|
model |
Yes | The reranker model. See the AI Platform model catalog. |
query |
Yes | The query string, identical to what you passed to hybrid or nearText. |
documents |
Yes | Candidate documents to re-score, as plain strings. Concatenate searchable fields (for example title + ". " + body) before passing them in. |
Sample output:
{
"results": [
{ "index": 0, "relevance_score": -0.6698001623153687 },
{ "index": 3, "relevance_score": -5.365661144256592 },
{ "index": 1, "relevance_score": -6.568672180175781 },
{ "index": 2, "relevance_score": -6.795877456665039 }
],
"usage": { "total_tokens": 129 }
}The index field refers to the position of each document in your input array. Use it to map back to the original Weaviate objects (UUIDs, properties, and so on). Results are returned sorted by relevance_score, highest first.
Pure keyword (BM25) results generally do not benefit from reranking. If BM25 returns a literal match, that is already a strong signal. Reranking is more useful for hybrid and pure vector results, where top candidates may be semantically similar and require additional scoring to order precisely.
Best Practices
- Skip non-search properties from the vectorizer: Set
moduleConfig.text2vec-openai.skip: trueon properties likeauthor,tags, and IDs so they are not included in embeddings. The example schema applies this toauthorandtags. - Mark filter properties with
indexFilterable: true: Filters that use the inverted index are more efficient than those that do not. - Cache frequently used query embeddings at the application layer: Server-side vectorization means each vector or hybrid query triggers a DigitalOcean Inference call. For repeated queries (for example, autocomplete or popular searches), consider caching results at the application layer.
- Plan for model immutability: The embedding model is fixed at collection creation time. To change models, create a new collection, re-ingest data, and use a collection alias for cutover.
- Start with hybrid at
alpha = 0.5: Tune from there. Lower toward0for identifier-heavy queries, raise toward1for longer, conversational queries. - Use the SDKs for high-throughput workloads: The Python, TypeScript, Go, and Java clients connect over gRPC at
WEAVIATE_GRPC_URL:443, handle retries and back-pressure, and manage batching. Thecurlexamples in this article are useful for prototyping, scripting, and debugging.
Next Steps
- Weaviate search documentation for the full search API surface (
nearVector,nearText, generative search, group-by, aggregation) - DigitalOcean AI Platform models for the current embedding and chat model catalog
- Weaviate client libraries for Python, JavaScript and TypeScript, Go, and Java SDKs