Give Feedback

DigitalOcean Knowledge Bases Features

Last verified 8 May 2026

DigitalOcean Knowledge Bases let you store, index, and retrieve data from private files, websites, Spaces buckets, and other sources to power retrieval-augmented generation with your own content.

Copy page as Markdown View page as Markdown

A knowledge base is a private repository of unstructured content, such as files, folders, and URLs, that improves agent responses using retrieval-augmented generation (RAG). Knowledge bases store source data in DigitalOcean Spaces object storage and store indexes in a DigitalOcean OpenSearch cluster.

You can add data sources to knowledge bases from Spaces buckets, local files, seed or site map URLs, Dropbox folders, and Amazon S3 buckets.

Embeddings Models

The embeddings models convert unstructured data into vector embeddings so AI agents can find content that matches a user’s input.

Chunking

Chunking controls how documents are split before indexing. You configure chunking per data source and use different strategies in the same knowledge base.

Data Services supports section-based, semantic, hierarchical, and fixed length chunking for different document types, retrieval patterns, and cost needs.

Section-based, semantic, hierarchical, and fixed-length chunking support different document types, retrieval patterns, and cost requirements. Section-based chunking follows document structure, semantic chunking groups related content by meaning, hierarchical chunking preserves section and subsection relationships, and fixed-length chunking splits content into uniform chunk sizes. For more details and recommendations, see our chunking best practices and chunking parameters reference.

Retrieve

The retrieve feature lets you query a knowledge base for relevant chunks, apply metadata filters, and review the results for use in your applications and agent workflows. Each user query is vectorized using the knowledge base’s selected embeddings model.

You can retrieve data and run semantic, keyword, or hybrid searches, review scored chunks, and generate live Gradient SDK (Python) and cURL examples from the current query via the Control Panel or API.

You can optionally enable reranking to re-score and reorder retrieved chunks so the most relevant results appear first.

Knowledge base retrieval is also available through an MCP server for querying, filtering, and retrieving chunks. For setup information, see Knowledge Bases MCP Tools.

Reranking

Reranking re-scores and reorders retrieved chunks after the initial search so the most relevant results appear first. Enabling reranking can improve retrieval quality, especially for ambiguous queries, large data sets, or content with similar language across many chunks. This often leads to better grounded generated responses because the strongest matches are more likely to be included.

However, reranking also adds latency and separate token charges, so it is best used when relevance is more important than response speed or cost. For pricing details, see knowledge base pricing.

RAG Playground

RAG Playground lets you test how a selected serverless inference model answers a query using content retrieved from a knowledge base. You can enter a query, choose a model, and adjust settings such as system instructions, max tokens, and temperature.

RAG Playground shows the generated answer alongside retrieved chunks, including source details, page numbers, relevance scores, and which chunks were used in the response.

Auto-Indexing

Auto-indexing keeps data sources up to date by re-indexing changes on a recurring schedule.

Activity Logs

Activity logs give you visibility into indexing jobs for each knowledge base. You can view recent activity and download CSVs for debugging.