DigitalOcean Knowledge Bases Quickstart
Validated on 23 Apr 2026 • Last edited on 27 Apr 2026
DigitalOcean Knowledge Bases let you store, index, and retrieve data from private files, websites, Spaces buckets, and other sources to power retrieval-augmented generation with your own content.
Create a DigitalOcean Knowledge Base
You can create a knowledge base, which is a collection of indexed content on a specific topic or domain. A knowledge base gives extra context during generation, such as your company’s internal documentation.
To create a knowledge base:
-
From the DigitalOcean Control Panel, in the top-right, click Create, and then under the DATA SERVICES section, click Knowledge Base to open the Create a knowledge base page.
-
In the Add data step, under the Choose your embeddings model section, click the Embeddings model dropdown list to select a model used for indexing your data. Choose one based on your data type and token budget. For options, see available embeddings models.
-
Under the Choosing your reranking model section, click the Reranking model (optional) dropdown list, and then select the available reranking model to re-score retrieved chunks and improve relevance. Alternatively, you can enable reranking later in the knowledge base settings. When reranking is enabled, you can also enable or disable it for individual retrieval queries. Reranking incurs charges per request when used.
-
Under the Add data sources section, choose at least one data source to upload or connect. You can add multiple sources, but to reduce indexing time and cost, use folders, buckets, or local files that contain only relevant content.
Knowledge bases support common text-based formats, including
.csv,.pdf,.docx,.md,.html,.json, and.jsonl.You can add any of the following data sources:
Click Upload a file, and then under the Choose Files section, either click Upload or drag-and-drop files. To remove a file, click its trash icon. To add more files, click Upload more files.
For best performance, upload files no larger than 2 GB and fewer than 100 files at a time.
Click Pull from a Spaces bucket or folder, and then choose one or more buckets or folders to index. To limit indexing to specific folders, click + on the left of the bucket to expand its contents and select the folders you want.
Knowledge bases index all supported files in the selected buckets and folders, regardless of privacy settings. For best results, use five or fewer buckets and store only indexing data in them.
When you specify a website URL as a data source for your knowledge base, DigitalOcean uses a custom agent named
DigitalOceanGradientAICrawler/1.0to index the website content. The crawler indexes up to 5,500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.Depending on the behavior you select, the crawler follows HTML links on the site, indexes text and certain image types, and ignores videos and navigation links. It respects the website’s
robots.txtrules, including anyDisallowdirectives or the wildcard*.Select Add a web or sitemap URL, then choose either Seed URL or Sitemap URL.
For a seed URL, enter the public URL in the Seed URL field, then choose a crawl scope:
- Scoped crawls only the seed URL.
- URL and all linked pages in path crawls the seed URL and pages within the same path.
- URL and all linked pages in domain crawls pages in the same domain.
- Subdomains crawls the domain and its subdomains.
For a sitemap URL, enter the
.xmlsitemap in the Sitemap URL field, such asdocs.digitalocean.com/sitemap.xml. Sitemap crawls only index URLs listed in the sitemap.To index supported images and other media, select Index embedded media. To include header and footer navigation content, select Include headers and footers navigation links. These options can significantly increase token count.
To verify a crawl completed, re-add the same seed or sitemap URL as a new data source. If it shows zero tokens, the original crawl indexed all content and you can delete the duplicate.
If you have not connected Dropbox, click Connect account next to Dropbox, then log in and authorize the connection.
Click Pull from a Dropbox folder, and then choose one or more folders to index. To limit indexing to specific folders, click + on the left of the folder to expand its contents and select the folders you want.
Select Pull from an AWS S3 bucket folder, and then enter the required credentials:
- Access Key ID: The IAM access key ID for the bucket or folder.
- Secret Key: The secret key for the access key ID.
- Bucket Name: The bucket’s name to index.
- Region: The AWS region, such as
us-east-1oreu-west-1.
On the right of the Region field, click + to add the S3 bucket folder.
Click Advanced Options to configure the chunking strategy. By default, data sources use section-based chunking. For guidance, see Chunking Strategy Best Practices.
Click Add selected data source, and then. either add another data source, or click Next step: Configure database.
-
In the Knowledge base name field, either enter a name for your knowledge base or use the auto-generated name.
-
In the Where should your knowledge base live? section, under the OpenSearch database options sub-section, select either Use existing to connect to an existing OpenSearch database or Create new to provision a new one.
If you choose Use existing, click the Select an OpenSearch database dropdown list, and then select the database you want to use. If it already contains data, it may limit how much new data you can index. You only pay for successfully indexed data.
If you choose Create new, under the Choose a datacenter region section, select the default datacenter region for your knowledge base, or click the Additional datacenter regions dropdown list, to choose a different one.
For DigitalOcean AI Platform agents, choose the same region as your agents to reduce latency. Most Agent Platform infrastructure is in TOR1.
New databases use the smallest size that fits your data. For embeddings, we recommend allocating about twice your original dataset size.
Under the VPC Network section, choose the VPC network you want to use.
Click Next step: Review and create.
-
Under the Final Details section, click the Select a project dropdown list to choose the project your knowledge base should live in.
-
In the Tags field, optionally add tags to help organize and filter your knowledge base. Tags can include letters, numbers, colons, dashes, and underscores.
-
Under the Review section, review the selected embeddings model, its token price, the estimated size and number of data sources, and the OpenSearch database configuration. Then, click Create knowledge base (or Create Knowledge Base and database).
After you create a knowledge base, you can optionally test your knowledge base’s retrieval or test retrieval and responses in RAG Playground.
After the knowledge base is created, you can also attach it to a new or existing agent.
Destroy a Knowledge Base
Deleting a knowledge base permanently deletes its data sources, indexed data, embeddings, and automated backups. Any DigitalOcean AI Platform agents using the knowledge base are automatically redeployed without it, which may affect performance.
If you need to keep indexing history for record-keeping or debugging, download all relevant CSV files from the Activity tab before destroying the knowledge base.
To destroy an existing knowledge base:
-
Go to the Control Panel, in the left menu, click DATA SERVICES, and then click Knowledge Bases.
-
Find the knowledge base you want to destroy, on the right of it, click …, and then click Destroy to open the Destroy knowledge base window.
-
To confirm, enter the name of your knowledge base, and then click Destroy.
The OpenSearch database storing your knowledge base is not destroyed along with it. To destroy your OpenSearch database, see How to Destroy OpenSearch Clusters.
Next Steps
Once you’ve created your knowledge base, you can:
- Test knowledge base’s retrieval results.
- Test model responses with retrieved content in the RAG Playground.
- Attach the knowledge base to an AI agent.