How to Create, Edit, and Destroy Knowledge Basespublic

Validated on 28 Apr 2025 • Last edited on 22 May 2025

DigitalOcean GenAI Platform lets you build GPU-powered AI agents with fully-managed deployment. Agents can use pre-built or custom foundation models, incorporate function and agent routes, and implement RAG pipelines with knowledge bases.

A knowledge base stores private data sources such as unstructured files, Spaces folders, or web pages to supplement an agent’s training data and improve response accuracy. Using retrieval-augmented generation (RAG), agents can search and reference external data to deliver more accurate, up-to-date, and domain-specific answers.

When you create a knowledge base, we automatically index your data by transforming it into vector embeddings, numerical representations that capture the meaning of the text and help agents efficiently find relevant information. These embeddings are stored in a Managed OpenSearch database, which appears in your Databases list and is scalable anytime for better performance.

Create a Knowledge Base Using the Control Panel

To create a knowledge base from the DigitalOcean Control Panel, in the left-hand menu, click GenAI Platform, click the Knowledge Bases tab, then click Create Knowledge Base to open the creation page.

In the Configure your knowledge base section, either keep the autogenerated name or choose a unique name using 3 to 63 characters, including only letters, numbers, dashes, and periods.

Select Your Data Sources

In the Select data sources to index section, click Select data sources to open the data source selection window, then click the dropdown menu to select a data source type: Spaces bucket or folder, File upload, or URL for web crawling.

Info
For web crawling data sources, the crawler indexes up to 5500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.

You can add multiple types of data sources to a knowledge base and include as many as needed. To save processing time and cost, organize your files in dedicated Spaces buckets, specific folders, or local storage containing only relevant files.

Supported File Formats

We support a wide range of text-based file formats, including: .csv, .eml, .epub, .xls, .xlsx, .html, .md, .odt, .pdf, .txt, .rst, .rtf, .tsv, .doc, .docx, .xml, .json, and .jsonl.

Info
PowerPoint files (.ppt, .pptx) are partially supported. We extract text but do not process images or other visual content. Image files (such as .png, .jpeg, .tiff, and .bmp) are not currently supported.

You can add any of the following data sources:

Add a Spaces Bucket or Folder

Add a Spaces Bucket or Folder

Add entire Spaces buckets or select specific folders to organize files in your knowledge base. The system indexes all supported file formats in selected buckets and folders, regardless of privacy settings.

For optimal performance and indexing quality:

  • Include only indexing data, keep bucket contents limited to files intended for indexing.
  • Use five buckets maximum, limit usage to five buckets or fewer for best performance.
  • Use supported file formats, ensure your files use supported formats.

Click + next to a bucket to expand and select folders.

Add a File Upload

Add a File Upload

In the Choose Files section, drag and drop files from your local storage, or click Upload to select them manually.

Add a URL for Web Crawling

Add a URL for Web Crawling

The web crawler indexes only publicly accessible content, follows HTML links, supports certain image types, ignores videos and navigation links, and respects robots.txt rules.

In the Seed URL field, type the public URL you want to crawl.

Under the Crawling Rules section, define the crawl scope:

  • Scoped, crawls only the seed URL.
  • Path, crawls the seed URL and all pages within the same path.
  • Domain, crawls all pages in the same domain.
  • Subdomains, crawls the domain and all its subdomains.

To verify the crawl completed, re-add the same seed URL as a new data source. If it shows zero tokens, the original crawl indexed all content and you can delete the duplicate.

For smooth uploads, keep batches under 100 files, each no larger than 2 GB. For larger files or batches, use the DigitalOcean API.

After selecting your data source, click Add selected data source. If needed, you can add more files later.

View your selected data sources and check the Status of each:

  • Ready, the data source is uploaded and ready for indexing.

  • Error, the upload or processing failed. Remove the data source and try again. If it fails again, contact support.

  • Uploading, the data source is still uploading and not ready for indexing.

    To avoid delays, upload fewer than 100 files at a time, each under 2 GB. For larger uploads, use the DigitalOcean API. If uploads continue to stall, contact support.

Knowledge bases require a new or existing OpenSearch database to store the vector embeddings created from your data. Below the list, Estimated Size shows the total size of all uploaded data. Use this value to estimate the final embedding size and allocate at least twice that amount to ensure your database is properly sized to store embeddings. This may affect costs based on OpenSearch pricing.

Choose Your OpenSearch Database

In the Where should your knowledge base live? section, under the OpenSearch database options sub-section, select either Use existing to connect to an existing OpenSearch database or Create new to provision a new one.

Use Existing OpenSearch DatabaseIf you choose Use existing, under the Select an OpenSearch database section, click the dropdown menu, then select the database you want to use. If it already contains data, it may limit how much new data you can index. You only pay for successfully indexed data.
Create a New OpenSearch Database

If you choose Create new, under the Choose a datacenter region section, select the default datacenter region for your knowledge base, or click Additional datacenter regions to choose a different one. New databases are automatically sized to the smallest option that fits your data. We recommend allocating about twice the size of your original dataset to efficiently store embeddings. Most GenAI Platform infrastructure is in TOR1, so choosing a different region may increase latency between your agents and knowledge bases.

We charge your account monthly for the database. For more information on our database pricing, see our OpenSearch pricing page.

Choose Your Embedding Model

An embedding model converts your data into vector embeddings, which are stored in your OpenSearch database. In the How much will I pay? section, click the Embeddings model dropdown menu, then select a model. You can’t change the model after creating your knowledge base. We offer multiple embedding models for different use cases, and indexing costs depend on the selected model and the size of your data.

The pricing table estimates token counts and indexing costs based on your dataset size and the model’s token rate. Each row shows the Dataset Size, the approximate Token Count, and the estimated Indexing Cost. Larger datasets generate more tokens, which increases the indexing cost. Pricing scales linearly with both model and data size, and you only pay for successfully indexed data. Final costs may vary. For more details, see our embedding model pricing.

Finalize Details

In the Final Details section, under the Select a project sub-section, choose the project where you want the knowledge base to live. You can use the default project or select another, and attach the knowledge base to agents in any project.

Under the Tags sub-section, add tags to help organize and filter your knowledge base. Tags can include letters, numbers, colons, dashes, and underscores. Choose a tag name, then press ENTER or SPACEBAR to add it. Use the arrow keys to navigate and the BACKSPACE key to remove tags.

After adding your knowledge base to a project and providing your tags, click Create Knowledge Base.

Provisioning Your Knowledge Base

After creation, your knowledge base appears under the GenAI Platform’s Knowledge Bases tab and begins indexing its data sources.

To track indexing progress, go to the Knowledge Bases tab, find your knowledge base, then check the last indexing time. Click the knowledge base to view detailed progress, including updates for each data source, tokens indexed, and any sources still processing. The list updates automatically, and agents begin using the updated embeddings as soon as they become available.

Provisioning typically takes five minutes or longer while the system processes, embeds, and stores your data. After indexing completes, go to the knowledge base’s Overview tab, then under the EMBEDDINGS DETAILS section, see a summary of the indexing results, including final costs.

If indexing takes longer than expected, click Stop job to cancel it, then Re-run job to restart it. If issues persist, contact support.

Once created, you can add more data sources, attach it to an existing agent, or include it during agent creation. You can also edit the name, project, and tags under the knowledge base’s Settings tab if needed.

Create a Knowledge Base Using the API

To create a knowledge base using the DigitalOcean API, provide a name, an embedding model, a data source, a project ID, and a datacenter region. You can also specify the ID of an existing OpenSearch database. If you don’t, a new one is created and automatically sized to about twice the size of your data to accommodate embeddings.

To list available embedding models and their IDs, call the /v2/gen-ai/models endpoint with the usecases query parameter. After creation, your data sources are indexed. For details, see Index Data Using the API.

How to Create a Knowledge Base Using the DigitalOcean API
  1. Create a personal access token and save it for use with the API.
  2. Send a POST request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases.

cURL

Using cURL:

curl -X POST \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases" \
  -d '{
    "name": "kb-api-create",
    "embedding_model_uuid": "05700391-7aa8-11ef-bf8f-4e013e2ddde4",
    "project_id": "37455431-84bd-4fa2-94cf-e8486f8f8c5e",
    "tags": [
      "tag1"
    ],
    "database_id": "abf1055a-745d-4c24-a1db-1959ea819264",
    "datasources": [
      {
        "bucket_name": "test-public-gen-ai",
        "bucket_region": "tor1"
      }
    ],
    "region": "tor1",
    "vpc_uuid": "f7176e0b-8c5e-4e32-948e-79327e56225a"
  }'

After creating a knowledge base, you can list all available knowledge bases, view details, or update the knowledge base.

Edit an Existing Knowledge Base

You can edit an existing knowledge base to change its name, project, or tags, and view details like the model in use, attached agents, and the OpenSearch database storing its data.

Note
You currently cannot edit attributes of your knowledge base using the DigitalOcean API.

To make changes from the DigitalOcean Control Panel, on the left-hand menu, click GenAI Platform, click the Knowledge Bases tab, select the knowledge base you want to edit, then open its Settings tab. In the Settings section, click Edit next to the section you want to update, then click Submit to apply your changes.

You can edit the following attributes:

  • Knowledge base info, change the knowledge base name or select a different project.
  • Tags, add or remove tags.
  • Destroy, destroy the knowledge base.

You can view but not edit the following sections:

  • Embeddings Model, shows the model in use and the token rate for indexing events.
  • Associated agents, lists the agents using the knowledge base. You can attach it to any agent as needed, or leave it unattached.
  • OpenSearch DB, shows the database in use and its region. To manage databases, see our OpenSearch documentation.

Destroy a Knowledge Base Using the Control Panel

If a knowledge base is no longer needed, you can permanently delete it along with its embeddings and automated backups. This process is irreversible, triggers redeployment of any agents using it, and may affect their performance. Destroying a knowledge base does not delete the associated OpenSearch database, but you can delete the database separately if needed.

To delete a knowledge base from the DigitalOcean Control Panel, in the left-hand menu, click GenAI Platform, click the Knowledge Bases tab, find the knowledge base you want to destroy, then on the right of it, click , then select Destroy.

In the confirmation window, type the knowledge base name to confirm deletion, then click Destroy to complete the deletion.

Destroy a Knowledge Base Using the API

To destroy a knowledge base using the DigitalOcean API, provide its unique identifier. You can retrieve available knowledge bases and their IDs using the /v2/gen-ai/knowledge_bases endpoint.

How to Destroy a Knowledge Base Using the DigitalOcean API
  1. Create a personal access token and save it for use with the API.
  2. Send a DELETE request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases/{uuid}.

cURL

Using cURL:

curl -X DELETE \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases/8241f44e-b0da-11ef-bf8f-4e013e2ddde4"

We can't find any results for your search.

Try using different keywords or simplifying your search terms.