Give Feedback

How to Create, Edit, and Destroy Knowledge Bases on DigitalOcean Gradient™ AI Platform

Validated on 28 Apr 2025 • Last edited on 14 Jan 2026

DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

A knowledge base stores private data sources such as unstructured files, Spaces folders, or web pages to supplement an agent’s training data and improve response accuracy. Using retrieval-augmented generation (RAG), agents can search and reference external data to deliver more accurate, up-to-date, and domain-specific answers.

Knowledge bases use an embedding model to convert your text into vector embeddings. The selected embedding model determines the maximum chunk sizes, hierarchical parent and child ranges, and token limits. It also affects whether size estimates are available for a data source.

Size estimates are based on the total data size, not the amount of extractable or indexable text. Estimates are available only for local uploads and Spaces buckets because their storage size is known. For all other data sources, the size is shown as “estimate unavailable” because we cannot determine the total data size before ingestion. Even for sources with estimates, the final indexable text may differ significantly depending on file structure, parsing behavior, and non-text content.

For chunking details and strategy guidance, see the Chunking Strategy Best Practices.

When you create a knowledge base, we automatically index your data by transforming it into vector embeddings, numerical representations that capture the meaning of the text and help agents efficiently find relevant information. These embeddings are stored in a Managed OpenSearch database, which appears in your Databases list and you can scale to increase its performance.

Knowledge bases support the following data sources:

Direct file uploads from your local machine.
DigitalOcean Spaces buckets or specific folders.
Public websites crawled at a URL you specify. You can crawl a seed URL or a sitemap URL.
Dropbox folders.
Amazon S3 buckets.

Each knowledge base requires at least one data source. You can add more or remove data sources after creation.

Create a Knowledge Base Using the Control Panel

To create a knowledge base from the DigitalOcean Control Panel, in the left menu, click Agent Platform. Then, in the Knowledge Bases tab, click Create Knowledge Base.

Choose Your Embedding Model

In the Create a knowledge base page, under the Add data tab, in the Choose your embedding model page, choose an embedding model.

An embedding model converts your data into vector embeddings which are stored in your OpenSearch database. From the Embedding model dropdown list, select a model. You can’t change the model after creating your knowledge base. We offer multiple embedding models for different use cases, and indexing costs depend on the selected model and the size of your data.

To understand how much you pay for the indexing job, click How much will I pay for an indexing job, which shows the pricing table estimates token counts and indexing costs based on your dataset size and the model’s token rate. Each row shows the Dataset size, the approximate Token Count, and the estimated Indexing Cost. Larger datasets generate more tokens, which increases the indexing cost. Pricing scales linearly with both model and data size, and you only pay for successfully indexed data. Final costs may vary. For more details, see embedding model pricing.

Add Data Sources

In the Add data sources section, click the data source you want to add to open its data source upload window.

You can add multiple types of data sources to a knowledge base and include as many as needed. To save processing time and cost, organize your files in dedicated Spaces buckets, specific folders, or local storage containing only relevant files.

Knowledge bases support the following text-based file formats: .csv, .eml, .epub, .xls, .xlsx, .html, .md, .odt, .pdf, .txt, .rst, .rtf, .tsv, .doc, .docx, .xml, .json, and .jsonl.

You can add any of the following data sources:

File Upload

To add files to update, click Upload a file, and then choose at least one file you want to upload. On the right of the file, you can click the trash can icon to remove the file, or on the bottom right, click Upload more files if you want to upload more files.

For performance and reliability, we recommend uploading files no larger than 2 GB and uploading fewer than 100 files at a time.

Spaces Bucket or Folder

To add a Spaces bucket or folder, click Spaces bucket or folder, and then choose at least one bucket or folder you want to index. On the left of the bucket, you can click + to expand their contents and select specific folders to limit the indexed content.

The system indexes all supported file formats in selected buckets and folders, regardless of privacy settings. For optimal performance and indexing quality, we recommend using five or fewer buckets and uploading only indexing data to your buckets.

Web or Sitemap URL

When you specify a website URL as a data source for your knowledge base, DigitalOcean uses a custom agent named DigitalOceanGradientAICrawler/1.0 to index the website content. The crawler indexes up to 5,500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.

Depending on the behavior you select, the crawler follows HTML links on the site, indexes text and certain image types, and ignores videos and navigation links. It respects the website’s robots.txt rules, including any Disallow directives or the wildcard *.

To add a URL for web crawling, select Add a web or sitemap URL. You can then choose to specify a Seed URL or a Sitemap URL.

Specify Seed URL

This option crawls only seed URLs and linked pages within the same path, domain, or subdomains. To specify a seed URL, select the Seed URL option. Then, in the Seed URL field, enter the public URL you want to crawl. The crawler indexes pages that are reachable from links you provide in this URL and indexes up to 5,500 pages.

Under the Crawling Rules section, define the crawl scope:

Scoped crawls only the seed URL.
URL and all linked pages in path crawls the seed URL and all pages within the same path.
URL and all linked pages in domain crawls all pages in the same domain.
Subdomains crawls the domain and all its subdomains.

Select the Index embedded media option to index supported images and other media encountered during the crawl. To include each page’s header and footer content, such as links in them, select the Include headers and footers navigation links option. We attempt to index supported embedded media types, such as images and SVGs and may increase indexing token count significantly.

Specify Sitemap URL

This option crawls only URLs listed in the sitemap. To crawl other URLs, use the Seed URL option or add another web crawling data source.

The sitemap URL must be in .xml format where you can identify a specific list of URLs to crawl. You can use this option to add scoped URLs all at once instead of adding them individually or choosing a crawling rule for a seed URL.

To specify a sitemap URL, select the sitemap URL option. Then, in the Sitemap URL field, type the URL you want to crawl. For example, docs.digitalocean.com/sitemap.xml.

Select the Index embedded media option to index supported images and other media encountered during the crawl. To include each page’s header and footer content, such as links they contain, select the Include headers and footers navigation links option. We attempt to index supported embedded media types, like images and SVGs. These media types may increase indexing token count significantly.

To verify the crawl completed, re-add the same seed or sitemap URL as a new data source. If it shows zero tokens, the original crawl indexed all content, and you can delete the duplicate.

Dropbox Folder

If you haven’t connected your Dropbox account, on the right of the Dropbox option, click Connect account to first log in to your Dropbox account and authorize the connection.

To add a Dropbox folder, click Pull from a Dropbox folder, and then choose at least one folder you want to index. On the left of the folder, click the + to expand their contents and select specific folders to limit the indexed content.

Amazon S3 Bucket or Folder

To add an Amazon S3 bucket or folder, select Amazon S3 bucket or folder, and then provide the following credentials in the fields provided:

Access Key ID, the IAM access key ID for your S3 bucket or folder.
Secret Key, the secret key associated with your access key ID.
Bucket or folder, the name of the S3 bucket or folder you want to index.
Region, the AWS region where your S3 bucket or folder is located, such as us-east-1 or eu-west-1.

On the right of the S3 bucket or folder, click the + button to add the S3 bucket or folder as a data source, and then below your recently added S3 bucket or folder, you can fill out the Bucket or folder and Region fields to add another S3 bucket or file.

If you want to control how the data source is split into chunks during indexing, click Advanced Options to configure its chunking strategy. To use chunking, enable it through Feature Preview. By default, all data sources use section-based chunking. For more information about chunking strategies, see Chunking Strategy Best Practices.

Click Add selected data source to add the seed URL as a data source.

Specify Chunking Strategy public

Chunking determines how your content is split before embedding and indexing. Chunking applies to each data source independently. To use different strategies for different files or URLs, add them as separate data sources.

Data sources are chunked with a section-based strategy by default. Settings are stored per data source and mixed strategies are allowed within the same knowledge base. Changing chunking settings requires re-indexing, which consumes additional tokens.

Chunking options depend on the selected embedding model. All chunk sizes must remain within the model’s token window. Minimum chunk size is approximately 100 tokens. For detailed strategy guidance, defaults, costs, and use cases, see the Chunking Strategy reference page.

Under the Select a chunking strategy for this data source section, chucking strategies determine how the content from this data source is sectioned and added to your knowledge base. Refer to our chunking reference for configuration recommendations. You can choose either:

Section-based chunking (default): Split content using structural elements such as headings, paragraphs, tables, and lists. This strategy is fast and low-cost. It merges or divides adjacent sections to meet your Maximum chunk size, which must stay within the embedding model’s limits.

Under the Maximum chunk size section, use the slider to select the maximum number of tokens in a chunk.
Semantic chunking: Groups text by meaning using embeddings during chunking. This strategy is slower and higher cost because it uses embeddings for both chunk detection and final embedding. Adjust Similarity threshold and Maximum chunk size.

Under the Similarity threshold section, click the up and down arrows to adjust. This determines how similar a sentence must be to be grouped with another. Lower values create learger sentence groups.

Under the Maximum chunk size section, use the slider to select the maximum number of tokens in a chunk. Maximum depends on the chosen embeddings model.
Hierarchical chunking: Generates two chunk levels: parent chunks (large context blocks) and child chunks (retrieval units). Retrieval returns the child chunk first, and then automatically receives the parent chunk as added context for better grounding. Configure Parent chunk size and Child chunk size, but the child must be smaller than parent.

Under the Maximum parent chunk size section, use the slider to select the maximum number of tokens in the parent chunk.

Under the Maximum child chunk size section, use the slider to select the maximum number of tokens in the child chunk. The maximum number of tokens in a child chunk. This is the primary returned context and determines which parent chunk is also returned.
Fixed-length chunking: Splits text strictly by token count, but it ignores formatting and structure. Best for logs, telemetry, OCR, and other unstructured data. Configure a Maximum chunk size within model limits.

Under the Maximum chunk size section, use the slider to select the maximum number of tokens in a chunk. Maximum depends on the chosen embeddings model.

After selecting your data source, click Add selected data source. You can upload additional files later if needed.

On the top of the creation page, you can view your added data sources and their statuses:

Ready, the data source is uploaded and ready for indexing.

We can only estimate data sizes for sources with known values, such as Spaces buckets and uploaded files. If your data sources do not contain these, or contain other inestimate sources, you get a size after the initial indexing job for all data sources completes.
Error, the upload or processing failed. Remove the data source and try again. If it fails again, contact support.
Uploading, the data source is still uploading and not ready for indexing.

To avoid delays, upload fewer than 100 files at a time, each under 2 GB. For larger uploads, use the DigitalOcean API. If uploads continue to stall, contact support.

Knowledge bases require a new or existing OpenSearch database to store the vector embeddings created from your data. Below the list, Estimated Size shows the total size of all uploaded data. Use this value to estimate the final embedding size and allocate at least twice that amount to ensure your database is properly sized to store embeddings. This may affect costs based on OpenSearch pricing.

Afterwards, click Next step: Configure database.

Choose Knowledge base name

In the Configure database section, either keep the autogenerated name or choose a unique name using 3 to 63 characters, including only letters, numbers, dashes, and periods.

Choose Your OpenSearch Database

In the Where should your knowledge base live? section, under the OpenSearch database options sub-section, select either Use existing to connect to an existing OpenSearch database or Create new to provision a new one.

OpenSearch database size depends on the size fo the data indexed. Anticipate doubling your datset size to represent the size of the storage your database needs.

Use Existing OpenSearch Database

If you choose Use existing, under the Select an OpenSearch database section, click the dropdown menu, then select the database you want to use. If it already contains data, it may limit how much new data you can index. You only pay for successfully indexed data.

Create a New OpenSearch Database

If you choose Create new, under the Choose a datacenter region section, select the default datacenter region for your knowledge base, or click the dropdown menu on the right of the datacenter region below the default, to choose a different one. We recommend choosing the same region as your Gradient AI Platform agents to reduce latency. Most of the Agent Platform infrastructure is in the TOR1 region. Creating an OpenSearch database in a different region may increase latency between your agents and your knowledge base.

New databases are automatically sized to the smallest option that fits your data. We recommend allocating about twice the size of your original dataset to efficiently store embeddings.

Under the VPC Network section, choose the VPC network you want to use.

Afterwards, click Next step: Review and create.

Finalize Details

In the Review tab, under the Final Details page, under the Select a project sub-section, choose the project where you want the knowledge base to live. You can use the default project or select another, and attach the knowledge base to agents in any project.

Under the Tags sub-section, add tags to help organize and filter your knowledge base. Tags can include letters, numbers, colons, dashes, and underscores. Choose a tag name, then press ENTER or SPACEBAR to add it. Use the arrow keys to navigate and the BACKSPACE key to remove tags.

Under Review section, see your chosen embeddings model with its token cost, the data sources you chose with its estimated size, and your chosen or newly created OpenSearch database with its specs and price. If you choose an existing database, there are no new charges to use the database.

After adding your knowledge base to a project and providing your tags and reviewing, click Create knowledge base (or Create Knowledge Base and database).

Provisioning Your Knowledge Base

After creation, your knowledge base appears under Gradient AI Platform’s Knowledge Bases tab.

Provisioning typically takes five minutes or longer while the system processes, embeds, and stores your data. After indexing completes, go to the knowledge base’s Overview tab, and then under the Embeddings Details section is a summary of the indexing results, including final costs.

You can also enable auto-indexing from the knowledge base’s Data sources tab by clicking Schedule Indexing. Auto-indexing keeps your knowledge base up to date automatically without manual re-ingest.

If indexing takes longer than expected, let the job continue running until it either completes or fails. If it fails, check the Activity tab for detailed logs to understand what went wrong (for example, failed or skipped files). After reviewing the logs and fixing any issues, you can click Re-run job to restart indexing. If problems persist, contact support.

Test and Update Chunking Strategy public

Chunking quality varies based on document structure and formatting. To improve retrieval accuracy, test your agent with Agent Evaluations, adjust your chunking configuration, and re-index the data source. Each re-indexing job consumes tokens.

High-cost strategies such as semantic and hierarchical chunking can increase token usage significantly. Section-based and fixed-length chunking offer the most predictable cost. For details on indexing and retrieval pricing, see the pricing page.

If you want to change the chunking strategy of an existing data source, you must remove the data source and re-add it with the new chunking settings. Updating a data source’s chunking configuration automatically triggers re-indexing.

View Indexing Job Logs

When you create or update a knowledge base, the system indexes its data sources. You can track this activity in the Activity tab of your knowledge base. To access your knowledge base’s activity logs, go to the control panel and click Agent Platform in the left menu. Next, click the Knowledge bases tab, and then select the knowledge base you want to see activity logs for.

On the knowledge base’s overview page, click the Activity tab to view its logs. The Activity tab lists the 15 most recent indexing jobs. Since only 15 are stored, we recommend downloading the CSV for each run you perform.

While a job is in progress, the Activity tab shows real-time updates. When it finishes, the status helps you interpret the results:

Completed: All files indexed successfully.
No Changes: No updates were detected in the data sources and no files or URLs were re-indexed.
Partially Completed: Some files were indexed, but others were skipped or failed. This may create gaps in knowledge base results.
Failed: No files were indexed, usually due to a system or configuration issue (for example, formatting problems or unsupported characters).

We only estimate data sizes for sources with known values, such as Spaces buckets and uploaded files. For all other data sources, estimates show “estimate unavailable”. Actual indexable text may differ from file size due to binary content, formatting, or parsing behavior.

Each log entry shows:

The overall status (see above).
How many files were processed, skipped, or failed (for example, “Indexed 0 of 2 files/URLs (0%)”).
Token usage and charges for the run.
A timestamp showing when the job ran.
A per–data source summary (source name and type, number of files scanned, skipped, or failed).

You can then download a CSV file by clicking Download Details for debugging or auditing. File-level details, such as filenames and error reasons, are only available in the downloadable CSV. Files with unchanged content are skipped to avoid extra charges. If another indexing job is already running, the new job is skipped since only one job can run at a time.

Logs also include the token count with the indexing rate, for example 0 tokens @ $0.09/1M. You only pay for successfully indexed data, and prices are rounded down to two decimal places. For more details, see knowledge base pricing.

If some files fail, download the CSV for that run to see the specific errors. Fix the issue with your data, re-upload the file, and re-run the job. If problems continue, contact support.

Create a Knowledge Base Using the API

To create a knowledge base using the DigitalOcean API, provide a name, an embedding model, a project ID, and a datacenter region. You can also specify the ID of an existing OpenSearch database or a chunking strategy. If you do not provide a database, DigitalOcean creates and sizes one automatically. If you do not specify a chunking strategy, the knowledge base uses section-based chunking by default.

You can define chunking when creating or updating a knowledge base or its individual data sources using the following optional fields:

chunking_algorithm: The chunking strategy (section, semantic, hierarchical, or fixed).
chunking_options: A configuration object containing parameters such as max_chunk_size, semantic_threshold, parent_chunk_size, or child_chunk_size.

Chunking is applied per data source. Updating chunking settings triggers re-indexing, which consumes tokens.

To list available embedding models and their IDs, call the /v2/gen-ai/models endpoint with the usecases query parameter.

How to Create a Knowledge Base Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a POST request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases.

cURL

Using cURL:

curl -X POST \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases" \
  -d '{
    "name": "kb-api-create",
    "embedding_model_uuid": "05700391-7aa8-11ef-bf8f-4e013e2ddde4",
    "project_id": "37455431-84bd-4fa2-94cf-e8486f8f8c5e",
    "tags": [
      "tag1"
    ],
    "database_id": "abf1055a-745d-4c24-a1db-1959ea819264",
    "datasources": [
      {
          "spaces_data_source": {
              "bucket_name": "test-public-gen-ai",
              "region": "tor1"
            },
            "chunking_algorithm": "CHUNKING_ALGORITHM_HIERARCHICAL",
            "chunking_options": {
              "parent_chunk_size": 1000,
              "child_chunk_size": 350
            }
      },
      {
        "web_crawler_data_source": {
          "base_url": "https://example.com",
          "crawling_option": "SCOPED",
          "embed_media": false,
          "exclude_tags": ["nav","footer","header","aside","script","style","form","iframe", "noscript"]
        },
        "chunking_algorithm": "CHUNKING_ALGORITHM_SEMANTIC",
        "chunking_options": {
          "max_chunk_size": 500,
          "semantic_threshold": 0.6
        }
      },
      {
        "spaces_data_source": {
            "bucket_name": "test-public-gen-ai-2",
            "region": "tor1"
          },
          "chunking_algorithm": "CHUNKING_ALGORITHM_FIXED_LENGTH",
          "chunking_options": {
            "max_chunk_size": 400
          }
      },
    ],
    "region": "tor1",
    "vpc_uuid": "f7176e0b-8c5e-4e32-948e-79327e56225a"
  }'

After creation, indexing begins automatically.

You can list all knowledge bases, view a knowledge base, or update one.

To add a data source with chunking, use the Data Sources endpoint. To retrieve metadata for embedding models, use the List Models endpoint.

Retrieve Data from a Knowledge Base Using the Knowledge Base API public

You can query a knowledge base to retrieve the most relevant chunks, along with metadata, scores, and any hierarchical context. To use chunking, enable it through the Feature Preview. The knowledge base retrieve API is available at https://kbaas.do-ai.run and has the following endpoint:

Endpoint	Verb	Description
`/v1/<knowledge-base-uuid>/retrieve`	POST	Retrieves the most relevant chunks from a knowledge base using hybrid retrieval (and optional metadata filters).

Retrieval methods determine how indexed chunks are searched and ranked at query time. They are distinct from chunking strategies, which control how documents are split before indexing. The retrieve API uses hybrid retrieval, which combines lexical retrieval (based on keyword matches) and semantic retrieval (based on meaning using embeddings).

You can also call this retrieve endpoint using the Gradient SDK.

To retrieve relevant chunks from a knowledge base, send a POST request to /v1/<knowledge-base-uuid>/retrieve using your DigitalOcean API token. Requests to the retrieve API require a DigitalOcean API token created from the Settings page with the GenAI:read scope enabled.

You can include the following fields in the request body:

query: Specifies the search query string.
num_results: Defines the number between 0 and 100 of results to return.
alpha: Controls the balance between lexical and semantic retrieval in hybrid search. Values range from 0 (lexical only) to 1 (semantic only). Lower values favor exact keyword matches for structured or technical queries, while higher values favor semantic similarity for conversational or exploratory queries. Mid-range values balance precision and recall and are a good default.
filters: Optionally applies filter rules to chunk metadata.

The following example shows a hybrid retrieval request that does not apply any filters:

curl --location 'https://kbaas.do-ai.run/v1/<knowledge-base-uuid>/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DO_API_TOKEN' \
--data '{
    "query": "How do I build an agent on DigitalOcean?",
    "num_results": 5,
    "alpha": 0.5 
}'

Filters let you narrow results by comparing a metadata field to a specific value. The following filter operations are supported:

equals: Matches records where the field is exactly the provided value.
not_equals: Excludes records that match the provided value.
greater_than: Matches values strictly higher than the input.
greater_than_or_equals: Matches values higher than or equal to the input.
less_than: Matches values strictly lower than the input.
less_than_or_equals: Matches values lower than or equal to the input.
starts_with: Matches text fields that begin with the specified characters.

You can combine multiple filter rules using logical groups such as and_all (match all conditions) and or_all (match any condition), and nest these groups to build more complex search logic.

To ensure your request is processed correctly, the key must reference a valid metadata field present in the dataset, and a value must be provided that matches the field’s data type (string, number, or boolean). Use comparison operators that are compatible with the field type; for example, avoid greater_than with string fields unless the field represents a supported date or version format.

This example filters results to chunks that match a specific documentation path or that were ingested on or after a specified date.

curl --location 'https://kbaas.do-ai.run/v1/<knowledge-base-uuid>/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DO_API_TOKEN' \
--data '{
    "query": "How do I build an agent on DigitalOcean",
    "num_results": 5,
    "filters": {
        "or_all": [
            {
                "starts_with": {
                    "key": "item_name",
                    "value": "https://docs.digitalocean.com/products/gradient-ai-platform/"
                },
                "greater_than_or_equals":{
                    "key":"ingested_timestamp",
                    "value": "2025-12-01"
                }
            }
        ]
    },
    "alpha": 0.5 
}'

The following example shows a sample response returned by the knowledge base API:

{
    "results": [
        {
            "metadata": {
                "chunk_category": "CompositeElement",
                "ingested_timestamp": "2025-12-15T15:23:19.191428+00:00",
                "item_name": "https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-agents/",
            },
            "text_content": "Chunk 1 Content"
        },
        {
            "metadata": {
                "chunk_category": "CompositeElement",
                "ingested_timestamp": "2025-12-15T15:23:19.191428+00:00",
                "item_name": "https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/",
            },
            "text_content": "Chunk2 content"
        },
        .......
    ],
    "total_results": 5
}

Manage Data Sources Using the Control Panel

You can add, remove, reindex, or enable auto-indexing of existing knowledge base data sources as needed.

To add, remove, or reindex a data source using the DigitalOcean Control Panel, in the left-hand menu, click Agent Platform, click the Knowledge Bases tab, find and then select the knowledge base you want to update, then click the Data sources tab.

Add Data Sources

To add a data source from the Data sources tab, click Add source.

On the Add Data Source page, click Select data source and then select a data source from the dropdown menu. For detailed information about each data source type, see the Select Your Data Sources section of the create workflow.

Click Advanced Options to configure chunking for this data source. Chunking determines how your content is split before embedding and indexing. The chosen strategy applies only to this data source. To use different strategies for different files or URLs, add them as separate data sources. Data sources use a section-based strategy by default. For more detailed information about chunking strategy options, see the Specify Chunking Strategy section.

Click Add data source to add the sitemap URL as a data source.

After adding new data sources, review the estimated price to index the data under the Indexing event summary section and then click Index added source. The data sources are added to the knowledge base and the data is automatically indexed. You can track progress and review results in the Activity tab.

Remove Data Sources

To remove a data source from the Data sources tab, click the … menu beside the data source you want to remove and then click Remove source from the dropdown menu.

In the Remove data source modal, enter the name of the data source to confirm its removal, and then click Destroy to remove it.

After removal, the knowledge base automatically reindexes the remaining data sources. You can track the reindexing process in the Activity tab.

Manually Reindex Data Sources

To manually reindex a data source from the Data sources tab, click the … menu beside the data source you want to reindex and then click Update source from the dropdown menu.

In the confirmation window, click Update source to reindex the data. You are only charged for any new data found during the indexing.You can view the results of the reindexing job in the Activity tab.

Auto-Index Data Sources

You can enable auto-indexing to keep your knowledge base up to date without manual re-ingest. Review indexing details and resolve any data source issues before scheduling to avoid failed jobs or skipped content.

To set up auto-indexing, open the Data sources tab, click Schedule Indexing on the right to open the Create Indexing Schedule window. In this window, under Days, select the days you want indexing to run. Under Trigger Time, set the time of day using the Hrs and Mins dropdowns. Scheduling time is in UTC. Lastly, under Summary, review your schedule, and then click Create Indexing Schedule.

Your schedule appears in the Data sources tab with details such as the current schedule, the next scheduled run, the last indexing job (manual or auto-index), whether the last job completed or failed, and when the schedule was created.

Indexing jobs normally generate costs because they process your data and create embeddings. If no changes are detected in the data sources, the job completes with no changes, and you are not billed.

If a manual job starts while another job is running, it queues until the current job finishes. If an auto-indexing job overlaps with another job (manual or scheduled), it is skipped. You can view results of each run in the Activity tab.

Failed auto-indexing jobs do not cancel the schedule. Failures are logged in the Activity tab. Review the logs to identify the cause of the failure, then either wait for the next scheduled run shown in the auto-indexing table or manually re-run the job.

To manage your indexing schedule, go to the Data sources tab, find your schedule, and then click its … menu. Then, click either Pause Indexing, Edit Schedule, or Destroy. If you pause indexing, you can resume it in the schedule’s settings. To destroy a schedule, click Destroy to open the Remove Scheduled Indexing window, type “delete”, and then click Destroy to confirm.

Add, Remove, Reindex, or Auto-index Data Sources Using the API

You can add, remove, reindex, or auto-index existing knowledge base data sources as needed using the API.

Add a Data Source

To add a data source using the API, provide the knowledge base’s unique identifier and specify the Spaces bucket, folder, file, or URL you want to index. To retrieve knowledge base IDs, use the /v2/gen-ai/knowledge_bases endpoint.

You can optionally configure chunking at the data source level:

chunking_algorithm: The chunking strategy (section, semantic, hierarchical, or fixed).
chunking_options: A configuration object defining parameters such as max_chunk_size, semantic_threshold, parent_chunk_size, or child_chunk_size.

Chunking is applied per data source. Updating these settings triggers a re-indexing job, which consumes tokens.

After adding the data source, start indexing it to make its content available for retrieval.

How to Add a Data Source Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a POST request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources.

cURL

Using cURL:

curl -X POST \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases/20cd8434-6ea1-11f0-bf8f-4e013e2ddde4/data_sources" \
  -d '{
"knowledge_base_uuid": "20cd8434-6ea1-11f0-bf8f-4e013e2ddde4",
"web_crawler_data_source": {
  "base_url": "https://example.com",
  "crawling_option": "SCOPED",
  "embed_media": false,
  "exclude_tags": ["nav","footer","header","aside","script","style","form","iframe", "noscript"]
},
"chunking_algorithm": "CHUNKING_ALGORITHM_SECTION_BASED",
"chunking_options": {
  "max_chunk_size": 500
}
}'

To confirm the data source was added, list the knowledge base’s data sources.

Remove a Data Source

To remove a data source using the API, provide the knowledge base ID and the specific data source ID. This detaches the data source from the knowledge base but does not delete the original source file or URL.

You can find data source IDs by listing the knowledge base’s data sources.

How to Remove a Data Source Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a DELETE request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}.

cURL

Using cURL:

curl -X DELETE \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases/9a6e3975-b0c6-11ef-bf8f-4e013e2ddde4/data_sources/bd2a2db5-b8b0-11ef-bf8f-4e013e2ddde4"

Manually Index a Data Source

To index a data source using the API, create an indexing job with the knowledge base ID and data source ID. Use the Create Indexing Job endpoint to start the process.

How to Start an Indexing Job Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a POST request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}/indexing_jobs.

You can check the job status using the Get Indexing Job endpoint.

After indexing completes, use the Get Knowledge Base endpoint to confirm completion and review the final token count and indexing cost.

If the job takes longer than expected, cancel it using the Cancel Indexing Job endpoint, then restart it. If issues persist, contact support for assistance.

Auto-index Data Sources

To schedule auto-indexing using the API, create an auto-indexing schedule with the knowledge base ID, the time of day you want the job to run (in UTC, using 24-hour format), and the days of the week to schedule the runs, where days are numbered 1 (Monday) through 7 (Sunday). Use the Create Scheduled Indexing Job endpoint.

How to Create an Auto-Indexing Schedule Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a POST request to https://api.digitalocean.com/v2/gen-ai/scheduled-indexing.

cURL

Using cURL:

curl --location --request POST 'https://api.digitalocean.com/v2/gen-ai/scheduled-indexing' \
--header 'Authorization: Bearer $DIGITALOCEAN_TOKEN"' \
--header 'Content-Type: application/json' \
--data '{
    "knowledge_base_uuid": "knowledge_base_uuid",
    "time": "18:00",
    "days": [
        1,
        2,
        3
    ]
}'

After you set up auto-indexing, use the List Scheduled Indexing Jobs for a Knowledge Base endpoint to verify the schedule on your knowledge base. If you no longer need the schedule, use the Delete Scheduled Indexing Job endpoint.

Deleting a schedule does not affect existing knowledge base data or previously completed indexing jobs.

Edit Knowledge Base Settings

You can edit an existing knowledge base to change its name, project, or tags, and view details like its embedding model, attached agents, and the OpenSearch database storing its data.

To make changes from the DigitalOcean Control Panel, on the left-hand menu, click GenAI Platform, click the Knowledge Bases tab, select the knowledge base you want to edit, then open its Settings tab. In the Settings section, click Edit next to the section you want to update, then click Submit to apply your changes.

You can edit the following attributes:

Knowledge base info, change the knowledge base name or select a different project.
Tags, add or remove tags.
Destroy, destroy the knowledge base.

You can view but not edit the following sections:

Embeddings Model shows the model in use and the token rate for indexing events.
Associated agents lists the agents using the knowledge base. You can attach it to any agent as needed or leave it unattached.
OpenSearch databases show the databases in use and its region. To manage databases, see OpenSearch documentation.

After making changes, check the Activity tab to confirm that indexing jobs completed successfully if your edits triggered reindexing.

Destroy a Knowledge Base Using the Control Panel

If you no longer need a knowledge base, you can permanently and irreversibly delete it along with its embeddings and automated backups. Destroying a knowledge base does not delete the associated OpenSearch database, but you can delete the database separately.

Deleting a knowledge base triggers redeployment of any agents using it and may affect their performance.

To delete a knowledge base from the DigitalOcean Control Panel, in the left-hand menu, click GenAI Platform, click the Knowledge Bases tab, find the knowledge base you want to destroy, then on the right of it, click …, then select Destroy.

In the confirmation window, type the knowledge base name to confirm deletion, then click Destroy to complete the deletion.

Once a knowledge base is destroyed, its indexing logs are no longer available. If you need to keep activity history for record-keeping or debugging, download all relevant CSV files from the Activity tab before destroying the knowledge base.

Destroy a Knowledge Base Using the API

To destroy a knowledge base using the DigitalOcean API, provide its unique identifier. You can retrieve available knowledge bases and their IDs using the /v2/gen-ai/knowledge_bases endpoint.

How to Destroy a Knowledge Base Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a DELETE request to https://api.digitalocean.com/v2/gen-ai/knowledge_bases/{uuid}.

cURL

Using cURL:

curl -X DELETE \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/knowledge_bases/8241f44e-b0da-11ef-bf8f-4e013e2ddde4"

How to Create, Edit, and Destroy Knowledge Bases on DigitalOcean Gradient™ AI Platform

Create a Knowledge Base Using the Control Panel

Choose Your Embedding Model

Add Data Sources

Specify Seed URL

Specify Sitemap URL

Specify Chunking Strategy public

Choose Knowledge base name

Choose Your OpenSearch Database

Finalize Details

Provisioning Your Knowledge Base

Test and Update Chunking Strategy public

View Indexing Job Logs

Create a Knowledge Base Using the API

cURL

Retrieve Data from a Knowledge Base Using the Knowledge Base API public

Manage Data Sources Using the Control Panel

Add Data Sources

Remove Data Sources

Manually Reindex Data Sources

Auto-Index Data Sources

Add, Remove, Reindex, or Auto-index Data Sources Using the API

Add a Data Source

cURL

Remove a Data Source

cURL

Manually Index a Data Source

Auto-index Data Sources

cURL

Edit Knowledge Base Settings

Destroy a Knowledge Base Using the Control Panel

Destroy a Knowledge Base Using the API

cURL

We can't find any results for your search.