DigitalOcean GenAI Platform lets you build GPU-powered AI agents with fully-managed deployment. Agents can use pre-built or custom foundation models, incorporate function and agent routes, and implement RAG pipelines with knowledge bases.
Creating a knowledge base using the DigitalOcean API requires a name for the knowledge base, an embedding model to use for indexing, data source, the identifier of the project the KB belongs to, and the datacenter region. You can also provide the unique identifier of the DigitalOcean OpenSearch database to store the vector embeddings of your data. If you do not provide one, we create a DigitalOcean OpenSearch database for the knowledge base to use. The size of the new database we create is typically double the size of the data.
You can obtain a list of embedding models with their unique identifiers using the /v2/gen-ai/models
endpoint with the usecases
query parameter.
You can list all available KBs, view details of the KB, or update the KB after creation.
To create an AI agent from the DigitalOcean Control Panel, click GenAI Platform in the left sidebar, then click the Knowledge bases tab, and the Create Knowledge Base button.
You can leave the automatically-generated name for the database or choose a custom name. Names must be unique, be between 3 and 63 characters long, and only contain alphanumeric characters, dashes, and periods.
You can organize all knowledge base files either in dedicated Spaces buckets or folders, or on a local storage and only include relevant files to save processing time and money. GenAI Platform supports the .txt
, .html
, .md
, .pdf
, .doc
, .json
, and .csv
formats.
Click Select data source to open the Select data source window. From the Data source dropdown list, select one of the following options:
Spaces bucket or folder: Select one or more Spaces buckets or folders in a bucket where your data is stored. If you do not have Spaces buckets for your data, see How to Create a Spaces Bucket and How to Migrate Spaces with Flexify.IO.
Web crawling: Add a static or dynamic seed URL to extract data with the GenAI crawler. The URL must use HTTPS and be publicly accessible. The crawler indexes up to 5500 links within the defined scope. It follows robots.txt
, respects disallow directives, and skips inaccessible links.
.svg
, .jpeg
, and .png
images if specified. However, including images and SVGs may increase the indexing token count. The crawler ignores videos and avoids scraping links in footers, headers, and navigation elements. Downloadable files are processed only if they fall within the defined crawling scope; otherwise, they are ignored.Scope | Seed URL Example | Crawls |
---|---|---|
Scoped (Most Narrow) Crawls only the seed URL and ignores all links to external pages. |
https://www.example.com/products/ai-ml/ |
Only this page. |
URL and all linked pages in path (Narrow) Crawls the seed URL and all pages within the same URL path, ignoring pages outside this path. |
https://www.example.com/docs/ |
Includes:https://www.example.com/docs/tutorials/ Excludes: https://www.example.com/products/ |
URL and all linked pages in domain (Broad) Crawls all pages within the same domain as the seed URL but does not include subdomains. |
https://www.example.com/docs/ |
Includes:https://www.example.com/products/ Excludes: https://docs.example.com/ |
Subdomains (Most Broad) Crawls all pages within the domain and its subdomains, including docs.example.com and marketplace.example.com . |
https://www.example.com/docs/ |
Includes:https://community.example.com/ |
If you add a seed URL for web crawling, you can check if it’s fully indexed by adding it again and starting a new crawl. If it returns zero tokens, the initial crawl indexed all content.
File: Drag and drop data files from your local storage or click Upload to select the files to add in the file browser.
Next, click Add selected data source to add the data source.
A knowledge base requires a new or existing OpenSearch cluster to store the vector embeddings of your data from the data source. To use an existing cluster, in the OpenSearch database options section, select the Use existing option and then select the existing cluster from the Select an OpenSearch database dropdown.
To create a new one, select the Create new option and select the datacenter region to create the cluster. Embeddings are typically double the size of the data you add. We create an OpenSearch cluster with the smallest size that can store the embeddings of your data.
An embedding model converts your data into vector embeddings, which are then stored in your OpenSearch database cluster. Use the Embeddings Model drop-down menu to select your model.
You currently cannot change embedding models after creating the knowledge base.
Review the estimated cost of your knowledge base per token count. For reference, each token is comprised of around four characters. Or, at scale, 100 tokens roughly equal 75 words of text. Estimates assume a Latin alphabet dataset. Using non-Latin characters, emojis, or binary data may result in more tokens.
Choose the project to add the knowledge base to and any tags you want to use.
Select a project: You can leave the default project or choose another one.
Tags: You can add a tag by typing it into the text box and pressing enter. Tags can only contain letters, numbers, colons, dashes, and underscores.
When you’re ready, click the Create Knowledge Base button. Knowledge bases typically take five minutes or more to provision.