pydo.genai.create_knowledge_base()

Generated on 4 Jun 2026 from pydo version v0.35.0

Usage

client.genai.create_knowledge_base(
    body={
        "database_id": "\"12345678-1234-1234-1234-123456789012\"",
        "datasources": [...],
        "embedding_model_uuid": "\"12345678-1234-1234-1234-123456789012\"",
        ...,
    },
)
Returns JSONRaises HttpResponseError

Description

To create a knowledge base, send a POST request to /v2/gen-ai/knowledge_bases.

Parameters

database_id string optional

Example: "12345678-1234-1234-1234-123456789012"

Identifier of the DigitalOcean OpenSearch database this knowledge base will use, optional.
If not provided, we create a new database for the knowledge base in
the same region as the knowledge base.

datasources array of objects optional

Optional data sources to attach at creation. Omit or use an empty list to create the knowledge base without sources, then add sources (with chunking strategy and sizes) using Add a Data Source to a Knowledge Base. When provided, see Organize Data Sources for best practices.

Show child properties
aws_data_source object optional

AWS S3 Data Source

Show child properties
bucket_name string optional

Example: example name

Spaces bucket name

item_path string optional

Example: example string

key_id string optional

Example: 123e4567-e89b-12d3-a456-426614174000

The AWS Key ID

region string optional

Example: example string

Region of bucket

secret_key string optional

Example: example string

The AWS Secret Key

bucket_name string optional

Example: example name

Deprecated, moved to data_source_details

bucket_region string optional

Example: example string

Deprecated, moved to data_source_details

chunking_algorithm string optional

One of: CHUNKING_ALGORITHM_UNKNOWN, CHUNKING_ALGORITHM_SECTION_BASED, CHUNKING_ALGORITHM_HIERARCHICAL, CHUNKING_ALGORITHM_SEMANTIC, CHUNKING_ALGORITHM_FIXED_LENGTH

Default: CHUNKING_ALGORITHM_UNKNOWN

chunking_options object optional
Show child properties
child_chunk_size integer optional

Example: 350

max_chunk_size integer optional

Example: 750

Common options

parent_chunk_size integer optional

Example: 1000

Hierarchical options

semantic_threshold number optional

Example: 0.5

Semantic options

dropbox_data_source object optional

Dropbox Data Source

Show child properties
folder string optional

Example: example string

refresh_token string optional

Example: example string

Refresh token. you can obrain a refresh token by following the oauth2 flow. see /v2/gen-ai/oauth2/dropbox/tokens for reference.

file_upload_data_source object optional

File to upload as data source for knowledge base.

Show child properties
original_file_name string optional

Example: example name

The original file name

size_in_bytes string optional

Example: 12345

The size of the file in bytes

stored_object_key string optional

Example: example string

The object key the file was stored as

google_drive_data_source object optional

Google Drive Data Source

Show child properties
folder_id string optional

Example: 123e4567-e89b-12d3-a456-426614174000

refresh_token string optional

Example: example string

Refresh token. you can obrain a refresh token by following the oauth2 flow. see /v2/gen-ai/oauth2/google/tokens for reference.

item_path string optional

Example: example string

spaces_data_source object optional

Spaces Bucket Data Source

Show child properties
bucket_name string optional

Example: example name

Spaces bucket name

item_path string optional

Example: example string

region string optional

Example: example string

Region of bucket

web_crawler_data_source object optional

WebCrawlerDataSource

Show child properties
base_url string optional

Example: example string

The base url to crawl.

crawling_option string optional

Options for specifying how URLs found on pages should be handled.

- UNKNOWN: Default unknown value
- SCOPED: Only include the base URL.
- PATH: Crawl the base URL and linked pages within the URL path.
- DOMAIN: Crawl the base URL and linked pages within the same domain.
- SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
- SITEMAP: Crawl URLs discovered in the sitemap.

One of: UNKNOWN, SCOPED, PATH, DOMAIN, SUBDOMAINS, SITEMAP

Default: UNKNOWN

embed_media boolean optional

Example: True

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags array of strings optional

Example: ['example string']

Declaring which tags to exclude in web pages while webcrawling

embedding_model_uuid string optional

Example: "12345678-1234-1234-1234-123456789012"

Identifier for the embedding model.

name string optional

Example: "My Knowledge Base"

Name of the knowledge base.

project_id string optional

Example: "12345678-1234-1234-1234-123456789012"

Identifier of the DigitalOcean project this knowledge base will belong to.

region string optional

Example: "tor1"

The datacenter region to deploy the knowledge base in.

reranking_config object optional

Configuration for cross-encoder reranking during retrieval.

Show child properties
enabled boolean optional

Example: True

Whether reranking is enabled for retrieval

model string optional

Example: "bge-reranker-v2-m3"

Reranker model internal name

size string optional

One of: OPEN_SEARCH_PLAN_SIZE_UNSPECIFIED, OPEN_SEARCH_PLAN_SIZE_SMALL, OPEN_SEARCH_PLAN_SIZE_MEDIUM, OPEN_SEARCH_PLAN_SIZE_LARGE, OPEN_SEARCH_PLAN_SIZE_EXTRA_LARGE

Default: OPEN_SEARCH_PLAN_SIZE_UNSPECIFIED

tags array of strings optional

Example: ['example string']

Tags to organize your knowledge base.

vpc_uuid string optional

Example: "12345678-1234-1234-1234-123456789012"

The VPC to deploy the knowledge base database in

Request Sample

Show Request Sample
import os
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

req = {
  "database_id": "\"12345678-1234-1234-1234-123456789012\"",
  "datasources": [
    {
      "bucket_name": "example name",
      "bucket_region": "example string",
      "chunking_algorithm": "CHUNKING_ALGORITHM_UNKNOWN",
      "item_path": "example string"
    }
  ],
  "embedding_model_uuid": "\"12345678-1234-1234-1234-123456789012\"",
  "name": "\"My Knowledge Base\"",
  "project_id": "\"12345678-1234-1234-1234-123456789012\"",
  "region": "\"tor1\"",
  "reranking_config": {
    "enabled": True,
    "model": "\"bge-reranker-v2-m3\""
  },
  "size": "OPEN_SEARCH_PLAN_SIZE_UNSPECIFIED",
  "tags": [
    "example string"
  ],
  "vpc_uuid": "\"12345678-1234-1234-1234-123456789012\""
}

resp = client.genai.create_knowledge_base(body=req)

Response Example

Show Response Example
{
  "knowledge_base": {
    "added_to_agent_at": "2023-01-01T00:00:00Z",
    "created_at": "2023-01-01T00:00:00Z",
    "database_id": "123e4567-e89b-12d3-a456-426614174000",
    "embedding_model_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "is_public": true,
    "last_indexing_job": {
      "completed_datasources": 123,
      "created_at": "2023-01-01T00:00:00Z",
      "data_source_jobs": [],
      "data_source_uuids": [
        "example string"
      ],
      "finished_at": "2023-01-01T00:00:00Z",
      "is_report_available": true,
      "knowledge_base_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "phase": "BATCH_JOB_PHASE_UNKNOWN",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "INDEX_JOB_STATUS_UNKNOWN",
      "tokens": 123,
      "total_datasources": 123,
      "total_tokens": "12345",
      "updated_at": "2023-01-01T00:00:00Z",
      "uuid": "123e4567-e89b-12d3-a456-426614174000"
    },
    "name": "example name",
    "project_id": "123e4567-e89b-12d3-a456-426614174000",
    "region": "example string",
    "reranking_config": {
      "enabled": true,
      "model": "\"bge-reranker-v2-m3\""
    },
    "tags": [
      "example string"
    ],
    "updated_at": "2023-01-01T00:00:00Z",
    "user_id": "12345",
    "uuid": "123e4567-e89b-12d3-a456-426614174000"
  }
}

More Information

See /v2/gen-ai/knowledge_bases in the API reference for additional detail on responses, headers, parameters, and more.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.