DigitalOcean AI Website Crawler
Validated on 14 Jan 2026 • Last edited on 27 Apr 2026
DigitalOcean AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more.
When you specify a website URL as a data source for your knowledge base, DigitalOcean uses a custom agent named DigitalOceanGradientAICrawler/1.0 to index the website content. The crawler indexes up to 5,500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.
Depending on the behavior you select, the crawler follows HTML links on the site, indexes text and certain image types, and ignores videos and navigation links. It respects the website’s robots.txt rules, including any Disallow directives or the wildcard *.
For guidance on adding seed URLs and site map URLs to a DigitalOcean knowledge base, see Add a Data Source.