Inference

Last verified 22 Jun 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

digitalocean-product-icon-available-standalone-service
Browse Models in Model Catalog

Identify the right model for your use case by filtering available foundation models by capabilities and price.

digitalocean-product-icon-available-standalone-service
Use Model Playground

Test and compare foundation models in the Model Playground.

digitalocean-product-icon-available-standalone-service
Use Serverless Inference

Send API requests directly to foundation models without creating an AI agent or managing infrastructure.

digitalocean-product-icon-available-standalone-service
Deploy to Dedicated Inference Endpoints

Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.

digitalocean-product-icon-available-standalone-service
Use Inference Router

Route serverless inference requests to foundation models using rules.

digitalocean-product-icon-available-standalone-service
Evaluate Models

Determine which model best fits your specific use case.

digitalocean-product-icon-available-standalone-service
Use Batch Inference

Batch Inference lets you run large collections of LLM requests as a single asynchronous job.

digitalocean-product-icon-available-standalone-service
Use Agent Platform

Use to build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more.

Latest Updates

1 July 2026

  • Prompt caching for open-source models in serverless inference chat completions and responses API is now in public preview. Open-source models cache context automatically, so you do not need to set the cache_control or prompt_cache_retention parameters.

    Prompt caching is available for the following open-source models:

    • DeepSeek V3.2
    • DeepSeek V4 Pro
    • DeepSeek V4 Flash
    • Kimi K2.5
    • Kimi K2.6
    • GLM 5
    • GLM-5.1
    • GLM-5.2
    • gpt-oss-120b
    • MiMo V2.5
    • MiMo V2.5 Pro
    • MiniMax M2.5
    • Qwen 3.5
    • Qwen3 Coder Flash

    For more information, see Use Prompt Caching.

30 June 2026

29 June 2026

  • Serverless Inference now requires a positive prepaid account balance before you can send inference requests. Usage charges are deducted from this balance, and access is suspended if it reaches $0. You can add a prepayment manually or enable auto-reload to replenish your balance automatically. For more information, see Manage Serverless Inference Prepayment.

For more information, see the full release notes.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.