Inference How-Tos

Generated on 28 Apr 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Serverless Inference

Serverless Inference Overview

What is serverless inference and how it differs from dedicated inference.

Serverless Inference API Endpoints

Synchronous and asynchronous API endpoints for serverless inference.

Use Serverless Inference

Send API requests directly to foundation models without creating an AI agent or managing infrastructure.

How to Retrieve Available Models

How to retrieve models available for serverless inference.

How to Send Prompts to a Model Using the Chat Completions API

Send prompts and use reasoning with the Chat Completions API.

How to Send Prompts to a Model Using the Responses API

Send prompts with the Responses API.

How to Use Prompt Caching in Chat Completions and Responses API

Use prompt caching with the Chat Completions and Responses API.

How to Use Reasoning with the Chat Completions and Responses API

Use reasoning with the Chat Completions and Responses API.

How to Generate Images from Text Prompts

Generate or edit images from text prompts.

How to Use fal Models to Generate Image, Audio, or Text-to-Speech

Generate image, audio, or text-to-speech using fal models.

How to Convert Text Into Dense Vector Representations

Convert text into dense vector representations for use in semantic search, retrieval-augmented generation (RAG), clustering, classification, and similarity matching.

How to View Serverless Inference Metrics

View metrics such as latency, throughput, error rates, token consumption, cost attribution, and rate limiting.

How to Use Built-in Tools

Extend model capabilities with server-side tools like knowledge base retrieval and MCP during inference requests.

How to Use Serverless Inference After Updating to Another Model

How to use serverless inference after updating a model.

Manage Model Catalog

How to Browse Models in Model Catalog

Identify the right model for your use case by filtering available foundation models by capabilities and price.

Use Model Playground

Test and Compare Models Using the Model Playground

Test and compare foundation models in the Model Playground.

Manage Inference Deployments

How to Use Dedicated Inference

Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.

How to Use Batch Inference on DigitalOcean AI

Batch inference runs text jobs asynchronously through batch APIs compatible with OpenAI and Anthropic using your serverless inference model access key.

How to Create and Manage Model Access Keys

Create, scope, and manage model access keys for foundation models, inference routers, and batch inference, with VPC restrictions and team-owner visibility.

Agentic Workflows

How to Use Claude Code and Other Agentic Workflows on DigitalOcean

Use the Messages API with Claude Code and similar agentic workflows.

Evaluate Models

How to Evaluate Models

Determine which model best fits your specific use case.

Use Coding Agents

How to Use Coding Agents With DigitalOcean

Configure Codex CLI, Claude Code, Cline, OpenCode, Cursor, and OpenClaw to use inference with your model access key.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.