Give Feedback

Serverless Inference Overview

Validated on 10 Apr 2026 • Last edited on 16 Apr 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.

Copy page as Markdown View page as Markdown

Serverless inference lets you send API requests directly to foundation models without creating an AI agent or managing infrastructure. Requests are authenticated using a model access key and sent to the serverless inference API.

Serverless inference automatically scales to handle incoming requests and supports generating text, images, audio, and other model outputs. Because serverless inference does not maintain sessions, each request must include the full context needed by the model.

All requests are billed per input and output token.

When to Use Serverless Inference Versus Dedicated Inference

Dedicated Inference is a managed inference service that enables you to host and scale open-source and commercial LLMs on dedicated GPUs. It gives you more control over the environment so you can choose the GPU, tune performance, and optimize your models for throughput, latency, cost or concurrency. Dedicated inference is best suited for steady, high-throughput workloads.

Serverless inference lets you send API requests directly to foundation models. Choose serverless inference over dedicated inference when you need to get started quickly without managing any components behind an inference endpoint, don’t have a custom model to host or optimize, or have unpredictable or spiky inference traffic.

Pricing for serverless inference is based on the number of tokens used, while pricing for dedicated inference is based on the GPU hours used.

If you want to use dedicated inference, see Use Dedicated Inference.

Serverless Inference Overview

When to Use Serverless Inference Versus Dedicated Inference

We can't find any results for your search.