Serverless Inference Overview
Validated on 10 Apr 2026 • Last edited on 16 Apr 2026
DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.
Serverless inference lets you send API requests directly to foundation models without creating an AI agent or managing infrastructure. Requests are authenticated using a model access key and sent to the serverless inference API.
Serverless inference automatically scales to handle incoming requests and supports generating text, images, audio, and other model outputs. Because serverless inference does not maintain sessions, each request must include the full context needed by the model.
All requests are billed per input and output token.
When to Use Serverless Inference Versus Dedicated Inference
Dedicated Inference is a managed inference service that enables you to host and scale open-source and commercial LLMs on dedicated GPUs. It gives you more control over the environment so you can choose the GPU, tune performance, and optimize your models for throughput, latency, cost or concurrency. Dedicated inference is best suited for steady, high-throughput workloads.
Serverless inference lets you send API requests directly to foundation models. Choose serverless inference over dedicated inference when you need to get started quickly without managing any components behind an inference endpoint, don’t have a custom model to host or optimize, or have unpredictable or spiky inference traffic.
Pricing for serverless inference is based on the number of tokens used, while pricing for dedicated inference is based on the GPU hours used.
If you want to use dedicated inference, see Use Dedicated Inference.