Serverless Inference API Endpoints

Validated on 10 Apr 2026 • Last edited on 16 Apr 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.

To use serverless inference, you need to authenticate your HTTP requests with a model access key, and then send your prompts to models for chat completions, image, audio, and text-to-speech generation from text prompts.

Prerequisites

You can create a model access key in the DigitalOcean Control Panel or by sending a POST request to the /v2/gen-ai/models/api_keys endpoint. Then, send your prompts to models from OpenAI, Anthropic, Meta, or other providers using the serverless inference API endpoints.

Serverless Inference API Endpoints

Depending on the modality, the serverless inference endpoints can be synchronous or asynchronous:

  • Synchronous: The response is returned directly when the job completes. The output is returned directly in the API response as base64 and is not stored long-term on behalf of users. The endpoints are OpenAI compatible. Use for image generation and audio workloads.

  • Asynchronous: You submit a job, receive a job ID, and poll for the result. When the job completes, you then use the endpoint to fetch the complete generated result. Use for video generation and other long-running workloads.

    Warning
    For asynchronous video generation requests, result storage is temporary and expires 2 hours after the job completes. After this window, the generated video and any presigned download URLs are permanently purged and cannot be retrieved.

The following table shows the available serverless inference endpoints:

API Name Type Base URL Endpoint Verb Description
Model Synchronous https://inference.do-ai.run /v1/models GET Returns a list of available models and their IDs.
Chat Completions Synchronous https://inference.do-ai.run /v1/chat/completions POST Sends chat-style prompts and returns model responses.
Responses Synchronous https://inference.do-ai.run /v1/responses POST Sends chat-style prompts and returns text or multimodal model responses.
Images Synchronous https://inference.do-ai.run /v1/images/generations POST Generates images from text prompts.
Images Asynchronous https://inference.do-ai.run /v1/async-invoke POST Sends text, image, or text-to-speech generation requests to fal models.

For more information, see the API reference.

We support both Chat Completions and Responses APIs for sending prompts. Choose the endpoint that best fits your use case:

  • Use the Chat Completions API when building or maintaining chat-style integrations that rely on structured messages with roles such as system, user, and assistant, or when migrating existing chat-based code with minimal changes.

  • Use the Responses API when building new integrations or working with newer models that only support the Responses API. It’s also useful for multi-step tool use in a single request, preserving state across turns with store: true, and simplifying requests by using a single input field with improved caching efficiency.

You can use these endpoints through cURL, Python OpenAI, Gradient Python SDK, and PyDo.

Alternatively, you can call serverless inference from your automation workflows. The n8n community node connects to any DigitalOcean-hosted model using your model access key. You can self-host n8n using the n8n Marketplace app.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.