How to Send Prompts to a Model Using the Responses API

Validated on 10 Apr 2026 • Last edited on 16 Apr 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.

The following cURL, Python OpenAI, Gradient Python SDK, and PyDo examples show how to send a prompt using the /v1/responses endpoint. Include your model access key and the following in your request:

  • model: The model ID of the model you want to use. Get the model ID using /v1/models or on the available models page.

  • input: The prompt or input content you want the model to respond to.

  • max_output_tokens: The maximum number of tokens to generate in the response.

  • temperature: A value between 0.0 and 1.0 to control randomness and creativity.

  • stream: Set to true to stream partial responses.

You can also use prompt caching parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.

Send a POST request to the /v1/responses endpoint using your model access key.

The following example request sends a prompt to an OpenAI GPT-OSS-20B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of output tokens set to 50.

curl -sS -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-20b",
    "input": "What is the capital of France?",
    "max_output_tokens": 50,
    "temperature": 0.7,
    "stream": false
  }'

The response includes structured output and token usage details:

{
  ...
  "output": [
    {
      "content": [
        {
          "text": "We need to answer: The capital of France is Paris. This is straightforward.",
          "type": "reasoning_text"
        }
      ],
      ...
    },
    {
      "content": [
        {
          "text": "The capital of France is **Paris**.",
          "type": "output_text"
        }
      ],
      ...
    }
  ],
  ...
  "usage": {
    "input_tokens": 72,
    "input_tokens_details": {
      "cached_tokens": 32
    },
    "output_tokens": 35,
    "output_tokens_details": {
      "reasoning_tokens": 17,
      "tool_output_tokens": 0
    },
    "total_tokens": 107
  },
  ...
}
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)
from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)
from pydo import Client
from dotenv import load_dotenv
import os

load_dotenv()

client = Client(token=os.getenv("MODEL_ACCESS_KEY"))

resp = client.inference.create_response(
    body={
        "model": "openai-gpt-oss-20b",
        "input": "What is the capital of France?",
        "max_output_tokens": 50,
        "temperature": 0.7,
    }
)

print(resp["output"][1]["content"][0]["text"])

We can't find any results for your search.

Try using different keywords or simplifying your search terms.