How to Send Prompts to a Model Using the Responses API

Validated on 27 Apr 2026 • Last edited on 27 Apr 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The following cURL, Python OpenAI, Gradient Python SDK, and PyDo examples show how to send a prompt using the /v1/responses endpoint. Include your model access key and the following in your request:

  • model: The model ID of the model you want to use. Get the model ID using /v1/models or on the available models page.

  • input: The prompt or input content you want the model to respond to.

  • max_output_tokens: The maximum number of tokens to generate in the response.

  • temperature: A value between 0.0 and 1.0 to control randomness and creativity.

  • stream: Set to true to stream partial responses.

You can also use prompt caching parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.

Send a POST request to the /v1/responses endpoint using your model access key.

The following example request sends a prompt to an OpenAI GPT-OSS-20B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of output tokens set to 50.

curl -sS -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-20b",
    "input": "What is the capital of France?",
    "max_output_tokens": 50,
    "temperature": 0.7,
    "stream": false
  }'

The response includes structured output and token usage details:

{
  ...
  "output": [
    {
      "content": [
        {
          "text": "We need to answer: The capital of France is Paris. This is straightforward.",
          "type": "reasoning_text"
        }
      ],
      ...
    },
    {
      "content": [
        {
          "text": "The capital of France is **Paris**.",
          "type": "output_text"
        }
      ],
      ...
    }
  ],
  ...
  "usage": {
    "input_tokens": 72,
    "input_tokens_details": {
      "cached_tokens": 32
    },
    "output_tokens": 35,
    "output_tokens_details": {
      "reasoning_tokens": 17,
      "tool_output_tokens": 0
    },
    "total_tokens": 107
  },
  ...
}
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)
from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)
from pydo import Client
from dotenv import load_dotenv
import os

load_dotenv()

client = Client(token=os.getenv("MODEL_ACCESS_KEY"))

resp = client.inference.create_response(
    body={
        "model": "openai-gpt-oss-20b",
        "input": "What is the capital of France?",
        "max_output_tokens": 50,
        "temperature": 0.7,
    }
)

print(resp["output"][1]["content"][0]["text"])

We can't find any results for your search.

Try using different keywords or simplifying your search terms.