How to Send Prompts to a Model Using the Responses API
Validated on 27 Apr 2026 • Last edited on 27 Apr 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
The following cURL, Python OpenAI, Gradient Python SDK, and PyDo examples show how to send a prompt using the /v1/responses endpoint. Include your model access key and the following in your request:
-
model: The model ID of the model you want to use. Get the model ID using/v1/modelsor on the available models page. -
input: The prompt or input content you want the model to respond to. -
max_output_tokens: The maximum number of tokens to generate in the response. -
temperature: A value between0.0and1.0to control randomness and creativity. -
stream: Set totrueto stream partial responses.
You can also use prompt caching parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.
Send a POST request to the /v1/responses endpoint using your model access key.
The following example request sends a prompt to an OpenAI GPT-OSS-20B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of output tokens set to 50.
curl -sS -X POST https://inference.do-ai.run/v1/responses \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-oss-20b",
"input": "What is the capital of France?",
"max_output_tokens": 50,
"temperature": 0.7,
"stream": false
}'The response includes structured output and token usage details:
{
...
"output": [
{
"content": [
{
"text": "We need to answer: The capital of France is Paris. This is straightforward.",
"type": "reasoning_text"
}
],
...
},
{
"content": [
{
"text": "The capital of France is **Paris**.",
"type": "output_text"
}
],
...
}
],
...
"usage": {
"input_tokens": 72,
"input_tokens_details": {
"cached_tokens": 32
},
"output_tokens": 35,
"output_tokens_details": {
"reasoning_tokens": 17,
"tool_output_tokens": 0
},
"total_tokens": 107
},
...
}from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
base_url="https://inference.do-ai.run/v1/",
api_key=os.getenv("MODEL_ACCESS_KEY"),
)
resp = client.responses.create(
model="openai-gpt-oss-20b",
input="What is the capital of France?",
max_output_tokens=50,
temperature=0.7,
)
print(resp.output[1].content[0].text)from gradient import Gradient
from dotenv import load_dotenv
import os
load_dotenv()
client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))
resp = client.responses.create(
model="openai-gpt-oss-20b",
input="What is the capital of France?",
max_output_tokens=50,
temperature=0.7,
)
print(resp.output[1].content[0].text)from pydo import Client
from dotenv import load_dotenv
import os
load_dotenv()
client = Client(token=os.getenv("MODEL_ACCESS_KEY"))
resp = client.inference.create_response(
body={
"model": "openai-gpt-oss-20b",
"input": "What is the capital of France?",
"max_output_tokens": 50,
"temperature": 0.7,
}
)
print(resp["output"][1]["content"][0]["text"])