How to Send Prompts to a Model Using the Chat Completions API
Validated on 10 Apr 2026 • Last edited on 16 Apr 2026
DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.
The following cURL, Python OpenAI, Gradient Python SDK, and PyDo examples show how to send a prompt to a model. Include your model access key and the following in your request:
-
model: The model ID of the model you want to use. Get the model ID using/v1/modelsor on the available models page. -
messages: The input prompt or conversation history. Serverless inference does not have sessions, so include all relevant context using this field. -
temperature: A value between0.0and1.0to control randomness and creativity. -
max_completion_tokens: The maximum number of tokens to generate in the response. Use this to manage output length and cost.For Anthropic models, we recommend you specify this parameter for better accuracy and control of the model response. For models by other providers, this parameter is optional and defaults to around 2048 tokens.
-
max_tokens: This parameter is deprecated. Usemax_completion_tokensinstead to control the size of the generated response.
You can also use prompt caching and reasoning parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.
Send a POST request to the /v1/chat/completions endpoint using your model access key.
Textual Q&A
The following example request sends a prompt to a Llama 3.3 Instruct-70B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of tokens set to 256.
curl -X POST https://inference.do-ai.run/v1/chat/completions \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3-70b-instruct",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_completion_tokens": 256
}'The response includes the generated text and token usage details:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"audio": null,
"content": "The capital of France is Paris.",
"refusal": null,
"role": ""
}
}
],
"created": 1747247763,
"id": "",
"model": "llama3.3-70b-instruct",
"object": "chat.completion",
"service_tier": null,
"usage": {
"completion_tokens": 8,
"prompt_tokens": 43,
"total_tokens": 51
}
}from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
base_url="https://inference.do-ai.run/v1/",
api_key=os.getenv("MODEL_ACCESS_KEY"),
)
resp = client.chat.completions.create(
model="llama3.3-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a fun fact about octopuses."}
],
)
print(resp.choices[0].message.content)from gradient import Gradient
from dotenv import load_dotenv
import os
load_dotenv()
client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))
resp = client.chat.completions.create(
model="llama3.3-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a fun fact about octopuses."}
],
)
print(resp.choices[0].message.content)Textual Q&A
from pydo import Client
from dotenv import load_dotenv
import os
load_dotenv()
client = Client(token=os.getenv("MODEL_ACCESS_KEY"))
resp = client.inference.create_chat_completion(
body={
"model": "llama3-8b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a fun fact about octopuses."},
],
}
)
print(resp["choices"][0]["message"]["content"])