Give Feedback

How to Send Prompts to a Model Using the Chat Completions API

Last verified 13 Jul 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

The following cURL, Python PyDo, Python OpenAI, and Gradient Python SDK examples show how to send a prompt to a model. Include your model access key and the following in your request:

model: The ID of the model you want to use. Get the model ID using /v1/models or from the available models page.
messages: The input prompt or conversation history. Serverless inference does not have sessions, so include all relevant context using this field.
temperature: A value between 0.0 and 1.0 to control randomness and creativity.
max_completion_tokens: The maximum number of tokens to generate in the response. Use this to manage output length and cost.

For Anthropic models, we recommend you specify this parameter for better accuracy and control of the model response. For models by other providers, this parameter is optional and defaults to around 2048 tokens.
max_tokens: This parameter is deprecated. Use max_completion_tokens instead to control the size of the generated response.

You can also use prompt caching and reasoning parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.

Textual Q&A

The following example request sends a prompt to a Llama 3.3 Instruct-70B model with the prompt What is the capital of Portugal?, a temperature of 0.7, and maximum number of tokens set to 256.

Create a model access key and save it for use with the API.

Python

Using PyDo, the official DigitalOcean API client for Python:

import os
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

resp = client.chat.completions.create(
    model="llama3.3-70b-instruct",
    messages=[
        {"role": "user", "content": "What is the capital of Portugal?"},
    ],
)

print(resp.choices[0].message.content)

JavaScript

Using dots, the official DigitalOcean API client for JavaScript:

import { InferenceClient } from "@digitalocean/dots";

const client = new InferenceClient({
    apiKey: process.env.DIGITALOCEAN_TOKEN,
});

const completion = await client.chat.completions.create({
    model: "llama3.3-70b-instruct",
    messages: [
        { role: "user", content: "What is the capital of Portugal?" },
    ],
});

console.log(completion.choices[0].message.content);

cURL

Send a POST request to https://inference.do-ai.run/v1/chat/completions.

Using cURL:

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -d '{"messages": [{"role": "user", "content": "What is the capital of Portugal?"}], "model": "meta-llama/Meta-Llama-3.1-8B-Instruct"}' \
  "https://inference.do-ai.run/v1/chat/completions"

The response includes the generated text and token usage details:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "The capital of Portugal is Lisbon.",
        "refusal": null,
        "role": ""
      }
    }
  ],
  "created": 1747247763,
  "id": "",
  "model": "llama3.3-70b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 43,
    "total_tokens": 51
  }
}

You can also use the Python OpenAI and Gradient Python SDKs:

Python OpenAI

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

resp = client.chat.completions.create(
    model="llama3.3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about octopuses."}
    ],
)

print(resp.choices[0].message.content)

Gradient Python SDK

from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

resp = client.chat.completions.create(
    model="llama3.3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about octopuses."}
    ],
)

print(resp.choices[0].message.content)

How to Send Prompts to a Model Using the Chat Completions API

Textual Q&A

We can't find any results for your search.