Give Feedback

How to Use Batch Inference

Validated on 14 May 2026 • Last edited on 15 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

Batch inference lets you run large collections of LLM requests as a single asynchronous job and retrieve results when processing completes, typically within 24 hours.

Using batch inference, you run large sets of text prompts asynchronously with OpenAI and Anthropic models. You use the same model access key, and send requests to the same serverless inference base URL (https://inference.do-ai.run). DigitalOcean forwards compatible batch traffic to the model provider.

Note

Only text prompts for OpenAI and Anthropic commercial models are supported for batch inference.

Use the Batch Inference API

Batch inference follows a three-step asynchronous pattern. First, prepare and submit your input file and create the batch job. Then, poll the job for results. When the job completes, download the results.

Prepare Your Input File

You need to submit batch inputs as a JSONL file (JSON Lines), where each line represents one inference request. The file size must be less than or equal to 200 MB, and can contain no more than 50,000 requests per file. Each line follows the batch input schema for the provider you use:

OpenAI: Each line must follow the OpenAI Batch API input schema. For example, custom_id, method, url, and body for the chosen endpoint. The endpoint you set on the job must match the URLs used in the file (such as /v1/chat/completions).
Anthropic: Each line should follow Anthropic Message Batches JSONL conventions for batch requests.

Refer to the provider’s batch documentation for the exact per-line JSON shape and size limits.

OpenAI

{"custom_id": "req-1", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1-mini", "input": "Summarize the following article: ...", "max_output_tokens": 256}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1-mini", "input": "Classify this review as positive or negative: ...", "temperature": 0.2}}

Anthropic

{"custom_id": "req-1", "params": {"model": "claude-3-5-sonnet-latest", "messages": [{"role": "user", "content": [{"type": "text", "text": "Summarize this document: ..."}]}], "max_tokens": 256}}
{"custom_id": "req-2", "params": {"model": "claude-3-5-sonnet-latest", "messages": [{"role": "user", "content": [{"type": "text", "text": "Extract key entities from: ..."}]}], "max_tokens": 256, "temperature": 0.2}}

Note

Every custom_id must be unique within the file. Duplicate custom_id values cause validation to fail.

Upload Your JSONL File

Before creating a batch job, create a batch file intent, which creates a file record and returns a presigned PUT URL and a file_id (UUID) that you pass when creating the batch job. We use a two-step presigned upload to avoid routing large payloads through the API gateway. In the cURL request, the file_name must end with .jsonl (case-insensitive).

Create a model access key and save it for use with the API.

cURL

Send a POST request to https://inference.do-ai.run/v1/batches/files.

Using cURL:

curl -sS -X POST "https://inference.do-ai.run/v1/batches/files" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "batch_requests.jsonl"
  }' | jq

Python

Using PyDo, the official DigitalOcean API client for Python:

import json
import os
from pathlib import Path

from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

input_path = Path("batch_requests.jsonl")
requests = [
    {
        "custom_id": "q-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "llama3.3-70b-instruct",
            "messages": [
                {"role": "user", "content": "One fun fact about octopuses."}
            ],
            "max_tokens": 128,
        },
    },
    {
        "custom_id": "q-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "llama3.3-70b-instruct",
            "messages": [
                {"role": "user", "content": "One fun fact about sharks."}
            ],
            "max_tokens": 128,
        },
    },
]
input_path.write_text("\n".join(json.dumps(r) for r in requests) + "\n")

uploaded = client.files.create(file=input_path, purpose="batch")

print("file_id: ", uploaded.file_id)
print("filename:", uploaded.filename)
print("bytes:   ", uploaded.bytes)

The response looks similar to the following:

{
  "expires_at": "2026-04-24T19:34:19Z",
  "file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
  "upload_url": "<presigned_spaces_url>"
}

Note the file_id and upload_url values. The file_id is valid for up to 30 days and you can reuse it across multiple batch jobs. The presigned upload URL is short-lived (about 15 minutes). Upload your file before it expires. If you miss the window, create a new file intent and upload again.

Next, send the raw JSONL bytes in a PUT request to the exact upload_url from the previous step. In the request:

Use --data-binary '@yourfile.jsonl' (or curl -T yourfile.jsonl '<UPLOAD_URL>') so that line endings and UTF-8 are preserved.
For Content-Type header, use whatever the presigned URL expects. Many Spaces/S3-style presigned PUT requests are sensitive to headers. If you are unsure, try the request without an extra Content-Type, or use application/octet-stream. Snippets with application/jsonl are nonstandard and can break signature matching. Avoid it unless your URL was signed for that header.

Note

The create batch step fails until the object exists in storage. Always finish this PUT request before calling POST /v1/batches.

cURL

Send a PUT request to the dynamic URL from the previous step.

Using cURL:

# UPLOAD_URL is the exact upload_url returned by POST /v1/batches/files.
# Use it verbatim; do not modify the host, path, or query string.
#
# Send the raw JSONL bytes with --data-binary so line endings and UTF-8
# are preserved. The presigned URL is signature-sensitive: prefer
# application/octet-stream (or omit Content-Type entirely) — a custom
# value such as application/jsonl can break signature matching unless
# the URL was signed for that exact header.
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@batch_requests.jsonl"

Python

Using PyDo, the official DigitalOcean API client for Python:

# Two-step upload flow:
#   1. Reserve a file_id + presigned upload_url via client.batches.files.create.
#   2. PUT the raw JSONL bytes to upload_url.
#
# The presigned URL is short-lived (~15 minutes) and signature-sensitive —
# use it verbatim and prefer Content-Type application/octet-stream (or
# omit the header entirely). A custom value such as application/jsonl
# can break signature matching.
import os
from pathlib import Path

import requests
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

input_path = Path("batch_requests.jsonl")

# Step 1: reserve the upload slot.
intent = client.batches.files.create(file_name=input_path.name)
upload_url = intent["upload_url"]
file_id = intent["file_id"]

# Step 2: PUT the JSONL bytes to the presigned URL.
with input_path.open("rb") as fh:
    put = requests.put(
        upload_url,
        data=fh,
        headers={"Content-Type": "application/octet-stream"},
        timeout=60,
    )
put.raise_for_status()

print("uploaded file_id:", file_id)

If the upload fails, retry using the same presigned URL until it expires. If it expires, get a new one by re-requesting the presigned URL.

Create the Batch Job

Once your file is uploaded, create the batch job by using the file_id. The request body uses the following fields:

file_id: UUID returned from POST /v1/batches/files.
provider: OpenAI or Anthropic.
completion_window: Time window in which the job must complete. Currently, only 24h is accepted. Jobs that do not finish within this window transition to expired.
request_id: Idempotency key you choose, such as a new UUID per logical job. Retries with the same request_id replay the existing job instead of creating a duplicate.
endpoint: Allowed OpenAI values are /v1/responses and /v1/chat/completions. The value must match the url set on each line of the JSONL file. Required for OpenAI models, omit for Anthropic models.

Note

The create-batch step performs a HEAD check against object storage and fails until the JSONL file is in place. Always finish the PUT upload from the previous step before calling POST /v1/batches.

Create a model access key and save it for use with the API.

cURL

Send a POST request to https://inference.do-ai.run/v1/batches.

Using cURL:

# OpenAI provider — endpoint required (/v1/responses or /v1/chat/completions)
curl -sS -X POST "https://inference.do-ai.run/v1/batches" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
    "provider": "openai",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h",
    "request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11"
  }'

# Anthropic provider — DO NOT send endpoint
curl -sS -X POST "https://inference.do-ai.run/v1/batches" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
    "provider": "anthropic",
    "completion_window": "24h",
    "request_id": "2f1a7d9e-8c03-4d2c-9b7e-6f8e2b1a4c77"
  }'

Python

Using PyDo, the official DigitalOcean API client for Python:

import os
import uuid

from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

batch = client.batches.create(
    file_id=os.environ["BATCH_INPUT_FILE_ID"],
    provider="openai",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    request_id=str(uuid.uuid4()),
)

print("batch_id:", batch.get("batch_id"))
print("status:  ", batch.get("status"))

The response includes the batch_id and an initial status, similar to the following:

{
  "batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
  ...
  "provider": "openai",
  "request_counts": {
    "completed": 0,
    "failed": 0,
    "total": 10000
  },
  "request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
  "status": "in_progress"
}

Note the returned batch_id for polling and downloading the results.

Monitor Job Status

Poll the batch status endpoint to track job progress. The status is updated in near real-time.

Create a model access key and save it for use with the API.

cURL

Send a GET request to https://inference.do-ai.run/v1/batches/{batch_id}.

Using cURL:

curl -sS -X GET "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json"

Python

Using PyDo, the official DigitalOcean API client for Python:

import os

from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

batch = client.batches.retrieve(os.environ["BATCH_ID"])

print("batch_id:      ", batch.get("batch_id"))
print("status:        ", batch.get("status"))
print("request_counts:", batch.get("request_counts"))
print("output_file_id:", batch.get("output_file_id"))

The response looks like the following:

{
  "batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
  "cancelled_at": "2026-04-24T19:45:11Z",
  "completed_at": "2026-04-24T20:15:30Z",
  "completion_window": "24h",
  "created_at": "2026-04-24T19:19:19Z",
  "endpoint": "/v1/chat/completions",
  ...
  "in_progress_at": "2026-04-24T19:20:05Z",
  "input_file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
  "output_file_id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
  "provider": "openai",
  "request_counts": {
    "completed": 0,
    "failed": 0,
    "total": 10000
  },
  "request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
  "status": "in_progress"
}

The job status is shown in the status field. Jobs progress through the following states:

validating: The platform is checking the JSONL file structure, unique custom_id values, token counts, and other basic checks.
queued: Validation passed. The job is waiting for compute resources.
in_progress: A worker is actively executing inference requests.
completed: All requests have been processed. This state is reached even if some individual requests failed. Failed requests are logged in the error file.
failed: The entire job failed due to a systemic or unrecoverable error such as complete file validation failure.
cancelling: A cancellation was requested. In-flight requests may still complete.
cancelled: The job was cancelled. Results for completed requests are preserved and available.
expired: The job exceeded the 24-hour completion window. Results for any completed requests are preserved and available.

Download Results

Once the job reaches a terminal state (completed, cancelled, failed, or expired), retrieve your results. The results endpoint returns result_available: true when the output is ready. If result_available is false, continue polling and retry the request.

Warning

Output files are retained for up to 30 days after job completion. After this window, results are permanently deleted and cannot be recovered.

Create a model access key and save it for use with the API.

cURL

Send a GET request to https://inference.do-ai.run/v1/batches/{batch_id}/results.

Using cURL:

curl -sS -X GET "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21/results" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" | jq

Python

Using PyDo, the official DigitalOcean API client for Python:

import os
from pathlib import Path

import requests
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

batch_id = os.environ["BATCH_ID"]

links = client.batches.results.retrieve(batch_id)

if not links.get("result_available"):
    print("results not ready yet; poll batch status and retry")
    raise SystemExit(0)

resp = requests.get(links["output_file_url"], timeout=60)
resp.raise_for_status()

out = Path("batch_output.jsonl")
out.write_bytes(resp.content)

print("wrote:", out)
print("----- preview -----")
print(resp.text[:500])

The response looks like the following:

{
  "batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
  "error_file_url": "string",
  "expires_at": "2026-04-24T20:19:19Z",
  "output_file_url": "https://batch-inference.nyc3.digitaloceanspaces.com/outputs/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6bxxxx.jsonl?X-Amz-Signature=...",
  "result_available": true
}

Each call to the results endpoint returns new presigned URLs that are short-lived, so download the files soon after fetching them. The error_file_url is only present when an error file was generated for the job.

The output file is a JSONL file where each line corresponds to one request from your input. Each line includes the following fields and values:

{"custom_id": "req-1", "response": {"id": "chatcmpl-...", "choices": [{"message": {"role": "assistant", "content": "Summary: ..."}}], "usage": {"prompt_tokens": 312, "completion_tokens": 89}}, "error": null}
{"custom_id": "req-2", "response": null, "error": {"code": "context_length_exceeded", "message": "Request exceeded maximum context length."}}

Requests that failed are written to a separate error file that has the following fields and values:

{"custom_id": "req-2", "error": {"code": "context_length_exceeded", "message": "Request exceeded maximum context length."}}
{"custom_id": "req-47", "error": {"code": "content_policy_violation", "message": "Request was blocked by content moderation."}}

Cancel a Batch Job

You can cancel a batch job at any time before it reaches a terminal state. Results for requests that were already completed before cancellation are preserved and billed. Incomplete requests are not billed.

Create a model access key and save it for use with the API.

cURL

Send a POST request to https://inference.do-ai.run/v1/batches/{batch_id}/cancel.

Using cURL:

curl -sS -X POST "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21/cancel" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" | jq

Python

Using PyDo, the official DigitalOcean API client for Python:

import os

from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

result = client.batches.cancel(os.environ["BATCH_ID"])

print("batch_id:    ", result.get("batch_id"))
print("status:      ", result.get("status"))
print("cancelled_at:", result.get("cancelled_at"))

The job transitions to a cancelling status immediately. Because both OpenAI and Anthropic process cancellations asynchronously, the job remains in cancelling until the provider confirms the final state, at which point it transitions to cancelled. Continue polling until you see the cancelled status. The response looks like the following:

{
  "batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
  ...
  "provider": "openai",
  "request_counts": {
    "completed": 0,
    "failed": 0,
    "total": 10000
  },
  "request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
  "status": "cancelling"
}

The cancel endpoint can return 409 Conflict in two cases:

The job is already in a terminal state (completed, failed, expired, or cancelled).
The job has not yet been submitted to the upstream provider, so there is no provider-side batch to cancel. Wait until the job moves out of validating and try again.

List Batch Jobs

To retrieve a list of all batch jobs, use one of the following. The endpoint supports pagination through the limit query parameter and an optional status filter (for example, completed, failed, in_progress, validating, queued, cancelling, cancelled, or expired).

Create a model access key and save it for use with the API.

cURL

Send a GET request to https://inference.do-ai.run/v1/batches.

Using cURL:

curl -sS -X GET "https://inference.do-ai.run/v1/batches?limit=20" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -H "Content-Type: application/json" | jq

Python

Using PyDo, the official DigitalOcean API client for Python:

import os

from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

resp = client.batches.list(limit=20)

for b in resp.get("data") or []:
    print(f"{b.get('batch_id'):40}  {b.get('status'):12}  {b.get('created_at')}")

print("has_more:", resp.get("has_more"))
print("last_id: ", resp.get("last_id"))

Troubleshooting

Symptom	Likely cause
`403` on batch routes	Batch inference is not enabled for the team or account.
`400` on `file_name`	Missing `.jsonl` extension on the file name.
`400` on create batch	`file_id` is not a UUID, `completion_window` is invalid, `endpoint` is missing for an OpenAI batch, or `endpoint` was set on an Anthropic batch.
`409` “upload first” or “not ready” on create batch	The JSONL file was not uploaded to `upload_url`, or the upload URL expired before the PUT completed.
`429` on create batch	Batch create rate limit or daily batch request limit.
Empty or missing results	The job is not yet complete, or `result_available` is `false`. Continue polling and retry after the job reaches a terminal state.

Full Python Example

The following is an end-to-end example showing how to create and run a batch inference job:

Full Python example

import json
import os
import time
import requests
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

input_path = "prompts.jsonl"
output_path = "batch_output.jsonl"

# Step 1: write the input JSONL file.
prompts = [
    "Summarize the following article: ...",
    "Classify this review as positive or negative: ...",
]
with open(input_path, "w") as f:
    for i, prompt in enumerate(prompts):
        f.write(json.dumps({
            "custom_id": f"req-{i}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4.1-mini",
                "messages": [{"role": "user", "content": prompt}],
                "max_tokens": 256,
            },
        }) + "\n")

# Step 2: reserve a presigned upload URL, then upload the JSONL bytes.
intent = client.batches.files.create(file_name=input_path)
upload_url = intent["upload_url"]
file_id = intent["file_id"]

with open(input_path, "rb") as fh:
    put = requests.put(
        upload_url,
        data=fh,
        headers={"Content-Type": "application/octet-stream"},
        timeout=60,
    )
# Stop if the presigned upload fails, before creating the batch job.
put.raise_for_status()

# Step 3: Create the batch job.
batch = client.batches.create(
    file_id=file_id,
    provider="openai",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    request_id=os.urandom(16).hex(),
)
batch_id = batch.get("batch_id")
print("Batch submitted:", batch_id, "status:", batch.get("status"))

# Step 4: Poll for completion.
while True:
    batch = client.batches.retrieve(batch_id)
    counts = batch.get("request_counts", {})
    status = batch.get("status")
    print(f"[{status}] {counts.get('completed', 0)}/{counts.get('total', 0)} complete")
    if status in ("completed", "failed", "expired", "cancelled"):
        break
    time.sleep(60)

# Step 5: Fetch results once they are available.
if batch.get("status") == "completed":
    result_info  = client.batches.results.retrieve(batch_id)
    if not result_info .get("result_available"):
        print("results not ready yet; retry later")
        raise SystemExit(0)

    resp = requests.get(result_info ["output_file_url"], timeout=60)
    resp.raise_for_status()

    with open(output_path, "wb") as f:
        f.write(resp.content)
    print("wrote:", output_path)

How to Use Batch Inference

Use the Batch Inference API

Prepare Your Input File

Upload Your JSONL File

Create the Batch Job

Monitor Job Status

Download Results

Cancel a Batch Job

List Batch Jobs

Troubleshooting

Full Python Example

We can't find any results for your search.