How to Use Batch Inference
Validated on 14 May 2026 • Last edited on 15 May 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
Batch inference lets you run large collections of LLM requests as a single asynchronous job and retrieve results when processing completes, typically within 24 hours.
Using batch inference, you run large sets of text prompts asynchronously with OpenAI and Anthropic models. You use the same model access key, and send requests to the same serverless inference base URL (https://inference.do-ai.run). DigitalOcean forwards compatible batch traffic to the model provider.
Use the Batch Inference API
Batch inference follows a three-step asynchronous pattern. First, prepare and submit your input file and create the batch job. Then, poll the job for results. When the job completes, download the results.
Prepare Your Input File
You need to submit batch inputs as a JSONL file (JSON Lines), where each line represents one inference request. The file size must be less than or equal to 200 MB, and can contain no more than 50,000 requests per file. Each line follows the batch input schema for the provider you use:
- OpenAI: Each line must follow the OpenAI Batch API input schema. For example,
custom_id,method,url, andbodyfor the chosen endpoint. The endpoint you set on the job must match the URLs used in the file (such as/v1/chat/completions). - Anthropic: Each line should follow Anthropic Message Batches JSONL conventions for batch requests.
Refer to the provider’s batch documentation for the exact per-line JSON shape and size limits.
{"custom_id": "req-1", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1-mini", "input": "Summarize the following article: ...", "max_output_tokens": 256}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1-mini", "input": "Classify this review as positive or negative: ...", "temperature": 0.2}}{"custom_id": "req-1", "params": {"model": "claude-3-5-sonnet-latest", "messages": [{"role": "user", "content": [{"type": "text", "text": "Summarize this document: ..."}]}], "max_tokens": 256}}
{"custom_id": "req-2", "params": {"model": "claude-3-5-sonnet-latest", "messages": [{"role": "user", "content": [{"type": "text", "text": "Extract key entities from: ..."}]}], "max_tokens": 256, "temperature": 0.2}}custom_id must be unique within the file. Duplicate custom_id values cause validation to fail.
Upload Your JSONL File
Before creating a batch job, create a batch file intent, which creates a file record and returns a presigned PUT URL and a file_id (UUID) that you pass when creating the batch job. We use a two-step presigned upload to avoid routing large payloads through the API gateway. In the cURL request, the file_name must end with .jsonl (case-insensitive).
Create a model access key and save it for use with the API.
Send a POST request to https://inference.do-ai.run/v1/batches/files.
Using cURL:
curl -sS -X POST "https://inference.do-ai.run/v1/batches/files" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_name": "batch_requests.jsonl"
}' | jqUsing PyDo, the official DigitalOcean API client for Python:
import json
import os
from pathlib import Path
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
input_path = Path("batch_requests.jsonl")
requests = [
{
"custom_id": "q-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "llama3.3-70b-instruct",
"messages": [
{"role": "user", "content": "One fun fact about octopuses."}
],
"max_tokens": 128,
},
},
{
"custom_id": "q-2",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "llama3.3-70b-instruct",
"messages": [
{"role": "user", "content": "One fun fact about sharks."}
],
"max_tokens": 128,
},
},
]
input_path.write_text("\n".join(json.dumps(r) for r in requests) + "\n")
uploaded = client.files.create(file=input_path, purpose="batch")
print("file_id: ", uploaded.file_id)
print("filename:", uploaded.filename)
print("bytes: ", uploaded.bytes)The response looks similar to the following:
{
"expires_at": "2026-04-24T19:34:19Z",
"file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
"upload_url": "<presigned_spaces_url>"
}Note the file_id and upload_url values. The file_id is valid for up to 30 days and you can reuse it across multiple batch jobs. The presigned upload URL is short-lived (about 15 minutes). Upload your file before it expires. If you miss the window, create a new file intent and upload again.
Next, send the raw JSONL bytes in a PUT request to the exact upload_url from the previous step. In the request:
- Use
--data-binary '@yourfile.jsonl'(orcurl -T yourfile.jsonl '<UPLOAD_URL>') so that line endings and UTF-8 are preserved. - For
Content-Typeheader, use whatever the presigned URL expects. Many Spaces/S3-style presignedPUTrequests are sensitive to headers. If you are unsure, try the request without an extraContent-Type, or useapplication/octet-stream. Snippets withapplication/jsonlare nonstandard and can break signature matching. Avoid it unless your URL was signed for that header.
PUT request before calling POST /v1/batches.
Send a PUT request to the dynamic URL from the previous step.
Using cURL:
# UPLOAD_URL is the exact upload_url returned by POST /v1/batches/files.
# Use it verbatim; do not modify the host, path, or query string.
#
# Send the raw JSONL bytes with --data-binary so line endings and UTF-8
# are preserved. The presigned URL is signature-sensitive: prefer
# application/octet-stream (or omit Content-Type entirely) — a custom
# value such as application/jsonl can break signature matching unless
# the URL was signed for that exact header.
curl -X PUT "$UPLOAD_URL" \
-H "Content-Type: application/octet-stream" \
--data-binary "@batch_requests.jsonl"Using PyDo, the official DigitalOcean API client for Python:
# Two-step upload flow:
# 1. Reserve a file_id + presigned upload_url via client.batches.files.create.
# 2. PUT the raw JSONL bytes to upload_url.
#
# The presigned URL is short-lived (~15 minutes) and signature-sensitive —
# use it verbatim and prefer Content-Type application/octet-stream (or
# omit the header entirely). A custom value such as application/jsonl
# can break signature matching.
import os
from pathlib import Path
import requests
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
input_path = Path("batch_requests.jsonl")
# Step 1: reserve the upload slot.
intent = client.batches.files.create(file_name=input_path.name)
upload_url = intent["upload_url"]
file_id = intent["file_id"]
# Step 2: PUT the JSONL bytes to the presigned URL.
with input_path.open("rb") as fh:
put = requests.put(
upload_url,
data=fh,
headers={"Content-Type": "application/octet-stream"},
timeout=60,
)
put.raise_for_status()
print("uploaded file_id:", file_id)If the upload fails, retry using the same presigned URL until it expires. If it expires, get a new one by re-requesting the presigned URL.
Create the Batch Job
Once your file is uploaded, create the batch job by using the file_id. The request body uses the following fields:
file_id: UUID returned fromPOST /v1/batches/files.provider: OpenAI or Anthropic.completion_window: Time window in which the job must complete. Currently, only24his accepted. Jobs that do not finish within this window transition toexpired.request_id: Idempotency key you choose, such as a new UUID per logical job. Retries with the samerequest_idreplay the existing job instead of creating a duplicate.endpoint: Allowed OpenAI values are/v1/responsesand/v1/chat/completions. The value must match theurlset on each line of the JSONL file. Required for OpenAI models, omit for Anthropic models.
HEAD check against object storage and fails until the JSONL file is in place. Always finish the PUT upload from the previous step before calling POST /v1/batches.
Create a model access key and save it for use with the API.
Send a POST request to https://inference.do-ai.run/v1/batches.
Using cURL:
# OpenAI provider — endpoint required (/v1/responses or /v1/chat/completions)
curl -sS -X POST "https://inference.do-ai.run/v1/batches" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
"provider": "openai",
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
"request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11"
}'
# Anthropic provider — DO NOT send endpoint
curl -sS -X POST "https://inference.do-ai.run/v1/batches" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
"provider": "anthropic",
"completion_window": "24h",
"request_id": "2f1a7d9e-8c03-4d2c-9b7e-6f8e2b1a4c77"
}'Using PyDo, the official DigitalOcean API client for Python:
import os
import uuid
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
batch = client.batches.create(
file_id=os.environ["BATCH_INPUT_FILE_ID"],
provider="openai",
endpoint="/v1/chat/completions",
completion_window="24h",
request_id=str(uuid.uuid4()),
)
print("batch_id:", batch.get("batch_id"))
print("status: ", batch.get("status"))The response includes the batch_id and an initial status, similar to the following:
{
"batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
...
"provider": "openai",
"request_counts": {
"completed": 0,
"failed": 0,
"total": 10000
},
"request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
"status": "in_progress"
}Note the returned batch_id for polling and downloading the results.
Monitor Job Status
Poll the batch status endpoint to track job progress. The status is updated in near real-time.
Create a model access key and save it for use with the API.
Send a GET request to https://inference.do-ai.run/v1/batches/{batch_id}.
Using cURL:
curl -sS -X GET "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json"Using PyDo, the official DigitalOcean API client for Python:
import os
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
batch = client.batches.retrieve(os.environ["BATCH_ID"])
print("batch_id: ", batch.get("batch_id"))
print("status: ", batch.get("status"))
print("request_counts:", batch.get("request_counts"))
print("output_file_id:", batch.get("output_file_id"))The response looks like the following:
{
"batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
"cancelled_at": "2026-04-24T19:45:11Z",
"completed_at": "2026-04-24T20:15:30Z",
"completion_window": "24h",
"created_at": "2026-04-24T19:19:19Z",
"endpoint": "/v1/chat/completions",
...
"in_progress_at": "2026-04-24T19:20:05Z",
"input_file_id": "a1b2c3d4-e5f6-4789-90ab-cdef12345678",
"output_file_id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
"provider": "openai",
"request_counts": {
"completed": 0,
"failed": 0,
"total": 10000
},
"request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
"status": "in_progress"
}The job status is shown in the status field. Jobs progress through the following states:
-
validating: The platform is checking the JSONL file structure, uniquecustom_idvalues, token counts, and other basic checks. -
queued: Validation passed. The job is waiting for compute resources. -
in_progress: A worker is actively executing inference requests. -
completed: All requests have been processed. This state is reached even if some individual requests failed. Failed requests are logged in the error file. -
failed: The entire job failed due to a systemic or unrecoverable error such as complete file validation failure. -
cancelling: A cancellation was requested. In-flight requests may still complete. -
cancelled: The job was cancelled. Results for completed requests are preserved and available. -
expired: The job exceeded the 24-hour completion window. Results for any completed requests are preserved and available.
Download Results
Once the job reaches a terminal state (completed, cancelled, failed, or expired), retrieve your results. The results endpoint returns result_available: true when the output is ready. If result_available is false, continue polling and retry the request.
Create a model access key and save it for use with the API.
Send a GET request to https://inference.do-ai.run/v1/batches/{batch_id}/results.
Using cURL:
curl -sS -X GET "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21/results" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" | jqUsing PyDo, the official DigitalOcean API client for Python:
import os
from pathlib import Path
import requests
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
batch_id = os.environ["BATCH_ID"]
links = client.batches.results.retrieve(batch_id)
if not links.get("result_available"):
print("results not ready yet; poll batch status and retry")
raise SystemExit(0)
resp = requests.get(links["output_file_url"], timeout=60)
resp.raise_for_status()
out = Path("batch_output.jsonl")
out.write_bytes(resp.content)
print("wrote:", out)
print("----- preview -----")
print(resp.text[:500])The response looks like the following:
{
"batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
"error_file_url": "string",
"expires_at": "2026-04-24T20:19:19Z",
"output_file_url": "https://batch-inference.nyc3.digitaloceanspaces.com/outputs/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6bxxxx.jsonl?X-Amz-Signature=...",
"result_available": true
}Each call to the results endpoint returns new presigned URLs that are short-lived, so download the files soon after fetching them. The error_file_url is only present when an error file was generated for the job.
The output file is a JSONL file where each line corresponds to one request from your input. Each line includes the following fields and values:
{"custom_id": "req-1", "response": {"id": "chatcmpl-...", "choices": [{"message": {"role": "assistant", "content": "Summary: ..."}}], "usage": {"prompt_tokens": 312, "completion_tokens": 89}}, "error": null}
{"custom_id": "req-2", "response": null, "error": {"code": "context_length_exceeded", "message": "Request exceeded maximum context length."}}Requests that failed are written to a separate error file that has the following fields and values:
{"custom_id": "req-2", "error": {"code": "context_length_exceeded", "message": "Request exceeded maximum context length."}}
{"custom_id": "req-47", "error": {"code": "content_policy_violation", "message": "Request was blocked by content moderation."}}Cancel a Batch Job
You can cancel a batch job at any time before it reaches a terminal state. Results for requests that were already completed before cancellation are preserved and billed. Incomplete requests are not billed.
Create a model access key and save it for use with the API.
Send a POST request to https://inference.do-ai.run/v1/batches/{batch_id}/cancel.
Using cURL:
curl -sS -X POST "https://inference.do-ai.run/v1/batches/0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21/cancel" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" | jqUsing PyDo, the official DigitalOcean API client for Python:
import os
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
result = client.batches.cancel(os.environ["BATCH_ID"])
print("batch_id: ", result.get("batch_id"))
print("status: ", result.get("status"))
print("cancelled_at:", result.get("cancelled_at"))The job transitions to a cancelling status immediately. Because both OpenAI and Anthropic process cancellations asynchronously, the job remains in cancelling until the provider confirms the final state, at which point it transitions to cancelled. Continue polling until you see the cancelled status. The response looks like the following:
{
"batch_id": "0e9d1d35-3d1e-4d66-9a2f-8c7e0f6b3e21",
...
"provider": "openai",
"request_counts": {
"completed": 0,
"failed": 0,
"total": 10000
},
"request_id": "c7e3ad1e-20c3-4e47-9bf2-6f2a4d6a2f11",
"status": "cancelling"
}The cancel endpoint can return 409 Conflict in two cases:
- The job is already in a terminal state (
completed,failed,expired, orcancelled). - The job has not yet been submitted to the upstream provider, so there is no provider-side batch to cancel. Wait until the job moves out of
validatingand try again.
List Batch Jobs
To retrieve a list of all batch jobs, use one of the following. The endpoint supports pagination through the limit query parameter and an optional status filter (for example, completed, failed, in_progress, validating, queued, cancelling, cancelled, or expired).
Create a model access key and save it for use with the API.
Send a GET request to https://inference.do-ai.run/v1/batches.
Using cURL:
curl -sS -X GET "https://inference.do-ai.run/v1/batches?limit=20" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-H "Content-Type: application/json" | jqUsing PyDo, the official DigitalOcean API client for Python:
import os
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
resp = client.batches.list(limit=20)
for b in resp.get("data") or []:
print(f"{b.get('batch_id'):40} {b.get('status'):12} {b.get('created_at')}")
print("has_more:", resp.get("has_more"))
print("last_id: ", resp.get("last_id"))Troubleshooting
| Symptom | Likely cause |
|---|---|
403 on batch routes |
Batch inference is not enabled for the team or account. |
400 on file_name |
Missing .jsonl extension on the file name. |
400 on create batch |
file_id is not a UUID, completion_window is invalid, endpoint is missing for an OpenAI batch, or endpoint was set on an Anthropic batch. |
409 “upload first” or “not ready” on create batch |
The JSONL file was not uploaded to upload_url, or the upload URL expired before the PUT completed. |
429 on create batch |
Batch create rate limit or daily batch request limit. |
| Empty or missing results | The job is not yet complete, or result_available is false. Continue polling and retry after the job reaches a terminal state. |
Full Python Example
The following is an end-to-end example showing how to create and run a batch inference job: