doctl serverless-inference chat-completions create
Generated on 3 Jun 2026
from doctl version
v1.160.1
Usage
doctl serverless-inference chat-completions create [flags]Description
Creates a chat completion using the specified model. Use –model and –message for quick prompts, or –request for a full JSON body. Use –stream to receive tokens as they arrive via server-sent events.
Example
doctl inference chat-completions create --model llama3-8b-instruct --message "Hello"Flags
| Option | Description |
|---|---|
--help, -h |
Help for this command |
--max-tokens |
Maximum tokens to generate Default: 0 |
--message |
User message (required unless –request is set) |
--model, -m |
Model ID (required unless –request is set) |
--request |
Path to JSON request body. Use “-” for stdin. |
--stream |
Stream using server-sent events Default: false |
--system-message |
Optional system message |
--temperature |
Sampling temperature Default: 0 |
Related Commands
| Command | Description |
|---|---|
| doctl serverless-inference chat-completions | Display commands for creating chat completions |
Global Flags
| Option | Description |
|---|---|
--access-token, -t |
API V2 access token |
--api-url, -u |
Override default API endpoint |
--config, -c |
Specify a custom config file Default: |
--context |
Specify a custom authentication context name |
--http-retry-max |
Set maximum number of retries for requests that fail with a 429 or 500-level error
Default: 5 |
--http-retry-wait-max |
Set the minimum number of seconds to wait before retrying a failed request
Default: 30 |
--http-retry-wait-min |
Set the maximum number of seconds to wait before retrying a failed request
Default: 1 |
--interactive |
Enable interactive behavior. Defaults to true if the terminal supports it (default false)
Default: false |
--output, -o |
Desired output format [text|json] Default: text |
--trace |
Show a log of network activity while performing a command Default: false |
--verbose, -v |
Enable verbose output Default: false |