How to Use fal Models to Generate Image, Audio, or Text-to-Speech
Validated on 10 Apr 2026 • Last edited on 16 Apr 2026
DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.
The following examples show how to generate an image or audio clip, or use text-to-speech with fal models with the /v1/async-invoke endpoint.
The following example sends a request to generate an image using the fal-ai/fast-sdxl model.
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/flux/schnell",
"input": { "prompt": "A futuristic city at sunset" }
}'You can update the image generation request to also include the output format, number of inference steps, guidance scale, number of images to generate, and safety checker option:
curl -X POST https://inference.do-ai.run/v1/async-invoke \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/fast-sdxl",
"input": {
"prompt": "A futuristic cityscape at sunset, with flying cars and towering skyscrapers.",
"output_format": "landscape_4_3",
"num_inference_steps": 4,
"guidance_scale": 3.5,
"num_images": 1,
"enable_safety_checker": true
},
"tags": [
{"key": "type", "value": "test"}
]
}'The following example sends a request to generate a 60 second audio clip using the fal-ai/stable-audio-25/text-to-audio model:
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/stable-audio-25/text-to-audio",
"input": {
"prompt": "Techno song with futuristic sounds",
"seconds_total": 60
},
"tags": [
{ "key": "type", "value": "test" }
]
}'The following example sends a request to generate text-to-speech audio using the fal-ai/multilingual-tts-v2 model:
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/elevenlabs/tts/multilingual-v2",
"input": {
"text": "This text-to-speech example uses DigitalOcean multilingual voice."
},
"tags": [
{ "key": "type", "value": "test" }
]
}'When you send a request to the /v1/async-invoke endpoint, it starts an asynchronous job for the image, audio, or text-to-speech generation and returns a request_id. The job status is QUEUED initially and the response looks similar to the following:
"completed_at": null,
"created_at": "2026-01-22T19:19:19.112403432Z",
"error": null,
"model_id": "fal-ai/fast-sdxl",
"output": null,
"request_id": "6590784a-ce47-4556-9ff4-53baff2693fb",
"started_at": null,
"status": "QUEUED"Query the status endpoint frequently using the request_id to check the progress of the job:
curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>/status" \
-H "Authorization: Bearer $MODEL_ACCESS_KEY"
When the job completes, the status updates to COMPLETE. You can then use the /async-invoke/<request_id> endpoint to fetch the complete generated result:
curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>" \
-H "Authorization: Bearer $MODEL_ACCESS_KEY"
The response includes a URL to the generated image, audio, or text-to-speech file, which you can download or open directly in your browser or app:
{
...
"images": [
{
"content_type": "image/jpeg",
"height": 768,
"url": "https://v3b.fal.media/files/b/0a8b7281/HpQcEqkz-xy2ZI5do9Lyp.jpg",
"width": 1024
}
...
},
"request_id": "6f76e8f7-f6b4-4e20-ab9a-ca0f01a9d2f4",
"started_at": null,
"status": "COMPLETED"
}