How to Use fal Models to Generate Image, Audio, or Text-to-Speech
Validated on 27 Apr 2026 • Last edited on 27 Apr 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
The following examples show how to generate an image or audio clip, or use text-to-speech with fal models with the /v1/async-invoke endpoint.
The following example sends a request to generate an image using the fal-ai/fast-sdxl model.
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/flux/schnell",
"input": { "prompt": "A futuristic city at sunset" }
}'You can update the image generation request to also include the output format, number of inference steps, guidance scale, number of images to generate, and safety checker option:
curl -X POST https://inference.do-ai.run/v1/async-invoke \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/fast-sdxl",
"input": {
"prompt": "A futuristic cityscape at sunset, with flying cars and towering skyscrapers.",
"output_format": "landscape_4_3",
"num_inference_steps": 4,
"guidance_scale": 3.5,
"num_images": 1,
"enable_safety_checker": true
},
"tags": [
{"key": "type", "value": "test"}
]
}'The following example sends a request to generate a 60 second audio clip using the fal-ai/stable-audio-25/text-to-audio model:
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/stable-audio-25/text-to-audio",
"input": {
"prompt": "Techno song with futuristic sounds",
"seconds_total": 60
},
"tags": [
{ "key": "type", "value": "test" }
]
}'The following example sends a request to generate text-to-speech audio using the fal-ai/multilingual-tts-v2 model:
curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "fal-ai/elevenlabs/tts/multilingual-v2",
"input": {
"text": "This text-to-speech example uses DigitalOcean multilingual voice."
},
"tags": [
{ "key": "type", "value": "test" }
]
}'When you send a request to the /v1/async-invoke endpoint, it starts an asynchronous job for the image, audio, or text-to-speech generation and returns a request_id. The job status is QUEUED initially and the response looks similar to the following:
"completed_at": null,
"created_at": "2026-01-22T19:19:19.112403432Z",
"error": null,
"model_id": "fal-ai/fast-sdxl",
"output": null,
"request_id": "6590784a-ce47-4556-9ff4-53baff2693fb",
"started_at": null,
"status": "QUEUED"Query the status endpoint frequently using the request_id to check the progress of the job:
curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>/status" \
-H "Authorization: Bearer $MODEL_ACCESS_KEY"
When the job completes, the status updates to COMPLETE. You can then use the /async-invoke/<request_id> endpoint to fetch the complete generated result:
curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>" \
-H "Authorization: Bearer $MODEL_ACCESS_KEY"
The response includes a URL to the generated image, audio, or text-to-speech file, which you can download or open directly in your browser or app:
{
...
"images": [
{
"content_type": "image/jpeg",
"height": 768,
"url": "https://v3b.fal.media/files/b/0a8b7281/HpQcEqkz-xy2ZI5do9Lyp.jpg",
"width": 1024
}
...
},
"request_id": "6f76e8f7-f6b4-4e20-ab9a-ca0f01a9d2f4",
"started_at": null,
"status": "COMPLETED"
}