Give Feedback

How to Use fal Models to Generate Image, Audio, or Text-to-Speech

Validated on 28 Apr 2026 • Last edited on 8 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

The following examples show how to generate an image or audio clip, or use text-to-speech with fal models with the /v1/async-invoke endpoint.

Generate Image

The following example sends a request to generate an image using the fal-ai/fast-sdxl model.

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/flux/schnell",
    "input": { "prompt": "A futuristic city at sunset" }
  }'

You can update the image generation request to also include the output format, number of inference steps, guidance scale, number of images to generate, and safety checker option:

curl -X POST https://inference.do-ai.run/v1/async-invoke \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/fast-sdxl",
    "input": {
      "prompt": "A futuristic cityscape at sunset, with flying cars and towering skyscrapers.",
      "output_format": "landscape_4_3",
      "num_inference_steps": 4,
      "guidance_scale": 3.5,
      "num_images": 1,
      "enable_safety_checker": true
    },
    "tags": [
      {"key": "type", "value": "test"} 
    ]
}'

Generate Audio

The following example sends a request to generate a 60 second audio clip using the fal-ai/stable-audio-25/text-to-audio model:

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/stable-audio-25/text-to-audio",
    "input": {
      "prompt": "Techno song with futuristic sounds",
      "seconds_total": 60
    },
    "tags": [
      { "key": "type", "value": "test" }
    ]
  }'

Use Text-to-Speech

The following example sends a request to generate text-to-speech audio using the fal-ai/multilingual-tts-v2 model:

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/elevenlabs/tts/multilingual-v2",
    "input": {
      "text": "This text-to-speech example uses DigitalOcean multilingual voice."
    },
    "tags": [
      { "key": "type", "value": "test" }
    ]
  }'

When you send a request to the /v1/async-invoke endpoint, it starts an asynchronous job for the image, audio, or text-to-speech generation and returns a request_id. The job status is QUEUED initially and the response looks similar to the following:

"completed_at": null,
"created_at": "2026-01-22T19:19:19.112403432Z",
"error": null,
"model_id": "fal-ai/fast-sdxl",
"output": null,
"request_id": "6590784a-ce47-4556-9ff4-53baff2693fb",
"started_at": null,
"status": "QUEUED"

Query the status endpoint frequently using the request_id to check the progress of the job:

curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>/status" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY"

When the job completes, the status updates to COMPLETE. You can then use the /async-invoke/<request_id> endpoint to fetch the complete generated result:

curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY"

The response includes a URL to the generated image, audio, or text-to-speech file, which you can download or open directly in your browser or app:

{
...
        "images": [
            {
                "content_type": "image/jpeg",
                "height": 768,
                "url": "https://v3b.fal.media/files/b/0a8b7281/HpQcEqkz-xy2ZI5do9Lyp.jpg",
                "width": 1024
            }
        ...
    },
    "request_id": "6f76e8f7-f6b4-4e20-ab9a-ca0f01a9d2f4",
    "started_at": null,
    "status": "COMPLETED"
}

How to Use fal Models to Generate Image, Audio, or Text-to-Speech

We can't find any results for your search.