curl --request POST \
  --url https://easy-peasy.ai/api/generate-talking-video \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "image": "https://example.com/portrait.jpg",
  "text": "Hello! Welcome to our product demo.",
  "voiceID": "21m00Tcm4TlvDq8ikWAM",
  "avatarModel": "premium",
  "resolution": "720p"
}
'

{
  "id": 12345,
  "prompt": "Hello! Welcome to our product demo.",
  "image_url": "",
  "is_video": true,
  "created_at": "2025-01-15T10:30:00.000Z"
}

Video Generation

Generate Talking Video

Generate a talking video by animating a face image or video with speech. Provide either text + voice to generate audio, or supply your own audio file.

Input combinations:

Image + text + voice — Generates speech from text using the specified voice, then animates the face in the image
Image + audio — Uses the provided audio to animate the face in the image
Video + text + voice — Generates speech from text, then lip-syncs the video
Video + audio — Lip-syncs the video with the provided audio

Models:

premium — High-quality avatar generation (VEED Fabric). Supports 480p and 720p.
standard — Faster generation (WaveSpeed InfiniteTalk). Supports 480p and 720p.

Video generation is asynchronous. Use the Get Video endpoint to poll for results.

Note: Requires a paid plan. Image must be at least 512x512 pixels. Audio/video max 5 minutes.

POST

api

generate-talking-video

curl --request POST \
  --url https://easy-peasy.ai/api/generate-talking-video \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "image": "https://example.com/portrait.jpg",
  "text": "Hello! Welcome to our product demo.",
  "voiceID": "21m00Tcm4TlvDq8ikWAM",
  "avatarModel": "premium",
  "resolution": "720p"
}
'

{
  "id": 12345,
  "prompt": "Hello! Welcome to our product demo.",
  "image_url": "",
  "is_video": true,
  "created_at": "2025-01-15T10:30:00.000Z"
}

Workflow

Submit a talking video request using this endpoint
Save the id from the response
Poll the Get Video endpoint every 15–30 seconds until status is completed
Download the video from the url field

Talking video generation requires a paid plan. Processing typically takes 1–5 minutes depending on audio length and resolution.

Input combinations

Input	Audio Source	Description
`image`	`text` + `voiceID`	Generates speech, then animates the face
`image`	`audio`	Animates the face with provided audio
`video`	`text` + `voiceID`	Generates speech, then lip-syncs the video
`video`	`audio`	Lip-syncs the video with provided audio

Voice IDs

Use the Get TTS Voices endpoint to discover available voice IDs. Both ElevenLabs and OpenAI voices are supported.

Requirements

Image: minimum 512x512 pixels
Video: .mp4 or .mov format, 3–300 seconds
Audio: max 5 minutes

Authorizations

x-api-key

string

header

required

API key for authentication. Get yours at https://easy-peasy.ai/settings/api

Headers

x-api-key

string

required

Your API key

Body

application/json

image

string<uri>

URL of a face image to animate. Must be at least 512x512 pixels. Provide either image or video.

video

string<uri>

URL of a video to lip-sync. Supported formats: .mp4, .mov. Duration: 3–300 seconds. Provide either image or video.

text

string

Text to convert to speech. Required if audio is not provided.

voiceID

string

Voice ID for text-to-speech. Get available voices from the Get TTS Voices endpoint. Required if text is provided and audio is not.

audio

string<uri>

URL of an audio file to use directly (instead of generating from text). Max 5 minutes.

avatarModel

enum<string>

default:premium

Avatar generation model. premium uses VEED Fabric (higher quality), standard uses WaveSpeed InfiniteTalk (faster). Only applies to image input.

Available options:

premium,

standard

resolution

enum<string>

default:480p

Output video resolution.

Available options:

480p,

720p

generateCaptions

boolean

Whether to generate captions on the video.

captionColor

string

Highlight color for captions (hex code).

Response

Talking video generation started

integer

Video ID. Use this to poll for the result with the Get Video endpoint.

prompt

string

The prompt used for generation

image_url

string

Video URL. Empty string while processing.

model

string

The model used for generation

is_video

boolean

created_at

string<date-time>

Generate Video Get Video

​Workflow

​Input combinations

​Voice IDs

​Requirements

Authorizations

Headers

Body

Response

Workflow

Input combinations

Voice IDs

Requirements