Generate a talking video by animating a face image or video with speech. Provide either text + voice to generate audio, or supply your own audio file.
Input combinations:
Models:
premium — High-quality avatar generation (VEED Fabric). Supports 480p and 720p.standard — Faster generation (WaveSpeed InfiniteTalk). Supports 480p and 720p.Video generation is asynchronous. Use the Get Video endpoint to poll for results.
Note: Requires a paid plan. Image must be at least 512x512 pixels. Audio/video max 5 minutes.
id from the responsestatus is completedurl field| Input | Audio Source | Description |
|---|---|---|
image | text + voiceID | Generates speech, then animates the face |
image | audio | Animates the face with provided audio |
video | text + voiceID | Generates speech, then lip-syncs the video |
video | audio | Lip-syncs the video with provided audio |
API key for authentication. Get yours at https://easy-peasy.ai/settings/api
Your API key
URL of a face image to animate. Must be at least 512x512 pixels. Provide either image or video.
URL of a video to lip-sync. Supported formats: .mp4, .mov. Duration: 3–300 seconds. Provide either image or video.
Text to convert to speech. Required if audio is not provided.
Voice ID for text-to-speech. Get available voices from the Get TTS Voices endpoint. Required if text is provided and audio is not.
URL of an audio file to use directly (instead of generating from text). Max 5 minutes.
Avatar generation model. premium uses VEED Fabric (higher quality), standard uses WaveSpeed InfiniteTalk (faster). Only applies to image input.
premium, standard Output video resolution.
480p, 720p Whether to generate captions on the video.
Highlight color for captions (hex code).
Talking video generation started