--- title: OmniAvatar-14B Video Generation emoji: 🎬 colorFrom: blue colorTo: purple sdk: gradio sdk_version: "4.44.1" app_file: app.py pinned: false suggested_hardware: "a10g-small" suggested_storage: "large" short_description: Avatar video generation with adaptive body animation models: - OmniAvatar/OmniAvatar-14B - Wan-AI/Wan2.1-T2V-14B - facebook/wav2vec2-base-960h tags: - avatar-generation - video-generation - text-to-video - audio-driven-animation - lip-sync - body-animation preload_from_hub: - OmniAvatar/OmniAvatar-14B - facebook/wav2vec2-base-960h --- # 🎬 OmniAvatar-14B: Avatar Video Generation with Adaptive Body Animation **This is a VIDEO GENERATION application that creates animated avatar videos, not just audio!** ## 🎯 What This Application Does ### **PRIMARY FUNCTION: Avatar Video Generation** - ✅ **Generates 480p MP4 videos** of animated avatars - ✅ **Audio-driven lip-sync** with precise mouth movements - ✅ **Adaptive body animation** that responds to speech content - ✅ **Reference image support** for character consistency - ✅ **Prompt-controlled behavior** for specific actions and expressions ### **Input → Output:** ``` Text Prompt + Audio/TTS → MP4 Avatar Video (480p, 25fps) ``` **Example:** - **Input**: "A professional teacher explaining mathematics" + "Hello students, today we'll learn calculus" - **Output**: MP4 video of an avatar teacher with lip-sync and teaching gestures ## 🚀 Quick Start - Video Generation ### **1. Generate Avatar Videos** - **Web Interface**: Use the Gradio interface above - **API Endpoint**: Available at `/generate` ### **2. Model Requirements** This application requires large models (~30GB) for video generation: - **Wan2.1-T2V-14B**: Base text-to-video model (~28GB) - **OmniAvatar-14B**: Avatar animation weights (~2GB) - **wav2vec2-base-960h**: Audio encoder (~360MB) *Note: Models will be automatically downloaded on first use* ## 🎬 Video Generation Examples ### **Web Interface Usage:** 1. **Enter character description**: "A friendly news anchor delivering breaking news" 2. **Provide speech text**: "Good evening, this is your news update" 3. **Select voice profile**: Choose from available options 4. **Generate**: Click to create your avatar video ### **Expected Output:** - **Format**: MP4 video file - **Resolution**: 480p (854x480) - **Frame Rate**: 25fps - **Duration**: Matches audio length (up to 30 seconds) - **Features**: Lip-sync, body animation, realistic movements ## 🎯 Prompt Engineering for Videos ### **Effective Prompt Structure:** ``` [Character Description] + [Behavior/Action] + [Setting/Context] ``` ### **Examples:** - `"A professional doctor explaining medical procedures with gentle hand gestures - white coat - modern clinic"` - `"An energetic fitness instructor demonstrating exercises - athletic wear - gym environment"` - `"A calm therapist providing advice with empathetic expressions - cozy office setting"` ### **Tips for Better Videos:** 1. **Be specific about appearance** - clothing, hair, age, etc. 2. **Include desired actions** - gesturing, pointing, demonstrating 3. **Specify the setting** - office, classroom, studio, outdoor 4. **Mention emotion/tone** - confident, friendly, professional, energetic ## ⚙️ Configuration ### **Video Quality Settings:** - **Guidance Scale**: Controls prompt adherence (4-6 recommended) - **Audio Scale**: Controls lip-sync strength (3-5 recommended) - **Steps**: Quality vs speed trade-off (20-50 steps) ### **Performance:** - **GPU Accelerated**: Optimized for A10G hardware - **Generation Time**: ~30-60 seconds per video - **Quality**: Professional 480p output with smooth animation ## 🔧 Technical Details ### **Model Architecture:** - **Base**: Wan2.1-T2V-14B for text-to-video generation - **Avatar**: OmniAvatar-14B LoRA weights for character animation - **Audio**: wav2vec2-base-960h for speech feature extraction ### **Capabilities:** - Audio-driven facial animation with precise lip-sync - Adaptive body gestures based on speech content - Character consistency with reference images - High-quality 480p video output at 25fps ## 💡 Important Notes ### **This is a VIDEO Generation Application:** - 🎬 **Primary Output**: MP4 avatar videos with animation - 🎤 **Audio Input**: Text-to-speech or direct audio files - 🎯 **Core Feature**: Adaptive body animation synchronized with speech - ✨ **Advanced**: Reference image support for character consistency ## 🔗 References - **OmniAvatar Paper**: [arXiv:2506.18866](https://arxiv.org/abs/2506.18866) - **Model Hub**: [OmniAvatar/OmniAvatar-14B](https://huggingface.co/OmniAvatar/OmniAvatar-14B) - **Base Model**: [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) --- **🎬 This application creates AVATAR VIDEOS with adaptive body animation - professional quality video generation!**