---
title: OmniAvatar-14B Video Generation
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.1"
app_file: app.py
pinned: false
suggested_hardware: "a10g-small"
suggested_storage: "large"
short_description: Avatar video generation with adaptive body animation
models:
- OmniAvatar/OmniAvatar-14B
- Wan-AI/Wan2.1-T2V-14B
- facebook/wav2vec2-base-960h
tags:
- avatar-generation
- video-generation
- text-to-video
- audio-driven-animation
- lip-sync
- body-animation
preload_from_hub:
- OmniAvatar/OmniAvatar-14B
- facebook/wav2vec2-base-960h
---

# 🎬 OmniAvatar-14B: Avatar Video Generation with Adaptive Body Animation

**This is a VIDEO GENERATION application that creates animated avatar videos, not just audio!**

## 🎯 What This Application Does

### **PRIMARY FUNCTION: Avatar Video Generation**
- ✅ **Generates 480p MP4 videos** of animated avatars
- ✅ **Audio-driven lip-sync** with precise mouth movements  
- ✅ **Adaptive body animation** that responds to speech content
- ✅ **Reference image support** for character consistency
- ✅ **Prompt-controlled behavior** for specific actions and expressions

### **Input → Output:**
```
Text Prompt + Audio/TTS → MP4 Avatar Video (480p, 25fps)
```

**Example:**
- **Input**: "A professional teacher explaining mathematics" + "Hello students, today we'll learn calculus"
- **Output**: MP4 video of an avatar teacher with lip-sync and teaching gestures

## 🚀 Quick Start - Video Generation

### **1. Generate Avatar Videos**
- **Web Interface**: Use the Gradio interface above
- **API Endpoint**: Available at `/generate`

### **2. Model Requirements**
This application requires large models (~30GB) for video generation:
- **Wan2.1-T2V-14B**: Base text-to-video model (~28GB)
- **OmniAvatar-14B**: Avatar animation weights (~2GB)  
- **wav2vec2-base-960h**: Audio encoder (~360MB)

*Note: Models will be automatically downloaded on first use*

## 🎬 Video Generation Examples

### **Web Interface Usage:**
1. **Enter character description**: "A friendly news anchor delivering breaking news"
2. **Provide speech text**: "Good evening, this is your news update"
3. **Select voice profile**: Choose from available options
4. **Generate**: Click to create your avatar video

### **Expected Output:**
- **Format**: MP4 video file
- **Resolution**: 480p (854x480)
- **Frame Rate**: 25fps
- **Duration**: Matches audio length (up to 30 seconds)
- **Features**: Lip-sync, body animation, realistic movements

## 🎯 Prompt Engineering for Videos

### **Effective Prompt Structure:**
```
[Character Description] + [Behavior/Action] + [Setting/Context]
```

### **Examples:**
- `"A professional doctor explaining medical procedures with gentle hand gestures - white coat - modern clinic"`
- `"An energetic fitness instructor demonstrating exercises - athletic wear - gym environment"`  
- `"A calm therapist providing advice with empathetic expressions - cozy office setting"`

### **Tips for Better Videos:**
1. **Be specific about appearance** - clothing, hair, age, etc.
2. **Include desired actions** - gesturing, pointing, demonstrating
3. **Specify the setting** - office, classroom, studio, outdoor
4. **Mention emotion/tone** - confident, friendly, professional, energetic

## ⚙️ Configuration

### **Video Quality Settings:**
- **Guidance Scale**: Controls prompt adherence (4-6 recommended)
- **Audio Scale**: Controls lip-sync strength (3-5 recommended) 
- **Steps**: Quality vs speed trade-off (20-50 steps)

### **Performance:**
- **GPU Accelerated**: Optimized for A10G hardware
- **Generation Time**: ~30-60 seconds per video
- **Quality**: Professional 480p output with smooth animation

## 🔧 Technical Details

### **Model Architecture:**
- **Base**: Wan2.1-T2V-14B for text-to-video generation
- **Avatar**: OmniAvatar-14B LoRA weights for character animation
- **Audio**: wav2vec2-base-960h for speech feature extraction

### **Capabilities:**
- Audio-driven facial animation with precise lip-sync
- Adaptive body gestures based on speech content
- Character consistency with reference images
- High-quality 480p video output at 25fps

## 💡 Important Notes

### **This is a VIDEO Generation Application:**
- 🎬 **Primary Output**: MP4 avatar videos with animation
- 🎤 **Audio Input**: Text-to-speech or direct audio files
- 🎯 **Core Feature**: Adaptive body animation synchronized with speech
- ✨ **Advanced**: Reference image support for character consistency

## 🔗 References

- **OmniAvatar Paper**: [arXiv:2506.18866](https://arxiv.org/abs/2506.18866)
- **Model Hub**: [OmniAvatar/OmniAvatar-14B](https://huggingface.co/OmniAvatar/OmniAvatar-14B)
- **Base Model**: [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)

---

**🎬 This application creates AVATAR VIDEOS with adaptive body animation - professional quality video generation!**