AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper • 2511.23475 • Published 13 days ago • 41
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26 • 29
Saire2023/wav2vec2-base-finetuned-Speaker-Classification Audio Classification • 94.6M • Updated Apr 16, 2024 • 11 • 2
harshit345/xlsr-wav2vec-speech-emotion-recognition Audio Classification • Updated Dec 12, 2021 • 644 • 62
ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition Audio Classification • 0.3B • Updated Oct 24, 2024 • 37.9k • 236
speechbrain/emotion-recognition-wav2vec2-IEMOCAP Audio Classification • Updated Jul 23, 2024 • 701k • 165