UI-TARS
🌖
33
Find click coordinates on images based on instructions
Track, rank and evaluate open LLMs and chatbots
Generate realistic audio from text
Generate text based on prompts
Generate a talking face video from an image and audio
Generate animated face images using a driving video
Generate a talking face video from text in multiple languages