DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 7 days ago • 146
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Paper • 2504.01901 • Published Apr 2
PairUni: Pairwise Training for Unified Multimodal Language Models Paper • 2510.25682 • Published Oct 29 • 13
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published about 1 month ago • 17
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published about 1 month ago • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving Paper • 2510.12796 • Published Oct 14 • 12
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21 • 36