-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2503.20215
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 48 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 53 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 140 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 53 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 140 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 48 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1