view article Article We’re open-sourcing our text-to-image model and the process behind it 24 days ago • 73
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Paper • 2411.19331 • Published Nov 28, 2024 • 5
facebook/dinov2-with-registers-small Image Feature Extraction • 22.1M • Updated Dec 23, 2024 • 11.5k • 9
facebook/dinov2-with-registers-base Image Feature Extraction • 86.6M • Updated Dec 23, 2024 • 177k • 7
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 155