Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
samsam55 's Collections
Reinforcement Learning Etc..
Datasets
Self Improving
Run on CPU Optimizations
Deep Search
World View Creation (out painting 3D)
Computer Use
Coding LLMs
Visual Multi Modal LLM
TTS & Speech to Text
Misc
Agents
3D Models & Modeling

Visual Multi Modal LLM

updated 25 days ago
Upvote
-

  • NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

    Paper • 2510.08565 • Published Oct 9 • 19

  • Detect Anything via Next Point Prediction

    Paper • 2510.12798 • Published Oct 14 • 46

  • PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

    Paper • 2510.14528 • Published Oct 16 • 103

  • DeepEyesV2: Toward Agentic Multimodal Model

    Paper • 2511.05271 • Published 30 days ago • 42
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs