Visual Multi Modal LLM - a samsam55 Collection

samsam55 's Collections

Reinforcement Learning Etc..

Run on CPU Optimizations

World View Creation (out painting 3D)

Visual Multi Modal LLM

TTS & Speech to Text

Misc

Agents

3D Models & Modeling

Visual Multi Modal LLM

updated 25 days ago

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9 • 19
Detect Anything via Next Point Prediction

Paper • 2510.12798 • Published Oct 14 • 46
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 103
DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published 30 days ago • 42