Merged ColGemma3 Model

This model is a merged version of multiple ColGemma3 models using the linear merging technique.

Source Models

  1. Nayana-cognitivelab/NayanaEmbed-ColGemma3-Modal-1848-colbert
  2. Nayana-cognitivelab/NayanaEmbed-ColGemma3-MultiGPU-merged-1610-22-colbert

Merge Method: LINEAR

Linear interpolation: Weighted average of model parameters.

Model Architecture

ColGemma3 is a vision-language model for late interaction retrieval:

  • Base: Gemma3 vision-language model
  • Vision Encoder: Processes images into patch embeddings
  • Custom Projection: Projects embeddings to 128 dimensions
  • Retrieval: Uses MaxSim scoring for multi-vector retrieval

Usage

from colpali_engine.models.gemma3.colgemma3 import ColGemma3, ColGemmaProcessor3
from PIL import Image
import torch

# Load model and processor
model = ColGemma3.from_pretrained("Nayana-cognitivelab/NayanaEmbed-ColGemma3-Merge-Colbert-base-nayana-linear", torch_dtype=torch.bfloat16, device_map="auto")
processor = ColGemmaProcessor3.from_pretrained("Nayana-cognitivelab/NayanaEmbed-ColGemma3-Merge-Colbert-base-nayana-linear")

# Process images
images = [Image.open("document.png")]
batch_images = processor.process_images(images).to(model.device)

# Process queries
queries = ["What is this document about?"]
batch_queries = processor.process_queries(queries).to(model.device)

# Generate embeddings
with torch.no_grad():
    img_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

# Compute similarity scores
scores = processor.score([query_embeddings[0]], [img_embeddings[0]])

Citation

If you use this model, please cite the original ColGemma3 work and the source models.


This model was automatically merged using Modal infrastructure.

Downloads last month
19
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support