SINC-V1-SIGLIP2-KEE-SPEED-Small: Multimodal Product Classifier
Shopify Image Niche Classifier (V1-small) : A high-performance multimodal product classification model that combines text (product titles) and images to classify products into 14 different categories.
Model Performance
- Test Accuracy: 92.46%
- Test Macro F1-Score: 0.73
- Test Loss: 0.2450
- Test Set Size: 59,789 samples
Per-Class Performance
| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Fashion | 0.97 | 0.99 | 0.98 | 35,880 |
| Jewelry | 0.95 | 0.97 | 0.96 | 2,474 |
| Baby & Kids | 0.91 | 0.88 | 0.90 | 2,511 |
| Consumer Electronics | 0.91 | 0.83 | 0.87 | 1,599 |
| Lights | 0.90 | 0.92 | 0.91 | 3,504 |
| Others | 0.90 | 0.68 | 0.77 | 1,930 |
| Beauty & Personal Care | 0.85 | 0.85 | 0.85 | 1,354 |
| Home & Interior | 0.81 | 0.89 | 0.85 | 6,006 |
| Sports & Fitness Equipment | 0.80 | 0.61 | 0.69 | 727 |
| Outdoor, Garden & Adventure Gear | 0.79 | 0.72 | 0.76 | 1,962 |
| Health & Supplements | 0.74 | 0.56 | 0.64 | 1,240 |
| Hobbies & Collectibles | 0.64 | 0.57 | 0.61 | 435 |
| Office Supplies | 0.51 | 0.45 | 0.48 | 153 |
| Food & Beverages | 0.00 | 0.00 | 0.00 | 14 |
Model Architecture
- Text Encoder: microsoft/deberta-v3-small
- Image Encoder: google/siglip2-base-patch16-256
- Fusion: Concatenation + MLP (512 hidden units)
- Output: 14-class classifier
Usage
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer, AutoProcessor, SiglipModel
import torch
from PIL import Image
import json
from model_arch import MultimodalClassifier
# Download model and config from Hugging Face
repo_id = "manavbangotra/sinc-v1-siglip2-kee-speed-small" # Replace with your repo
model_path = hf_hub_download(repo_id=repo_id, filename="best_multimodal.pt")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
# Load configuration
with open(config_path) as f:
config = json.load(f)
labels = config["labels"]
num_labels = config["num_labels"]
# Load model components
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
processor = AutoProcessor.from_pretrained("google/siglip2-base-patch16-256")
clip_model = SiglipModel.from_pretrained("google/siglip2-base-patch16-256")
model = MultimodalClassifier("microsoft/deberta-v3-small", clip_model, num_labels, text_finetune=True, clip_finetune=False)
#Use cuda if available
model.load_state_dict(torch.load(model_path, map_location="cpu"))
# Example usage
text = "Elegant summer dress with floral pattern"
image = Image.open("product_image.jpg")
text_inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
image_inputs = processor(images=image, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(
input_ids=text_inputs["input_ids"],
attention_mask=text_inputs["attention_mask"],
pixel_values=image_inputs["pixel_values"]
)
predictions = torch.softmax(outputs["logits"], dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = predictions[0][predicted_class].item()
print(f"Predicted: {labels[predicted_class]} ({confidence:.2%})")
Categories
The model classifies products into these 14 categories:
- Baby & Kids
- Beauty & Personal Care
- Consumer Electronics
- Fashion
- Food & Beverages
- Health & Supplements
- Hobbies & Collectibles
- Home & Interior
- Jewelry
- Lights
- Office Supplies
- Others
- Outdoor, Garden & Adventure Gear
- Sports & Fitness Equipment
Training Details
- Training Data: E-commerce product dataset with titles and images
- Training Strategy: Fine-tuned text encoder, frozen image encoder
- Optimizer: AdamW with linear warmup scheduler
- Batch Size: 8
- Learning Rate: 2e-5
- Epochs: 3
- Max Text Length: 64 tokens
Limitations
- Performance varies across categories (especially small classes)
- Requires mandatory image input.
- Trained on specific Shopify product domain
Citation
@misc{sinc-v1-siglip2-kee-speed-small,
title={SINC-V1-SIGLIP2-KEE-SPEED-small: High-Performance Multimodal Product Classifier},
author={Manav Bangotra},
year={2025},
url={https://huggingface.co/manavbangotra/sinc-v1-siglip2-kee-speed-small}
}
License
MIT License
- Downloads last month
- 7