SINC-V1-SIGLIP2-KEE-SPEED-Small: Multimodal Product Classifier

Shopify Image Niche Classifier (V1-small) : A high-performance multimodal product classification model that combines text (product titles) and images to classify products into 14 different categories.

Model Performance

  • Test Accuracy: 92.46%
  • Test Macro F1-Score: 0.73
  • Test Loss: 0.2450
  • Test Set Size: 59,789 samples

Per-Class Performance

Category Precision Recall F1-Score Support
Fashion 0.97 0.99 0.98 35,880
Jewelry 0.95 0.97 0.96 2,474
Baby & Kids 0.91 0.88 0.90 2,511
Consumer Electronics 0.91 0.83 0.87 1,599
Lights 0.90 0.92 0.91 3,504
Others 0.90 0.68 0.77 1,930
Beauty & Personal Care 0.85 0.85 0.85 1,354
Home & Interior 0.81 0.89 0.85 6,006
Sports & Fitness Equipment 0.80 0.61 0.69 727
Outdoor, Garden & Adventure Gear 0.79 0.72 0.76 1,962
Health & Supplements 0.74 0.56 0.64 1,240
Hobbies & Collectibles 0.64 0.57 0.61 435
Office Supplies 0.51 0.45 0.48 153
Food & Beverages 0.00 0.00 0.00 14

Model Architecture

  • Text Encoder: microsoft/deberta-v3-small
  • Image Encoder: google/siglip2-base-patch16-256
  • Fusion: Concatenation + MLP (512 hidden units)
  • Output: 14-class classifier

Usage

from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer, AutoProcessor, SiglipModel
import torch
from PIL import Image
import json
from model_arch import MultimodalClassifier

# Download model and config from Hugging Face
repo_id = "manavbangotra/sinc-v1-siglip2-kee-speed-small"  # Replace with your repo
model_path = hf_hub_download(repo_id=repo_id, filename="best_multimodal.pt")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")

# Load configuration
with open(config_path) as f:
    config = json.load(f)

labels = config["labels"]
num_labels = config["num_labels"]

# Load model components
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
processor = AutoProcessor.from_pretrained("google/siglip2-base-patch16-256")
clip_model = SiglipModel.from_pretrained("google/siglip2-base-patch16-256")

model = MultimodalClassifier("microsoft/deberta-v3-small", clip_model, num_labels, text_finetune=True, clip_finetune=False)
#Use cuda if available
model.load_state_dict(torch.load(model_path, map_location="cpu"))

# Example usage
text = "Elegant summer dress with floral pattern"
image = Image.open("product_image.jpg")

text_inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
image_inputs = processor(images=image, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(
        input_ids=text_inputs["input_ids"],
        attention_mask=text_inputs["attention_mask"],
        pixel_values=image_inputs["pixel_values"]
    )
    predictions = torch.softmax(outputs["logits"], dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()
    confidence = predictions[0][predicted_class].item()

print(f"Predicted: {labels[predicted_class]} ({confidence:.2%})")

Categories

The model classifies products into these 14 categories:

  1. Baby & Kids
  2. Beauty & Personal Care
  3. Consumer Electronics
  4. Fashion
  5. Food & Beverages
  6. Health & Supplements
  7. Hobbies & Collectibles
  8. Home & Interior
  9. Jewelry
  10. Lights
  11. Office Supplies
  12. Others
  13. Outdoor, Garden & Adventure Gear
  14. Sports & Fitness Equipment

Training Details

  • Training Data: E-commerce product dataset with titles and images
  • Training Strategy: Fine-tuned text encoder, frozen image encoder
  • Optimizer: AdamW with linear warmup scheduler
  • Batch Size: 8
  • Learning Rate: 2e-5
  • Epochs: 3
  • Max Text Length: 64 tokens

Limitations

  • Performance varies across categories (especially small classes)
  • Requires mandatory image input.
  • Trained on specific Shopify product domain

Citation

@misc{sinc-v1-siglip2-kee-speed-small,
  title={SINC-V1-SIGLIP2-KEE-SPEED-small: High-Performance Multimodal Product Classifier},
  author={Manav Bangotra},
  year={2025},
  url={https://huggingface.co/manavbangotra/sinc-v1-siglip2-kee-speed-small}
}

License

MIT License

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support