What is the best text embedding model for ecommerce product search (short, noisy user queries)?

I am integrating a vector-based semantic search system into a B2B ecommerce platform’s product search, and I want to select the right text embedding model.

Use Case

User queries are often:

  • Very short (1–4 words)

  • Ungrammatical

  • Misspelled

  • Contain specifications or abbreviations (e.g., “m12 nut”, “2hp pump”, “ss tank 1000l”)

  • Contain domain-specific technical terms

Each product has:

  • Title

  • Attribute fields (e.g., Material=SS, Voltage=220V)

  • Description text

I need embeddings that capture semantic meaning across these fields and match them with noisy, spec-heavy queries.

Constraints / Setup

  • English-only

  • Running on GPU (model size not a constraint)

  • Throughput: ~100 queries per second

  • Retrieval backend not yet decided but most likely Vespa

  • Fine-tuning will come later — I first need a strong base embedding model

Questions

  1. Which open-source embedding models work best out of the box for ecommerce/product search?

  2. Are there any models that are trained or tuned specifically for ecommerce data?

  3. Should I embed (title + attributes + description) concatenated as a single document, or embed fields separately and combine?

Example queries

  • “2hp motor pump”

  • “ss nut m12”

  • “isi water tank 1000l”

  • “sewing macine” (misspelled)

Any guidance or practical experience with embedding models for ecommerce search would be appreciated.

1 Like

I looked around for now. There might not be a definitive model.

1 Like

Thanks a lot for such an informative reply!

1 Like