Looking for a Blueprint for AI Search

EroStefano · December 3, 2025, 12:29pm

Hi everyone,

I’m building an AI Search system where a user types a query, and the system performs a similarity check against a document corpus. While working on the initialization, I realized that the query and documents could benefit from preprocessing, optimization, and careful handling before performing similarity computations.

Instead of figuring out all the details myself, I’m wondering if there’s a blueprint, best-practice guide, or reference implementation for building an end-to-end AI Search pipeline — from query/document preprocessing to embedding, indexing, and retrieval.

Any guidance, references, or examples would be greatly appreciated.

John6666 · December 3, 2025, 3:12pm

Related resources seem to be relatively easy to find.

EroStefano · December 4, 2025, 11:48am

Thank you very much for your precise blueprint.

If I may offer some feedback: the documentation as a whole is somewhat confusing. It contains repeated information at different levels of detail, which makes it difficult to follow. Even the numbering is inconsistent. While the content is very helpful, the way it is presented makes it nearly impossible to use effectively. Even when I ask an LLM for help, it remains confusing, which makes it essentially unusable.

John6666 · December 4, 2025, 5:23pm

Sorry. This is a resource collection prioritizing redundancy while essentially ignoring readability. It’s intended to be used as part of the clues fed to an LLM (RAG). If prioritizing human readability, would it look something like this? https://huggingface.co/datasets/John6666/forum3/blob/main/ai_search_blueprint_1r.md

If you dislike AI-generated documents, just ignore it…

EroStefano · December 4, 2025, 7:51pm

Thank you for updating it. This is much better now!

aaraya · December 4, 2025, 10:19pm

Hi @EroStefano, a while back we wrote a blog post about how to tackle this problem of contextual searches using a document corpus, embeddings and other techniques; you can find it here

Adrian Araya
Machine Learning Engineer at RidgeRun.ai
Contact us: [email protected]

EroStefano · December 5, 2025, 2:47pm

Thank you, it looks great!

Topic		Replies	Views
Language model to search an answer in a huge collection of (unrelated) paragraphs Research	4	1539	July 6, 2021
Image neural search 🤗 Course Projects	2	732	November 15, 2021
Which model to use for suggesting article to the user based on details provided? Beginners	7	1890	May 28, 2021
Find document by keyword? Beginners	1	420	July 28, 2022
Use of Dataset Library as Retrieval System in QnA for Production Grade solution 🤗Datasets	1	333	October 2, 2022

Looking for a Blueprint for AI Search

Related topics