- Typescript Daily
- Posts
- Build your own Internal search using AI (OpenAI, Vector Embeddings, ...)
Build your own Internal search using AI (OpenAI, Vector Embeddings, ...)
Make the LLMs search from your own data source and not from the open internet.
Welcome to our 144th edition!
🔥 Top Stories
Future of Web Development - 2024 and beyond!
🌟 Spotlight
Let’s talk about AI. Well, it has been all over the internet for more than a year now, so why not?
Every one of you knows about OpenAI and the revolution that started what is now known as the AI wave. Every so-called side hustler is trying to benefit from this bubble/wave however you may refer to this.
Have you ever thought of building something with the OpenAI APIs or using other LLMs (Large-Language-Models)?
Let’s build something useful today.
Build an internal searching tool.
Requirements:
We can upload any text content to the tool.
When queried for any information, it should return the chat from within the information provided and not look on the outside internet.
How is it useful?
You can have a personal assistant for you to answer your to-do list, reminders, and what you have done so far, …
Professionally, you can use your internal wikis or documents (the non-confidential ones of course!) and create an assistant that answers the questions from within the internal data provided. More like a FAQ on your entire data set.
How are we implementing it?
I’m going to provide high-level information that you can use to build your style. This is keeping the length of this newsletter edition in mind.
What does the flow look like?
Data Ingestion:
You need a utility to accept the data source. You can have an UI built or if you are just going to try it out, you can even build via CLI.
Data Embedding:
We cannot directly store the entire data in the database. That won’t help in finding the closest possible matches for the queries. We need to create something known as vector embeddings. In short, Vector embeddings turn the text into a series of numbers that capture meanings and relationships with one another. NLP (Natural Language Processing) extensively uses this concept to perform sentiment analysis, text classification, …
“king − man + woman ≈ queen”
For eg.)
Consider the following sentences:
That is a happy dog.
That is a very happy person.
Today is a sunny day.
Each of them will have their embeddings.
Now, I introduce a query: “That is a happy person“
>>> That is a very happy person -> similarity score = 0.94291496
>>> That is a happy dog -> similarity score = 0.69457746
>>> Today is a sunny day -> similarity score = 0.25687605
Again, depending on the LLM you’re using, the score may vary but the context still applies.
So, for the given query, the closest match is this sentence “This is a very happy person“.
Query Interface:
You also need to have an interface (UI, CLI, …) to accept queries/prompts/searches.
High-level conceptual diagrams
Data Ingestion Flow:
If your document source is large, it makes sense to divide them into various chunks (let’s say based on size or words, …).
For each of the divided chunks, you can generate the vector embeddings using your preferred LLMs (OpenAI, Llama, …). You can visit this link to understand more about how to generate using OpenAI.
Now the generated vector embeddings can be stored as individual entries in a Vector Database (DataStax, MongoDB, …)
Data Query Flow:
User queries with a prompt (eg. “This is a very happy person” from our previous example).
We can use the LLM to generate the vector embedding for this query.
We can query the vector database using cosine similarity or related.
Vector DB returns the list of matches and we can sort them by the similarity score.
Mostly, the top result would be our preferred result.
You can display that to the user.
Tada! Now you have built your internal search engine using LLMs, Vector Database, …
You’re officially an AI developer now 😎
Hope this was helpful. Please let me know either as a response or via comments in case you need more details on how to do this one.
This is exactly what I thought of building in-depth - https://www.betterassist.online/.
I eventually had to drop this idea as I didn’t get enough traction :|
📚 Popular Articles
😂 Fun memes
Me: Well. My web dev skills are getting used at least somewhere 🤷♂️
Damn, someone knows my credentials to my bank account. Let me reset the password. That makes it secure.
New Password: password
🤣
💬 What do you think about this?
Just hit reply and let us know your thoughts!
📢 Calling for contributions
This newsletter thrives on community contributions. Your expertise, insights, and experiences matter to us! We're open to featuring articles written by our readers.
If you have a valuable perspective, a TypeScript tip, or a frontend engineering story to share, we welcome your submissions!
Just hit reply, and we will connect!
🌻 Your support matters! 🌻
Researching and writing high-quality articles demands considerable time and effort. As this newsletter is offered for free and managed alongside a full-time commitment, your support can help sustain its quality and growth.
If you enjoy the content and find it valuable, please consider supporting my efforts by visiting this link. Every contribution helps in maintaining and enhancing the newsletter's content and reach.
Thank you for being part of this journey!
Reply