Subscribers have been asking me to write more about AI and how it works, and so I shall. This post — and others going into the mechanics of AI — is for paid subscribers only. Check out the last post about RAG here.
TL;DR
A vector database is a place where developers store specially formatted data to use for machine learning and AI.
To make large language models more accurate, you need to power them with your own unique data
But models have a very specific data diet: they only consume vectors, which are a bunch of numbers
Embedding is the process of turning your data (images, text, videos) into vector representations (numbers)
Vector databases are specialized places to store these embeddings, and search through + retrieve them when you need them
Vector databases themselves are actually pretty simple, but the context for why they exist is not. So let’s start with that.
Why do we need a vector database in the first place?
Using your data to improve AI models
The point of a vector database is to make it easier for you to integrate your company’s data into a large language model. But why would you want to do that in the first place? The general answer: to make models more accurate and customized to your specific needs.
Back in prehistoric machine learning days, every model you built was trained on your unique data. Today, most people use off-the-shelf foundation models like GPT-4, Claude, or Gemini. These models are trained on the entirety of the internet! And they don’t have access to your company’s internal data, which is where the real value is. You can get decent generic responses out of them for some tasks, but for real business use cases you won’t get anywhere without your data being integrated somehow.
There are two state of the art ways that teams are powering LLMs with their proprietary data. The first is fine tuning, where you actually retrain a model to take your data into account, updating the model weights as you go and creating an entirely new model. The second, and more popular for now, is RAG, or Retrieval Augmented Generation. It’s a clever way of including relevant data for your prompt inside the prompt itself, without needing to retrain the model.
Keep reading with a 7-day free trial
Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.