Subscribers have been asking me to write more about AI and how it works, and so I shall. This post — and others going into the mechanics of AI — is for paid subscribers only.
TL;DR
Retrieval Augmented Generation (RAG) is a way to make LLMs like GPT-4 more accurate and personalized to your specific data.
LLMs are powerful as hell, but they’re also generic: they’re trained on all data on the internet ever!
RAG helps you get more personalized responses tailored to your data by embedding your data in your model prompts
RAG relies on the model’s context window, which is how much data in can take in a prompt
Today’s RAG pipelines are pretty complex and rely on embedding models and vector databases
Alongside old school fine tuning, RAG is becoming the standard way to get better, more personalized results out of state of the art LLMs.
Back to the future: training models
The funny thing about RAG is that the basic concept has been around for as long as machine learning has. Long time readers will recall that back in the day, I studied Data Science in undergrad. “Old School” machine learning, before everyone was calling it AI, was entirely predicated on training a new model for every problem.
Keep reading with a 7-day free trial
Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.