I’ve already written about how Large Language Models like ChatGPT and Claude work. But how are they made?
How do you actually build and train an AI model to do all of the amazing stuff that ChatGPT can do?
What is training in the first place?
The process of creating a model is called training, kind of like training a kid to ride a bike, or whatever those people were doing at the Pokemon gyms. In old school Machine Learning – like the kind I went to school for – training broke down into 4 major steps:
Acquire data on the problem: gather a dataset that you’ll use to teach your model to do what you want it to do, like classify an image or predict a stock price.
Label your dataset: data needs context to be useful to the model, like what’s in an image or if a stock went up or down.
Train your model: using some standard algorithms and linear algebra, teach the model what’s going on in your nicely curated dataset.
Test your model: make sure what your model has learned transfers well to new data (and ideally, the real world).
In a sense, training a model really is like teaching a kid how to do something, like riding a bike. It’s less about telling them how to do it, and more about giving them repetitions so they can figure out what’s going on for themselves. With some well timed guidance, of course.
In the same sense, a model is a decision making machine. The way you train a model is by showing it many, many different situations and what the correct outcome is in those situations. The model uses some fancy math to learn the patterns in those situations and learns to apply them to new data. And like teaching a kid to do something, the way you train a model – from the method to the algorithms used – vary slightly depending on what you want the model to do.
Let’s run through a few examples.
Examples of ML training
Iowa has the most fertile soil in America (and maybe the world), so they grow the most profitable mechanized crops: corn and soybeans. Farmers want to use drones to scour their fields and detect if there are pests eating away at their valuable plants. So one of them wants to build a model that looks at an image (the input) and tells you if that image contains any bugs (the output). How would you train a model like this?
This is a brown stink bug (real name), which is bad news for your corn.
Keep reading with a 7-day free trial
Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.