All about AI reasoning models

All about "reasoning" language models like OpenAI's o3 and Deepseek's R1.

Jun 10, 2025

∙ Paid

The answers you already love from AI models like ChatGPT and Claude might seem like they’re deeply thought out and researched. But under the hood these models have historically been pretty simple: they’re just playing the word guessing game. Enter a new generation of models – like DeepSeek’s R1 and OpenAI’s o3-mini – that are actually learning to think and reason like humans. This post will walk through how models actually do this and how you can use reasoning models to get better answers.

The word guessing game vs. actual reasoning

One of the most counterintuitive things about large language models is that sure, they’re incredibly complicated and powerful, but also…very simple at their core. I explained how these models worked a couple of years back:

The way that ChatGPT and LLMs generate these entire paragraphs of text is by playing the word guessing game, over and over and over again. Here’s how it works:

You give the model a prompt (this is the “predict” phrase”)

It predicts a word based on the prompt
It predicts a 2nd word based on the 1st word
It predicts a 3rd word based on the first 2 words
…

It’s really very primitive when you get down to it. But it turns out that the word guessing game can be very powerful when your model is trained on all of the text on the fucking internet! Data Scientists have long said (about ML models) that “garbage in means garbage out” – in other words, your models are only as good as the data you’ve used to train them. With OpenAI’s partnership with Microsoft, they’ve been able to dedicate tremendous amounts of computing resources towards gathering this data and training these models on powerful servers.

Things have changed since then: model architectures have gotten more complicated, and models are guessing more than one word at a time. But fundamentally, word guessing is how LLMs work. This is why sometimes, model responses have obvious logical inconsistencies, or miss things that humans never would. They’re not really thinking in the human sense.

But what if they could? A seminal research paper in 2022 developed an idea they called STaR. It’s a way to teach models to have real chains of thought, just like humans might, so models can “think” instead of just generating words right away.

The gist of the STaR method is teaching a model how to develop rationales. Rationales are reasons why the model gives the answer it did, instead of just the answer itself. The paper develops a method to fine tune these models using the somewhat convoluted process in the diagram above, but the idea is pretty simple. If the model gets the right answer with a good rationale, it gets rewarded. If it gets the wrong answer, or the right answer with the wrong rationale, we start again.

The training data set is made up of questions like this:

Keep reading with a 7-day free trial

Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.