AI and the — em dash
Finally, an explanation for why AI models can't seem to quit them.
This sentence — which I wrote from scratch without the help of AI — contains an em dash (actually two).
If you’ve been keeping up with the online discourse about AI writing, you may be surprised that I put an em dash in this post. That’s because so many human writers are steering away from this once-common punctuation mark, which is now viewed as a hallmark of writing written by chatbots.
In fact, AI bots love the em dash so much that it can be hard to get them to write content without including it, even when you give the bot explicit instructions not to do so. LLMs can be so funny sometimes.
Of course, this raises the question of why em dashes are all over AI-written content — and whether human writers should give up this once-beloved punctuation mark entirely, so their content isn’t immediately clocked as being written by an LLM.
Why does AI love the em-dash so much?
AI’s love affair with em dashes seems to have a simple explanation: The data used to train large language models was full of em-dashes. The AI is simply mimicking the writers that it learned from.
In fact, there’s some evidence to suggest that the content AI was trained on included significantly more em-dashes than you might expect. And weirdly enough, their prevalence seems to have become a deep bias that’s embedded into how LLMs understand the flow and structure of writing.
AI-training material may have used an overabundance of em-dashes
One theory behind AI’s love of the em-dash is that the later-generation AI models, which rely on it much more heavily than earlier iterations, were trained on older books that included more em-dashes than most modern writers would.
Early on, most AI models were trained based on a mix of public data on the Internet, as well as based on content from pirated books. However, in a quest for better quality training data as the tools evolved, AI models started scanning older texts. Curating the massive data trove that is the internet has been a major focus of AI Labs for more recent model generations, and finding quality text from books was certainly part of that.
The exact timeline for when this happened is something of a mystery, but Anthropic started in 2024, based on court documents, and other AI labs likely made a similar move somewhere between 2022 and 2024.
If AI labs digitized mostly older books, which is a common belief largely because of expired copyrights, their AI programs may have been fed writing with significantly more em dashes included in it — especially as studies show the use of that use of the em-dash peaked in the 1860s.
It may not have been the books alone, either.
Another theory suggests that AI may also have picked up em-dash use from Medium, which automatically converted two hyphens (--) into an em-dash since the company’s founder was a fan of typography. Since Medium may have been seen as a source of high-quality writing by LLMs (and ergo upweighted in training by labs), AI may have determined that the em dash is a key feature of high-quality prose.
The brevity theory
There’s also another competing theory as well. Some suggest that AI favors em dashes because of brevity.
AI models think in tokens, or small chunks of text, and they constantly have to evaluate the next token to use. The more tokens the model uses, the more chance there is to make a mistake, so AI models prefer to use fewer tokens to reduce loss
An em dash is just one token, while other alternatives, like “, and”, actually add up to three tokens. As a result, the em dash ends up being favored to improve efficiency.
Consider the difference between these two sentences:
AI is fast, and it can produce massive volumes of content quickly.
Versus:
AI is fast — producing massive volumes of content quickly.
The em dash removes filler phrases like “and it can,” which humans may prefer to use for better flow, but which add lots of extra tokens. The second sentence ends up shorter, more direct, and denser with meaning, so AI bots tend towards that approach by default.
This preference for brevity ends up being reinforced during fine-tuning, when models are rewarded for responses that humans find to be clear, helpful, and concise rather than rambling or inefficient. AI models choose the em-dash to keep things short, then learn that using em-dashes is the right thing to do.
Sam Altman announces a fix — but it may not be foolproof
For em-dash haters frustrated with the challenge of training out the em-dash, there was reason to hope in recent months.
In November 2025, Sam Altman, CEO of OpenAI, announced on X that custom instructions to avoid the em-dash would now be followed.
However, responses from X users show that the results haven’t been perfect.
Unfortunately, this problem may actually get worse before it gets better — if it gets better at all. The big issue now is that newer models are training on the output of previous generations.
As more content on the Internet contains an over-abundance of em-dashes thanks to the rise in AI-generated content, this stylistic tic could result in an Internet that’s absolutely filled with even more em-dashes. It becomes a self-perpetuating cycle.
This actually creates the risk of model collapse, as newer models amplify the idiosyncrasies of old ones. For those who don’t like AI writing — and who fear that it will replace them — that may be a good thing. But, for those who are frustrated with AI slop and who don’t think LLMs are going anywhere any time soon, seeing the writing get worse could be a major source of frustration.
Should real human writers stop using em-dashes?
For now, the fate of the em-dash hangs in the balance, and the question becomes — should human writers abandon this punctuation entirely and leave it to the bots?
That’s ultimately a judgment call. Many writers are pushing back, refusing to let AI change the way they write and think.
The key may be to use them sparingly and to make sure your writing otherwise clearly signals that your text is written by humans for humans. This can mean injecting personal experiences and emotional depth that chatbots simply haven’t been able to master — at least not yet.







