[Technically dispatch] ChatGPT, and why are these new ML models so good
Explaining recent advances in creepy good AI
This post is sponsored by Hightouch!
Hightouch helps you sync your customer data from your warehouse into the tools your business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to Reverse ETL, or book a demo here.
When I graduated with a Data Science degree in 2017, AI was kind of like a funny toy, and mostly something researchers (read: not me) spent their time on. Getting a half decent result from an ML model involved a bunch of code, several failed attempts at training, and then the inevitable abdication and surrender.
Today, it’s coming for my job:
This is from a model called ChatGPT, a recent release from OpenAI1 that acts as a sort of conversation companion. ChatGPT has been making rounds on the web for prompt responses that are very good, like the above (but don’t worry, all of this post is hand written by yours truly).
Nothing like this existed when I was in school — and even over just the past year, the quality of available ML models has accelerated dramatically. Sentiment among people I know in AI has never been higher and more excited, and hundreds of startups have been popping up, building on top of these so-called Large Language Models (LLMs).
How did things improve so quickly? And what are these models actually doing?
Basics of ML models and text generation
Admittedly, this is not the first time I’ve written about this. You might remember GPT-3, another OpenAI model that dropped in 2020 with some seriously impressive results for generating text:
GPT-3 is a language generation model. Machine Learning is just about figuring out relationships – what’s the impact of something on another thing? This is pretty straightforward when you’re tackling structured problems – like predicting housing pricing based on the number of bedrooms – but gets kind of confusing when you move into the realm of language and text. What are ML models doing when they generate text? How does that work?
The easiest way to understand text generation is to think about a really good friend of yours (assuming you have one). At some point if you hang out enough, you get a good feel for their mannerisms, phrasing, and preferred topics of conversation - to the point where you might be able to reliably predict what they’re going to say next (“finishing each other’s sentences”). That’s exactly how GPT-3 and other models like it work - they learn a lot (I mean, like really a lot) about text, and then based on what they’ve seen so far, predict what’s coming next.
The actual internals of language models are obviously Very Scary and Very Complicated - there’s a reason that most big advancements come from big research teams full of PhDs.
ChatGPT is trained on text and code from across the web (articles, books, comments, etc.), but also actual human conversations:
We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses.
There’s a whole taxonomy of OpenAI models and what other models they’re built off of.
Why this is happening now
Every time an exciting new ML model or interface comes out, the question is always “why now?” and the answers are usually sort of the same.
It’s hard to point to a particular single breakthrough – instead, many of the trends that have been driving better models for years have just continued to develop.
More training data – OpenAI and others have been training models on just colossally large sets of data taken from increasingly diverse sources. Better data means better models
More complex model architectures – foundation models and transformer architectures (neither of which you need to know in depth) are driving more complex, deeper neural networks. In english: models are getting more complex
More horsepower! – the machines we’re using to train and run these models are getting stronger and more efficient. It’s a rough estimate but training GPT-3 probably cost millions of dollars in compute cost
These bullet points are nothing new; they’re just steady progress on the same dimensions as the past few years. Instead, my “big brain” take on why AI has been making it into the discourse so much more lately is less about the models themselves, and more about who uses the models.
Model interfaces are now for the public
Over the past couple of years, models have gone from private, code first, and inaccessible to widely available to the public.
If you look at models that have made it into the public discourse recently, like DALL-E or Stable Diffusion, they share a unique quality: whoever built the model also built an interface to the model. Using ChatGPT is as simple as typing a prompt into OpenAI’s website; generating a photo with DALL-E is too. It’s for everyone! And that is very weird.
It’s hard to overstate how novel this is. For as long as I can remember, AI was a research driven discipline, which meant that interfaces to cutting edge models were in what researchers were familiar with: code. In fact you can trace a pretty clear progression in these model interfaces – a fancy word for how you use them – from niche and closed off, to widely applicable and open.
Academic: new model architecture is discussed in some paper on arXiv, practitioners read and discuss
Code and local first: a new cutting edge model is available for download, and you work with it on your computer (or a server) with code
Code and remote first: a new model is released via API, where you can make requests and get responses. You still use code to do that (this is how GPT-3 worked)
[today] UI and remote first: many new models are released to the public via a slick UI that’s easy to interact with
In other words, anyone can use these new ML models easily, which is a very significant departure from how things used to work. AI has gone from a by-researchers-for-researchers (and practitioners) discipline to a by-researchers-for-the-public-discipline.
The implications of this are huge! Part of OpenAI’s research philosophy is transparency: they want major developments to be known and available to the public. This is obviously a double edged sword; the public is erratic, doesn’t understand what’s going on under the hood, and is very prone to misuse and misunderstand. But it’s also pretty cool to be able to use this stuff, and maybe more than cool, it’s useful.
What ChatGPT means for you and the workforce
With every advance in AI comes the perennial question: is my job safe?
I’m not a sociologist, and I’m definitely not an economist. Papers and bodies of work have been dedicated to how AI impacts the economy, and most is probably conjecture. What we can talk about are a couple of practical changes that you might see over the next few years.
The concept of the prompt engineer
Models like ChatGPT and DALL-E respond to prompts given by the user. In the same sense that searching on Google is now a skill, getting good at creating ML model prompts may become the same. Some have called this eventual job (or more likely, skill you’ll use in your job) prompt engineering. E.g. here’s a post with tips and tricks for prompt engineering.
Businesses built on language models
In my day job at a VC fund, I’m seeing an absolute explosion in startups using language models like GPT-3 to make something easier. This isn’t a new thing, but it’s definitely accelerating. A few categories and examples:
Generating text instead of writing it: Jasper raised $125M (!) for a tool that generates marketing copy and content with AI.
Using text to analyze data: building charts, writing queries, etc. generated by ML models
Using text to build interfaces: describing what you want (e.g. a customer support tool) and generating the code with ML models
I have my gripes with the no code universe, but it is undoubtedly true that we’re going to see non-technical people empowered by this kind of stuff. Which is why you subscribe to Technically!
The era of the idea guy
Taken to an extreme, AI like ChatGPT is going to make it as simple as coming up with the idea (now: the prompt) and the model will do the work for you. This is not where we’re at now, but early applications like generating decent marketing copy has definitely got me scared, as a writer.
Is the dawn of the era of the idea guy beginning? What do you think about what’s been going on in ML? How do you think it might help or hurt you at work? Chime in with a comment.
This post is sponsored by Hightouch!
Hightouch helps you sync your customer data from your warehouse into the tools your business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to Reverse ETL, or book a demo here.
OpenAI is a for-profit research company, with very very deep pockets, focused on building powerful AI models that benefit humanity (or something like that).
good article
First thing I thought of was the college essay(s)- it’s already a corrupted process but now since so much of it can be machine generated, college admissions teams will need to meet candidates and have live interviews to truly vet / assess real capabilities, which may not be a bad thing but has its own limitations.
And... The fact that AI and attendant apps will also comprise another nail in the coffin for the concept of 4-year college as we know it, is perhaps not a bad thing...