Databricks is apparently worth $100B. What do they even do?
Or perhaps more accurately: what does Databricks NOT do?
If you’re like me, you are probably a little bit confused about Databricks.
To the innocent onlooker, it sometimes feels like they are constantly announcing some new fundraising round, multiple times per year, each with a comically larger valuation than the next. Their most recent of these comically large valuations is $100B (yes, one hundred billion), making them one of the 5 most valuable private companies in the world. At this point, Databricks has raised so many funding rounds that they are running out of letters in the alphabet to designate them (this is their Series K, by the way).
All of this begs the obvious question. What does Databricks actually do?
It’s not like their website headline clears things up at all.
What is a data intelligence platform? Judging by the imagery, you’d be justified to believe Databricks somehow got a $100B valuation by selling these:
The magic eight ball says: Don’t count on it.
I first wrote about this frankly very odd company back in 2020, when they were only worth a paltry $6B. Here’s the TL;DR that I put atop the post:
Databricks sells a data science and analytics platform – i.e., a place to query and share data – built on top of an open source package called Apache Spark.
Apache Spark is an open source engine for running analytics and machine learning across distributed, giant datasets
Spark is notoriously hard to run on your own infrastructure, and companies often don’t have the expertise to do that
Databricks provides a managed service for running Spark clusters, as well as notebooks for visualization and exploration, plus the ability to schedule pipelines
More recently, Databricks has been expanding the product portfolio to include ML and data warehousing
This is a pretty big company, all things considered - $6.2B was their most recent valuation, and they’re planning on going public in 2021.
A lot has changed since then (except the fact that Databricks is still private. Planning on going public in 2021. We all fell for that one). The world is awash in Generative AI. The entire corporate universe is uprooting their playbooks and shifting towards AI, OpenAI and Anthropic are worth hundreds of billions of dollars1, and NVIDIA is selling GPUs faster than they can make them.
This is all to say that Databricks is no longer just The Spark Company™. Over time, they’ve become simply a one stop shop for everything related to a company’s data, from training models to storing data to building pipelines with it. Like Snowflake, they are positioning themselves as the all-in-one “data universe” where anything you’d conceivably want to do that involves any sort of data can be done, no other vendors required. And, of course, AI stuff.
What does this all-in-one magic playland consist of? One could break it up into 4 categories:
Storing data
Moving data
Analyzing data
AI stuff
I’ll go through each of these in more depth so we can figure out what this company actually does.
The thing that you have to understand about all of this, before we dive in, is that Databricks has quite literally dozens of SKUs. These are just the high-level categories, each containing several SKUs within:
Remember, anon, most of these do not matter. They are the corporate equivalent of lovebombing, aimed at overwhelming the Fortune 500 customer with so many gadgets, names, and features that they can’t help but think “wow, Databricks has it all.” Because that is why the Fortune 500 customer buys Databricks in the first place: it has it all.
Keep reading with a 7-day free trial
Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.