Snowflake is a cloud data warehouse. Data teams use it to store and query their analytical data.
Refresher: what’s a Data Warehouse?
A data warehouse is a specially designed database that holds analytical data. It’s built to handle long, complicated queries written by data scientists, analysts, and machine learning engineers.
If you’re iffy on what data warehouses do, now would be a good time to read the original post here. It covers:
Why we use separate databases for our apps vs. analytics
How data gets into a warehouse via ETL or ELT
Typical types of analytical queries
Part of where data warehouses get confusing is what they actually are. And the answer is that a data warehouse involves both:
A different place to store your data (a separate database server)
A different way of storing data (new data sources, structures, etc.)
Most companies with data teams will have at least two separate places where they work with data: their production database, and a data warehouse. The production database only has data that’s relevant to the core operations of their app (users, business concepts, etc.). The data warehouse, on the other hand, might have copies of their production data, payment data, website traffic data, and lots of other stuff - whatever the data team wants to analyze.
There’s a rich history to data warehouses, and the state of the art has been in constant change. For a while, teams ran their data warehouses on top of the same database software that powered their production databases, like PostgreSQL. For the past 10 years, the Hadoop ecosystem was all the rage (from personal experience, a very thorny proposition).
Today, most startups (and increasingly big companies) are using cloud native, fully managed data warehouses. When you buy something like Snowflake or BigQuery, they take complete care of infrastructure, and you don’t need to manage any servers whatsoever.
The basic highlights:
Fully managed infrastructure (don’t need to touch servers, automatic scale, etc.)
Pay per use model: usually $/storage and amount of data queried
Slick browser-based user interfaces for managing permissions and data
These are completely taking over the market – Snowflake was the largest tech IPO literally ever (lol).
The core Snowflake product – storage and compute
At its core, Snowflake is just a data warehouse as a service, and there are many of them (BigQuery, Redshift, etc.). We’ll run through the major components of it but just keep in mind that a very similar set of paragraphs could be written about BigQuery too.
When you get right to it, the “product” that Snowflake offers is really just a place to store your analytical data, and then query it. Here’s how they break down their product offering:
Keep reading with a 7-day free trial
Subscribe to Technically to keep reading this post and get 7 days of free access to the full post archives.