> Cassandra is a NoSQL database built for really big companies who need to store lots of data and retrieve it fast. Unlike MongoDB, which is built as a document database, Cassandra is columnar, which means data is stored in entire columns (like Snowflake, actually). Using Cassandra feels a lot more like using a relational database.
Cassandra is not a columnar database but implements the concept of a wide-column family.
"Cassandra and HBase have a concept of column families, which they inherited from Bigtable. However, it is very misleading to call them column-oriented: within each column family, they store all columns from a row together, along with a row key, and they do not use column compression. Thus, the Bigtable model is still mostly row-oriented."
- Source: Designing Data-Intensive Applications by Martin Kleppmann.
> DynamoDB is AWS’s proprietary NoSQL database
DynamoDB is based on the architecture of Amazon Dynamo whitepaper. Amazon Dynamo was/is used by the Amazon shopping cart because the database was built for extremely high availability. Apache Cassandra can be considered a spinoff because it was built on the architecture of Amazon Dynamo but with a few different design decisions.
Once upon a time I stayed buried in databases all day long. Oracle, IBM, SQLServer, and yes ... even Postgres once in a while. Thank the Lord I'm retired and don't need to worry about them anymore. I use a plain old notebook for my database these days. It's sooooo much more convenient.
Great overview as usual Justin! With the hype around generative AI and LLMs, which database models do you think are going to used a lot more? I've read comments from MongoDB say it will be an accelerant for their operational database business. Read Snowflake will benefit for their data-sharing. Can you talk about where you think LLMs will use more, obviously not restricted to these two. Could be all fluff for all I know
Yea, not surprised marketing teams at these companies are trying to make the case that their database is somehow best positioned for AI. I can't really opine on that stuff, I can't predict the future. Same thing with vector databases. Not sure yet!
Hey Justin, recently a few startups running vector database are raising lots of money. How would you categorize vector database? Is it another type of transactional database?
Vector databases are an AI thing, definitely not transactional in the traditional sense. It's not entirely clear that they're legit yet so I've avoided them here
If I had to pick one, it would depend on the use case of the models that you're using the vector database for. If they're models that power actual experiences in your app (e.g. Hex Magic https://hex.tech/product/magic-ai/), I would say it's a user facing DB. If it's just for models that you use internally, I'd say it's operational
We're reached the point where most NoSQL databases can "basically" do what SQL databases do, and vice versa. The answer to this question used to be that NoSQL scales better, but that's changing too.
From a fundamentals perspective, the only real answer here today would be when your data model is highly unstructured and irregular, e.g. doesn't fit nicely into a table format. A good example is a Graph Database for companies whose data is mostly about relations from an entity to another. Or if you're storing loads and loads of unstructured text. Some would also say quick, in memory lookups in something like Redis but I'm less familiar with that world.
“I heard you like databases, so I added a database of databases on my post about databases” -Justin, probably
Just one more database bro. One more and I'll be done. Just one more bro
Super interesting! Thank you
That is such a useful post and love the database of database.
💯💯
Great overview.
I'd pay for something like this but more in depth!
What in particular are you looking for more depth on?
I'm not sure, just more!
Fair enough 😉
This is super interesting and incredibly useful. Thanks for putting this together!
Great overview. Love the database of database.
Awesome explanation :) thanks
This is super helpful! Can you please write about indexes and how search systems work under the hood?
https://planetscale.com/blog/how-do-database-indexes-work
Very informative post, thank you!
A few remarks on the following:
> Cassandra is a NoSQL database built for really big companies who need to store lots of data and retrieve it fast. Unlike MongoDB, which is built as a document database, Cassandra is columnar, which means data is stored in entire columns (like Snowflake, actually). Using Cassandra feels a lot more like using a relational database.
Cassandra is not a columnar database but implements the concept of a wide-column family.
"Cassandra and HBase have a concept of column families, which they inherited from Bigtable. However, it is very misleading to call them column-oriented: within each column family, they store all columns from a row together, along with a row key, and they do not use column compression. Thus, the Bigtable model is still mostly row-oriented."
- Source: Designing Data-Intensive Applications by Martin Kleppmann.
> DynamoDB is AWS’s proprietary NoSQL database
DynamoDB is based on the architecture of Amazon Dynamo whitepaper. Amazon Dynamo was/is used by the Amazon shopping cart because the database was built for extremely high availability. Apache Cassandra can be considered a spinoff because it was built on the architecture of Amazon Dynamo but with a few different design decisions.
Helpful - just updated. Thank you
Once upon a time I stayed buried in databases all day long. Oracle, IBM, SQLServer, and yes ... even Postgres once in a while. Thank the Lord I'm retired and don't need to worry about them anymore. I use a plain old notebook for my database these days. It's sooooo much more convenient.
Sounds secure
Great overview as usual Justin! With the hype around generative AI and LLMs, which database models do you think are going to used a lot more? I've read comments from MongoDB say it will be an accelerant for their operational database business. Read Snowflake will benefit for their data-sharing. Can you talk about where you think LLMs will use more, obviously not restricted to these two. Could be all fluff for all I know
Yea, not surprised marketing teams at these companies are trying to make the case that their database is somehow best positioned for AI. I can't really opine on that stuff, I can't predict the future. Same thing with vector databases. Not sure yet!
I read ‘diabetes’ haha
Hey Justin, recently a few startups running vector database are raising lots of money. How would you categorize vector database? Is it another type of transactional database?
Vector databases are an AI thing, definitely not transactional in the traditional sense. It's not entirely clear that they're legit yet so I've avoided them here
Thanks! So it doesn't fit in any of the three types of DB you mentioned here?
If I had to pick one, it would depend on the use case of the models that you're using the vector database for. If they're models that power actual experiences in your app (e.g. Hex Magic https://hex.tech/product/magic-ai/), I would say it's a user facing DB. If it's just for models that you use internally, I'd say it's operational
For foundational understanding: when would a product want to be designed with a NoSql vs SQL database ?
We're reached the point where most NoSQL databases can "basically" do what SQL databases do, and vice versa. The answer to this question used to be that NoSQL scales better, but that's changing too.
From a fundamentals perspective, the only real answer here today would be when your data model is highly unstructured and irregular, e.g. doesn't fit nicely into a table format. A good example is a Graph Database for companies whose data is mostly about relations from an entity to another. Or if you're storing loads and loads of unstructured text. Some would also say quick, in memory lookups in something like Redis but I'm less familiar with that world.
Hope this helps!