6 Comments

Loved the explainer Justin! The twitter example reminds me of a really good excerpt about describing "load" from Designing Data-Intensive Applications by Martin Kleppman. More related to the topic of structured vs "unstructured" DBs in production-environment than Data Lakes, but I think it does a great job of driving home why the differences matter and the consequences of the trade-offs. Found the excerpt here: https://ebrary.net/64604/computer_science/scalability

Huge fan btw! Thanks for doing this.

Expand full comment

that's quite straightforward.

A Warehouse indicates the stuff are or organized.(structured )

A lake is a place where once you drop a water you cannot find easily (unstructured)

Expand full comment

Thanks Justin, this is super clear and helpful. Appreciate if you could shed more light on Databrick. For example, what is the architecture like? How could it make the querying of data lake as fast as data warehouse?

Expand full comment

Would you say the data lake is the central part of the modern cloud infrastructure?

Expand full comment