Loved the explainer Justin! The twitter example reminds me of a really good excerpt about describing "load" from Designing Data-Intensive Applications by Martin Kleppman. More related to the topic of structured vs "unstructured" DBs in production-environment than Data Lakes, but I think it does a great job of driving home why the differences matter and the consequences of the trade-offs. Found the excerpt here: https://ebrary.net/64604/computer_science/scalability
Thanks Justin, this is super clear and helpful. Appreciate if you could shed more light on Databrick. For example, what is the architecture like? How could it make the querying of data lake as fast as data warehouse?
Loved the explainer Justin! The twitter example reminds me of a really good excerpt about describing "load" from Designing Data-Intensive Applications by Martin Kleppman. More related to the topic of structured vs "unstructured" DBs in production-environment than Data Lakes, but I think it does a great job of driving home why the differences matter and the consequences of the trade-offs. Found the excerpt here: https://ebrary.net/64604/computer_science/scalability
Huge fan btw! Thanks for doing this.
that's quite straightforward.
A Warehouse indicates the stuff are or organized.(structured )
A lake is a place where once you drop a water you cannot find easily (unstructured)
Thanks Justin, this is super clear and helpful. Appreciate if you could shed more light on Databrick. For example, what is the architecture like? How could it make the querying of data lake as fast as data warehouse?
Check this out: https://technically.substack.com/p/what-does-databricks-do
Thanks so much! ^_^
Would you say the data lake is the central part of the modern cloud infrastructure?