Best explanation of delta parquet and its features that I've seen so far.
Sir, I commend your work, and I have all the best wishes in my heart for you. I was working as a junior data engineer at a company here in Canada; your teachings perfectly align with the skills that are required to shine in this field. As a junior, I always have so many questions about data, and your course addresses those above and beyond. Please don't leave making videos; we need you. If you can do some lectures on PySpark, that would be great.
You are the best teacher which I came across, your lectures have increased my interest in Data lake many folds.
Complete Content and helpful! I scheduled my exam at 21 march 2025 60 days to complete the series! lets go! Thank you for amazing content
your just blowing me away with your knowledge..... your a Addiction
Sheer genius way of teaching. Passion rekindled for wanting to learn again. Thank you master.
DP-203: 08 - Notes: Delta lake is a storage format like parquet, csv Data lake is a storage itself Delta lake is parquet file + transaction log (json) Each update - new log file and parquet files which were updated - acid guarantees! atomicity partitioning (sub directories) like country = us, country = ua parallelism consistency state 1 to state 2 isolation concurrency - optimistic model (in sql pessimistic by default) snapshot isolation durability redundant strategy - possible history audit (basic) > DESCRIBE HISTORY tableName - time traveling > SELECT * FROM tableName TIMESTAMP AS OF '2023-10-07T16:09:18.000+0000' -- show what was before this date or > SELECT * FROM tableName VERSION AS OF 2 - rollback of changes easy (in case of accidentaly updates) - vacuuming operations remove uncommitted files no longer need (retention), default 7 days if we run this operation impact time travel - schema enforcement x parquet v delta (schema mismatch detected, when in parque usually not and lead to data corruption) - check & constraints > ALTER TABLE tableName ADD CONSTRAINT GenderCheck CHECK (gender IN ('F','M') - schema evolution capabilities (data changes over time) > df.write.option("mergeSchema","true").mode("append").format("delta").save(...) - merge (new data insert/update by ID) - OPTIMIZE AND Z-ORDER small files combine to bigger ones, could be used z-order - how data will be physically ordered in files - unified batch and streaming (we could use the same code for both as result)
Thanks a lot After Watching your playlist My Interest In the Field of Data engineering has increased a lot (I rarely comments on someone videos)
I just started watching your lectures.This is the the first time I understood different files formats and delta lake..
Best teacher! Thank you so much for your work.
Explained in very simple manner and easy to digest. However for a cat lover like me, cats in the background steal my attention from the main course and have to repeat to understand! :)
Thanks for these videos. It has helped me better understand the different file formats available in Azure. I think this is the only tutorial I found extremely benefitiial on youtube. You also speak slowly, so that a non-native English speaker can easily understand and follow. Thanks a lot.
Hi Tybul, your content is simple phenomenal. the way you explain the concept, make us stay connected with the flow. please keep posting the videos more freqeuently. Thank you
Enjoying your videos!
Pure and Original content. Thanks a lot!
Thank you so much for these videos. The practical examples are gold.
Videos are really helpful and your cats are absolutely adorable, they're fighting in the background so hard to miss. 🤣🤣🤣
Thanks for the videos. They are very well done and helping big time.
Please continue the series.. And spreading knowledge..
@AbdullahShoukat-x4m