![Raja's Data Engineering](/img/default-banner.jpg)
- 131
- 2 017 022
Raja's Data Engineering
Приєднався 1 лип 2021
Welcome to Raja's Data Engineering!
Are you ready to embark on a thrilling journey into the world of Azure Databricks and Apache Spark? Look no further! Our channel is your go-to destination for all things related to these powerful data processing and analytics tools.
Join us as we delve into the depths of Azure Databricks and Apache Spark, unraveling their capabilities, exploring best practices, and unlocking the secrets to harnessing their true potential. Whether you're a data engineer, data scientist, or a curious learner passionate about big data technologies, our channel offers a wealth of knowledge to fuel your growth.
Here's what you can expect:
In-depth Tutorials
Best Practices and Tips
Use Case Discussions
Performance Optimization
Interview Preparation
Get ready to unlock the full potential of Azure Databricks and Apache Spark with our engaging and informative videos. Don't forget to subscribe to our channel and hit the notification bell, so you never miss an update.
Are you ready to embark on a thrilling journey into the world of Azure Databricks and Apache Spark? Look no further! Our channel is your go-to destination for all things related to these powerful data processing and analytics tools.
Join us as we delve into the depths of Azure Databricks and Apache Spark, unraveling their capabilities, exploring best practices, and unlocking the secrets to harnessing their true potential. Whether you're a data engineer, data scientist, or a curious learner passionate about big data technologies, our channel offers a wealth of knowledge to fuel your growth.
Here's what you can expect:
In-depth Tutorials
Best Practices and Tips
Use Case Discussions
Performance Optimization
Interview Preparation
Get ready to unlock the full potential of Azure Databricks and Apache Spark with our engaging and informative videos. Don't forget to subscribe to our channel and hit the notification bell, so you never miss an update.
131. Databricks | Pyspark| Built-in Function: ZIP_WITH
131. Databricks | Pyspark| Built-in Function: ZIP_WITH
============================================
🚀 New UA-cam Video Alert! 🚀
I just released a new video on UA-cam where I dive into the powerful zip_with function in PySpark! 📊🔧
I am excited to announce the release of my latest UA-cam video where I delve into the powerful Change Data Feed (CDF) feature in Databricks. 📊✨
In this video, you'll learn:
The basics of the zip_with function.
Practical examples of using zip_with for element-wise operations.
How to apply custom binary functions to array elements.
Tips and tricks for handling array data operations in PySpark.
👉 ua-cam.com/video/8LVmUpFLMzA/v-deo.html
Don't forget to like, share, and subscribe for more data engineering content! Your feedback and comments are always welcome. Let's dive into the world of real-time data together! 💡💻
#Databricks #PySpark #BigData #DataEngineering #DataScience #MachineLearning #ApacheSpark #zip_with #ArrayOperations #DataTransformation #ETL #DataProcessing #CloudData #DataAnalytics #TechTutorial #UA-camLearning #DataCommunity #AI #ML #DataOps #RealTimeData #DataAnalytics #UA-camLearning #DataEngineeringProjectUsingPyspark, #PysparkAdvancedTutorial,#BestPysparkTutorial, #BestDatabricksTutorial, #BestSparkTutorial, #DatabricksETLPipeline, #AzureDatabricksPipeline, #AWSDatabricks, #GCPDatabricks
============================================
🚀 New UA-cam Video Alert! 🚀
I just released a new video on UA-cam where I dive into the powerful zip_with function in PySpark! 📊🔧
I am excited to announce the release of my latest UA-cam video where I delve into the powerful Change Data Feed (CDF) feature in Databricks. 📊✨
In this video, you'll learn:
The basics of the zip_with function.
Practical examples of using zip_with for element-wise operations.
How to apply custom binary functions to array elements.
Tips and tricks for handling array data operations in PySpark.
👉 ua-cam.com/video/8LVmUpFLMzA/v-deo.html
Don't forget to like, share, and subscribe for more data engineering content! Your feedback and comments are always welcome. Let's dive into the world of real-time data together! 💡💻
#Databricks #PySpark #BigData #DataEngineering #DataScience #MachineLearning #ApacheSpark #zip_with #ArrayOperations #DataTransformation #ETL #DataProcessing #CloudData #DataAnalytics #TechTutorial #UA-camLearning #DataCommunity #AI #ML #DataOps #RealTimeData #DataAnalytics #UA-camLearning #DataEngineeringProjectUsingPyspark, #PysparkAdvancedTutorial,#BestPysparkTutorial, #BestDatabricksTutorial, #BestSparkTutorial, #DatabricksETLPipeline, #AzureDatabricksPipeline, #AWSDatabricks, #GCPDatabricks
Переглядів: 529
Відео
130. Databricks | Pyspark| Delta Lake: Change Data Feed
Переглядів 1,1 тис.14 днів тому
130. Databricks | Pyspark| Delta Lake: Change Data Feed 🚀 New UA-cam Video Alert: Exploring Change Data Feed in Databricks! 🚀 I am excited to announce the release of my latest UA-cam video where I delve into the powerful Change Data Feed (CDF) feature in Databricks. 📊✨ In this video, you'll learn: 🔹 What Change Data Feed is and how it works 🔹 How to enable and use CDF in your Databricks environ...
129. Databricks | Pyspark| Delta Lake: Deletion Vectors
Переглядів 87921 день тому
129. Databricks | Pyspark| Delta Lake: Deletion Vectors Delta Lake Internal Architecture: ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=EEgkoZZKJ7F4QsaH Optimize Command : ua-cam.com/video/F9tc8EgIn3c/v-deo.htmlsi=9KknJFJeHJunYJ_h Vacuum Command : ua-cam.com/video/G_RzisFeA5U/v-deo.htmlsi=FDNusdn2U4vjIlup 🚀 Excited to announce my latest UA-cam video on the new Databricks Deletion Vectors feature! 🎥...
128. Databricks | Pyspark| Built-In Function: TRANSFORM
Переглядів 950Місяць тому
128. Databricks | Pyspark| Built-In Function: TRANSFORM The transform function in PySpark is a versatile and powerful feature that plays a crucial role in data engineering and data science use cases. In this tutorial video, learn how to develop concise and more readable solution in Databricks development. ua-cam.com/video/eNUYxJBMrh8/v-deo.html #Databricks #TRANSFORM, #PysparkBuilt-InFunction #...
127. Databricks | Pyspark| SQL Coding Interview:LeetCode-1045: Customers Who Bought All Products
Переглядів 1,7 тис.2 місяці тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most Big Data interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the customers who bought all the products available. This is also one of the common coding exercises asked in ...
126. Databricks | Pyspark | Downloading Files from Databricks DBFS Location
Переглядів 2,6 тис.4 місяці тому
Quick Guide: Downloading Files from Databricks DBFS Location In this short tutorial video, learn how to effortlessly download files from a Databricks DBFS (Databricks File System) location. Whether you're a data engineer, data scientist, or analyst working with Databricks, accessing and retrieving files from DBFS is a fundamental skill. * Accessing DBFS: Learn how to navigate to the DBFS locati...
125. Databricks | Pyspark| Delta Live Table: Data Quality Check - Expect
Переглядів 3,8 тис.5 місяців тому
Azure Databricks Learning: Databricks and Pyspark: Delta Live Table: Data Quality Check - Expect 🚀 Excited to share my latest UA-cam video discussing the powerful data quality checks feature of "expect" in Delta Live Tables on Databricks! In today's data-driven world, ensuring data accuracy and reliability is paramount. With "expect", we can effortlessly define and enforce data quality constrai...
124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views
Переглядів 7 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp 4. DLT Declarative vs Procedural - ua-cam.com/video/-ia78A2QMN0/v-deo.htmlsi=MgkO7zfwYRjK6843 5. DLT Datasets - ua-c...
123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural
Переглядів 6 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp 4. DLT Declarative vs Procedural - ua-cam.com/video/-ia78A2QMN0/v-deo.htmlsi=MgkO7zfwYRjK6843 Azure Databricks Learn...
122. Databricks | Pyspark| Delta Live Table: Introduction
Переглядів 15 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp Azure Databricks Learning: Databricks and Pyspark: Delta Live Table: Introduction Delta Live Tables in Databricks is...
121. Databricks | Pyspark| AutoLoader: Incremental Data Load
Переглядів 15 тис.8 місяців тому
Azure Databricks Learning: Databricks and Pyspark: AutoLoader: Incremental Data Load AutoLoader in Databricks is a crucial feature that streamlines the process of ingesting and processing large volumes of data efficiently. This automated data loading mechanism is instrumental for real-time or near-real-time data pipelines, allowing organizations to keep their data lakes up-to-date with minimal ...
120. Databricks | Pyspark| SQL Coding Interview: Employees Earning More Than Department Avg Salary
Переглядів 3,3 тис.9 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the employees who are earning more than their department average salary. This is also one of common codi...
119. Databricks | Pyspark| Spark SQL: Except Columns in Select Clause
Переглядів 2,8 тис.9 місяців тому
Azure Databricks Learning: Pyspark and Spark SQL: Except Columns in Select Clause Except function provided by Databricks in Spark SQL is powerful feature while performing data analytics of dataset with 1000s of columns. It is life saver feature for developers for data engineering and data Analytics projects To get more understanding, watch this video ua-cam.com/video/Aj0kTlD9IgI/v-deo.html #Exc...
118. Databricks | PySpark| SQL Coding Interview: Employees Earning More than Managers
Переглядів 3,1 тис.10 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the employees who are earning more than their managers. This is Leet Code SQL Exercise number 1783. This...
117. Databricks | Pyspark| SQL Coding Interview: Total Grand Slam Titles Winner
Переглядів 3,4 тис.10 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the total number grand slam titles won by each player. This is Leet Code SQL Exercise number 1783. This ...
116. Databricks | Pyspark| Query Dataframe Using Spark SQL
Переглядів 4,1 тис.10 місяців тому
116. Databricks | Pyspark| Query Dataframe Using Spark SQL
115. Databricks | Pyspark| SQL Coding Interview: Number of Calls and Total Duration
Переглядів 4 тис.10 місяців тому
115. Databricks | Pyspark| SQL Coding Interview: Number of Calls and Total Duration
114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table
Переглядів 4,5 тис.10 місяців тому
114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table
113. Databricks | PySpark| Spark Reader: Skip Specific Range of Records While Reading CSV File
Переглядів 4,3 тис.Рік тому
113. Databricks | PySpark| Spark Reader: Skip Specific Range of Records While Reading CSV File
112. Databricks | Pyspark| Spark Reader: Skip First N Records While Reading CSV File
Переглядів 4,1 тис.Рік тому
112. Databricks | Pyspark| Spark Reader: Skip First N Records While Reading CSV File
111. Databricks | Pyspark| SQL Coding Interview: Exchange Seats of Students
Переглядів 6 тис.Рік тому
111. Databricks | Pyspark| SQL Coding Interview: Exchange Seats of Students
110. Databricks | Pyspark| Spark Reader: Reading Fixed Length Text File
Переглядів 2,4 тис.Рік тому
110. Databricks | Pyspark| Spark Reader: Reading Fixed Length Text File
109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL
Переглядів 14 тис.Рік тому
109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL
108. Databricks | Pyspark| Window Function: First and Last
Переглядів 4,2 тис.Рік тому
108. Databricks | Pyspark| Window Function: First and Last
107. Databricks | Pyspark| Transformation: Subtract vs ExceptAll
Переглядів 3,4 тис.Рік тому
107. Databricks | Pyspark| Transformation: Subtract vs ExceptAll
106.Databricks|Pyspark|Automation|Real Time Project:DataType Issue When Writing to Azure Synapse/SQL
Переглядів 3,5 тис.Рік тому
106.Databricks|Pyspark|Automation|Real Time Project:DataType Issue When Writing to Azure Synapse/SQL
105. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - V
Переглядів 3,8 тис.Рік тому
105. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - V
104. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - IV
Переглядів 3,3 тис.Рік тому
104. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - IV
103. Databricks | Pyspark |Delta Lake: Spark/Databricks Interview Question Series - III
Переглядів 7 тис.Рік тому
103. Databricks | Pyspark |Delta Lake: Spark/Databricks Interview Question Series - III
102. Databricks | Pyspark |Performance Optimization: Spark/Databricks Interview Question Series - II
Переглядів 7 тис.Рік тому
102. Databricks | Pyspark |Performance Optimization: Spark/Databricks Interview Question Series - II
declare @Player table (PlayerId int, PlayerName varchar(10)) insert into @Player values (1,'Nadal'),(2,'Federer'),(3,'Novak') declare @Matches table (Year int, Wimbledon tinyint,Fr_Open tinyint, US_Open tinyint, Au_Open tinyint) insert into @Matches values (2017, 2,1,1,2),(2018,3,1,3,2),(2019,3,1,1,3) select P.PlayerId, P.PlayerName , count ( distinct W.Year ) as Wimbledon , count ( distinct F.Year ) Fr_Open , count ( distinct U.Year ) as Us_Open , count ( distinct A.Year ) as Au_Open , count ( distinct W.Year ) + count ( distinct F.Year )+ count ( distinct U.Year ) + count ( distinct A.Year ) as Total_Grand_Slams from @Player P left join @Matches W ON P.PlayerId = W.Wimbledon left join @Matches F ON P.PlayerId = F.Fr_Open left join @Matches U ON P.PlayerId = U.US_Open left join @Matches A ON P.PlayerId = A.Au_Open group by P.PlayerId, P.PlayerName order by P.PlayerId
Great, thank you. You have explained in simpler way to understand anyone.
Thank you
A Passionate teacher,,,Hats off...Keep updating ...this is like contribution to Indians growth...Heart felt thanks
Thanks a ton!
Hi Sir, we want vidoe for performance issues and solutions while develope the notebook what are the issue comes
Is it required to learn Delta Live table for 3 year exp person to switch to data engineer role ?
Yes it is good to learn DLT
@@rajasdataengineering7585 i'm having 3yr exp in other role.i want to switch to Azure Data engineer role . Can you please create a project of which we can set in a resume for applying jobs.
Use window function row_number, then use filter to filter out row nums from 11 to 20: w=Window.orderby(lit("A")) df.withColumn("ROWNUM", row_number().over(w)).filter( (col("ROWNUM")<11) | col("ROWNUM")>20) ).drop("ROWNUM").show()
can i get files which your are worked for demo
Thanks for this video, But I am curious why didnt you directly use min max with group by which would have fetched the same result ``` result = df.withColumn("event_date", F.to_date("event_date")) \ .groupBy("event_status") \ .agg( F.min("event_date").alias("event_start_date"), F.max("event_date").alias("event_end_date") ) \ .orderBy("event_start_date") result.show() ```
Thanks for sharing your approach. Yes there are various approaches
This won't work please check your code
It worked
Please make Video on Salting in Performance optimization
Sure will create a video on salting technique
Awesome work bro. Have you put these notebooks somewhere in your github? Can you share that with us if possible?
Finally found one person who can explain Broadcast variable in a clear and understandable way. Huge respect bro. Subscribed and off I go to other videos in the playlist :)
Thanks and welcome!
You are a great teacher.
Glad you think so! Thanks
Very good video, thank you
Glad you liked it! Welcome!
thank you sir
Most welcome!
Excellent job. Can you please provide me the data set and code? Or please give me the Git link to download the dataset and code for your tutorials. Thanks.
Thanks for the content!
My pleasure! Welcome
Please make video on liquid clustering..
Sure will create soon
The file was shared at the beginning is incorrect , it has an extra comma delimater which cause all columns to be added to _corrupt_record
Awesome explanation on Z-ordering. Greatly appreciated your efforts making this video.
Thank you
BEST MEANS BEST VIDEO EVER IF YOU ARE INTERESED TO LEARN THEN .
Thank you
at least you should have make one video on how to open databricks notebook
so helpful ..thanks for helping the community
Happy to help! Thanks for your comment
Where can I find the datasets used in this demo? is there any github repo you can share?
how to explode two column at the same time getting error in below code brand=[('Raja',["TV","Oven","Ref","AC"],["Scooter","Car"]), ('supriya',["AC","Washimach",None],["Car",None]) ] df_app=spark.createDataFrame(data=brand,schema=['name','appliances','vehicles']) df_explode=df_app.select(df_app.name,explode(df_app.appliances),explode(df_app.vehicles))
add more videos on this delta live tables
Sure, will do
Hi Raja ,please do azure synapse analytics,eagerly waiting
Sure Krishna, will create videos on synapse analytics
Superb explanation Raja 👌 👏 👍
Thank you so much 🙂
Are any prerequisites required to this pyspark series?
No, nothing needed. I have covered from basic
Thank you!
Very Nice
Thanks
thanks for your vides as DLT is but confusing topic for me, can you create a video on DLT constraint in SQL and Python in this DLT series
Sure, I will create a video on this requirement
Keep going
Thank you
sir plz explain about cluster mode vs client mode
Sure, will create a video for this requirement
Sure, will create a video for this requirement
Thanks alot for in depth explanation😊
Hope it helps! Thanks and welcome
union_df = call_df.select("from_id", "to_id", "duration").unionAll(call_df.select("to_id", "from_id", "duration")) ######################################################################################################################### final_df= (union_df.filter(F.col("from_id") < F.col("to_id")) .groupBy(F.col("from_id").alias("person_1"), F.col("to_id").alias("person_2")) .agg(F.sum(F.col("duration")).alias("total_duration"), F.count(F.lit(1)).alias("total_calls"))) ######################################################################################################################### final_df.display()
input_data = [("01-06-2020", 'Won'), ("02-06-2020", 'Won'), ("03-06-2020", 'Won'), ("04-06-2020", 'Lost'), ("05-06-2020", 'Lost'), ("06-06-2020", 'Lost'), ("07-06-2020", 'Won')] input_schema = "event_date STRING, event_status STRING" df = spark.createDataFrame(data= input_data, schema= input_schema).withColumn("event_date", F.to_date(F.col("event_date"), "dd-MM-yyyy")) ##################################### windowSpec= Window.partitionBy(F.col("event_status")).orderBy(F.col("event_date")) rnk_df= df.withColumn("rnk", F.row_number().over(windowSpec)) ##################################### final_df= rnk_df.withColumn('grp_id',F.dayofmonth(F.col("event_date")) - F.col("rnk")) ##################################### result_df = (final_df.groupBy(F.col("grp_id"), F.col("event_status")).agg(F.max(F.col("event_status")).alias("eventstatus"),F.min(F.col("event_date")).alias("start_date"), F.max(F.col("event_date")).alias("end_date")).drop(F.col("grp_id"), F.col("event_status")) ) result_df.display()
Sir, Your explanations are very clear & concise. Thank you
Thanks and welcome
Hi sir I don't understand use of spot instance?
As always, well explained and very useful. Thank you!
Glad it was helpful! Thanks for your comment
You explained it so simply... i hope will be able to explain to the interviewer the same way u did😅
Thank you! All the best!
awsome
Thank you
Hi Raja, is there anyway to get data sets / code which you are explaining for practice
Good video Raja. In real time we don’t know exact versions how can we deal with them dynamically ?
describe history command can give you version history or if you'll not specify starting version it'll consider latest version by default
example query streaming_query= (spark.readStream .option("readChangeData", True) .table(f"{tablename") .writeStream .outputMode("append") .foreachBatch(udf) .option("mergeSchema", "true") .option("checkpointLocation", "location") .start() ) batch_query = (spark.read .option("readChangeData", True) .table(f"{sourcetablename") .write.format("delta") .mode("overwrite") .saveAsTable("targettablename") )
Invaluable explanation sir! I went through many blogs but couldn't get it. You made it crystal clear😊
Glad to hear that! Thanks for your comment
Great Explanation and You covered all topics which will help in interview and real time projects. Thanks for your effort...
You are most welcome! Thanks for your comment
Some of videos are not showing,is it paid one ,can you please let us know
Hi, all videos are visible. I didn't make anything paid version
Hi sir, where can i get notebook for the classes. could you please share
can we attend daily classes !
A higher order function is a function that takes one or more functions as arguments, or returns a function as its result.
Crystals clear Explanation...very much helpful
Glad it was helpful!
Very nice explanations. thanks
Glad it was helpful! Thanks