Raja's Data Engineering
Raja's Data Engineering
  • 131
  • 2 017 022
131. Databricks | Pyspark| Built-in Function: ZIP_WITH
131. Databricks | Pyspark| Built-in Function: ZIP_WITH
============================================
🚀 New UA-cam Video Alert! 🚀
I just released a new video on UA-cam where I dive into the powerful zip_with function in PySpark! 📊🔧
I am excited to announce the release of my latest UA-cam video where I delve into the powerful Change Data Feed (CDF) feature in Databricks. 📊✨
In this video, you'll learn:
The basics of the zip_with function.
Practical examples of using zip_with for element-wise operations.
How to apply custom binary functions to array elements.
Tips and tricks for handling array data operations in PySpark.
👉 ua-cam.com/video/8LVmUpFLMzA/v-deo.html
Don't forget to like, share, and subscribe for more data engineering content! Your feedback and comments are always welcome. Let's dive into the world of real-time data together! 💡💻
#Databricks #PySpark #BigData #DataEngineering #DataScience #MachineLearning #ApacheSpark #zip_with #ArrayOperations #DataTransformation #ETL #DataProcessing #CloudData #DataAnalytics #TechTutorial #UA-camLearning #DataCommunity #AI #ML #DataOps #RealTimeData #DataAnalytics #UA-camLearning #DataEngineeringProjectUsingPyspark, #PysparkAdvancedTutorial,#BestPysparkTutorial, #BestDatabricksTutorial, #BestSparkTutorial, #DatabricksETLPipeline, #AzureDatabricksPipeline, #AWSDatabricks, #GCPDatabricks
Переглядів: 529

Відео

130. Databricks | Pyspark| Delta Lake: Change Data Feed
Переглядів 1,1 тис.14 днів тому
130. Databricks | Pyspark| Delta Lake: Change Data Feed 🚀 New UA-cam Video Alert: Exploring Change Data Feed in Databricks! 🚀 I am excited to announce the release of my latest UA-cam video where I delve into the powerful Change Data Feed (CDF) feature in Databricks. 📊✨ In this video, you'll learn: 🔹 What Change Data Feed is and how it works 🔹 How to enable and use CDF in your Databricks environ...
129. Databricks | Pyspark| Delta Lake: Deletion Vectors
Переглядів 87921 день тому
129. Databricks | Pyspark| Delta Lake: Deletion Vectors Delta Lake Internal Architecture: ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=EEgkoZZKJ7F4QsaH Optimize Command : ua-cam.com/video/F9tc8EgIn3c/v-deo.htmlsi=9KknJFJeHJunYJ_h Vacuum Command : ua-cam.com/video/G_RzisFeA5U/v-deo.htmlsi=FDNusdn2U4vjIlup 🚀 Excited to announce my latest UA-cam video on the new Databricks Deletion Vectors feature! 🎥...
128. Databricks | Pyspark| Built-In Function: TRANSFORM
Переглядів 950Місяць тому
128. Databricks | Pyspark| Built-In Function: TRANSFORM The transform function in PySpark is a versatile and powerful feature that plays a crucial role in data engineering and data science use cases. In this tutorial video, learn how to develop concise and more readable solution in Databricks development. ua-cam.com/video/eNUYxJBMrh8/v-deo.html #Databricks #TRANSFORM, #PysparkBuilt-InFunction #...
127. Databricks | Pyspark| SQL Coding Interview:LeetCode-1045: Customers Who Bought All Products
Переглядів 1,7 тис.2 місяці тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most Big Data interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the customers who bought all the products available. This is also one of the common coding exercises asked in ...
126. Databricks | Pyspark | Downloading Files from Databricks DBFS Location
Переглядів 2,6 тис.4 місяці тому
Quick Guide: Downloading Files from Databricks DBFS Location In this short tutorial video, learn how to effortlessly download files from a Databricks DBFS (Databricks File System) location. Whether you're a data engineer, data scientist, or analyst working with Databricks, accessing and retrieving files from DBFS is a fundamental skill. * Accessing DBFS: Learn how to navigate to the DBFS locati...
125. Databricks | Pyspark| Delta Live Table: Data Quality Check - Expect
Переглядів 3,8 тис.5 місяців тому
Azure Databricks Learning: Databricks and Pyspark: Delta Live Table: Data Quality Check - Expect 🚀 Excited to share my latest UA-cam video discussing the powerful data quality checks feature of "expect" in Delta Live Tables on Databricks! In today's data-driven world, ensuring data accuracy and reliability is paramount. With "expect", we can effortlessly define and enforce data quality constrai...
124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views
Переглядів 7 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp 4. DLT Declarative vs Procedural - ua-cam.com/video/-ia78A2QMN0/v-deo.htmlsi=MgkO7zfwYRjK6843 5. DLT Datasets - ua-c...
123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural
Переглядів 6 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp 4. DLT Declarative vs Procedural - ua-cam.com/video/-ia78A2QMN0/v-deo.htmlsi=MgkO7zfwYRjK6843 Azure Databricks Learn...
122. Databricks | Pyspark| Delta Live Table: Introduction
Переглядів 15 тис.8 місяців тому
Delta Live Table Tutorial: 1. Delta Lake Internal Architecture - ua-cam.com/video/YmqkMZ4MxJg/v-deo.htmlsi=GbX3Fi1SH4sb_elw 2. Auto Loader - ua-cam.com/video/GjV2m8b9fNY/v-deo.htmlsi=gY9K3MISDYkRlImA 3. DLT Introduction - ua-cam.com/video/ryOe64wwLuw/v-deo.htmlsi=JS-izYpggbm1H1Wp Azure Databricks Learning: Databricks and Pyspark: Delta Live Table: Introduction Delta Live Tables in Databricks is...
121. Databricks | Pyspark| AutoLoader: Incremental Data Load
Переглядів 15 тис.8 місяців тому
Azure Databricks Learning: Databricks and Pyspark: AutoLoader: Incremental Data Load AutoLoader in Databricks is a crucial feature that streamlines the process of ingesting and processing large volumes of data efficiently. This automated data loading mechanism is instrumental for real-time or near-real-time data pipelines, allowing organizations to keep their data lakes up-to-date with minimal ...
120. Databricks | Pyspark| SQL Coding Interview: Employees Earning More Than Department Avg Salary
Переглядів 3,3 тис.9 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the employees who are earning more than their department average salary. This is also one of common codi...
119. Databricks | Pyspark| Spark SQL: Except Columns in Select Clause
Переглядів 2,8 тис.9 місяців тому
Azure Databricks Learning: Pyspark and Spark SQL: Except Columns in Select Clause Except function provided by Databricks in Spark SQL is powerful feature while performing data analytics of dataset with 1000s of columns. It is life saver feature for developers for data engineering and data Analytics projects To get more understanding, watch this video ua-cam.com/video/Aj0kTlD9IgI/v-deo.html #Exc...
118. Databricks | PySpark| SQL Coding Interview: Employees Earning More than Managers
Переглядів 3,1 тис.10 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the employees who are earning more than their managers. This is Leet Code SQL Exercise number 1783. This...
117. Databricks | Pyspark| SQL Coding Interview: Total Grand Slam Titles Winner
Переглядів 3,4 тис.10 місяців тому
Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews. In this video, I have explained a coding scenario to find the total number grand slam titles won by each player. This is Leet Code SQL Exercise number 1783. This ...
116. Databricks | Pyspark| Query Dataframe Using Spark SQL
Переглядів 4,1 тис.10 місяців тому
116. Databricks | Pyspark| Query Dataframe Using Spark SQL
115. Databricks | Pyspark| SQL Coding Interview: Number of Calls and Total Duration
Переглядів 4 тис.10 місяців тому
115. Databricks | Pyspark| SQL Coding Interview: Number of Calls and Total Duration
114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table
Переглядів 4,5 тис.10 місяців тому
114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table
113. Databricks | PySpark| Spark Reader: Skip Specific Range of Records While Reading CSV File
Переглядів 4,3 тис.Рік тому
113. Databricks | PySpark| Spark Reader: Skip Specific Range of Records While Reading CSV File
112. Databricks | Pyspark| Spark Reader: Skip First N Records While Reading CSV File
Переглядів 4,1 тис.Рік тому
112. Databricks | Pyspark| Spark Reader: Skip First N Records While Reading CSV File
111. Databricks | Pyspark| SQL Coding Interview: Exchange Seats of Students
Переглядів 6 тис.Рік тому
111. Databricks | Pyspark| SQL Coding Interview: Exchange Seats of Students
110. Databricks | Pyspark| Spark Reader: Reading Fixed Length Text File
Переглядів 2,4 тис.Рік тому
110. Databricks | Pyspark| Spark Reader: Reading Fixed Length Text File
109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL
Переглядів 14 тис.Рік тому
109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL
108. Databricks | Pyspark| Window Function: First and Last
Переглядів 4,2 тис.Рік тому
108. Databricks | Pyspark| Window Function: First and Last
107. Databricks | Pyspark| Transformation: Subtract vs ExceptAll
Переглядів 3,4 тис.Рік тому
107. Databricks | Pyspark| Transformation: Subtract vs ExceptAll
106.Databricks|Pyspark|Automation|Real Time Project:DataType Issue When Writing to Azure Synapse/SQL
Переглядів 3,5 тис.Рік тому
106.Databricks|Pyspark|Automation|Real Time Project:DataType Issue When Writing to Azure Synapse/SQL
105. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - V
Переглядів 3,8 тис.Рік тому
105. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - V
104. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - IV
Переглядів 3,3 тис.Рік тому
104. Databricks | Pyspark |Pyspark Development: Spark/Databricks Interview Question Series - IV
103. Databricks | Pyspark |Delta Lake: Spark/Databricks Interview Question Series - III
Переглядів 7 тис.Рік тому
103. Databricks | Pyspark |Delta Lake: Spark/Databricks Interview Question Series - III
102. Databricks | Pyspark |Performance Optimization: Spark/Databricks Interview Question Series - II
Переглядів 7 тис.Рік тому
102. Databricks | Pyspark |Performance Optimization: Spark/Databricks Interview Question Series - II

КОМЕНТАРІ

  • @landchennai8549
    @landchennai8549 4 години тому

    declare @Player table (PlayerId int, PlayerName varchar(10)) insert into @Player values (1,'Nadal'),(2,'Federer'),(3,'Novak') declare @Matches table (Year int, Wimbledon tinyint,Fr_Open tinyint, US_Open tinyint, Au_Open tinyint) insert into @Matches values (2017, 2,1,1,2),(2018,3,1,3,2),(2019,3,1,1,3) select P.PlayerId, P.PlayerName , count ( distinct W.Year ) as Wimbledon , count ( distinct F.Year ) Fr_Open , count ( distinct U.Year ) as Us_Open , count ( distinct A.Year ) as Au_Open , count ( distinct W.Year ) + count ( distinct F.Year )+ count ( distinct U.Year ) + count ( distinct A.Year ) as Total_Grand_Slams from @Player P left join @Matches W ON P.PlayerId = W.Wimbledon left join @Matches F ON P.PlayerId = F.Fr_Open left join @Matches U ON P.PlayerId = U.US_Open left join @Matches A ON P.PlayerId = A.Au_Open group by P.PlayerId, P.PlayerName order by P.PlayerId

  • @annaduraip3182
    @annaduraip3182 7 годин тому

    Great, thank you. You have explained in simpler way to understand anyone.

  • @prabakaran-g5x
    @prabakaran-g5x 13 годин тому

    A Passionate teacher,,,Hats off...Keep updating ...this is like contribution to Indians growth...Heart felt thanks

  • @RamaiahChenna
    @RamaiahChenna День тому

    Hi Sir, we want vidoe for performance issues and solutions while develope the notebook what are the issue comes

  • @ramswaroop1560
    @ramswaroop1560 День тому

    Is it required to learn Delta Live table for 3 year exp person to switch to data engineer role ?

    • @rajasdataengineering7585
      @rajasdataengineering7585 День тому

      Yes it is good to learn DLT

    • @ramswaroop1560
      @ramswaroop1560 День тому

      @@rajasdataengineering7585 i'm having 3yr exp in other role.i want to switch to Azure Data engineer role . Can you please create a project of which we can set in a resume for applying jobs.

  • @welcometojungle1234
    @welcometojungle1234 2 дні тому

    Use window function row_number, then use filter to filter out row nums from 11 to 20: w=Window.orderby(lit("A")) df.withColumn("ROWNUM", row_number().over(w)).filter( (col("ROWNUM")<11) | col("ROWNUM")>20) ).drop("ROWNUM").show()

  • @vinoda3480
    @vinoda3480 2 дні тому

    can i get files which your are worked for demo

  • @jinsonfernandez
    @jinsonfernandez 2 дні тому

    Thanks for this video, But I am curious why didnt you directly use min max with group by which would have fetched the same result ``` result = df.withColumn("event_date", F.to_date("event_date")) \ .groupBy("event_status") \ .agg( F.min("event_date").alias("event_start_date"), F.max("event_date").alias("event_end_date") ) \ .orderBy("event_start_date") result.show() ```

  • @swathi6472
    @swathi6472 2 дні тому

    Please make Video on Salting in Performance optimization

  • @shakthimaan007
    @shakthimaan007 3 дні тому

    Awesome work bro. Have you put these notebooks somewhere in your github? Can you share that with us if possible?

  • @shakthimaan007
    @shakthimaan007 3 дні тому

    Finally found one person who can explain Broadcast variable in a clear and understandable way. Huge respect bro. Subscribed and off I go to other videos in the playlist :)

  • @Elkhamasi
    @Elkhamasi 3 дні тому

    You are a great teacher.

  • @epicure07
    @epicure07 3 дні тому

    Very good video, thank you

  • @VISHVABATTULA-p7l
    @VISHVABATTULA-p7l 3 дні тому

    thank you sir

  • @shamsmalek
    @shamsmalek 3 дні тому

    Excellent job. Can you please provide me the data set and code? Or please give me the Git link to download the dataset and code for your tutorials. Thanks.

  • @divyamariyameldo6495
    @divyamariyameldo6495 5 днів тому

    Thanks for the content!

  • @bollywoodbadshah.796
    @bollywoodbadshah.796 6 днів тому

    Please make video on liquid clustering..

  • @ezaghal
    @ezaghal 6 днів тому

    The file was shared at the beginning is incorrect , it has an extra comma delimater which cause all columns to be added to _corrupt_record

  • @ravisunkara6664
    @ravisunkara6664 6 днів тому

    Awesome explanation on Z-ordering. Greatly appreciated your efforts making this video.

  • @ashishbarwad9471
    @ashishbarwad9471 6 днів тому

    BEST MEANS BEST VIDEO EVER IF YOU ARE INTERESED TO LEARN THEN .

  • @Thescienceworld652
    @Thescienceworld652 6 днів тому

    at least you should have make one video on how to open databricks notebook

  • @pulastyadas3351
    @pulastyadas3351 7 днів тому

    so helpful ..thanks for helping the community

  • @lalitsalunkhe9422
    @lalitsalunkhe9422 8 днів тому

    Where can I find the datasets used in this demo? is there any github repo you can share?

  • @supriyakoura7755
    @supriyakoura7755 9 днів тому

    how to explode two column at the same time getting error in below code brand=[('Raja',["TV","Oven","Ref","AC"],["Scooter","Car"]), ('supriya',["AC","Washimach",None],["Car",None]) ] df_app=spark.createDataFrame(data=brand,schema=['name','appliances','vehicles']) df_explode=df_app.select(df_app.name,explode(df_app.appliances),explode(df_app.vehicles))

  • @ravulapallivenkatagurnadha9605
    @ravulapallivenkatagurnadha9605 9 днів тому

    add more videos on this delta live tables

  • @krishnamurthy1243
    @krishnamurthy1243 9 днів тому

    Hi Raja ,please do azure synapse analytics,eagerly waiting

  • @sravankumar1767
    @sravankumar1767 9 днів тому

    Superb explanation Raja 👌 👏 👍

  • @dineshtadepalli4584
    @dineshtadepalli4584 10 днів тому

    Are any prerequisites required to this pyspark series?

  • @subhashyadav9262
    @subhashyadav9262 10 днів тому

    Very Nice

  • @sanrhn
    @sanrhn 10 днів тому

    thanks for your vides as DLT is but confusing topic for me, can you create a video on DLT constraint in SQL and Python in this DLT series

  • @Basket-hb5jc
    @Basket-hb5jc 10 днів тому

    Keep going

  • @tadojuvishwa2509
    @tadojuvishwa2509 11 днів тому

    sir plz explain about cluster mode vs client mode

  • @ranyasri1092
    @ranyasri1092 11 днів тому

    Thanks alot for in depth explanation😊

  • @grim_rreaperr
    @grim_rreaperr 11 днів тому

    union_df = call_df.select("from_id", "to_id", "duration").unionAll(call_df.select("to_id", "from_id", "duration")) ######################################################################################################################### final_df= (union_df.filter(F.col("from_id") < F.col("to_id")) .groupBy(F.col("from_id").alias("person_1"), F.col("to_id").alias("person_2")) .agg(F.sum(F.col("duration")).alias("total_duration"), F.count(F.lit(1)).alias("total_calls"))) ######################################################################################################################### final_df.display()

  • @grim_rreaperr
    @grim_rreaperr 11 днів тому

    input_data = [("01-06-2020", 'Won'), ("02-06-2020", 'Won'), ("03-06-2020", 'Won'), ("04-06-2020", 'Lost'), ("05-06-2020", 'Lost'), ("06-06-2020", 'Lost'), ("07-06-2020", 'Won')] input_schema = "event_date STRING, event_status STRING" df = spark.createDataFrame(data= input_data, schema= input_schema).withColumn("event_date", F.to_date(F.col("event_date"), "dd-MM-yyyy")) ##################################### windowSpec= Window.partitionBy(F.col("event_status")).orderBy(F.col("event_date")) rnk_df= df.withColumn("rnk", F.row_number().over(windowSpec)) ##################################### final_df= rnk_df.withColumn('grp_id',F.dayofmonth(F.col("event_date")) - F.col("rnk")) ##################################### result_df = (final_df.groupBy(F.col("grp_id"), F.col("event_status")).agg(F.max(F.col("event_status")).alias("eventstatus"),F.min(F.col("event_date")).alias("start_date"), F.max(F.col("event_date")).alias("end_date")).drop(F.col("grp_id"), F.col("event_status")) ) result_df.display()

  • @PRUTHVIRAJ-wp9vu
    @PRUTHVIRAJ-wp9vu 11 днів тому

    Sir, Your explanations are very clear & concise. Thank you

  • @Kohli079
    @Kohli079 12 днів тому

    Hi sir I don't understand use of spot instance?

  • @mateen161
    @mateen161 12 днів тому

    As always, well explained and very useful. Thank you!

  • @coolraviraj24
    @coolraviraj24 13 днів тому

    You explained it so simply... i hope will be able to explain to the interviewer the same way u did😅

  • @807diatm
    @807diatm 13 днів тому

    awsome

  • @lakshminarayanag2888
    @lakshminarayanag2888 14 днів тому

    Hi Raja, is there anyway to get data sets / code which you are explaining for practice

  • @venkatasai4293
    @venkatasai4293 14 днів тому

    Good video Raja. In real time we don’t know exact versions how can we deal with them dynamically ?

    • @sumitchandwani9970
      @sumitchandwani9970 12 днів тому

      describe history command can give you version history or if you'll not specify starting version it'll consider latest version by default

    • @sumitchandwani9970
      @sumitchandwani9970 12 днів тому

      example query streaming_query= (spark.readStream .option("readChangeData", True) .table(f"{tablename") .writeStream .outputMode("append") .foreachBatch(udf) .option("mergeSchema", "true") .option("checkpointLocation", "location") .start() ) batch_query = (spark.read .option("readChangeData", True) .table(f"{sourcetablename") .write.format("delta") .mode("overwrite") .saveAsTable("targettablename") )

  • @navdeepjha2739
    @navdeepjha2739 15 днів тому

    Invaluable explanation sir! I went through many blogs but couldn't get it. You made it crystal clear😊

  • @bababallon1785
    @bababallon1785 15 днів тому

    Great Explanation and You covered all topics which will help in interview and real time projects. Thanks for your effort...

  • @krishnamurthy1243
    @krishnamurthy1243 17 днів тому

    Some of videos are not showing,is it paid one ,can you please let us know

  • @chappasiva
    @chappasiva 17 днів тому

    Hi sir, where can i get notebook for the classes. could you please share

  • @MrMallesh1
    @MrMallesh1 18 днів тому

    can we attend daily classes !

  • @SidharthanPV
    @SidharthanPV 18 днів тому

    A higher order function is a function that takes one or more functions as arguments, or returns a function as its result.

  • @rajasekharkondamidi4554
    @rajasekharkondamidi4554 18 днів тому

    Crystals clear Explanation...very much helpful

  • @shalendrakumar5546
    @shalendrakumar5546 18 днів тому

    Very nice explanations. thanks