In this article , we are going to discuss different joins like inner,left,right,cartesian of RDD.

Inner Join:It returns the matching records or matching keys from both RDD.
Let’s say one RDD (K,V1) and other RDD contains (K,V2) then inner join between two RDD return (K,(V1,V2)).

Q-1 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching students name and values marks,grade?

#First dataSet student_marks.csv contains
#Joseph,Physics,80,100
#Joseph,Chemistry,82,100
#Bob,Physics,45,100
#Bob,Chemistry,35,100

#Second dataset student_grade.csv contains
#Joseph,A
#Bob,B
#Jimmy,A
#Harry,C
#Stephanie,A
#Ron,A
#Tina,A
#Williamson,A
#Rocky,A

students_marks=spark.sparkContext.textFile(“/FileStore/tables/student_marks.csv”)
students_grade=spark.sparkContext.textFile(“/FileStore/tables/student_grade.csv”)
students_marks_tran=students_marks.map(lambda x:(x.split(“,”)[0],(x.split(“,”)[1],x.split(“,”)[2],x.split(“,”)[3])))
students_grade_tran=students_grade.map(lambda x:(x.split(“,”)[0],x.split(“,”)[1]))
student_name_join=students_marks_tran.join(students_grade_tran)
student_name_join.collect()


#Output
#[(Tina,((Computer Science,100,100),A))
#(Tina,((Physics,100,100),A))
#(Jimmy,((Hindi,92,100),A))
#(Harry,((Physics,40,100),C))
#….
#….

Left Outer Join: It returns the matching records or matching keys from left RDD.
Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then left join between two RDD return (K,(V1,V2)),(K1,V2).

Q-2 We have one dataset contains students name and marks and the other dataset contains students name and grade.Find the matching and left rdd records?

student_name_leftjoin=students_marks_tran.leftOuterJoin(students_grade_tran)
student_name_leftjoin.collect()

Right Outer Join:It returns the matching records or matching keys from right RDD.
Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then right join between two RDD return (K,(V1,V2)),(K2,V2).

Q-3 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching and right RDD?

student_name_rightjoin=students_marks_tran.rightOuterJoin(students_grade_tran)
student_name_rightjoin.collect()

CARTESIAN JOIN:The CARTESIAN JOIN is also known as CROSS JOIN.In a CARTESIAN JOIN there is a join for each row of one table to every row of another table.

Q-4 We have one dataset contains students name and marks and other dataset contains students name and grade. Join each row of one RDD to every row of other rdd ?

student_name_cartesianjoin=students_marks_tran.cartesian(students_grade_tran) student_name_cartesianjoin.collect()


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert