In this article , we are going to discuss different joins like inner,left,right,cartesian of RDD.

Inner Join:It returns the matching records or matching keys from both RDD.
Let’s say one RDD (K,V1) and other RDD contains (K,V2) then inner join between two RDD return (K,(V1,V2)).

Q-1 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching students name and values marks,grade?

#First dataSet student_marks.csv contains
#Joseph,Physics,80,100
#Joseph,Chemistry,82,100
#Bob,Physics,45,100
#Bob,Chemistry,35,100

#Joseph,A
#Bob,B
#Jimmy,A
#Harry,C
#Stephanie,A
#Ron,A
#Tina,A
#Williamson,A
#Rocky,A

students_marks=spark.sparkContext.textFile(“/FileStore/tables/student_marks.csv”)
students_marks_tran=students_marks.map(lambda x:(x.split(“,”)[0],(x.split(“,”)[1],x.split(“,”)[2],x.split(“,”)[3])))
student_name_join.collect()

#Output
#[(Tina,((Computer Science,100,100),A))
#(Tina,((Physics,100,100),A))
#(Jimmy,((Hindi,92,100),A))
#(Harry,((Physics,40,100),C))
#….
#….

Left Outer Join: It returns the matching records or matching keys from left RDD.
Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then left join between two RDD return (K,(V1,V2)),(K1,V2).

Q-2 We have one dataset contains students name and marks and the other dataset contains students name and grade.Find the matching and left rdd records?

student_name_leftjoin.collect()

Right Outer Join:It returns the matching records or matching keys from right RDD.
Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then right join between two RDD return (K,(V1,V2)),(K2,V2).

Q-3 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching and right RDD?

student_name_rightjoin.collect()

CARTESIAN JOIN:The CARTESIAN JOIN is also known as CROSS JOIN.In a CARTESIAN JOIN there is a join for each row of one table to every row of another table.

Q-4 We have one dataset contains students name and marks and other dataset contains students name and grade. Join each row of one RDD to every row of other rdd ?

Insert math as
Formula color
Type math using LaTeX
Preview
$${}$$
Nothing to preview
Insert