In this article , we are going to discuss different joins like inner,left,right,cartesian of RDD.

Inner Join:It returns the matching records or matching keys from both RDD.
Let’s say one RDD (K,V1) and other RDD contains (K,V2) then inner join between two RDD return (K,(V1,V2)).

Q-1 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching students name and marks,grade between two datasets?

//First dataset student_marks.csv contains
//Joseph,Physics,80,100
//Joseph,Chemistry,82,100
//Bob,Physics,45,100
//Bob,Chemistry,35,100

//Second dataset student_grade.csv contains
//Joseph,A
//Bob,B
//Jimmy,A
//Harry,C
//Stephanie,A
//Ron,A
//Tina,A
//Williamson,A
//Rocky,A

val students_marks=spark.sparkContext.textFile(“/FileStore/tables/student_marks.csv”)
val students_grade=spark.sparkContext.textFile(“/FileStore/tables/student_grade.csv”)
val students_marks_tran=students_marks.map(x=>(x.split(“,”)(0),(x.split(“,”)(1),x.split(“,”)(2),x.split(“,”)(3))))
val students_grade_tran=students_grade.map(x=>(x.split(“,”)(0),(x.split(“,”)(1))))
val student_name_join=students_marks_tran.join(students_grade_tran)
student_name_join.collect().foreach(println)
//Output
//(Tina,((Computer Science,100,100),A))
//(Tina,((Physics,100,100),A))
//(Jimmy,((Hindi,92,100),A))
//(Harry,((Physics,40,100),C))
//….
//….

Left Outer Join: It returns the matching records or matching keys from left RDD. Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then left join between two RDD return (K,(V1,V2)),(K1,V2).

Q-2 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching and left RDD records?

val student_name_leftjoin=students_marks_tran.leftOuterJoin(students_grade_tran)
student_name_leftjoin.collect().foreach(println)

Right Outer Join:It returns the matching records or matching keys from right RDD.Let’s say one RDD (K,V1),(K1,V2) and other RDD contains (K,V2),(K2,V2) then right join between two RDD return (K,(V1,V2)),(K2,V2).

Q-3 We have one dataset contains students name and marks and other dataset contains students name and grade. Find the matching and right RDD records?

val student_name_rightjoin=students_marks_tran.rightOuterJoin(students_grade_tran)
student_name_rightjoin.collect().foreach(println)

CARTESIAN JOIN:The CARTESIAN JOIN is also known as CROSS JOIN.In a CARTESIAN JOIN there is a join for each row of one table to every row of another table.

Q-4 We have one dataset contains students name and marks and other dataset contains students name and grade. Join each row of one RDD to every row of other rdd ?

val student_name_rightjoin=students_marks_tran.cartesian(students_grade_tran)
student_name_rightjoin.collect().foreach(println)


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert