Have you ever thought of using SQL statements in Spark Dataframe? Is it possible to provide conditions in Spark to get the desired outputs in the dataframe?
We have studied the case and switch statements in any programming language we practiced. And yes, here too Spark leverages to provides us with “when otherwise” and “case when” statements to reframe the dataframe with existing columns according to your own conditions.
Let’s first do the imports that are needed and create a dataframe. I have chosen a Student-Based Dataframe.
// Sequence val seqdata=Seq(Row("Candis",27,20,89.2,"F"),Row("Robert",60,21,85.1,"Other"),Row("Nanki",46,21,80.1,"F"),Row("Deck",30,19,90.7,"M")) //imports import spark.implicits._ import org.apache.spark.sql.types.{StructField,StructType,StringType,IntegerType,FloatType,DoubleType} //Defining schmea val Schema=List(StructField("Name",StringType,false),StructField("RollNo",IntegerType,true),StructField("Age",IntegerType,false),StructField("Percentage",DoubleType,false),StructField("Gender",StringType,true)) // Creating Dataframe val df=spark.createDataFrame(spark.sparkContext.parallelize(seqdata),StructType(Schema)) // View Dataframe df.show() // View Schema df.printSchema()

1. when otherwise
Since col and when are spark functions, we need to import them first. We’ll use withcolumn() function. For the first argument, we can use the name of the existing column or new column. In the second argument, we write the when otherwise condition.
If the condition satisfies, it replaces with when value else replaces it with otherwise value.
Question: Create a new column or make changes into the existing one and convert Gender column to its appropriate form?
Hint: F denotes Female
a) Existing Column
// Importing import org.apache.spark.sql.functions.{col,when} val df_when = df.withColumn("Gender", when(col("Gender") === "M","Male") .when(col("Gender") === "F","Female") .otherwise("Other")) // View Dataframe df_when.show()

b) New Column
val df_when2 = df.select(col("*"), when(col("Gender") === "M","Male") .when(col("Gender") === "F","Female") .otherwise("Unknown").alias("New_gender")) // View Dataframe df_when2.show()

2. case when
Similar to when otherwise, we can use case when with expression expr().
Question: Create a column “Performance” and find it out on the basis of percentage?
// case when import org.apache.spark.sql.functions.{expr} val df_case = df.withColumn("Performance", expr("case when Percentage>88.0 then 'Excellent' " + "when Percentage<83.0 then 'Average' " + "else 'Great' end")) // View Dataframe df_case.show()

3. Using && and || operator
To apply multiple conditions in a dataframe using case when and when otherwise, we can use AND(&&) and OR(||) operator.
Question:Find Results of students according to multiple conditions of your choice?
val data=df.withColumn("Results", when(col("Age")>19 && col("Percentage")>87.0 , "Pass") .when(col("Age")===19 || col("Percentage")===87.0 , "Fail") .otherwise("Unknown") ) data.show()

In this article, we have learned how to use spark functions like when and expr() as well as combining them with (&&) and (||). Leave your words if you liked it or have any issues.
Happy Sparking!!
-Gargi Gupta
0 Comments