Have you ever thought of using SQL statements in Spark Dataframe? Is it possible to provide conditions in Spark to get the desired outputs in the dataframe?

We have studied the case and switch statements in any programming language we practiced. And yes, here too Spark leverages to provides us with “when otherwise” and “case when” statements to reframe the dataframe with existing columns according to your own conditions.

Let’s first do the imports that are needed and create a dataframe. I have chosen a Student-Based Dataframe.

// Sequence
val seqdata=Seq(Row("Candis",27,20,89.2,"F"),Row("Robert",60,21,85.1,"Other"),Row("Nanki",46,21,80.1,"F"),Row("Deck",30,19,90.7,"M"))

import spark.implicits._
import org.apache.spark.sql.types.{StructField,StructType,StringType,IntegerType,FloatType,DoubleType}

//Defining schmea
val Schema=List(StructField("Name",StringType,false),StructField("RollNo",IntegerType,true),StructField("Age",IntegerType,false),StructField("Percentage",DoubleType,false),StructField("Gender",StringType,true))

// Creating Dataframe
val df=spark.createDataFrame(spark.sparkContext.parallelize(seqdata),StructType(Schema))

// View Dataframe

// View Schema

1. when otherwise

Since col and when are spark functions, we need to import them first. We’ll use withcolumn() function. For the first argument, we can use the name of the existing column or new column. In the second argument, we write the when otherwise condition.

If the condition satisfies, it replaces with when value else replaces it with otherwise value.

Question: Create a new column or make changes into the existing one and convert Gender column to its appropriate form?

Hint: F denotes Female

a) Existing Column
// Importing
import org.apache.spark.sql.functions.{col,when}
val df_when = df.withColumn("Gender", when(col("Gender") === "M","Male")
                        .when(col("Gender") === "F","Female")
// View Dataframe
b) New Column
val df_when2 = df.select(col("*"), when(col("Gender") === "M","Male")
      .when(col("Gender") === "F","Female")
// View Dataframe

2. case when

Similar to when otherwise, we can use case when with expression expr().

Question: Create a column “Performance” and find it out on the basis of percentage?

// case when
import org.apache.spark.sql.functions.{expr}
val df_case = df.withColumn("Performance", 
      expr("case when Percentage>88.0 then 'Excellent' " +
                       "when Percentage<83.0 then 'Average' " +
                       "else 'Great' end"))
// View Dataframe

3. Using && and || operator

To apply multiple conditions in a dataframe using case when and when otherwise, we can use AND(&&) and OR(||) operator.

Question:Find Results of students according to multiple conditions of your choice?

val data=df.withColumn("Results",
                      when(col("Age")>19 && col("Percentage")>87.0 , "Pass")
                     .when(col("Age")===19 || col("Percentage")===87.0 , "Fail")

In this article, we have learned how to use spark functions like when and expr() as well as combining them with (&&) and (||). Leave your words if you liked it or have any issues.

Happy Sparking!!

-Gargi Gupta


Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview