You might be knowing that data type conversion is an important step while doing the transformation of the dataframe. Let’s say we would like to add a number to the dataframe column and the column data type is String.

Can we perform addition now ? Answer is No.

The data type of the column should be Integer in case of any mathematical operations. So, we have to convert the data type of the column into Integer.

Now the question arises, how to convert the data type of the column?

One Way: Using StructType

The column data type is “String” by default while reading the external file as a dataframe. We will create the list of StructField and use StructType to change the datatype of dataframe columns.

val schema= List(StructField(“Name”, StringType, false),
StructField(“Roll No”, IntegerType, true),StructField(“Father’s Name”, StringType, false),StructField(“Age”, IntegerType, false),StructField(“Marks”, IntegerType, false))


Let’s create a dataframe to work with.

Question Prepare the Dataframe and convert the datatype of each column into suitable datatype using StructField and Struct type?

import spark.implicits._
import org.apache.spark.sql.types.{StructField,StructType,StringType,IntegerType,FloatType,DoubleType}

//Create Collection Sequence
val col = Seq("Name","Roll No.","Father's Name","Age","Marks")
val row =Seq(Row("Nikita",65,"Mr.Pradeep",19,890),Row("Ayush",22,"Mr.Gopal",20,780),Row("Parth",27,"Mr.Bharat",21,865),Row("Ankit",15,"Mr.Naresh",20,680))

val schema= List(StructField("Name", StringType, false),
  StructField("Roll No", IntegerType, true),StructField("Father's Name", StringType, false),StructField("Age", IntegerType, false),StructField("Marks", IntegerType, false))

//Creating dataframe
val df = spark.createDataFrame(

// View Dataframe

// View Schema

Another Way: Column DataType Conversion

By using Spark withcolumn on a dataframe, we can convert the data type of any column. The function takes a column name with a cast function to change the type. We need to import the “col” function to address the column. “$” can also be used to refer column of the dataframe.

Question:Convert the Datatype of “Age” Column from Integer to String.

import org.apache.spark.sql.functions.{col}
// change datatype of a column  
val df_datatype=df.withColumn("Age",col("Age").cast("String"))

//Another way to change datatype
val df_datatype=df.withColumn("Age",$"Age".cast("String"))

 // View Schema

Question: Convert the datatype of the “Marks” column from Integer to Float.

import org.apache.spark.sql.functions.{col}
// change datatype of a column  
val df_datatype=df.withColumn("Marks",$"Marks".cast("Float"))
// View Dataframe

I hope you all understood how to change the data type of any column. So, here’s a task for you all. Comment down, How will you convert the datatype of Roll No from Integer to Double?

We would love to hear back your answers or any query. You can also share your way of changing column datatype.

-Gargi Gupta

