Tutorial-2 Pyspark DataFrame FileFormats

Here we are going to discuss about reading and writing different file formats and sources like parquet,json,carbon, mysql(RDBMS),S3 etc. Q-1 How to read the parquet file from hdfs and after some transformations, write again into hdfs only as a parquet file? Ans: #Read and write Parquet file from hdfs df=spark.read.parquet(“parquet Read more…

Spark Analytics on COVID-19

After jumbling around with some Spark DataFrame functions, operations, and creation, let’s catch upon doing Analysis on a particular dataset. These days, we are all fighting against Corona #COVID-19. So I opt for the COVID19 Dataset where we have columns depicting the number of cases, deaths, and other fields. Question Read more…


Spark filter() function is used to filter rows from the dataframe based on given condition or expression. If you are familiar with SQL, then it would be much simpler for you to filter out rows according to your requirements. For example, a list of students who got marks more than Read more…

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview