Tutorial-2 Pyspark DataFrame FileFormats

Here we are going to discuss about reading and writing different file formats and sources like parquet,json,carbon, mysql(RDBMS),S3 etc. Q-1 How to read the parquet file from hdfs and after some transformations, write again into hdfs only as a parquet file? Ans: #Read and write Parquet file from hdfs df=spark.read.parquet(“parquet Read more…

Spark Analytics on COVID-19

After jumbling around with some Spark DataFrame functions, operations, and creation, let’s catch upon doing Analysis on a particular dataset. These days, we are all fighting against Corona #COVID-19. So I opt for the COVID19 Dataset where we have columns depicting the number of cases, deaths, and other fields. Question Read more…

