Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as structured data Read more…

Apache Spark Architecture – How Spark works?

Apache Spark is a distributed computing platform which handles and efficiently process “Big Data”. Spark supports “In-memory” processing ( 10x faster than Hadoop). Key Features: Fast Processing Supports both Batch and Real-time processing Flexible Better Analytics Compatible with Hadoop How does the Spark execute our programs on a cluster? Driver:The Read more…

Tutorial-1 Spark RDD, Map,Flatmap

We will discuss and practice below each transformations and actions in spark RDD. Transformations: Map,FlatMap, MapPartition, Filter, Sample, Union, Intersection, Distinct, ReduceByKey, GroupByKey, AggregateByKey, Join, Repartition, Coalesce etc . Actions: Reduce, Collect, Count, First, Take, Foreach, saveAsTextFile etc. Q-1 What all different ways to create the RDD? Ans: Parallelize the Read more…

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview