Here we are going to discuss mapPartitions ,mapPartitionsWithIndex and filter operation.

(a) MapPartitions: This transformation is similar to map,but runs separately on each partition (block) of the RDD. MapPartitions can be used as an alternative to map and foreach.

Q-1 Remove the specific number from input list or input RDD using mapPartitions transformation?

rdd_list=spark.sparkContext.parallelize([[1, 2, 3], [3, 5, 6], [7, 8, 9]])

def mapPartitionRemoveNumber(list_aa):
iterable_list=[]
for ele_list in list_aa:
iterable_list.append([subele for subele in ele_list if subele!=3])
return iter(iterable_list)

mapPartition_rdd=rdd_list.mapPartitions(mapPartitionRemoveNumber)
mapPartition_rdd.collect()
#Output
[[1, 2], [5, 6], [7, 8, 9]]

(b) MapParitionsWithIndex :  This transformation is similar to map Partitions, but also provides func with an integer value representing the index of the partition.

Q-2: Remove the specific number from input list or input RDD and enable the indexing of partitions to the numbers using mapPartitionsWithIndex transformation?

rdd_list=spark.sparkContext.parallelize([[1, 2, 3], [3, 5, 6], [7, 8, 9]],3)

def mapPartitionRemoveNumber(index,list_aa):
iterable_list=[]
for ele_list in list_aa:
iterable_list.append([str(subele)+”—>”+str(index) for subele in ele_list if subele!=3])
return iter(iterable_list)

mapPartition_rdd=rdd_list.mapPartitionsWithIndex(mapPartitionRemoveNumber)
mapPartition_rdd.collect()

#Output #[[‘1—>0’, ‘2—>0’], [‘5—>1’, ‘6—>1’], [‘7—>2’, ‘8—>2’, ‘9—>2’]]

(c) Filter :  This transformation returns a new dataset formed by selecting those elements of the source on which function returns true.

Q-3:How many times the keyword “Sun” come in the input RDD ?

//Input data.txt contains
//The Sun rises in the East and sets in the West.
//The Sun sets in the West and rises in the East

data_rdd=spark.sparkContext.textFile(“/FileStore/tables/data.txt”)
flat_rdd=data_rdd.flatMap(lambda x:(x.split(” “)))
flat_rdd.filter(lambda x:(x==”Sun”)).count()
#or
flat_rdd.filter(lambda x:(x in “Sun”)).count()

//Output 2


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert