In this article, we are going to discuss coalesce and repartition transformations.

Coalesce:

  • Useful only to reduce the number of partitions.
  • It avoids full data shuffle.
  • It may have unequal partitions length.

Example: Let’s say we have four machine or nodes which contains equal number of partitions in each node like below:

Node 1 = 1,2,3
Node 2 = 4,5,6
Node 3 = 7,8,9
Node 4 = 10,11,12

And suppose we require only two partitions then coalesce work here:
Node 1 = (1,2,3) and (4,5,6)
Node 3 = (7,8,9) and (10,11,12)

Notice that Node 1 and Node 3 did not require its original data to move.

Q-1 How does Coalesce reduce the number of partitions?

val initialRdd = spark.sparkContext.parallelize(1 to 15 ,5)
initialRdd.saveAsTextFile(“/output_rdd/”) #hdfs path

#Now apply the coalesce method to reduce the number of partitions

val coalesceRDD =initialRdd.coalesce(3)
coalesceRDD.saveAsTextFile(“/coalesce-output/”) #hdfs path

As we can see,initially there were five partitions and then applied coalesce to make three partitions and also data is partially shuffled.

Q-2 How does Repartion reduce or increase the number of partitions?

val initialRdd = spark.sparkContext.parallelize(1 to 15 ,5)
initialRdd.saveAsTextFile(“/output_rdd/”) #hdfs path

#Now apply the coalesce method to reduce the number of partitions

val repartitionRDD =initialRdd.repartition(3)
repartitionRDD.saveAsTextFile(“/repartition-output/”) #hdfs path

Repartition:

  • The repartition method can be used to either increase or decrease the number of partitions in a RDD..
  • Full data shuffle.
  • It will have equal partitions length.

Example: Let’s say we have four machine or nodes which contains equal number of partitions in each node like below:

Node 1 = 1,2,3
Node 2 = 4,5,6
Node 3 = 7,8,9
Node 4 = 10,11,12


And repartition convert two partition here:
Node 1 = (7,8,9) and (4,5,6)
Node 3 = (1,2,3) and (10,11,12)

Notice that data is fully shuffled.

Q-3 What is the difference between Repartition and Coalesce:-


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert