Spark Analytics on MovieLens Dataset

Hey people!! Do you know how Netflix recommends us movies? How it classifies things? It predicts Movie Ratings according to user’s ratings and on other basic grounds. But, don’t you think we need to first analyze the data and get some insights from it. Thus, we’ll perform Spark Analysis on Read more…

Tutorial-6 PySpark Coalesce and Repartition

In this article, we are going to discuss coalesce and repartition transformations. Coalesce: Useful only to reduce the number of partitions. It avoids full data shuffle. It may have unequal partitions length. Example: Let’s say we have four machine or nodes which contains equal number of partitions in each node Read more…

Insert math as
$${}$$