Explain Combiner and Partitioner in Hadoop – Hadoop Tutorial

8/23/2025

Combiner Partitioner in Hadoop

Go Back

Explain Combiner and Partitioner in Hadoop – Hadoop Tutorial

Hadoop’s MapReduce framework is designed to process large-scale data efficiently. Two important components that play a crucial role in optimizing performance and controlling data flow are the Combiner and Partitioner. Understanding these concepts helps developers fine-tune their MapReduce programs for scalability and efficiency.

In this Hadoop tutorial, we will explain what a Combiner and Partitioner are, how they work, and why they are essential in the MapReduce execution flow.


Combiner  Partitioner in Hadoop

What is a Combiner in Hadoop?

A Combiner is also known as a mini-reducer. It is an optional optimization step in the MapReduce process that runs after the Mapper phase but before the Shuffle and Sort phase.

Key Points about Combiner:

  • It reduces the volume of data transferred between the Mapper and Reducer.

  • It works on the output of the Mapper and performs local aggregation.

  • It must have the same input and output types as the Reducer.

  • Example: In a word count program, the Combiner can locally sum word counts before sending them to the Reducer.

Example of Combiner Usage:

class WordCountCombiner extends Reducer[Text, IntWritable, Text, IntWritable] {
  override def reduce(key: Text, values: Iterable[IntWritable], context: Context): Unit = {
    val sum = values.asScala.map(_.get()).sum
    context.write(key, new IntWritable(sum))
  }
}

This reduces the amount of intermediate data transferred across the network.


What is a Partitioner in Hadoop?

A Partitioner controls how the intermediate key-value pairs generated by the Mapper are distributed to the Reducers.

Key Points about Partitioner:

  • By default, Hadoop uses HashPartitioner, which assigns keys to Reducers based on their hash values.

  • It ensures that all values for the same key go to the same Reducer.

  • Developers can create a custom Partitioner to control data distribution.

Example of Partitioner Usage:

Suppose we want to separate even and odd numbers into different Reducers:

class CustomPartitioner extends Partitioner[IntWritable, Text] {
  override def getPartition(key: IntWritable, value: Text, numPartitions: Int): Int = {
    if (key.get % 2 == 0) 0 else 1 % numPartitions
  }
}

This ensures that even numbers go to Reducer 0 and odd numbers go to Reducer 1.


Difference Between Combiner and Partitioner

FeatureCombinerPartitioner
PurposeReduces intermediate data size locallyDistributes keys across Reducers
ExecutionRuns after Mapper, before ShuffleRuns after Mapper output, before Reducer
RequirementOptionalMandatory if custom data distribution needed
ExampleLocal word count sumDistribute data by key range or category

Advantages of Combiner and Partitioner

  • Combiner reduces network congestion by minimizing data transfer.

  • Partitioner ensures balanced workload among Reducers.

  • Together, they optimize the overall efficiency of MapReduce jobs.


Conclusion

The Combiner and Partitioner are key components in the Hadoop MapReduce framework that improve performance and control how data is processed. By using a Combiner, developers can reduce the volume of intermediate data, while a Partitioner ensures efficient distribution of data to Reducers. Understanding these concepts is crucial for writing optimized and scalable Hadoop applications.