Go Back

What are accumulators and Explain briefly.

All Articles

#What are accumulars Explain briefly.

What are accumulators and Explain briefly.

What are accumulators in spark ?

Apache Spark is framework for processing large data.It is similar to hadoop framework.
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

We can create program in spark and create accumulators .

accumulator = sc.accumulator(0)
def demo_acc(value):
    global accumulator
    accumulator += value

data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)
print("Accumulator value:", accumulator.value)


Sometimes, a variable needs to be shared across diferent tasks, or between tasks and the driver program. Spark supports two types of shared variables: broadcast variables, which can be used to cache a value in memory on all nodes, and accumulators, which are variables that are only “added” to, such as counters and sums or constant object.


In the article of accumulator variable , we observe constant object which is distributed with computing with Apache Spark, an accumulator is a variable that can be used to aggregate values across multiple tasks in a parallel manner.
Spark ensures that these variables are updated in a way that is both efficient and fault-tolerant.
This is particularly useful in distributed computing scenarios where you want to perform a parallel operation on a large dataset and need to aggregate(sum,add,product,divide) results across different nodes.