Go Back

What are accumulators and Explain briefly.

12/9/2023
All Articles

#What are accumulars Explain briefly.

What are accumulators and Explain briefly.

What are accumulators in spark ?

Apache Spark is framework for processing large data.It is similar to hadoop framework.
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

We can create program in spark and create accumulators .

accumulator = sc.accumulator(0)
def demo_acc(value):
    global accumulator
    accumulator += value

data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)
rdd.foreach(demo_acc)
print("Accumulator value:", accumulator.value)

 

Sometimes, a variable needs to be shared across diferent tasks, or between tasks and the driver program. Spark supports two types of shared variables: broadcast variables, which can be used to cache a value in memory on all nodes, and accumulators, which are variables that are only “added” to, such as counters and sums or constant object.

Conclusion

In the article of accumulator variable , we observe constant object which is distributed with computing with Apache Spark, an accumulator is a variable that can be used to aggregate values across multiple tasks in a parallel manner.
Spark ensures that these variables are updated in a way that is both efficient and fault-tolerant.
This is particularly useful in distributed computing scenarios where you want to perform a parallel operation on a large dataset and need to aggregate(sum,add,product,divide) results across different nodes.

Article