Running Spark on YARN, Mesos, and Kubernetes: Spark Tutorial

8/17/2025

Diagram showing Apache Spark deployment on YARN, Mesos, and Kubernetes with cluster managers and resource allocation

Go Back

Running Spark on YARN, Mesos, and Kubernetes: Spark Tutorial

Apache Spark is a versatile big data processing framework that can run on multiple cluster managers. Understanding how to deploy Spark on YARN, Mesos, and Kubernetes is essential for building scalable and production-ready applications.

In this tutorial, we will explore the steps, advantages, and best practices of running Spark on different cluster managers.


Diagram showing Apache Spark deployment on YARN, Mesos, and Kubernetes with cluster managers and resource allocation

1. Running Spark on YARN

YARN (Yet Another Resource Negotiator) is Hadoop’s cluster manager and is widely used in big data ecosystems.

Steps to Run Spark on YARN

  1. Configure Spark for YARN:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 4 \
  --executor-memory 4G \
  --executor-cores 2 \
  app.py
  1. Deploy Mode:

    • cluster – Application driver runs on a YARN node.

    • client – Driver runs on the submitting machine.

  2. Resource Management:

    • Use --num-executors, --executor-memory, and --executor-cores to control resources.

Advantages of YARN

  • Seamless integration with Hadoop ecosystem.

  • Supports multi-tenancy.

  • Mature and widely used in enterprise deployments.


2. Running Spark on Mesos

Apache Mesos is a general cluster manager that can run Spark alongside other frameworks.

Steps to Run Spark on Mesos

  1. Start Mesos Cluster and install Mesos master and agents.

  2. Configure Spark for Mesos:

spark-submit \
  --master mesos://mesos-master:5050 \
  --deploy-mode cluster \
  app.py
  1. Mesos Modes:

    • coarse-grained – Spark manages resources for the entire application.

    • fine-grained – Spark requests resources per task (less common).

Advantages of Mesos

  • Supports multiple frameworks on the same cluster.

  • Fine-grained resource sharing.

  • Dynamic allocation of resources.


3. Running Spark on Kubernetes

Kubernetes has become a popular choice for Spark deployment due to containerization and cloud-native features.

Steps to Run Spark on Kubernetes

  1. Build a Docker Image containing Spark and your application.

  2. Deploy Spark Application:

spark-submit \
  --master k8s://https://<k8s-api-server> \
  --deploy-mode cluster \
  --name spark-app \
  --conf spark.executor.instances=3 \
  --conf spark.kubernetes.container.image=spark-app-image:latest \
  local:///opt/spark/app.py
  1. Kubernetes Features:

    • Autoscaling pods.

    • Native container orchestration.

    • Easy integration with cloud services.

Advantages of Kubernetes

  • Cloud-native deployment.

  • Container isolation and portability.

  • Easy scaling and monitoring.


Best Practices for Running Spark on Cluster Managers

  1. Resource Tuning: Adjust memory, cores, and executor count according to workload.

  2. Monitoring: Use Spark UI, YARN ResourceManager, Mesos UI, or Kubernetes dashboards.

  3. Fault Tolerance: Enable checkpointing and retry mechanisms.

  4. Containerization: For Kubernetes, always build optimized Docker images.

  5. Dynamic Allocation: Enable dynamic allocation to optimize cluster resource usage.


Real-World Use Cases

  • Enterprise ETL Pipelines: Spark on YARN is widely used for Hadoop-based pipelines.

  • Multi-framework Clusters: Mesos allows Spark, Hadoop, and other frameworks to coexist.

  • Cloud-native Applications: Kubernetes is ideal for deploying Spark in AWS, GCP, or Azure.


Conclusion

Running Spark on YARN, Mesos, and Kubernetes provides flexibility for different environments. By understanding deployment modes, resource tuning, and best practices, Spark developers can build scalable, efficient, and fault-tolerant applications across various cluster managers.


SEO Keywords

  • Running Spark on YARN tutorial

  • Spark Mesos deployment

  • Spark Kubernetes example

  • Spark cluster manager guide

  • Spark submit cluster mode