apache spark installation guide
Apache Spark installation steps - Verifying Java and Scala setup
Updated: May 20, 2025 by Your Name or Brand Name
Apache Spark is a powerful open-source framework for distributed data processing, widely used for big data analytics and machine learning. Whether you're working on Windows or Ubuntu, installing Spark is a straightforward process if you follow the right steps. This guide will walk you through the installation process for both operating systems, including prerequisites, installation steps, and verification.
Before installing Apache Spark, ensure your system meets the following requirements:
.tar files.Apache Spark requires Java 8 or later. To verify if Java is installed on your system:
Open Command Prompt or PowerShell and run:
java -versionIf Java is installed, you'll see output like:
java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)If Java is not installed, download it from the official Java website and install it.
Run the following command in the terminal:
java -versionIf Java is not installed, install it using:
sudo apt install default-jdk -yScala is another prerequisite for Apache Spark. To check if Scala is installed:
Run the following command in Command Prompt or PowerShell:
scala -versionIf Scala is installed, you'll see output like:
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFLIf Scala is not installed, proceed to the next step.
Run:
scala -versionIf Scala is not installed, install it using:
sudo apt install scala -yIf Scala is not installed, download the latest version from the official Scala website. For this tutorial, we’ll use Scala 2.11.6.
.tar file.
tar xvf scala-2.11.6.tgzC:\Scala.
set PATH=%PATH%;C:\Scala\bin
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz.tar file:
tar xvf scala-2.11.6.tgz/usr/local/scala:
sudo mv scala-2.11.6 /usr/local/scala
echo "export PATH=$PATH:/usr/local/scala/bin" >> ~/.bashrc
source ~/.bashrcDownload the latest version of Apache Spark from the official Spark website. For this tutorial, we’ll use Spark 3.5.3 with Hadoop 3.
.tgz file.
wget https://dlcdn.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz
wget https://downloads.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz.sha512
shasum -a 512 -c spark-3.5.3-bin-hadoop3.tgz.sha512.tgz file:
tar xvf spark-3.5.3-bin-hadoop3.tgzC:\Spark.
set PATH=%PATH%;C:\Spark\bin/opt/spark:
sudo mv spark-3.5.3-bin-hadoop3 /opt/spark
echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=$PATH:$SPARK_HOME/bin" >> ~/.bashrc
source ~/.bashrcOpen Command Prompt or PowerShell and run:
spark-shellIf Spark is installed correctly, you’ll see the Spark logo and the Scala shell prompt.
Run:
spark-shellYou should see the Spark logo and the Scala shell prompt.
start-master.sh
start-worker.sh spark://:7077 By following this guide, you’ve successfully installed Apache Spark on Windows and Ubuntu. You’re now ready to start building and running Spark applications for big data processing. For advanced tasks, explore configuring multi-node clusters or working with Spark DataFrames.
Apache Spark is a distributed computing framework for big data processing and analytics.
Yes, Spark can run in standalone mode without Hadoop.
Run spark-shell and check for the Spark logo and Scala prompt.
Java, Scala, and a compatible operating system (Windows or Ubuntu).
Use start-master.sh and start-worker.sh commands.
By following this guide, you’ll be able to install and set up Apache Spark on both Windows and Ubuntu efficiently. Happy coding! 🚀