apache spark installation guide

3/6/2025

Apache Spark installation steps - Verifying Java and Scala setup

Step-by-Step Guide to Installing Apache Spark on Windows and Ubuntu

Updated: May 20, 2025 by Your Name or Brand Name

Apache Spark installation steps - Verifying Java and Scala setup

Introduction to Apache Spark Installation

Apache Spark is a powerful open-source framework for distributed data processing, widely used for big data analytics and machine learning. Whether you're working on Windows or Ubuntu, installing Spark is a straightforward process if you follow the right steps. This guide will walk you through the installation process for both operating systems, including prerequisites, installation steps, and verification.

Prerequisites for Apache Spark Installation

Before installing Apache Spark, ensure your system meets the following requirements:

For Windows:

A system running Windows 10.
A user account with administrator privileges (required for software installation and system path modifications).
Command Prompt or PowerShell.
A tool like 7-Zip to extract .tar files.

For Ubuntu:

An Ubuntu system (preferably Ubuntu 24.04 or later).
Access to a terminal or command line.
A user with sudo or root permissions.

Step 1: Verifying Java Installation

Apache Spark requires Java 8 or later. To verify if Java is installed on your system:

On Windows:

Open Command Prompt or PowerShell and run:

java -version

If Java is installed, you'll see output like:

java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)

If Java is not installed, download it from the official Java website and install it.

On Ubuntu:

Run the following command in the terminal:

java -version

If Java is not installed, install it using:

sudo apt install default-jdk -y

Step 2: Verifying Scala Installation

Scala is another prerequisite for Apache Spark. To check if Scala is installed:

On Windows:

Run the following command in Command Prompt or PowerShell:

scala -version

If Scala is installed, you'll see output like:

Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

If Scala is not installed, proceed to the next step.

On Ubuntu:

Run:

scala -version

If Scala is not installed, install it using:

sudo apt install scala -y

Step 3: Downloading and Installing Scala

If Scala is not installed, download the latest version from the official Scala website. For this tutorial, we’ll use Scala 2.11.6.

On Windows:

Download the Scala .tar file.
Extract it using 7-Zip:
```
tar xvf scala-2.11.6.tgz
```
Move the extracted files to a directory, e.g., C:\Scala.
Add Scala to the system PATH:
```
set PATH=%PATH%;C:\Scala\bin
```

On Ubuntu:

Download Scala:

wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz

Extract the .tar file:
```
tar xvf scala-2.11.6.tgz
```
Move the files to /usr/local/scala:
```
sudo mv scala-2.11.6 /usr/local/scala
```

Add Scala to the PATH:

echo "export PATH=$PATH:/usr/local/scala/bin" >> ~/.bashrc
source ~/.bashrc

Step 4: Downloading Apache Spark

Download the latest version of Apache Spark from the official Spark website. For this tutorial, we’ll use Spark 3.5.3 with Hadoop 3.

On Windows:

Download the .tgz file.
Extract it using 7-Zip.

On Ubuntu:

Download Spark:

wget https://dlcdn.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz

Verify the download using the SHA-512 checksum:

wget https://downloads.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz.sha512
shasum -a 512 -c spark-3.5.3-bin-hadoop3.tgz.sha512

Extract the .tgz file:
```
tar xvf spark-3.5.3-bin-hadoop3.tgz
```

Step 5: Installing Apache Spark

On Windows:

Move the extracted Spark folder to a directory, e.g., C:\Spark.
Add Spark to the system PATH:
```
set PATH=%PATH%;C:\Spark\bin
```

On Ubuntu:

Move the Spark folder to /opt/spark:

sudo mv spark-3.5.3-bin-hadoop3 /opt/spark

Add Spark to the PATH:

echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=$PATH:$SPARK_HOME/bin" >> ~/.bashrc
source ~/.bashrc

Step 6: Verifying the Spark Installation

On Windows:

Open Command Prompt or PowerShell and run:

spark-shell

If Spark is installed correctly, you’ll see the Spark logo and the Scala shell prompt.

On Ubuntu:

Run:

spark-shell

You should see the Spark logo and the Scala shell prompt.

Step 7: Setting Up Spark on Ubuntu

Start the Spark master server:
```
start-master.sh
```
Access the Spark Web UI at http://localhost:8080.
Start a worker server:
```
start-worker.sh spark://:7077
```

Conclusion

By following this guide, you’ve successfully installed Apache Spark on Windows and Ubuntu. You’re now ready to start building and running Spark applications for big data processing. For advanced tasks, explore configuring multi-node clusters or working with Spark DataFrames.

FAQs

1. What is Apache Spark?

Apache Spark is a distributed computing framework for big data processing and analytics.

2. Can I use Spark without Hadoop?

Yes, Spark can run in standalone mode without Hadoop.

3. How do I verify Spark installation?

Run spark-shell and check for the Spark logo and Scala prompt.

4. What are the prerequisites for Spark installation?

Java, Scala, and a compatible operating system (Windows or Ubuntu).

5. How do I start the Spark master and worker?

Use start-master.sh and start-worker.sh commands.

By following this guide, you’ll be able to install and set up Apache Spark on both Windows and Ubuntu efficiently. Happy coding! 🚀

Table of content

Introduction to Scala
Scala Basics
Control Structures
Functions and Methods
Object-Oriented Programming in Scala
Advanced Scala Concepts
Functional Programming in Scala
Scala Programming Exercises
- Fibonacci Program
- Remove Duplicates in String
Scala Interview Preparation
- Top 250 Scala Questions
- Scala Interview Questions
Scala Practice & Coding Challenges
- Scala Quiz
- Coding Challenges
Additional Scala Resources