Skip to content

How to Install Apache Spark on Ubuntu Linux

Richard
Written by
Richard
Feb 13, 2021 Updated Mar 18, 2026 2 min read
How to Install VMware Workstation Player on Ubuntu Linux

You install Apache Spark on Ubuntu Linux to efficiently process and analyze large datasets across distributed systems.

Apache Spark is an open-source, distributed computing system designed for big data workloads. It provides powerful APIs for batch and stream processing, machine learning, and graph computation.

This guide helps you install Spark version 3.5.1 on your Ubuntu 22.04 or 20.04 system. You will set up Spark for distributed computing, enabling you to tackle complex data science tasks on your PC.

⚡ Quick Answer

Install Java JDK and Scala using `sudo apt install default-jdk scala`. Download and extract Spark from its official site, then move it to `/opt/spark`. Set `SPARK_HOME` and update your PATH in `~/.bashrc`. Finally, start Spark with `start-master.sh` and `start-slave.sh`.

Install Java JDK

Apache Spark requires Java JDK. In Ubuntu, the commands below can install the latest version.

🐧Bash / Shell
sudo apt update
sudo apt install default-jdk

After installing, run the commands below to verify the version of Java installed.

💻Code
java --version

This should show you lines like the ones below:

💻Code
openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Install Scala

You’ll also need a package called Scala to run Apache Spark. To install it in Ubuntu, run the commands below:

🐧Bash / Shell
sudo apt install scala

To verify the version of Scala installed, run the commands below:

💻Code
scala -version

This should display a line like this:

💻Code
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Install Apache Spark

Now that you have installed the required packages to run Apache Spark, continue below to install it.

Download the latest version of Spark.

Command Prompt
cd /tmp
wget https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz

Then, extract the downloaded file and move it to the /opt directory.

🐧Bash / Shell
tar -xvzf spark-2.4.6-bin-hadoop2.7.tgz
sudo mv spark-2.4.6-bin-hadoop2.7 /opt/spark

After that, create the necessary environment variables to run Spark.

💻Code
nano ~/.bashrc

Add the following lines to the bottom of the file and save.

💻Code
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Finally, apply your environment changes by running these commands.

💻Code
source ~/.bashrc

Start Apache Spark

At this point, Apache Spark is installed and ready to use. Run the commands below to start it up.

💻Code
start-master.sh

Next, start the Spark work process by running the commands below.

💻Code
start-slave.sh spark://localhost:7077

You can swap `localhost` with your server’s hostname or IP address. When the process starts, open your browser and navigate to the server hostname or IP address.

💻Code
http://localhost:8080
ubuntu apache spark install

If you wish to connect to Spark via its command shell, run the commands below:

💻Code
spark-shell

The commands above will launch Spark Shell.

💻Code
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 2.4.6
      /_/
         
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.10)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

That should do it!

Conclusion:

This post showed you how to install Apache Spark on Ubuntu 20.04 | 18.04. If you find any error above, please use the form below to report.

Was this guide helpful?

Was this helpful?
Richard

About the Author

Richard

Tech Writer, IT Professional

Richard, a writer for Geek Rewind, is a tech enthusiast who loves breaking down complex IT topics into simple, easy-to-understand ideas. With years of hands-on experience in system administration and enterprise IT operations, he’s developed a knack for offering practical tips and solutions. Richard aims to make technology more accessible and actionable. He's deeply committed to the Geek Rewind community, always ready to answer questions and engage in discussions.

No comments yet — be the first to share your thoughts!

Leave a Comment

Your email address will not be published. Required fields are marked *

Exit mobile version