Ubuntu Linux

How to Install Apache Spark on Ubuntu Linux

Written by

Richard

Feb 13, 2021 Updated Mar 18, 2026 2 min read

How to Install VMware Workstation Player on Ubuntu Linux

You install Apache Spark on Ubuntu Linux to efficiently process and analyze large datasets across distributed systems.

Apache Spark is an open-source, distributed computing system designed for big data workloads. It provides powerful APIs for batch and stream processing, machine learning, and graph computation.

This guide helps you install Spark version 3.5.1 on your Ubuntu 22.04 or 20.04 system. You will set up Spark for distributed computing, enabling you to tackle complex data science tasks on your PC.

⚡ Quick Answer

Install Java JDK and Scala using `sudo apt install default-jdk scala`. Download and extract Spark from its official site, then move it to `/opt/spark`. Set `SPARK_HOME` and update your PATH in `~/.bashrc`. Finally, start Spark with `start-master.sh` and `start-slave.sh`.

Install Java JDK

Apache Spark requires Java JDK. In Ubuntu, the commands below can install the latest version.

🐧Bash / Shell

sudo apt update
sudo apt install default-jdk

After installing, run the commands below to verify the version of Java installed.

💻Code

java --version

This should show you lines like the ones below:

💻Code

openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Install Scala

You’ll also need a package called Scala to run Apache Spark. To install it in Ubuntu, run the commands below:

🐧Bash / Shell

sudo apt install scala

To verify the version of Scala installed, run the commands below:

💻Code

scala -version

This should display a line like this:

💻Code

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Install Apache Spark

Now that you have installed the required packages to run Apache Spark, continue below to install it.

Download the latest version of Spark.

⬛Command Prompt

cd /tmp
wget https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz

Then, extract the downloaded file and move it to the /opt directory.

🐧Bash / Shell

tar -xvzf spark-2.4.6-bin-hadoop2.7.tgz
sudo mv spark-2.4.6-bin-hadoop2.7 /opt/spark

After that, create the necessary environment variables to run Spark.

💻Code

nano ~/.bashrc

Add the following lines to the bottom of the file and save.

💻Code

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Finally, apply your environment changes by running these commands.

💻Code

source ~/.bashrc

Start Apache Spark

At this point, Apache Spark is installed and ready to use. Run the commands below to start it up.

💻Code

start-master.sh

Next, start the Spark work process by running the commands below.

💻Code

start-slave.sh spark://localhost:7077

You can swap `localhost` with your server’s hostname or IP address. When the process starts, open your browser and navigate to the server hostname or IP address.

💻Code

http://localhost:8080

ubuntu apache spark install

If you wish to connect to Spark via its command shell, run the commands below:

💻Code

spark-shell

The commands above will launch Spark Shell.

💻Code

Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 2.4.6
      /_/
         
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.10)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

That should do it!

Conclusion:

This post showed you how to install Apache Spark on Ubuntu 20.04 | 18.04. If you find any error above, please use the form below to report.

Was this guide helpful?

Tags: #Ubuntu Linux

Was this helpful?

About the Author

Richard

Tech Writer, IT Professional

Richard, a writer for Geek Rewind, is a tech enthusiast who loves breaking down complex IT topics into simple, easy-to-understand ideas. With years of hands-on experience in system administration and enterprise IT operations, he’s developed a knack for offering practical tips and solutions. Richard aims to make technology more accessible and actionable. He's deeply committed to the Geek Rewind community, always ready to answer questions and engage in discussions.

2631 articles → Twitter