Ubuntu Linux

How to Install Apache Spark on Ubuntu Linux

Richard
Written by
Richard
Feb 13, 2021 Updated Mar 18, 2026 4 min read

This brief tutorial shows students and new users how to install Apache Spark on Ubuntu 20.04 | 18.04.

Apache Spark is a powerful open-source big data processing and analysis framework. It has higher-level libraries that support SQL queries, streaming data, machine learning, and graph processing.

By installing Apache Spark on Ubuntu Linux, you can take advantage of its features and analyze large amounts of data, distribute it across clusters, and process it in parallel.

Additionally, Ubuntu Linux is a popular and reliable operating system widely used by developers, making it a great choice for running Apache Spark.

I am getting started with installing Apache Spark on Ubuntu.

Install Java JDK

Apache Spark requires Java JDK. In Ubuntu, the commands below can install the latest version.

🐧Bash / Shell
sudo apt update
sudo apt install default-jdk

After installing, run the commands below to verify the version of Java installed.

💻Code
java --version

That should display similar lines as shown below:

💻Code
openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Install Scala

One package that you’ll also need to run Apache Spark in Scala. To install in Ubuntu, simply run the commands below:

🐧Bash / Shell
sudo apt install scala

To verify the version of Scala installed, run the commands below:

💻Code
scala -version

Doing that will display a similar line below:

💻Code
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Install Apache Spark

Now that you have installed the required packages to run Apache Spark, continue below to install it.

Run the commands below to download the latest version.

Command Prompt
cd /tmp
wget https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz

Next, extract the downloaded file and move it to the /opt directory.

🐧Bash / Shell
tar -xvzf spark-2.4.6-bin-hadoop2.7.tgz
sudo mv spark-2.4.6-bin-hadoop2.7 /opt/spark

Next, create environment variables to be able to execute and run Spark.

💻Code
nano ~/.bashrc

Then, add the lines at the bottom of the file and save.

💻Code
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

After that, run the commands below to apply your environment changes.

💻Code
source ~/.bashrc

Start Apache Spark

At this point, Apache Spark is installed and ready to use. Run the commands below to start it up.

💻Code
start-master.sh

Next, start the Spark work process by running the commands below.

💻Code
start-slave.sh spark://localhost:7077

You can replace the localhost host with the server hostname or IP address. When the process starts, open your browser and browse to the server hostname or IP address.

💻Code
http://localhost:8080
Terminal window showing the Apache Spark installation process on Ubuntu Linux

If you wish to connect to Spark via its command shell, run the commands below:

💻Code
spark-shell

The commands above will launch Spark Shell.

💻Code
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 2.4.6
      /_/
         
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.10)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

That should do it!

Conclusion:

This post showed you how to install Apache Spark on Ubuntu 20.04 | 18.04. If you find any error above, please use the form below to report.

Frequently Asked Questions

What are the system requirements to install Apache Spark on Ubuntu?

To install Apache Spark on Ubuntu, you need to have Java JDK and Scala installed. Additionally, ensure your system has sufficient memory and CPU resources to handle big data processing tasks.

How do I verify if Java is installed correctly on my Ubuntu system?

You can verify if Java is installed correctly by running the command 'java --version' in the terminal. If Java is installed, it will display the version information.

What is the purpose of installing Scala for Apache Spark?

Scala is a programming language that Apache Spark is built on, and it is necessary for running Spark applications. Installing Scala allows you to write and execute Spark jobs using the Scala programming language.

How can I start Apache Spark after installation?

After installing Apache Spark, you can start it by running the command 'start-master.sh' in the terminal. Then, to start the Spark worker process, use 'start-slave.sh spark://localhost:7077'.

How do I access the Spark web UI after starting it?

Once Apache Spark is running, you can access the Spark web UI by opening a web browser and navigating to 'http://localhost:8080'. This interface provides insights into the Spark cluster and job status.

Was this guide helpful?

Richard

About the Author

Richard

Tech Writer, IT Professional

Richard, the owner and lead writer at Geek Rewind, is a tech enthusiast passionate about simplifying complex IT topics. His years of hands-on experience in system administration and enterprise IT operations have honed his ability to provide practical insights and solutions. Richard aims to make technology more accessible and actionable. He's deeply committed to the Geek Rewind community, always ready to answer questions and engage in discussions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version