How to install Apache Spark on Ubuntu using Apache Bigtop

by Volodymyr Miz on April 4, 2019 under Tutorial

2 minute read

Has this ever happened to you? A new version of Spark is coming out and you want to try it out. To do that, you have to remove the previous version, download and extract a new one and hope that everything still works. Or, sometimes, you are just getting started with a Spark project and want the installation process to be seamless and easy. A one-liner command would be nice, wouldn’t it? Making this process more organized and user-friendly is one of the goals of Apache Bigtop. This post is for ML and infrastructure engineers, data scientists, and those who are just willing to try Spark out.

Apache Bigtop is aimed at providing ML engineers, infrastructure engineers, and data scientists with a convenient tool for packaging, deployment, and integration of Hadoop-related projects such as HDFS, MapReduce, Pig, Hive, HBase, ZooKeeper, Spark, and many others.

In this tutorial, I will show how to install Apache Bigtop and how to use it to install Apache Spark. Here, I will focus on Ubuntu. For other distributions, check out this link.

Bigtop installation

This tutorial is for Bigtop version 1.3.0. If you want to isntall other versions, change the version in the commands below accordingly.

Make sure that you have the latest JDK installed on your system (so far, JDK 8 works well).
Install the Apache Bigtop GPG key.

wget -O- http://archive.apache.org/dist/bigtop/bigtop-1.3.0/repos/GPG-KEY-bigtop | sudo apt-key add -

Make sure to grab the repo file.

sudo wget -O /etc/apt/sources.list.d/bigtop-1.3.0.list http://archive.apache.org/dist/bigtop/bigtop-1.3.0/repos/ubuntu16.04/bigtop.list

Update the apt cache.

sudo apt-get update

Browse through the artifacts.

apt-cache search mahout

Install bigtop-utils.

sudo apt-get install bigtop-utils

Now you can install Spark and other Hadoop-related projects.

Spark installation

Install Spark.

sudo apt-get install spark\*

Take a look at the Wiki of the Bigtop project for more information concerning other Hadoop-related projects.

If you are looking for an easier way to try out Spark, check out another tutorial on how to create Spark Scala project in Intellij IDEA. This way you do not have to install anything except Intellij IDEA.

Apache Spark, Apache Bigtop, Development, Machine Learning, Tutorial, Install

I feedback.
Let me know what you think of this article on twitter @mizvladimir or leave a comment below!