The Project for the Semester

The Goal

The Raspberry Pi is a marvel of modern computing. This credit card sized microprocessor offers full fledged computing power on-the-go. Raspberry Pis have been used for a variety of purposes including but not limited to gaming emulation and cluster computing. For our project however, we shall try to run Hadoop on a Raspberry Pi cluster.Initially, we will try to run Hadoop on a single Raspberry Pi 2 and figure out what hurdles we might be facing. Once the setup process is done on a single Pi, we shall try to scale the process to a cluster of Pis. Our current setup is 1 x Raspberry Pi 2, Model B, 1×32 GB Class 10 MicroSD card. Our version of the Raspberry Pi has a dual core ARMv8 processor with 1 GB of RAM. We would be using a lite version of the Raspbian OS on the Pi.

Requirements or What we’ve used.

  • Raspberry Pi 2 Model B
  • Samsung Class 10 32GB MicroSD Card
  • Dell KM113 Bluetooth Keyboard and Mouse
  • Acer H236HL Monitor
  • 2 AMPs Power Plug
  • Micro USB Cable
  • HDMI Cable
  • Patience!

Pictures, please?

Some pictures of the setup.

Details? Details!

First things, first. Setting up Raspbian Jessie Lite on the Raspberry Pi

Install Raspbian Jessie Lite on to your micro SD card

  • Download Raspbian Jessie Lite from https://www.raspberrypi.org/downloads/raspbian/
  • Format your micro SD card into fat32
  • Use the SDFormatter program
  • Enable ‘format size adjustment’
  • Extract the .zip of Raspbian Jessie Lite
  • Install the .img file onto your clean micro SD card
  • Use Win32DiskImager
  • You can now use the micro SD card to boot into Raspbian
  • Boot into Raspbian
  • Gain root access

Why Hadoop 2.0?

  • YARN (Yet Another Resource Negotiator) – Next generation MapReduce (MRv2)
  • Separation of processing engine and resource management which was implemented in Hadoop 1.x mapreduce
  • In Hadoop 1.x all processing was done through the mapreduce framework. With Hadoop 2.x the use of other data processing  frameworks is possible
  • TaskTracker slots are replaced with containers which are more generic
  • Hadoop 2.x MapReduce programs are backward compatible with Hadoop 1.x MapReduce
  • Overall increased scalability and performance
  • HDFS Federation – Possible to use multiple namenode servers to manage namespace which allows for horizontal scaling

Configure Network

Install a text editor of your choice and edit as root or with sudo:
/etc/network/interfaces

iface eth0 inet static
address 192.168.0.110
netmask 255.255.255.0
gateway: 192.168.0.1

Update system and install Oracle Java

sudo apt-get install oracle-java7-jdk
sudo apt-get update

Configure Hadoop users

Create new users for use with Hadoop:

sudo addgroup hadoop
sudo adduser --ingroup hadoop faraz
sudo adduser faraz sudo
sudo adduser --ingroup hadoop atul
sudo adduser atul sudo

 

This will enable nodes to communicate with each other in the cluster.

su hduser
mkdir ~/.ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Login as faraz, atul

su faraz
ssh localhost
exit
su atul
ssh localhost
exit

Install the latest Hadoop 2.0

And build it

wget https://www.trieuvan.com/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar -xzvf hadoop-2.7.2.tar.gz
cd hadoop-2.7.2
./configure --prefix=/usr
make
make check
sudo make install

Copy compiled binaries to /opt

cd hadoop-dist/target/
sudo cp -R hadoop-2.7.2 /opt/hadoop

Give access to faraz, atul

sudo chown -R faraz.hadoop /opt/hadoop/
sudo chown -R atul.hadoop /opt/hadoop/

Check version

hadoop version

Configure environment variables

In /etc/bash.bashrc, add to bottom of file:

echo 'export JAVA_HOME=$(readlink -f /usr/bin/java | sed “s:jre/bin/java::”)' >> /etc/bash.bashrc
echo 'export HADOOP_INSTALL=/opt/hadoop' >> /etc/bash.bashrc
echo 'export PATH=$PATH:$HADOOP_INSTALL/bin' >> /etc/bash.bashrc
echo 'export PATH=$PATH:$HADOOP_INSTALL/sbin' >> /etc/bash.bashrc
echo 'export HADOOP_MAPRED_HOME=$HADOOP_INSTALL' >> /etc/bash.bashrc
echo 'export HADOOP_COMMON_HOME=$HADOOP_INSTALL' >> /etc/bash.bashrc
echo 'export HADOOP_HDFS_HOME=$HADOOP_INSTALL' >> /etc/bash.bashrc
echo 'export YARN_HOME=$HADOOP_INSTALL' >> /etc/bash.bashrc
echo 'export HADOOP_HOME=$HADOOP_INSTALL' >> /etc/bash.bashrc

Edit and change varibales in hadoop-env.sh at /opt/hadoop/etc/hadoop/

export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt

Enable the use of native hadoop library and IPv4 stack:

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_INSTALL/lib/native -Djava.net.preferIPv4Stack=true"

Configure:

  • core-site.xml
  • hdfs-site.xml
  • yarn-site.xml
  • mapred-site.xml

Format HDFS filesystem

sudo mkdir -p /hdfs/tmp
sudo chown faraz:hadoop /hdfs/tmp
sudo chmod 750 /hdfs/tmp
hadoop namenode -format

Start Hadoop

Run the following commands as faraz:

start-dfs-sh
start-yarn.sh

Verify that all services started correctly:

jps

You are all set!

 

 

References:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s