Cutting Edge: Installing Single Node Hadoop 2.6 using Bash Script

Get Started

We are going to write simple bash script and execute it to install Hadoop 2.6 with all its dependencies, Now we will check how to install stable version of Apache Hadoop from bash script on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. We will write content to text file using any of your favorite test editor. give it permission to execute and enjoy. hope it works without error, else let me know issue to update it, it works fine for me on 2 different machines running Linux mint 17 and Ubuntu 14.04 respectively.

What is Bash?

Descended from the Bourne Shell, Bash is a GNU product, the "Bourne Again SHell." It's the standard command line interface on most Linux machines. It excels at interactivity, supporting command line editing, completion, and recall. It also supports configurable prompts - most people realize this, but don't know how much can be done.

Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Write Script File

To write script for installation open new file named install_hadoop.sh and put following content in it.

#!/bin/bash  

# Script to install Sun Java and Hadoop 2.6 

clear  

# This command is used to tell shell, turn installation mode to non interactive  
# and set auto selection of agreement for Sun Java  
export DEBIAN_FRONTEND=noninteractive  
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections  
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections  


echo "Bash Script for Installing Sun Java for Ubuntu!"  

echo "Now Script will try to purge OpenJdk if installed..."  

# purge openjdk if installed to remove conflict  
apt-get purge openjdk-\* -y  

echo "Now we will update repository..."  

apt-get update -y  

echo "Adding Java Repository...."  

apt-get install python-software-properties -y  
add-apt-repository ppa:webupd8team/java -y  

echo "Updating Repository to load java repository"  

apt-get update -y  

echo "Installing Sun Java....."  
sudo -E apt-get purge oracle-java7-installer -y  
sudo -E apt-get install oracle-java7-installer -y  


echo "Installation completed...."  

echo "Installed java version is...."  

java -version  


apt-get install openssh-server -y  
/etc/init.d/ssh status  
/etc/init.d/ssh start  

ssh-keyscan -H localhost > ~/.ssh/known_hosts  
y|ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa  
cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys  
ssh-add  

cd /usr/local  
sudo wget http://mirror.sdunix.com/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz  
tar xzf hadoop-2.6.0.tar.gz  
mkdir hadoop  
mv hadoop-2.6.0/* hadoop/  

echo "Now script is updating Bashrc for export Path etc"  

cat >> ~/.bashrc << EOL  
export HADOOP_HOME=/usr/local/hadoop  
export HADOOP_MAPRED_HOME=/usr/local/hadoop  
export HADOOP_COMMON_HOME=/usr/local/hadoop  
export HADOOP_HDFS_HOME=/usr/local/hadoop  
export YARN_HOME=/usr/local/hadoop  
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native  
export JAVA_HOME=/usr/  
export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin:$JAVA_PATH/bin  
EOL  

cat ~ / .bashrc  

source ~ / .bashrc  

echo "Now script is updating hadoop configuration files"  

cat >> /usr/local/hadoop/etc/hadoop/hadoop-env.sh << EOL  
export JAVA_HOME=/usr/  
EOL  

cd /usr/local/hadoop/etc/hadoop  

cat > core-site.xml << EOL  
<configuration>  
<property>  
<name>fs.default.name</name>  
<value>hdfs://localhost:9000</value>  
</property>  
</configuration>  
EOL  

cp mapred-site.xml.template mapred-site.xml  
cat > mapred-site.xml << EOL  
<configuration>  
<property>  
<name>mapreduce.framework.name</name>  
<value>yarn</value>  
</property>  
</configuration>  
EOL  

cat > yarn-site.xml << EOL  
<configuration>  
<property>  
<name>yarn.nodemanager.aux-services</name>  
<value>mapreduce_shuffle</value>  
</property>  
</configuration>  
EOL  

cat > hdfs-site.xml << EOL  
<configuration>  
<property>  
<name>dfs.replication</name>  
<value>1</value>  
</property>  
<property>  
<name>dfs.name.dir</name>  
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>  
</property>  
<property>  
<name>dfs.data.dir</name>  
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >  
</property>  
</configuration>  
EOL  

echo "Completed process Now Reloading Bash Profile...."  
cd ~  

echo "You may require reloading bash profile, you can reload using following command."  
echo "source ~/.bashrc"  

echo "To Start you need to format Name Node Once you can use following command."  
echo "hdfs namenode -format"  

echo "Hadoop configured. now you can start hadoop using following commands. "  
echo "start-dfs.sh"  
echo "start-yarn.sh"  

echo "To stop hadoop use following scripts."  
echo "stop-dfs.sh"  
echo "stop-yarn.sh"

Now we will give it permission to make it executable.

 ~$ chmod 755 install_hadoop.sh

Now we can execute script using following command, Since installation require root excess, you are required to login as root or switch to root using command "su -"

 ~$ ./install_hadoop.sh

after successful completion of script you can now move forward for Formation name Node and starting Hadoop.

You may face command not recognized issue, mean bash profile not reloaded. safe way is to reload manually using following command.

 ~$ source ~/.bashrc

Initializing the Single-Node Cluster

Formatting the Name Node:

While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.

 ~$ cd ~  
 ~$ hdfs namenode -format

Starting Hadoop dfs daemons:

 ~$ start-dfs.sh

Starting Yarn daemons:

 ~$ start-yarn.sh

Check all daemon processes:

 ~ $ Jps

 6069 NodeManager  
 5644 DataNode  
 5827 SecondaryNameNode  
 4692 ResourceManager  
 6165 Jps  
 5491 NameNode

* Process id will be changed for each execution, main idea is to check if certain processes are running fine.

You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:

name-node: http://localhost:50070/

Stopping all daemons:

 ~$ stop-dfs.sh  
 ~$ stop-yarn.sh

Enjoy.

Cutting Edge

Friday, 24 April 2015

Installing Single Node Hadoop 2.6 using Bash Script