Friday 24 April 2015

Installing Single Node Hadoop 2.6 using Bash Script

Get Started

We are going to write simple bash script and execute it to install Hadoop 2.6 with all its dependencies, Now we will check how to install stable version of Apache Hadoop from bash script on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. We will write content to text file using any of your favorite test editor. give it permission to execute and enjoy. hope it  works without error, else let me know issue to update it, it works fine for me on 2 different machines running Linux mint 17 and Ubuntu 14.04 respectively.

What is Bash?

Descended from the Bourne Shell, Bash is a GNU product, the "Bourne Again SHell." It's the standard command line interface on most Linux machines. It excels at interactivity, supporting command line editing, completion, and recall. It also supports configurable prompts - most people realize this, but don't know how much can be done.

Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:
  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Write Script File

To write script for installation open new file named install_hadoop.sh and put following content in it.
#!/bin/bash  

# Script to install Sun Java and Hadoop 2.6 

clear  

# This command is used to tell shell, turn installation mode to non interactive  
# and set auto selection of agreement for Sun Java  
export DEBIAN_FRONTEND=noninteractive  
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections  
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections  


echo "Bash Script for Installing Sun Java for Ubuntu!"  

echo "Now Script will try to purge OpenJdk if installed..."  

# purge openjdk if installed to remove conflict  
apt-get purge openjdk-\* -y  

echo "Now we will update repository..."  

apt-get update -y  

echo "Adding Java Repository...."  

apt-get install python-software-properties -y  
add-apt-repository ppa:webupd8team/java -y  

echo "Updating Repository to load java repository"  

apt-get update -y  

echo "Installing Sun Java....."  
sudo -E apt-get purge oracle-java7-installer -y  
sudo -E apt-get install oracle-java7-installer -y  


echo "Installation completed...."  

echo "Installed java version is...."  

java -version  


apt-get install openssh-server -y  
/etc/init.d/ssh status  
/etc/init.d/ssh start  

ssh-keyscan -H localhost > ~/.ssh/known_hosts  
y|ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa  
cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys  
ssh-add  

cd /usr/local  
sudo wget http://mirror.sdunix.com/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz  
tar xzf hadoop-2.6.0.tar.gz  
mkdir hadoop  
mv hadoop-2.6.0/* hadoop/  

echo "Now script is updating Bashrc for export Path etc"  

cat >> ~/.bashrc << EOL  
export HADOOP_HOME=/usr/local/hadoop  
export HADOOP_MAPRED_HOME=/usr/local/hadoop  
export HADOOP_COMMON_HOME=/usr/local/hadoop  
export HADOOP_HDFS_HOME=/usr/local/hadoop  
export YARN_HOME=/usr/local/hadoop  
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native  
export JAVA_HOME=/usr/  
export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin:$JAVA_PATH/bin  
EOL  

cat ~ / .bashrc  

source ~ / .bashrc  

echo "Now script is updating hadoop configuration files"  

cat >> /usr/local/hadoop/etc/hadoop/hadoop-env.sh << EOL  
export JAVA_HOME=/usr/  
EOL  

cd /usr/local/hadoop/etc/hadoop  

cat > core-site.xml << EOL  
<configuration>  
<property>  
<name>fs.default.name</name>  
<value>hdfs://localhost:9000</value>  
</property>  
</configuration>  
EOL  

cp mapred-site.xml.template mapred-site.xml  
cat > mapred-site.xml << EOL  
<configuration>  
<property>  
<name>mapreduce.framework.name</name>  
<value>yarn</value>  
</property>  
</configuration>  
EOL  

cat > yarn-site.xml << EOL  
<configuration>  
<property>  
<name>yarn.nodemanager.aux-services</name>  
<value>mapreduce_shuffle</value>  
</property>  
</configuration>  
EOL  

cat > hdfs-site.xml << EOL  
<configuration>  
<property>  
<name>dfs.replication</name>  
<value>1</value>  
</property>  
<property>  
<name>dfs.name.dir</name>  
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>  
</property>  
<property>  
<name>dfs.data.dir</name>  
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >  
</property>  
</configuration>  
EOL  

echo "Completed process Now Reloading Bash Profile...."  
cd ~  

echo "You may require reloading bash profile, you can reload using following command."  
echo "source ~/.bashrc"  

echo "To Start you need to format Name Node Once you can use following command."  
echo "hdfs namenode -format"  

echo "Hadoop configured. now you can start hadoop using following commands. "  
echo "start-dfs.sh"  
echo "start-yarn.sh"  

echo "To stop hadoop use following scripts."  
echo "stop-dfs.sh"  
echo "stop-yarn.sh"  


Now we will give it permission to make it executable.
 ~$ chmod 755 install_hadoop.sh

Now we can execute script using following command, Since installation require root excess, you are required to login as root or switch to root using command "su -"
 ~$ ./install_hadoop.sh  
after successful completion of script you can now move forward for Formation name Node and starting Hadoop.

You may face command not recognized issue, mean bash profile not reloaded. safe way is to reload manually using following command.
 ~$ source ~/.bashrc

Initializing the Single-Node Cluster

Formatting the Name Node:

While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.

 ~$ cd ~  
 ~$ hdfs namenode -format  

Starting Hadoop dfs daemons:

 ~$ start-dfs.sh  

Starting Yarn daemons:

 ~$ start-yarn.sh  

Check all daemon processes:

 ~ $ Jps  

 6069 NodeManager  
 5644 DataNode  
 5827 SecondaryNameNode  
 4692 ResourceManager  
 6165 Jps  
 5491 NameNode  

* Process id will be changed for each execution, main idea is to check if certain processes are running fine.

You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:

name-node: http://localhost:50070/

Stopping all daemons:

 ~$ stop-dfs.sh  
 ~$ stop-yarn.sh  
   
Enjoy.

18 comments:

  1. Great information. Thanks for providing us such a useful information. Keep up the good work and continue providing us more quality information from time to time.

    Hadoop Training in Chennai

    ReplyDelete
  2. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  3. please provide the script file as its not working properly

    ReplyDelete
  4. hi , your post helped to understand on installing the single node hadoop and to run the commands successfully thanks for posting Hadoop Training in Velachery | Hadoop Training .
    Hadoop Training in Chennai | Hadoop .

    ReplyDelete
  5. I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive.
    biztalk-training-in-chennai

    ReplyDelete
  6. I think you have a long story to share and i am glad after long time finally you cam and shared your experience.
    Blueprism training in tambaram

    Blueprism training in annanagar

    Blueprism training in velachery

    ReplyDelete
  7. I likable the posts and offbeat format you've got here! I’d wish many thanks for sharing your expertise and also the time it took to post!!
    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    angularjs Training in bangalore

    angularjs Training in bangalore

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete