Get Started
We are going to write simple bash script and execute it to install Hadoop 2.6 with all its dependencies, Now we will check how to install stable version of Apache Hadoop from bash script on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. We will write content to text file using any of your favorite test editor. give it permission to execute and enjoy. hope it works without error, else let me know issue to update it, it works fine for me on 2 different machines running Linux mint 17 and Ubuntu 14.04 respectively.What is Bash?
Descended from the Bourne Shell, Bash is a GNU product, the "Bourne Again SHell." It's the standard command line interface on most Linux machines. It excels at interactivity, supporting command line editing, completion, and recall. It also supports configurable prompts - most people realize this, but don't know how much can be done.
Apache Hadoop?
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Write Script File
To write script for installation open new file named install_hadoop.sh and put following content in it.
#!/bin/bash
# Script to install Sun Java and Hadoop 2.6
clear
# This command is used to tell shell, turn installation mode to non interactive
# and set auto selection of agreement for Sun Java
export DEBIAN_FRONTEND=noninteractive
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
echo "Bash Script for Installing Sun Java for Ubuntu!"
echo "Now Script will try to purge OpenJdk if installed..."
# purge openjdk if installed to remove conflict
apt-get purge openjdk-\* -y
echo "Now we will update repository..."
apt-get update -y
echo "Adding Java Repository...."
apt-get install python-software-properties -y
add-apt-repository ppa:webupd8team/java -y
echo "Updating Repository to load java repository"
apt-get update -y
echo "Installing Sun Java....."
sudo -E apt-get purge oracle-java7-installer -y
sudo -E apt-get install oracle-java7-installer -y
echo "Installation completed...."
echo "Installed java version is...."
java -version
apt-get install openssh-server -y
/etc/init.d/ssh status
/etc/init.d/ssh start
ssh-keyscan -H localhost > ~/.ssh/known_hosts
y|ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
ssh-add
cd /usr/local
sudo wget http://mirror.sdunix.com/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar xzf hadoop-2.6.0.tar.gz
mkdir hadoop
mv hadoop-2.6.0/* hadoop/
echo "Now script is updating Bashrc for export Path etc"
cat >> ~/.bashrc << EOL
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=/usr/local/hadoop
export YARN_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native
export JAVA_HOME=/usr/
export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin:$JAVA_PATH/bin
EOL
cat ~ / .bashrc
source ~ / .bashrc
echo "Now script is updating hadoop configuration files"
cat >> /usr/local/hadoop/etc/hadoop/hadoop-env.sh << EOL
export JAVA_HOME=/usr/
EOL
cd /usr/local/hadoop/etc/hadoop
cat > core-site.xml << EOL
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
EOL
cp mapred-site.xml.template mapred-site.xml
cat > mapred-site.xml << EOL
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
EOL
cat > yarn-site.xml << EOL
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
EOL
cat > hdfs-site.xml << EOL
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >
</property>
</configuration>
EOL
echo "Completed process Now Reloading Bash Profile...."
cd ~
echo "You may require reloading bash profile, you can reload using following command."
echo "source ~/.bashrc"
echo "To Start you need to format Name Node Once you can use following command."
echo "hdfs namenode -format"
echo "Hadoop configured. now you can start hadoop using following commands. "
echo "start-dfs.sh"
echo "start-yarn.sh"
echo "To stop hadoop use following scripts."
echo "stop-dfs.sh"
echo "stop-yarn.sh"
Now we will give it permission to make it executable.
Now we can execute script using following command, Since installation require root excess, you are required to login as root or switch to root using command "su -"
~$ chmod 755 install_hadoop.sh
Now we can execute script using following command, Since installation require root excess, you are required to login as root or switch to root using command "su -"
~$ ./install_hadoop.sh
after successful completion of script you can now move forward for Formation name Node and starting Hadoop.
You may face command not recognized issue, mean bash profile not reloaded. safe way is to reload manually using following command.
~$ source ~/.bashrc
Initializing the Single-Node Cluster
Formatting the Name Node:
While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.
~$ cd ~
~$ hdfs namenode -format
Starting Hadoop dfs daemons:
~$ start-dfs.sh
Starting Yarn daemons:
~$ start-yarn.sh
Check all daemon processes:
~ $ Jps
6069 NodeManager
5644 DataNode
5827 SecondaryNameNode
4692 ResourceManager
6165 Jps
5491 NameNode
* Process id will be changed for each execution, main idea is to check if certain processes are running fine.
You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:
name-node: http://localhost:50070/
Stopping all daemons:
~$ stop-dfs.sh
~$ stop-yarn.sh
Enjoy.