Get Started
Now we will check how to install stable version of Apache Hadoop on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. To start we need to acquire hadoop package and get java installed, to install java, if not already installed follow my install java post. to check which versions of java are supported with hadoop check Hadoop Java Versions.Apache Hadoop?
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Apache Hadoop2.6 Installation
Configuring Secure Shell (SSH)
Communication between master and slave nodes uses SSH, to ensure we have SSH server installedand running SSH deamon.
Installed server with provided command:
~$ sudo apt-get install openssh-server
You can check status of server use this command
~$ /etc/init.d/ssh status
To start ssh server use:
~$ /etc/init.d/ssh start
Now ssh server is running, we need to set local ssh connection with password. To enable passphraseless ssh use
~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
OR ~$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
to check ssh
~$ ssh localhost
~$ exit
Disabling IPv6
We need to make sure IPv6 is disabled, it is best to disable IPv6 as all Hadoop communication between nodes is IPv4-based.For this, first access the file /etc/sysctl.conf
~$ sudo nano /etc/sysctl.conf
add following lines to end
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Save and exit
Reload sysctl for changes to take effect
~$ sudo sysctl -p /etc/sysctl.conf
If the following command returns 1 (after reboot), it means IPv6 is disabled.
~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Install Hadoop
Download Version 2.6.0 (Stable Version) ~$ su -
~$ cd /usr/local
~$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
~$ tar xzf hadoop-2.6.0.tar.gz
~$
~$ mkdir hadoop
~$ mv hadoop-2.6.0/* hadoop/
~$
~$ exit
Update .bashrc with Hadoop-related environment variables
~$ sudo nano ~/.bashrc
Add following lines at the end:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export JAVA_HOME=/usr/
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_PATH/bin
Save & Exit
Reload bashrc
~$ source ~/.bashrc
Update JAVA_HOME in hadoop-env.sh
~$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Add following line at the end:
export JAVA_HOME=/usr/
or if Java is Installed Manually:: double check your installed version of java and update path accordingly, I have assumed 1.7.0_51
export JAVA_HOME=/usr/local/java/jdk1.7.0_51
Save and exit
Hadoop Configurations
Now we are moving to update configuration files for Hadoop installation
~$ cd /usr/local/hadoop/etc/hadoop
Modify core-site.xml – Core Configuration
The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers.
Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags.
Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags.
~$ sudo nano core-site.xml
Add the following lines between configuration tags
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Your file will look like<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Modify mapred-site.xml – MapReduce configuration
This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template file, we are required to copy the file from mapred-site.xml.template to mapred-site.xml file.
~$ sudo cp mapred-site.xml.template mapred-site.xml
~$ sudo nano mapred-site.xml
Add the following lines between configuration tags.
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
your file should look like:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
* Note you may have other configurations defined later, we are considering fresh install
Modify yarn-site.xml – YARN
This file is used to configure yarn into Hadoop.
~$ sudo nano yarn-site.xml
Add following lines between configuration tags:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
your file should look like:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Modify hdfs-site.xml – File Replication
This file contains information like replication factor for application we have used 1, name-node path, data-node path to your local file system. this will be the location to store Hadoop information.
~$ sudo nano hdfs-site.xml
Add following lines between configuration tags and check file path:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >
</property>
your file should look like:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >
</property>
</configuration>
Initializing the Single-Node Cluster
Formatting the Name Node:
While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.
~$ cd ~
~$ hdfs namenode -format
Starting Hadoop dfs daemons:
~$ start-dfs.sh
Starting Yarn daemons:
~$ start-yarn.sh
Check all daemon processes:
~$ jps
6069 NodeManager
5644 DataNode
5827 SecondaryNameNode
4692 ResourceManager
6165 Jps
5491 NameNode
* Process id will be changed for each execution, main idea is to check if certain processes are running fine.
You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:
name-node: http://localhost:50070/
Stopping all daemons:
~$ stop-dfs.sh
~$ stop-yarn.sh
Now run examples. looking for examples to run without changing your style of code, am going run Python MapReduce on New Version of Hadoop wait for post.
yurtdışı kargo
ReplyDeleteresimli magnet
instagram takipçi satın al
yurtdışı kargo
sms onay
dijital kartvizit
dijital kartvizit
https://nobetci-eczane.org/
İ7F
Hollanda yurtdışı kargo
ReplyDeleteİrlanda yurtdışı kargo
İspanya yurtdışı kargo
İtalya yurtdışı kargo
Letonya yurtdışı kargo
0FA
Litvanya yurtdışı kargo
ReplyDeleteLüksemburg yurtdışı kargo
Macaristan yurtdışı kargo
Malta yurtdışı kargo
Polonya yurtdışı kargo
CD6J
Yunanistan yurtdışı kargo
ReplyDeleteAfganistan yurtdışı kargo
Amerika Birleşik Devletleri yurtdışı kargo
Amerika Samoası yurtdışı kargo
Angola yurtdışı kargo
0K78
salt likit
ReplyDeletesalt likit
dr mood likit
big boss likit
dl likit
dark likit
F1D