Monday, 3 March 2014

Running your First Example On hadoop using python


Overview

Even though the Hadoop framework is written in Java, but we can use other languages like python and C++, to write MapReduce for Hadoop. However, Hadoop’s documentation suggest that your must translate your code to java jar file using jython. which is not very convenient and can even be problematic if you depend on Python features not provided by Jython.

Example

We will write simple WordCount MapReduce program using pure python. input is text files and output is file with words and thier count. you can use other languages like perl.

Prerequisites

You should have hadoop cluster running if still not have cluster ready Try this to start with single node cluster.

MapReduce

Idea behind python code is that we will use hadoop streaming API to transfer data/Result between our Map and Reduce code using STDIN(sys.stdin)/ STDOUT(sys.stdout). We will use STDIN to read data
from input and print output to STDOUT.

mapper.py


import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s\t%s' % (word, 1)


reducer.py

from operator import itemgetter

import sys


current_word = None

current_count = 0

word = None


for line in sys.stdin:

    line = line.strip()

    word, count = line.split('\t', 1)

    try:

        count = int(count)

    except ValueError:

        continue

    if current_word == word:

        current_count += count

    else:

        if current_word:

            print '%s\t%s' % (current_word, current_count)

        current_count = count

        current_word = word

if current_word == word:

    print '%s\t%s' % (current_word, current_count)


Running Hadoop's Job

Download Example Data to home directory like /home/elite/Downloads/examples/
Book1
Book2
Book3



Start Cluster

$ bin/start-all.sh
Copy Data from Local to dfs File System
$ bin/hadoop dfs copyFromLocal /home/elite/Downloads/examples/ /home/hdpuser/wordscount/

Check files on dfs
$ bin/hadoop dfs -ls /home/hdpuser/wordscount

Run MapReduce Job

I have both mapper.py and reducer.py and /home/hdpuser/ here is command to run job.
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar \
-file /home/hduser/mapper.py -mapper /home/hduser/mapper.py \
-file /home/hduser/reducer.py -reducer /home/hduser/reducer.py \
-input /home/hdpuser/wordscount/* -output /home/hdpuser/wordscount.out

You Can check status from terminal or web page http://localhost:50030/ configured in your cluster setup. after job is complete we can get results back by coping output file from hadoop file system to local

$ bin/hadoop dfs -copyToLocal /home/hdpuser/wordscount.out /home/hdpuser/

Check Result

$ vi /home/hdpuser/wordscount.out/part-00000

Stop running cluster

$ bin/stop-all.sh

Sunday, 2 March 2014

Installing Hadoop Single Node

Get Started

Now we will check how to install stable version of Apache Hadoop on a Laptop running Linux Mint 15 but will work on all Debian based systems including Ubuntu. To start we need to acquire hadoop package and get java installed, to install java, if not already installed follow my install java post. to check which versions of java are supported with hadoop check Hadoop Java Versions. Next step is to acquire hadoop which could be downloaded @ hadoop webpage. we opted for hadoop-1.2.1 in our blog.

Create Dedicated Hadoop User

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hdpuser

Give user sudo rights

$ sudo nano /etc/sudoers
add this to end of file
hdpuser ALL=(ALL:ALL) ALL

Configuring Secure Shell (SSH)   

Communication between master and slave nodes uses SSH, to ensure we have SSH server installed
and running SSH deamon.

Installed server with provided command:

$ sudo apt-get install openssh-server

You can check status of server use this command

$ /etc/init.d/ssh status

To start ssh server use:

$ /etc/init.d/ssh start

Now ssh server is running, we need to set local ssh connection with password. To enable passphraseless ssh use

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

to check ssh

$ ssh localhost
$ exit

Disabling IPv6

We need to make sure IPv6 is disabled, it is best to disable IPv6 as all Hadoop communication between nodes is IPv4-based.

For this, first access the file /etc/sysctl.conf

$ sudo nano /etc/sysctl.conf
add following lines to end
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Save and exit

Reload sysctl for changes to take effect

$ sudo sysctl -p /etc/sysctl.conf

If the following command returns 1 (after reboot), it means IPv6 is disabled.

$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

Install Hadoop

Download Version 1.2.1 (Stable Version)

Make Hadoop installation directory

$ sudo mkdir -p /usr/hadoop

Copy Hadoop installer to installation directory

$ sudo cp -r ~/Downloads/hadoop-1.2.1.tar.gz /usr/hadoop

Extract Hadoop installer

$ cd /usr/hadoop
$ sudo tar xvzf hadoop-1.2.1.tar.gz

Rename it to hadoop

$ sudo mv hadoop-1.2.1 hadoop

Change owner to hdpuser for this folder

$ sudo chown -R hdpuser:hadoop hadoop

Update .bashrc with Hadoop-related environment variables

$ sudo nano ~/.bashrc
Add following lines at the end:
# Set HADOOP_HOME
export HADOOP_HOME=/usr/hadoop/hadoop
# Set JAVA_HOME
# Import if you have installed java from apt-get
# use /usr instead of /usr/local/java/jdk1.7.0_51
export JAVA_HOME=/usr/local/java/jdk1.7.0_51
# Add Hadoop bin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

Save & Exit

Reload bashrc

$ source ~/.bashrc


Update JAVA_HOME in hadoop-env.sh

$ cd /usr/hadoop/hadoop
$ sudo nano conf/hadoop-env.sh

Add the line:
export JAVA_HOME=/usr/local/java/jdk1.7.0_51

Save and exit

Create a Directory to hold Hadoop’s Temporary Files:

$ sudo mkdir -p /usr/hadoop/tmp

Provide hdpuser the rights to this directory

$ sudo chown hdpuser:hadoop /usr/hadoop/tmp


Hadoop Configurations

Modify conf/core-site.xml – Core Configuration

$ sudo nano conf/core-site.xml

Add the following lines between configuration tags
<property>
   <name>hadoop.tmp.dir</name>
   <value>/usr/hadoop/tmp</value>
   <description>Hadoop's temporary directory</description>
</property>
<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:54310</value>
   <description>Specifying HDFS as the default file system.</description>
</property>

Modify conf/mapred-site.xml – MapReduce configuration

$ sudo nano conf/mapred-site.xml

Add the following lines between configuration tags
<property>
   <name>mapred.job.tracker</name>
   <value>localhost:54311</value>
   <description>The URI is used to monitor the status of MapReduce tasks</description>
</property>

Modify conf/hdfs-site.xml – File Replication

$ sudo nano conf/hdfs-site.xml

Add following lines between configuration tags:
<property>
   <name>dfs.replication</name>
   <value>1</value>
   <description>Default block replication.</description>
</property>

Initializing the Single-Node Cluster


Formatting the Name Node:

While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.
$ bin/hadoop namenode -format

Starting all daemons:

$ bin/start-all.sh

You should now be able to browse the nameNode and JobTracker in your browser (after a short delay for startup) by browsing to the following URLs:

nameNode: http://localhost:50070/
JobTracker: http://localhost:50030/

Stoping all daemons:

$ bin/stop-all.sh

your can seperatly start stop as

hdfs:

$ bin/start-dfs.sh
$ bin/stop-dfs.sh

mappered:

$ bin/start-mapred.sh
$ bin/stop-mapred.sh


Now run examples Java Word Count Example.  looking for examples to run without changing your style of code, am going run python map-reduce wait for post.

Saturday, 1 March 2014

Installing Sun-Java JDK 7

Install Java JDK

OS- MINT15, will work on on all Debian Based systems including ubuntu

Easy Way:

Simple and easy way to install JDK is to do it with apt-get repository. but noted that this some ti PPA becomes out dated. 
This installs JDK 7 (which includes Java JDK, JRE and the Java browser plugin).

Remove any installed version of open-JDK
sudo apt-get purge openjdk-\*

Add PPA and update apt-get repo

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update

install it
$ sudo apt-get install oracle-java7-installer

Check version and process id
Check your Java version to ensure installations and settings:
$ java -version
Verify that JPS (JVM Process Status tool) is up and running
$ jps

Manual way:

1) Remove any previous OpenJDK installations
$ sudo apt-get purge openjdk-\*

2) Make directory to hold Sun Java
$ sudo mkdir -p /usr/local/java

3) Download Oracle Java Sun (JDK/JRE) from Oracle’s website:

JDK Download and JRE Download. Normally downloaded files will be

placed in /home/<your_user_name>/Downloads folder.

4) Copy the downloaded files to the Java directory
$ cd /home/<your_user_name>/Downloads
$ sudo cp -r jdk-7u51-linux-x64.tar.gz /usr/local/java
$ sudo cp -r jre-7u51-linux-x64.tar.gz /usr/local/java

5) Unpack the compressed binaries
$ cd /usr/local/java
$ sudo tar xvzf jdk-7u51-linux-x64.tar.gz
$ sudo tar xvzf jre-7u51-linux-x64.tar.gz

6) Cross-check the extracted binaries:
$ ls -a
The following two folders should be created: jdk1.7.0_51 and jre1.7.0_51

7) To provide information about JDK/JRE paths to the system PATH

(located in /etc/profile), first access the PATH:
$sudo nano /etc/profile

and add the following lines at the end:
JAVA_HOME=/usr/local/java/jdk1.7.0_51
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
JRE_HOME=/usr/local/java/jre1.7.0_51
PATH=$PATH:$HOME/bin:$JRE_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH

Save and exit (CTRL+O then Enter, then press CTRL+X then Enter)

8) Inform OS about Oracle Sun Java location to signal that it is ready for use:

JDK is available:
$ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_51/bin/javac" 1

JRE is available:
$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_51/bin/java" 1

Java Web Start is available:
$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_51/bin/javaws" 1

9) Make Oracle Sun JDK/JRE the default on your system:
Set JRE:
$ sudo update-alternatives --set java /usr/local/java/jre1.7.0_51/bin/java

Set javac Compiler:
$ sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_51/bin/javac

Set Java Web Start:
$ sudo update-alternatives --set javaws /usr/local/java/jre1.7.0_51/bin/javaws

10) Re-load the /etc/profile
$ source /etc/profile

11) Check your Java version to ensure installations and settings:
$ java -version

12) Verify that JPS (JVM Process Status tool) is up and running
$ jps

This will show the process id of the jps process








Big Data and Analytics - Hadoop

What is Big Data?

Big data is buzzword to describe massive volume of structured or unstructured data. Data is too large and complex and impractical to manage with traditional software tools. Now enterprises have data that is too large, move too fast to exceed current data processing capacities. example could be petabytes or exabytes. billions to trillions of records. Big data is not only about too large data as described,

"Big Data Refer to technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infra-structure to address efficiently. Said differently, the volume, velocity or variety of data is too great." - Mongodb

Today's technologies have made it possible to evaluate Big data and realize value from it. retailers can track user web clicks to identify behavioral trends to improve campaigns. Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety:

Volume: normal computers have storage from 250 gigabytes to 1 terabytes of storage. Today Facebook ingests 500 terabytes of new data every day.
Velocity: to capture ad impressions or user web clicks require millions of events per second.

Variety: Big Data is not only about numbers, dates, strings but is also geospatial data, 3D data, audio and video etc.

Big Data Analytic?

As described refer to process of collecting, organizing and analyzing large sets of data to discover patterns and other useful information. Not only it helps to understand information within data, but will help to identify data that is most important to the business and future business decisions. Big Data analysts basically want the knowledge that comes from analyzing the data.

Hadoop?

Hadoop is a software technology designed to store and process large volumes of data using a cluster of commodity servers and storage. it's an open-source Apache project originated in 2005 by Yahoo. It consists of a distributed file system, called HDFS, and a data processing and execution model called MapReduce. wait and visit next post to install & configure it, then practice MApReduce?

Tuesday, 1 May 2012

Reset Password for CMS Made Simple

How to Reset Admin Password for CMSMS?


Since Last few hour I was trying to recover admin password for one of my client admin panel user password and searched a lot on net but found no luck, Since I was having Db access So good idea from one post to change md5 encrypted password from table cms_users and I used md5 creator at http://md5encryption.com/


But that didn't solved my issue as it was found in many forums. So I got into little depth to solve it.

Find solution here:



after changing password with new md5 Created password from http://md5encryption.com/ in cms_users you have to do little changes in classes found at lib\classes\


class.user.inc.php



Change:


function SetPassword($password)
    {
        $this->password = md5(get_site_preference('sitemask','').$password);
    }



To:


function SetPassword($password)
    {
        $this->password = md5($password);

    }

class.useroperations.inc.php


Change:

if ($password != '')
        {
          $where[] = 'password = ?';
          $params[] = md5(get_site_preference('sitemask','').$password);
        }


To:

if ($password != '')
        {
          $where[] = 'password = ?';
          $params[] = md5($password);
        }


Actually CMSMS includes some more values to password before encrypting to which we have changed to normal md5 of password only.


Now you have login to admin site you have cracked password for CMSMS admin.







Saturday, 19 November 2011

SQL Server 2005 Management Studio 29506 Error

This error occure due to permission and require administrator rights while there are normal installation have some permission restrictions and inorder to overcome that  issue you need to install management studio with administrator permission.

To do this do follwing steps:
For Windows 7 64 bit you have to use the 64 bit CMD prompt (I didn't even know there was a separate version) and then run this as administrator. So I did the following…
1. Right click on desktop and click NEW – SHORTCUT
2. Create shortcut to C:\Windows\SysWOW64\cmd.exe
3. Right click on the new shortcut and Run AS ADMINISTRATOR
4. Enter full path and file name:
e.g. C:\Pathtoyousetupfolder\SQLServer2005_SSMSEE_x64.msi

Monday, 28 March 2011

Trip To Rani Kot Wall


The Great Wall of Sindh also known as Deware Sindh in sindhi language is the world's largest fort with a circumference of about 26 km or 16 miles. Since 1993, it has been on the list of tentative UNESCO World Heritage Sites.

Location

It is located in the Kirthar Range, about 30 km southwest of Sann, in Jamshoro District, Sindh, Pakistan. It is approximately 90 km north of Hyderabad. and 245 km from Karachi, Here is how we verified the route and decided this trip, check this Map.





Dimensions

It has an approximate diameter of 6 km. Its walls are on the average 6 meters high and are made of gypsum and lime cut sandstone and total circumference is about 20 km. While originally constructed for bow and arrow warfare it was later expanded to withstand firearms.

It is reputed to be the largest unexplored fort in the world. The purpose of its construction and the reason for the choice of its location are still unknown.

Ranikot is the most talismanic wonder of Sindh. Visible from five kilometers away its massive undulating walls twist and dip over the hills. With the circumference of about twenty kilometers, its walls, built with dressed sandstone and reinforced with 45 bastions along the outer wall, of which 7 are rectangular and the remaining are round. All modified through the ages to accommodate the use of gunpowder, this perhaps makes it the largest fort in the world.


Our Trip
Our Trip was exciting as we saw its documentary on GEO tv and we Friends decided to have a trip to this incredible adventure and decided the coming Sunday for this adventure because it was impossible to wait. We planned and executed our trip from Karachi as given below might help you to plan your trip.


Morning 7AM Installed all Equipments to fly

baleno full of CNG, Petrol and What ever you need


*Remember there is No Water So keep sufficient

*Remember there is No Canteens So Manage it Here

*Remember Keep you Tank Full and Fill CNG from jamshuro after that there is No CNG Pump Only Petrol

*Remember Don't Forget you Digital Cam Charged


Break Fast at    -> 07:30AM

Depart Karachi   -> 08:00AM

Arrived Rani Kot -> 11:00AM

Forced to Depart -> 04:30PM

Arrived Back     -> 07:15PM


















History

The original purpose and architects of Ranikot Fort are unknown. Some archaeologists attribute it to Arabs, or possibly built by a Persian noble under the Abbasids by Imran Bin Musa Barmaki who was the Governor of Sindh in 836 CE. Others have suggested a much earlier period of construction attributing to at times the Sassanians Persians and at times to the Greeks. Despite the fact that a prehistoric site of Amri is nearby, there is no trace of any old city inside the fort and the present structure has little evidence of prehistoric origins.

Archaeologists point to 17th century CE as its time of first construction but now Sindh archaeologists agree that some of the present structure was reconstructed by Mir Karam Ali Khan Talpur and his brother Mir Murad Ali in 1812 CE at a cost of 1.2 million rupees (Sindh Gazetteer, 677).

Fort Ranikot is located in Lakki Mountains of the Kirthar range to the west of the mighty River Indus at a distance of about 30 kilometers from the present day town of Sann. A mountainous ridge, Karo Takkar(Black Hill), running north to south, forms its western boundary and the 'Lundi Hills' forms its eastern boundary. Mohan Nai, a rain-stream enters the fort from its rarely used western 'Mohan Gate', where it is guarded by a small fortification, changes its name to 'Reni' or 'Rani Nai' or rain-stream and gives the fort its name. Ranikot is thus the 'fort of a rain stream' - Rani. It runs through it, tumbles in a series of turquoise pools to irrigate fields and leaves the fort from its most used 'Sann Gate' on the eastern side. It then travels about 33 kilometers more to enter the Lion River - Indus.

Most of the twenty six kilometers long wall is made of natural cliffs and mountains which at places rise as high as two thousand feet above sea level! Only about 8.25 km portions of its wall are man-made, built with yellow sandstone. This was first measured on foot by Badar Abro along with local guide Sadiq Gabol. As one enters the fort, one can find hills, valleys, streams, ditches, ponds, pools, fossils, building structure, bastions, watchtowers, ammunition depots, fortresses - all inside Ranikot, adding more to its beauty and mystery. A spring emerging from an underground water source near the Mohan Gate is named as 'Parryen jo Tarr' (the spring of fairies).

According to a tale told by the local inhabitants, fairies come from far and wide on the Ponam Nights (full moon) to take bath at this spring near 'Karo Jabal'! Splashing sounds of water falling on the rocks can be heard at another spring, Waggun jo Tarr or "the Crocodile Spring", named so as crocodiles once lived there.

Within Ranikot, there are two more fortresses - Meeri and Shergarh, both have 5 bastions each. Meerikot takes its name from the word 'Mir' meaning top (for instance the top of a hill, chief of any Baloch tribe, etc.).M.H Panhwar (a Sindhologist) disagrees upon the name's history being related to Mirs of Sindh "Of two forts inside the main Rani Kot fort, the lower one is called Miri and is a word used in Seistan for small fortress. It has nothing to do with Mirs of Sindh" he writes. Both the main Ranikot and the inner Meerikot have similar entrances - curved, angulated with a safe tortuous path. From the military point of view, Meerikot is located at a very safe and central place in the very heart of the Ranikot with residential arrangements including a water-well.

Talpur Mirs used Meerikot as their fortified residence. One can explore ruins of the court, harem, guest rooms, and soldiers quarters inside it. Its 1435 feet long wall has five bastions. Every structure in the Ranikot has its own uniqueness and beauty. Looking up from Meerikot one can find another fortified citadel - Shergarh (Abode of Lions) built with whitish stone, it too has five bastions. Though its location at 1480 feet above the sea level makes this fortress a unique structure, it also makes it equally difficult for supply of water, which can only be had from the brooks and rain streams, hundreds of feet below.The steep climb up to Shergarh gives a commanding view down over the whole fort and its entrance and exit points. On a clear day one can even see Indus, 37 kilometers away to the east.

Beside the Mohan Gate and the Sann Gate, there are two more gates, rather pseudo gates. One is towards the side of ancient town of Amri. This 'gate' is called the 'Amri Gate'. Certainly it takes its name from the prehistoric ruins of Amri, but it must have taken this name much later than the times of Amri as the fort itself doesn't appears to be as old as the Amri itself. In fact there is a bridge over rain stream 'Toming Dhoro' exiting from the fort called 'Budhi Mori'. The breach in fort wall due to the river stream has been referred as a gate. Similarly, the Shahpir Gate to the south also appears to be a pseudo gate taking its name from a limestone rock with a rough shape of foot imprinted on it. The sacred footprint supposedly belongs to Hazrat Ali or some other religious personality and is venerated by locals. It seems to be a later breach in the fort wall instead of a formal gate because one can't find any bastion or watchtower or their remains at the site, needed to guard any formal entrance or exit points.

A mosque found in the fort appears to be a later modification of a watchtower or a later construction. Scattered animal skeletons and prehistoric fossils can be found on the top of Lundi Hills. One of the three graveyards has about four hundred graves made of Chowkundi like sandstone with engraved motifs of sunflowers and peacocks. Whether we can call them as theriomorphic and phytomorphic motifs is an open question. Another one appears to be a graveyard of Arabs. The third one, about a mile away from the Sann Gate, had sixteen or seventeen graves earlier but now there are only four graves. The local inhabitants call it the Roman's graveyard.


Research

"The size of Ranikot defies all reasons. It stands in the middle of nowhere, defending nothing" writes Isobel Shaw. So why was this fort built here in the desolate terrain of the Kirthar range? Many theories have been developed to answer this question. According to Ishtiaq Ansari, the Talpurs had sent their families to Thar and Kachchh when Afsharids attacked Sindh during the times of Kalhoras. However, after acquiring the rule of Sindh, they wanted a safe and secure place where they can send their families during the troubled times. This might have prompted them to rebuild this fort to their needs. Rahimdad Khan Molai Sheedai holds view that its location in Kohistan on the western frontiers of Sindh gave it its strategic value. Whereas Mazher Ansari is of the opinion that, it was first constructed in the Achaemenid Dynasty of the Persian Empire (550 - 330 BC). As this empire stretched from Turkey in the west, where a similar wall is constructed near the Caspian Sea called The Great Wall of Gorgan, which is 155 km in length and to the east up to River Indus in Sindh, where this majestic fort is located.

Access to this man-made marvel of ancient times is possible through a mettled road, which goes up to Meeri Kot.


For More Adventures Visit my blog http://mypakadventures.blogspot.com/