Friday, 14 October 2016

Big Data & Retail VoIP

Big Data is Reality and is nothing new to telecoms service providers. Telecommunications service providers are sitting on gold mines of data, Customer experience and data collected about calls could be used for analytics. This could help to forecast traffic patterns, fraud detection, customer experience (ASR, NER etc).

Data is raw material for processing to create knowledge, and without debate, it's an common understanding that knowledge is power. For Businesses it all about power to make good decisions. The more business knows about it’s customers and operations, best possible decisions could be made, and chances of costly mistakes could be reduced.

Retail VoIP companies generate good amount of data daily. Every call a customer make, company can extract valuable information, In order to best exploit this ever increasing amount of data, Service providers require big data solutions to get best possible insight, and take business problem solving skills to new dimensions.

Call detail records are recorded since decades for billing purposes. Communication service providers willing to maximize their revenue potential must have right solution in place to get actionable insight of recorded data. globally most of the service providers suffer from real-time decision making challenges. Most of the operational decision are made manually or parietal hard-coded in Operations support systems, in result these decisions tends to be subjective and suboptimal. The promise of data-driven decision are recognized, In order to exploit full potential, service providers are required to find possibilities of what they can do with big data analytics and decipher information to support decision making.

Telecommunications service providers, over the globe are experiencing an unprecedented rise in volume, variety and velocity of data. One who can address this big data challenge will have an competitive edge, will gain market share with increased revenue and profits using new innovative services. successfully addressing big data challenge will help service providers achieve their objectives.

Friday, 24 April 2015

Installing Single Node Hadoop 2.6 using Bash Script

Get Started

We are going to write simple bash script and execute it to install Hadoop 2.6 with all its dependencies, Now we will check how to install stable version of Apache Hadoop from bash script on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. We will write content to text file using any of your favorite test editor. give it permission to execute and enjoy. hope it  works without error, else let me know issue to update it, it works fine for me on 2 different machines running Linux mint 17 and Ubuntu 14.04 respectively.

What is Bash?

Descended from the Bourne Shell, Bash is a GNU product, the "Bourne Again SHell." It's the standard command line interface on most Linux machines. It excels at interactivity, supporting command line editing, completion, and recall. It also supports configurable prompts - most people realize this, but don't know how much can be done.

Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:
  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Write Script File

To write script for installation open new file named and put following content in it.

# Script to install Sun Java and Hadoop 2.6 


# This command is used to tell shell, turn installation mode to non interactive  
# and set auto selection of agreement for Sun Java  
export DEBIAN_FRONTEND=noninteractive  
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections  
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections  

echo "Bash Script for Installing Sun Java for Ubuntu!"  

echo "Now Script will try to purge OpenJdk if installed..."  

# purge openjdk if installed to remove conflict  
apt-get purge openjdk-\* -y  

echo "Now we will update repository..."  

apt-get update -y  

echo "Adding Java Repository...."  

apt-get install python-software-properties -y  
add-apt-repository ppa:webupd8team/java -y  

echo "Updating Repository to load java repository"  

apt-get update -y  

echo "Installing Sun Java....."  
sudo -E apt-get purge oracle-java7-installer -y  
sudo -E apt-get install oracle-java7-installer -y  

echo "Installation completed...."  

echo "Installed java version is...."  

java -version  

apt-get install openssh-server -y  
/etc/init.d/ssh status  
/etc/init.d/ssh start  

ssh-keyscan -H localhost > ~/.ssh/known_hosts  
y|ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa  
cat ~/.ssh/ > ~/.ssh/authorized_keys  

cd /usr/local  
sudo wget  
tar xzf hadoop-2.6.0.tar.gz  
mkdir hadoop  
mv hadoop-2.6.0/* hadoop/  

echo "Now script is updating Bashrc for export Path etc"  

cat >> ~/.bashrc << EOL  
export HADOOP_HOME=/usr/local/hadoop  
export HADOOP_MAPRED_HOME=/usr/local/hadoop  
export HADOOP_COMMON_HOME=/usr/local/hadoop  
export HADOOP_HDFS_HOME=/usr/local/hadoop  
export YARN_HOME=/usr/local/hadoop  
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native  
export JAVA_HOME=/usr/  
export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin:$JAVA_PATH/bin  

cat ~ / .bashrc  

source ~ / .bashrc  

echo "Now script is updating hadoop configuration files"  

cat >> /usr/local/hadoop/etc/hadoop/ << EOL  
export JAVA_HOME=/usr/  

cd /usr/local/hadoop/etc/hadoop  

cat > core-site.xml << EOL  

cp mapred-site.xml.template mapred-site.xml  
cat > mapred-site.xml << EOL  

cat > yarn-site.xml << EOL  

cat > hdfs-site.xml << EOL  
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>  
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >  

echo "Completed process Now Reloading Bash Profile...."  
cd ~  

echo "You may require reloading bash profile, you can reload using following command."  
echo "source ~/.bashrc"  

echo "To Start you need to format Name Node Once you can use following command."  
echo "hdfs namenode -format"  

echo "Hadoop configured. now you can start hadoop using following commands. "  
echo ""  
echo ""  

echo "To stop hadoop use following scripts."  
echo ""  
echo ""  

Now we will give it permission to make it executable.
 ~$ chmod 755

Now we can execute script using following command, Since installation require root excess, you are required to login as root or switch to root using command "su -"
 ~$ ./  
after successful completion of script you can now move forward for Formation name Node and starting Hadoop.

You may face command not recognized issue, mean bash profile not reloaded. safe way is to reload manually using following command.
 ~$ source ~/.bashrc

Initializing the Single-Node Cluster

Formatting the Name Node:

While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.

 ~$ cd ~  
 ~$ hdfs namenode -format  

Starting Hadoop dfs daemons:


Starting Yarn daemons:


Check all daemon processes:

 ~ $ Jps  

 6069 NodeManager  
 5644 DataNode  
 5827 SecondaryNameNode  
 4692 ResourceManager  
 6165 Jps  
 5491 NameNode  

* Process id will be changed for each execution, main idea is to check if certain processes are running fine.

You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:

name-node: http://localhost:50070/

Stopping all daemons: