Cutting Edge: compile monetdb using ubuntu

Showing posts with label compile monetdb using ubuntu. Show all posts

Tuesday, 21 April 2015

Installing Hadoop Single Node - 2.6

Get Started

Now we will check how to install stable version of Apache Hadoop on a Server running Linux Ubuntu 14 x64 but should work on all Debian based systems. To start we need to acquire hadoop package and get java installed, to install java, if not already installed follow my install java post. to check which versions of java are supported with hadoop check Hadoop Java Versions.

Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Apache Hadoop2.6 Installation

Configuring Secure Shell (SSH)

Communication between master and slave nodes uses SSH, to ensure we have SSH server installed
and running SSH deamon.

Installed server with provided command:

 ~$ sudo apt-get install openssh-server

You can check status of server use this command

 ~$ /etc/init.d/ssh status

To start ssh server use:

 ~$ /etc/init.d/ssh start

Now ssh server is running, we need to set local ssh connection with password. To enable passphraseless ssh use

 ~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 ~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 ~$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
 ~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

to check ssh

 ~$ ssh localhost  
 ~$ exit

Disabling IPv6

We need to make sure IPv6 is disabled, it is best to disable IPv6 as all Hadoop communication between nodes is IPv4-based.

For this, first access the file /etc/sysctl.conf

 ~$ sudo nano /etc/sysctl.conf

add following lines to end

 net.ipv6.conf.all.disable_ipv6 = 1  
 net.ipv6.conf.default.disable_ipv6 = 1  
 net.ipv6.conf.lo.disable_ipv6 = 1

Save and exit

Reload sysctl for changes to take effect

 ~$ sudo sysctl -p /etc/sysctl.conf

If the following command returns 1 (after reboot), it means IPv6 is disabled.

 ~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

Install Hadoop

Download Version 2.6.0 (Stable Version)

 ~$ su -  
 ~$ cd /usr/local  
 ~$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz  
 ~$ tar xzf hadoop-2.6.0.tar.gz  
 ~$   
 ~$ mkdir hadoop  
 ~$ mv hadoop-2.6.0/* hadoop/  
 ~$   
 ~$ exit

Update .bashrc with Hadoop-related environment variables

 ~$ sudo nano ~/.bashrc

Add following lines at the end:

export HADOOP_HOME=/usr/local/hadoop  
export HADOOP_MAPRED_HOME=$HADOOP_HOME  
export HADOOP_COMMON_HOME=$HADOOP_HOME  
export HADOOP_HDFS_HOME=$HADOOP_HOME  
export YARN_HOME=$HADOOP_HOME  
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  
export JAVA_HOME=/usr/  
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_PATH/bin

Save & Exit

Reload bashrc

 ~$ source ~/.bashrc

Update JAVA_HOME in hadoop-env.sh

 ~$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Add following line at the end:

 export JAVA_HOME=/usr/

or if Java is Installed Manually:: double check your installed version of java and update path accordingly, I have assumed 1.7.0_51

 export JAVA_HOME=/usr/local/java/jdk1.7.0_51

Save and exit

Hadoop Configurations

Now we are moving to update configuration files for Hadoop installation

 ~$ cd /usr/local/hadoop/etc/hadoop

Modify core-site.xml – Core Configuration

The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers.
Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags.

 ~$ sudo nano core-site.xml

Add the following lines between configuration tags

   <property>   
    <name>fs.default.name</name>   
    <value>hdfs://localhost:9000</value>   
   </property>

Your file will look like

 <configuration>  
   
   <property>   
    <name>fs.default.name</name>   
    <value>hdfs://localhost:9000</value>   
   </property>  
     
 </configuration>

Modify mapred-site.xml – MapReduce configuration

This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template file, we are required to copy the file from mapred-site.xml.template to mapred-site.xml file.

 ~$ sudo cp mapred-site.xml.template mapred-site.xml  
 ~$ sudo nano mapred-site.xml

Add the following lines between configuration tags.

   <property>   
    <name>mapreduce.framework.name</name>   
    <value>yarn</value>   
   </property>

your file should look like:

 <configuration>  
   
   <property>   
    <name>mapreduce.framework.name</name>   
    <value>yarn</value>   
   </property>  
   
 </configuration>

* Note you may have other configurations defined later, we are considering fresh install

Modify yarn-site.xml – YARN

This file is used to configure yarn into Hadoop.

 ~$ sudo nano yarn-site.xml

Add following lines between configuration tags:

   <property>   
    <name>yarn.nodemanager.aux-services</name>   
    <value>mapreduce_shuffle</value>   
   </property>

your file should look like:
 <configuration>  
   
   <property>   
    <name>yarn.nodemanager.aux-services</name>   
    <value>mapreduce_shuffle</value>   
   </property>  
     
 </configuration>

Modify hdfs-site.xml – File Replication

This file contains information like replication factor for application we have used 1, name-node path, data-node path to your local file system. this will be the location to store Hadoop information.

 ~$ sudo nano hdfs-site.xml

Add following lines between configuration tags and check file path:

   <property>   
    <name>dfs.replication</name>   
    <value>1</value>   
   </property>   
   <property>   
    <name>dfs.name.dir</name>   
    <value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>   
   </property>   
   <property>   
    <name>dfs.data.dir</name>  
    <value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >   
   </property>

your file should look like:

 <configuration>  
   
   <property>   
    <name>dfs.replication</name>   
    <value>1</value>   
   </property>   
   <property>   
    <name>dfs.name.dir</name>   
    <value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>   
   </property>   
   <property>   
    <name>dfs.data.dir</name>  
    <value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >   
   </property>  
     
 </configuration>

Initializing the Single-Node Cluster

Formatting the Name Node:

While setting up the cluster for the first time, we need to initially format the Name Node in HDFS.

 ~$ cd ~  
 ~$ hdfs namenode -format

Starting Hadoop dfs daemons:

 ~$ start-dfs.sh

Starting Yarn daemons:

 ~$ start-yarn.sh

Check all daemon processes:

 ~$ jps

 6069 NodeManager  
 5644 DataNode  
 5827 SecondaryNameNode  
 4692 ResourceManager  
 6165 Jps  
 5491 NameNode

* Process id will be changed for each execution, main idea is to check if certain processes are running fine.

You should now be able to browse the name-node in your browser (after a short delay for start-up) by browsing to the following URLs:

name-node: http://localhost:50070/

Stopping all daemons:

 ~$ stop-dfs.sh  
 ~$ stop-yarn.sh

Now run examples. looking for examples to run without changing your style of code, am going run Python MapReduce on New Version of Hadoop wait for post.

Monday, 13 April 2015

MonetDB Basic Example with Python

Overview

When size of your application database grows into millions of records, distributed over different tables, and business intelligence/ science becomes the prevalent application domain, a column-store database management system is called for. Unlike traditional row-stores, such as MySQL and PostgreSQL, a column-store provides a modern and scale-able solution without calling for substantial hardware investments.

In earlier blog post we have compiled MonetDB from source tarball and connected to shell for testing SQL from SQL reference manual. Now we are going to explore python API for connecting to MonetDB Database and execute SQL commands.

Python Package:

Python package hosted by monetDB itself is available at pypi repository and can be installed using following commands.

pip install python-monetdb

of download source tarball and install manually using;

wget https://pypi.python.org/packages/source/p/python-monetdb/python-monetdb-11.19.3.2.tar.gz#md5=9031fd2ea4b86a2bc2d5dd1ab4b10a77
tar xvf python-monetdb-11.19.3.2.tar.gz
cd python-monetdb-11.19.3.2
python setup.py install

Create Test Table:

Now we will connect to database created in last post, you can change database to your own.

mclient -u monetdb -d mydatabase

Create table using following SQL;

CREATE TABLE "sys"."test" (
"id" INTEGER,
"data" VARCHAR(30)
);

Now here is python code to insert data:

 import monetdb.sql  
 connection = monetdb.sql.connect(username="monetdb", password="monetdb", hostname="localhost", database="mydatabase")  
 cursor = connection.cursor()  
 cursor.arraysize = 100  
 for a in range(1, 200):    
   cursor.execute("INSERT into sys.test(id, data) values(%s, '%s')"%(a, 'testing %s'%a))  
 connection.commit()  
 cursor.execute("SELECT * FROM sys.test LIMIT 1")  
 # To Fetch all rows as list  
 print cursor.fetchall()  
 # To Fetch single row as list  
 print cursor.fetchone()

You can perform all queries using cursor.execute. for queries and SQL use MonetDB SQL Reference manual.

Why Monetdb? Compile Monetdb with Ubuntu.

Overview

Column store technology has found its way into products offering in all major commercial vendors. The market for applications empowered be these techniques provide ample space for further innovations.

WHY?

When you database grows into million of rows you really need one NoSQL Solution, column store database management system would be good choice.

MonetDB innovates at all layers of DBMS, e.g. a storage model bases on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and a modular software architecture.

MonetDB pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993. It achieves its goal by innovations at all layers of a DBMS. It is based on the SQL 2003 standard with full support for foreign keys, joins, views, triggers, and stored procedures. It is fully ACID compliant and supports a rich spectrum of programming interfaces (JDBC, ODBC, PHP, Python, RoR, C/C++, Perl).

INSTALL MONETDB:

OS: UBUNTU 14.10

download copy of monetdb source tar ball, I have fetched latest available copy 11.19.9 using commands below, extract it and go to MonetDB directory:

~ # wget https://www.monetdb.org/downloads/sources/Oct2014-SP2/MonetDB-11.19.9.tar.bz2
~ # tar xvf MonetDB-11.19.9.tar.bz2
~ # cd MonetDB-11.19.9/

Now we will compile source for installation:

To configure and compile we need following packages to be installed,

make
pkg-config
openssl
pcre
libxml2

To install above listed packages use following commands for ubuntu(sudo users will use sudo for each command like "sudo apt-get update")

apt-get update
apt-get install make
apt-get install pkg-config
apt-get install bison
apt-get install OpenSSL
apt-get install libssl-dev
apt-get install libpcre3 libpcre3-dev
apt-get install libxml2 libxml2-dev

To Configure and install Monetdb source use following command:

./configure
make
make install

To Add missing path to the monet libraries use:

ldconfig -v

Now Monetdb is installed, to continue using we need to create/start dbform to store data, use following commands to create dbform.

monetdbd create /root/my_dbform
monetdbd start /root/my_dbform

After starting dbform you can create database using following commands.

monetdb create mydatabase

monetdb release mydatabase

To start db shell using following command. default username/password for fresh installation is monetdb.

mclient -u monetdb -d mydatabase <------ Hit Enter and you will be asked for password

For SQL Reference use provided weblink.

https://www.monetdb.org/Documentation/SQLreference