Saturday 1 March 2014

Big Data and Analytics - Hadoop

What is Big Data?

Big data is buzzword to describe massive volume of structured or unstructured data. Data is too large and complex and impractical to manage with traditional software tools. Now enterprises have data that is too large, move too fast to exceed current data processing capacities. example could be petabytes or exabytes. billions to trillions of records. Big data is not only about too large data as described,

"Big Data Refer to technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infra-structure to address efficiently. Said differently, the volume, velocity or variety of data is too great." - Mongodb

Today's technologies have made it possible to evaluate Big data and realize value from it. retailers can track user web clicks to identify behavioral trends to improve campaigns. Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety:

Volume: normal computers have storage from 250 gigabytes to 1 terabytes of storage. Today Facebook ingests 500 terabytes of new data every day.
Velocity: to capture ad impressions or user web clicks require millions of events per second.

Variety: Big Data is not only about numbers, dates, strings but is also geospatial data, 3D data, audio and video etc.

Big Data Analytic?

As described refer to process of collecting, organizing and analyzing large sets of data to discover patterns and other useful information. Not only it helps to understand information within data, but will help to identify data that is most important to the business and future business decisions. Big Data analysts basically want the knowledge that comes from analyzing the data.

Hadoop?

Hadoop is a software technology designed to store and process large volumes of data using a cluster of commodity servers and storage. it's an open-source Apache project originated in 2005 by Yahoo. It consists of a distributed file system, called HDFS, and a data processing and execution model called MapReduce. wait and visit next post to install & configure it, then practice MApReduce?

No comments:

Post a Comment