Big Data Cloud Trends — Data-First Philosophy and the Cloud: Part 2 of 3

Published in

jinman

4 min readJan 20, 2012

In the previous post, I discussed my views regarding importance of Big Data. In this post, I will discuss what has changed and why Big Data matters today.

The Big Data Phenomenon is not new. For ages, companies have collected, stored and analyzed massive amounts of data. So why there is so much buzz around Big Data today?

In the old days, a Senior executive from a large company might know what questions he is going to ask (for example, “I want nightly sales reports of every product”) and the questions drove the data model which in turn drove which data to collect and where and how to store raw data. As a result, the schema (structure and organization of data) was optimized to answer a given set of questions. The problem with this approach is that it became prohibitively expensive to answer ad-hoc questions. Each new question submitted to a standard data warehouse required a full scan of the data in order to answer them, as there were no pre-computed indices available. This limited the algorithms that could be used to analyze the data from those tables. When market dynamics changed, executives were not able to get answers to new different questions because either raw data was not preserved or queries ran so long that deep exploration wasn’t possible in a reasonable time frame or there was not enough computing capacity available that would perform these full scans at reasonable time intervals.

Nowadays, with increased competitive pressure and changing market conditions, the philosophy towards data has changed. Instead of asking to answer a particular question, the senior executive wants to say “Measure everything and collect as much data as possible” without worrying about what questions to ask. In order to make this work, it is vital to collect the data in all phases and aspects of a project and then apply the right algorithms, analytical technology stacks and tools against that data in order to gain business insight. This new shift towards the “data first” philosophy has changed how we store and analyze data.

Public and Private Big Data Remix

There are two types of Big Data: public (data that is generated and created by an organization or community and available to everybody) andprivate (data that is generated and consumed within the organization). Companies have found that when they analyze public and private datasets together, they learn far more than when they analyze those data sets separately. For example, when one can analyze e-commerce sales data (private data) for a given city with that city’s demographics, large events, and festivals (public data), we can obtain better valuable insights like how that city’s purchasing patterns will change as it prepares for those events and festivals. Likewise, when we “mash up” historical flight data with real time weather (forecast) data, we can predict the probability of our flight getting delayed few hours before the airlines themselves know it.

Hence, Big Data is not just about gaining insight from internal private datasets but about analyzing disparate datasets and exploring new dimensions of analysis.

Enter Cloud Computing

Deriving value from this Big Data requires massive computing and storage resources and emergence and proliferation of Cloud Computing has accelerated this phenomenon.

Cloud computing ensures that our ability to analyze data and to extract business intelligence is not limited by capacity or computing power. The cloud gives us access to virtually limitless capacity, on-demand. In doing so, it lowers total cost, maximizes revenue and gets the work done faster at scale

Elasticity, the ability to grow or shrink compute and storage capacity on demand in the cloud, is the fundamental property of cloud computing that drives the cost benefits. While provisioning the infrastructure for data warehouses that are tuned to answer regularly asked questions (like generating the nightly sales report) can be easy to predict, the analytics to discover new trends and correlations in the data (and in other public datasets) is an activity that requires an unpredictable amount of compute cycles and storage. For example, to process Big Data in a traditional on-premise set up, businesses have to provision for the maximum power they might need at some point in the future. To process Big Data in the cloud, businesses can expand and contract their infrastructure resources depending on how much they need at the present moment.

Cloud computing empowers businesses to quickly leverage their data and derive valuable insights from it. They no longer have to wait for weeks or months to procure, acquire and setup physical servers and storage. With cloud computing, businesses can roll out hundreds or thousands of servers in hours versus months and analyze the data faster than their competitors. The cloud helps businesses realize the value of their data that they already own and convert it to a competitive advantage.

In the next post, I will discuss some emerging Big Data Cloud Trends…..

Big Data Cloud Trends — Data-First Philosophy and the Cloud: Part 2 of 3

Written by Jinesh Varia