What is Big Data?
20/11/12 | in: Storage
Big Data seems to be another “buzz phrase” flying around these days following in the footsteps of Information Lifecycle Management (ILM), Virtualisation and Cloud Computing etc.
The question is whether Big Data is just another marketing hype or does it have relevance to the IT masses and if so what?
S3 has been providing data management consultancy, services and solutions for over 24 years and in that time we have seen a number of different waves in IT from Mainframe, Thin Client, Virtualisation and Cloud.
Currently we are in the midst of a data explosion with an insatiable appetite for information being fuelled by the Cloud, Social Media, Collaboration, and the rise in Big Data.
We can define Big Data as a collection of data sets and individual files that become so large and complex our existing infrastructure can no longer scale or manage the storage and analysis of those data sets. What it is important to note here is that Big Data isn’t always measured in terabytes or petabytes but rather an organisations ability to manage those growing data sets. This means that any organisation whether they have 10Tbs or 10Pbs can face the challenges associated with storing and analysing rapidly growing data sets.
I think we can expand on the definition of Big Data above by segmenting the challenges associated with Big Data into Infrastructure (1.0) vs. Analytics (2.0).
Big Data 1.0 is the efficient use of hardware infrastructure to store increasingly growing sets of unstructured data. This has been enabled by advances in technology including the use of commodity components and “scale out” file systems which allow one to store petabytes of unstructured data without the management and cost associated with traditional enterprise SAN infrastructure.
Big Data 2.0 is the value we can extract from those data sets in real time using Massively Parallel Process in database analytics of unstructured data. The key here is the use of opensource software distribution frameworks such as Hadoop and NoSQL databases to carry out analytics on large unstructured data sets.
In the real world user cases such as 3D modelling, Monte Carlo Analysis, Web 2.0 and Predictive Modelling are driving the requirement for big data infrastructure and analytics. A massive increase in unstructured data sets coupled with a desire for increased competiveness or efficiency means today’s organisations need a real time view of the value that can be extracted from their data today – a rear view mirror type approach is no longer enough.
An example of this is the US Presidential Election where Barack Obama used big data analytics to drive his funding campaign. The Obama machine created a single massive system that could merge the information collected from pollsters, fundraisers, field workers and consumer databases as well as social-media and mobile contacts. The new megafile didn’t just tell the campaign how to find voters and get their attention; it also allowed the number crunchers to run tests predicting which types of people would be persuaded by certain kinds of appeals. This is a fantastic example of how the right infrastructure coupled with big data analysis can deliver real time competitive advantage.
One could argue that Big Data is something we are already doing in the form of Business Intelligence (BI) and data warehousing/mining. To some extent I agree however I also believe Big Data refers to a new wave in technologies that are truly transforming the way in which we behave both as an enterprise and consumer.
Data is Big Data when it enables one or more of the following:
- Value Creation
- Productivity enhancement
- Greater competitiveness
Why has Big Data happened only now?
- Enabled by advances in technology
- Moore’s Law, virtualization
- Lowering of cost barriers
- Adoption of commodity components
- Cloud computing for new levels of scale, agility and flexibility
- Scale-Out vs. Scale-Up technologies – For both processing and storage