Sign up for our newsletter and get the latest HPC news and analysis.

Dan Olds on Why Big Data Is a Big Deal — And it’s only going to get bigger…

Dan Olds

So what is ‘Big Data,’ and why should you care about it? To take the first question, well, first: the specific term ‘Big Data’ refers to the radically increasing size of data sets that need to be gathered, moved, and analyzed by research, government, and business organizations. In a more general sense, the Big Data trend is about organizations of all types and sizes finding that their ability to gather, analyze, and act upon available data increasingly determines whether or not they achieve their particular mission.

 

For traditional research organizations (national labs, academic institutions) Big Data is what they’re working with on a daily basis – although even their data sets are growing at a much faster pace than many envisioned. Likewise, many private organizations in energy production, life sciences, financial services, and even retail have also been analyzing massive data sets and using the results to guide their actions and strategies for many years. Now we’re seeing this trend become broader and deeper as organizations that are already doing ‘big analytics’ extend their capabilities, and newcomers start to dip their toes into the pool.

What’s driving this? Why is it happening now? To me, it comes down to economics and societal changes. At the most basic level, globalization and the instantaneous communications enabled by the wired economy have put almost all of the economic power into the hands of buyers.

Think about how you purchase goods and services today. Do you walk into a store or two and make your buying decision based on what a salesperson or store display tells you? Or do you research the alternatives online – examining features/functions, looking at product and vendor reviews, and finding the best value for your money?

I’m firmly in the latter camp, rigorously shopping online for every significant purchase. Business buyers are behaving the same way when looking for suppliers or even employees. They search for the best combination of quality, terms, and price from suppliers who are located anywhere in the world. Globalization, with the lowering of trade barriers worldwide, makes it much easier to manage a global supply chain or set of end-user customers.

In this kind of economic environment, it’s very hard to achieve and maintain a competitive advantage over your rivals. Successful new innovations on the product or service front are quickly adopted by competitors. Your market can suddenly be disrupted by new entrants or entirely different offerings that threaten to make your product or service obsolete. The end result is a constant churn that keeps margins low and makes it hard for a company to break away from the pack for any significant period of time.

The use of analytics and predictive analytics is fast becoming a valuable tool that enterprises are using to cut costs and maximize opportunities. Virtual prototyping allows Boeing to design and test an airplane wing without having to physically build it first. Wall Street firms use analytics to build models and trading algorithms to maximize their market returns. Cities can use analytics to deploy the level of police presence they need to reduce crime in trouble areas. One very large retailer uses analytics to track cold and flu season as it moves across the world so that they don’t need to buy more tissue and cough drops – they just make sure to have the right quantities in the right locations.

Many companies may think that they’re already doing Big Data (or, more accurately, enterprise analytics) because they’re using Business Intelligence (BI). From what I can tell, many of them aren’t really in the same zip code yet. Their BI systems, while impressive in many cases, just don’t make the cut in terms of providing true insight and actionable intelligence. The biggest downfall is that these BI operations rely heavily (almost exclusively, in many cases) on data generated by the organization itself. Thus predictions of future conditions are built primarily on past experience – past experience that’s limited to a single organization.

This is a lot like driving down the freeway with your eyes locked on a very large and shiny rearview mirror. Your steering decisions are based on landscape that just rushed by your window. This worked fine for the most part in “normal” times (what are those?) But today curves, washed-out bridges, and mountains materialize much more quickly. In order to predict these changes, or to at least figure out how you should react to them, you need to have a much wider and larger set of data inputs to analyze.

The processing required to discover, quantify, and predict business conditions isn’t all that different than what HPC researchers are doing daily. The data and the questions being posed are different, of course, but the statistical techniques and the math are the same. The IT infrastructure supporting both efforts will be much the same as well. However, many corporate data centers will need to consider how (or if) their existing infrastructure can address more computationally intense workloads and deal with much larger datasets.

Inside BigData will track this trend as it develops, providing news on a wide variety of topics including both technical and business innovations. We’ll evolve as the trend evolves, but the focus will always be on providing our readers with interesting and timely content that will help them get the most out of their data. And feel free to let us know how well we’re hitting (or missing) the target, we appreciate any and all feedback.

Resource Links: