How To Make Big Data Headache Go Away
Let’s start with the good news: Your big data problem is getting smaller every day.
The emergence of big data analysis has been described in many ways, from the ridiculous to the sublime. In some quarters, it’s derided as the next buzzword, a vendor-driven fad that is designed to simply sell more software. Others consider it the defining shift in information management that leaps the chasm and brings data into the useful domain of business and government.
Me? I think it’s a little of both. But what I think barely matters—in fact, doesn’t matter at all. In this worldwide economy of information trading and access, all that really matters is the ability to find and use information hidden away in those vast storage repositories.
So why would you consider reading any more of this belly-gazing pondering? Partly because I have unique access to the best minds in the business. Mine is not one of them, I assure you. But I am fortunate to have the phone numbers of those minds that are. I’ll get to those in a minute.
Let’s get some dictionary work out of the way first. Big data is not only big. It’s complex. It’s full of (as they say) not just volume, but also variety (rich data, media forms, structured and unstructured content), plus it’s riddled with duplicative files, junk files and mistakes. Let’s not even TALK about the stuff that people take home on their I-phones and laptops. So big data is more than just big.
Let’s also mention the other “Vs.” The analysts like to narrow the issue down to acronyms, and I guess to be honest, I do too. Besides the great volume and variety, there’s also the velocity in which data and content enters your house, and then there’s the other “Vs”: veracity (can they rely on the truth of the content?) and most importantly, the value of the information. Much of it’s junk, let’s face it. But there are many elements hidden within content that can make or break today’s deal.
I know what you’re thinking. Business intelligence tools have been around for a long time, and have pretty much mastered the analysis of structured AND unstructured content.
Not so fast, bucko. The advent of big data—really big data—and the underlying complexity of it, has changed the game. And more than 70 vendors in the space have recognized the need to address it (each in their own impenetrable ways, of course.)
I love how Timo Elliot at SAP put it recently: “What’s the difference between business analytics and business intelligence?” And working for SAP, he should know. “The correct answer,” he says is: “Everybody has an opinion, but nobody knows, and you shouldn’t care.
“At the end of the day,” Timo writes, “the first is the business aspect of BI—the need to get the most value out of information. This need hasn’t really changed in more than 50 years (although the increasing complexity of the world economy means it’s ever harder to deliver). And the majority of real issues that stop us from getting value out of information (information culture, politics, lack of analytic competence, etc.) haven’t changed in decades either.
“The second is the IT aspect of BI—what technology is used to help provide the business need. This obviously does change over time—sometimes radically. The problems in nomenclature typically arise because ‘business intelligence’ is commonly used to refer to both of these according to the context, thus confusing the heck out of everyone.
“In particular, as the IT infrastructure inevitably changes over time, analysts and vendors (especially new entrants) become uncomfortable with what increasingly strikes them as a ‘dated’ term, and want to change it for a newer term that they think will differentiate their coverage/products.
“When people introduce a new term, they inevitably (and deliberately, cynically?) dismiss the old one as ‘just technology driven’ and ‘backward looking,’ while the new term is ‘business oriented’ and ‘actionable.’ This is complete rubbish, and I encourage you to boo loudly whenever you hear a pundit say it.”
TechTarget’s (WhatIs) writer Margaret Rouse put it well when she said that “Big data analytics is the process of examining large data sets containing a variety of data types—i.e., big data—to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits.”
The effect that big data analytics has on marketing and sales cannot be understated. It seems to have been built for that purpose, and rightly so. Collecting sales information, consumer feedback, market potentials and trending fashion hits are the bread-and-butter of big data analytics. I don’t know a single CEO or CTO who would ever say, “Learning more about customers? Forgetabout it. I got shareholders on the line right now who are trying to talk me down about costs.” In fact, the best business leaders understand the inherent value in big data analytics. It’s a gut feeling for them; they can’t always explain it, but they “just know.”
The Urge for Big Data Analytics
And I haven’t even mentioned BI or the open source tools that are dominating the market yet. Big data analytics is the one place on Earth that open-source tools have made a difference. You can’t swing a cat at a big data meeting without hitting Hadoop, for instance.
Hadoop is a technology that is open to anyone to improve or at least comment on. Here’s the part I stole from Wikipedia (because my feeble mind can’t seem to grasp it): “To process the data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking.”