The knowledge challenge: Surging volumes of content
The metaphors for big data baffle me. I think in terms of data sets, collections and curated information. I ran a query on Google images for big data, and the system quickly displayed an image of buildings covered with numbers; an image of an elephant (the logo of the big data system Hadoop) in a tunnel of numbers; and a swirling pipe of numbers with a light at the end of it.
I struggle with the challenge large volumes of content pose. Definitions of big data are of little help. The fanciful graphics make clear that there is no precise or agreed upon definition. Big data is one of those soup du jour terms that seem to mean a lot and leave a great deal to the restaurant owner's' method of recycling chicken bones.
In an article entitled "Big Data Causing Big Changes, Analysts Predict," IDC is said to define big data as: "a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery and/or analysis. Big data is a horizontal cross-section of the digital universe and can include transactional data, warehoused data, metadata and other data residing in ridiculously large files." (See computerworlduk.com/business-it-hub/tech-briefing/3371764/big-data-causing-big-changes-analysts-predict.)
Big numbers
Dave Feinleib, writing on the Forbes magazine blog, lit up the Internet with his Big Data Trends presentation (available at slide share.net/bigdatalandscape/big-data-trends). The write-up is in the form of a series of presentation slides. Several startling facts caught my attention. Most notable were the numbers attributed to big data, which Feinleib describes as "the next big thing." He also points out the three "I's" of big data: immediate, intimidating and ill defined. Among his assertions are the following:
- Feinleib tallied 10 different business segments plus enabling technologies like Hadoop and Cassandra. The market segments include business intelligence and analytics and some that were new to me: data as a service, structured databases and something called operational.
- Feinleib asserted that the size of the big data market in 2011 was $5.1 billion, which will balloon to $54.3 billion by 2017. To support that astounding estimate, he pointed out that Facebook had 900 million users in April [and reached a billion in September]. Twitter distributed 400 million tweets per day in June 2012, and the corporate data growth rate is ripping along at between 50 percent and 97 percent per year.
- Feinleib lists eight "Big Data Laws." Big Data Law #1 is: "The faster you analyze your data, the greater its predictive value ... Companies are moving away from batch processing to real time to gain competitive advantage."
- Feinleib advises in Big Data Law #4: "Data has value far beyond what you originally anticipate. Don't throw it away."
"Digital landfills"
The idea of not throwing away data may have some other risks. In his article "Compliance, Meet Big Data" (see http://www3.cfo.com/article/2012/6/applications_big-data-compliance-security-regulations-finra), David Rosenbaum quotes Brian Hill, principal analyst at Forrester as saying that businesses are struggling with surging volumes of content, much of it unstructured (such as Word documents, PDFs, e-mails, voicemails, video and, increasingly, social media inputs) and, therefore, inaccessible to traditional enterprise resource planning systems with their stored invoices and tables of numbers. All that data can cause organizations to accumulate "digital landfills" that generate storage costs, tax IT resources, degrade application performance and clog up a company's information pathways.
In the article, Hill suggests that in the age of big data, it's critical for companies to have policies for data disposal as well as retention. "Disposal needs to be done in light of potential litigation concerns," he is quoted as saying.
Figuring out what is in big data collections places a significant burden on traditional search, content management and database management systems. Most systems were designed to process digital content that has boundaries and limits. Perhaps those fences were poorly constructed, but system administrators and senior managers could get their arms around the notion of indexing contracts, business-related e-mails and PowerPoint presentations. Big data operate without limits. Hill says, "There's a hard return on investment for businesses and CFOs in digging out from under the digital landfill by using new archiving and discovery technologies."
Other experts sidestep the compliance issue and focus on the notion of "big changes." According to the article in ComputerWorld UK mentioned earlier: "The net result of this data explosion is, according to Andrew Buss, service director at Freeform Dynamics, that many organizations, large and small, are becoming increasingly overwhelmed by storage-by the amount they have to store, by getting it to the systems that matter and by the huge variety of the types of data being held. He pointed out that while many companies may not think of it as such, this is very much a big data problem, whatever the scale."
The issues ahead
If big data is the future, organizations face some challenges in the coming months:
- First, enterprise software installed prior to the onset of the big data revolution was not designed to account for massive storage scattered across many devices. Once sales data are updated in near real time, the traditional accounting system may be designed to handle serialized processes that assume the database is up-to-date and contains the needed information. If the database does not gather together the data from multiple storage devices, the traditional invoicing process may issue statements that are incomplete. Other enterprise systems may struggle with information about orders stored on tablets, smart phones or the sales professional's laptop.
- Second, existing knowledge management systems may need to be overhauled or subjected to the costly rip-and-replace method that disrupts many work routines. In addition to the cost of the new system, there are soft costs for training, lost opportunities and mistakes resulting from unfamiliarity.
- Third, management in many organizations is more art than science. Crunching numbers means working on the quarterly financials, not pouring through outputs from sophisticated statistics routines and dealing with the many decisions that must be made about thresholds, margins of error and data noise.
Organizations have useful data. Doug Levin of Quant5 said, "Many companies have these Fort Knox data depots, where they collect enormous amounts of critical data that's inaccessible, or just not analyzed regularly."
Many enterprise search companies are now grafting big data into their existing systems. Can enterprise search vendors, content management systems and traditional document management systems embrace big data? I am confident of one fact: Talking about big data is much easier than converting a legacy enterprise solution into a big data solution. As Elizabethan playwright Ben Jonson noted, "To speak and to speak well are two things. A fool may talk, but a wise man speaks."