Big data: hype or transformation?
IDC predicts that the market for big data technology and services will hit $32.4 billion in 2017, nearly doubling the predicted size for 2015 and an impressive 10 times the size it was in 2010. That estimate includes infrastructure software such as security and datacenter management, and high performance data analysis. Predictions from Wikibon are even higher, at $47 billion for 2017, although its definition of big data is broader than some.
The factors driving the growth are well recognized at this point—the availability of large amounts of incoming data and new technology are combining to produce an ideal opportunity. But how much is hype and how much is real? "Big data represents a true paradigm shift," says Brian Hopkins, VP and principal analyst at Forrester. "The scale fundamentally changes the way that business gets done, because when you put all the information together, you really can find the needle in the haystack."
A large telecommunications company wanted to improve its customer experience by detecting and resolving issues as they arose. The company selected the Vitria Operational Intelligence (OI) solution. The goal was to analyze network events and customer-related streaming data to provide insights that can be immediately acted upon to improve service and increase customer retention.
The company uses Vitria OI to aggregate and correlate events from many different sources, detect patterns as they occur and respond proactively. "Vitria OI correlates network and cell site performance issues with call failure rates in real time to help the client identify the specific customers that are being affected," says Dale Skeen, co-founder and CTO of Vitria. The client can then initiate re-routing of calls or load shedding to prevent a shutdown, or take other action.
Front window view
The telecommunications company also uses customer location as a trigger for real-time promotional offers. Instead of blanketing all of its customers with a promotion, the company targets the offers to those most likely to respond, when it is most relevant to them. As an example, Skeen says, "This company uses Vitria OI to determine when a customer is about to leave the United Kingdom to France on the Eurostar."
By correlating customer movements over time along the set of cell sites that serve the Eurostar route, and by masking out other local train services along the same route, they are able to predict which customers are about to leave the country. "The company can then send the appropriate customers a targeted marketing offer—before they reach their destination and turn off data roaming," Skeen explains.
The information used to determine the customer's travel plans comes from network events generated by the customer's mobile device indicating that the traveler is following the set of best-serving cells along the train route in the correct sequence. "All this must take place in real time," Skeen says, "or the promotion will not be useful."
Vitria's streaming big data analytics capabilities are in part in answer to the limitations posed by Hadoop, which was designed for batch processing, although Hadoop 2.0 promises improvements in that area. Vitria OI's Hadoop connector allows data to move from OI to Hadoop so it can subsequently be queried, and data can also be streamed from Hadoop to Vitria OI. "With streaming analytics, you can take a prospective view. Instead of looking out the rear window, now you're looking out your front window to see what's coming," Skeen adds.
According to Gartner, by 2017 more than half of analytics implementations will use data streams from sensors or other machines, applications and people. Since many of those will be on the scale of big data, more companies will be entering the field to support real-time applications. A recent one is Splice Machine, which offers a transactional SQL-on-Hadoop database for real-time big data applications and describes its product as the only such database available.
Getting started with big data
Hadoop technology has brought the opportunities inherent to big data to a much broader group of organizations. "Data-intensive applications such as seismic data analysis for oil and gas exploration have been addressing problems requiring analysis of very large data sets for some time," says Alex Gorbachev, CTO of Pythian. "However, in the past, only a few organizations could afford to store and analyze the volumes of data we associate with big data." Now, the use of the Hadoop Distributed File System (HDFS), which distributes data across commodity servers, makes the management of big data affordable to a much wider range of companies.
Pythian is a data management company that helps plan, deploy and manage data infrastructures. Among the big data products it deploys for its clients are MongoDB and Apache Cassandra, which are both NoSQL non-relational databases often supporting large-scale data applications with millions of users. Pythian also provides a complete spectrum of services for Hadoop and its ecosystem. Hadoop is the modern de facto platform for applications that require affordable massively parallel data processing at huge scale.
Initially, Hadoop is typically used at enterprise organizations to optimize some of their existing operational processes. "Hadoop often is deployed as a landing pad for many different data sources, including unstructured information," says Gorbachev. "It takes over the ETL and frees up the data warehouse for BI users, and the organization learns how it works."
Changing the mind-set
Companies just getting started with big data projects should be open-minded about how the technology can be applied, advises Gorbachev. "For example, they might find ways to improve their decision-making processes, to evaluate investment opportunities or to make operational processes more efficient," he says. The learning process is also important, so organizations need to be proficient in managing big data before taking on more sweeping projects.
Gorbachev cites one example of an incremental project in managing inventory. "In a large retail chain, each outlet used RFID technology to track the location of 20,000 items in the store," he says. "The system could verify whether an item was in storage or on the floor." Traditionally, the inventory system notified salespeople when an item was sold out, but sometimes the employees did not restock the item. "By constantly monitoring all the items and proactively reporting empty shelves, the system could improve customer service as well as sales," Gorbachev says.