-->

KMWorld 2024, Washington, DC - November 18 - 21 

In the realm of big data

Article Featured Image

Academic institutions at all levels are working hard to get a better understanding of their constituents. "These organizations have a lot of data that indicates how they are doing," says Richard Rodts, manager of global academic programs, predictive analytics, at IBM. "They are using that data to plan for the future and circumvent issues before they happen."

With the advent of big data, institutions can look at the lifetime of the individual after he or she leaves the educational environment and enters the job market. "With the greater insights able to be derived from analytics tools to big data, it's possible to look at variables among hundreds of variables in thousands of cases and find out whether a person will persist or not persist in a given setting," Rodts says.

Unified analytical architecture

Big data has brought in a new approach to analysis in which discovery plays a stronger role. "In the past, companies focused more on traditional analytics when business users knew what questions they wanted to ask," says John Dinning, VP of product management and marketing at Teradata. "That will continue to be really important, but now businesses are looking at new ways in which to explore data with a less predictable set of questions." The purpose is to reveal information that may be contained in the data, yet is unexplored.

To achieve that goal, a new vision for an "analytical ecosystem" that allows integration between those two processes is emerging. Teradata offers an analytics appliance that is a traditional data warehouse as well as the Teradata Aster platform for discovery. Output and insight from the discovery platform can then be put into operation in the Teradata warehouse. Teradata also partners with BI companies such as SAS to analyze customer data or transactional information.

A typical example is provided by the case of a global bank that wanted to reduce churn in its profitable customer segments. "The key challenge was the need to integrate customer interactions from many different repositories containing web click data, but also campaign management, marketing, customer and transaction data from within the data warehouse," says Dinning. "The volume of data—billions of records a month—made it complex."

A path analysis running within Aster allowed the company to sessionize the information and see what path the customer took just before closing the account. "After the 10 most common paths were identified, that data was fed into a business intelligence application (like SAS) to develop a predictive model for current customers," explains Dinning. "The company was then able to set up interventions to improve customer retention." 

Transforming data into insight

Dan Garrett, U.S. health IT practice leader at PwC, says, "The industry has digitized a lot of medical data, and all the information is getting to a volume where people want to produce actionable information that creates value." Garrett encourages organizations to think in terms of "intelligence at the moment," which PwC defines as the process of turning data into insight and delivering it when and where needed for better decision-making. In addition, institutions should modify their information paradigm to develop an integrated record that provides a longitudinal patient view.

Increasingly, such analyses are the result of data integrated from multiple sources, as enterprises are now beginning to deploy Hadoop and MapReduce together with their central data warehouse environments, according to John Dinning, VP of product management and marketing at Teradata. Teradata's position is that the most useful results are produced by analyzing all of the available data, integrated in a unified data architecture.

With Hadoop, raw data is loaded directly to low-cost commodity servers one time, and only the higher-value, refined results are passed to other systems-increasingly, large data warehouses, where complex queries can quickly deliver integrated insights to analysts.

"By integrating data—structured and unstructured—from other sources such as Hadoop in an enterprise data warehouse, analysts can have much more complete visibility at a detailed level-plus the power to manage complex queries quickly," Dinning says. "It is beneficial when the data from all sources is combined for analysis, because a more complete and clear analytic picture emerges."

Ultimately, one of the most valuable aspects of big data is its ability to allow integration and interpretation of vast amounts of data in a systemic fashion. "Product development teams in pharma and other health industry sectors are trying to aggregate and understand patient behavior," explains Garrett, "so that the information can guide research and commercialization."

In addition, the ability to have a longitudinal view of the patient as digital data accumulates will allow improved care for that patient and also for others. "Looking at social and family history, treatments and outcomes will allow professionals to make recommendations about what works or does not work for similar individuals," Garrett says.

New storage paradigm for big data

The redundant array of independent disk drives (RAID) has been the de facto standard for data storage for the last 30 years. Despite the decreasing costs of hard drive storage, however, when petabytes are the benchmark, costs can still soar, especially when backup copies are made. Moreover, restoring data if a storage site does fail can take a very long time. Cleversafe uses a technology called information dispersal to perform mathematical calculations that transform the data allowing reconstruction if disk drives, servers or even sites fail.

"The data is virtualized, encrypted and dispersed through this transformation," says Russ Kennedy, VP of product strategy and customer solutions at Cleversafe, "and if one location is compromised, there is no security breach or impact on business operations." From the viewpoint of analytics tools, the data appears to be running on a standard cluster of servers and can be analyzed by Hadoop's MapReduce software or other software. "For big data customers, this technology picks up where traditional storage has reached its limits," says Kennedy.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues