-->

Keep up with all of the essential KM news with a FREE subscription to KMWorld magazine. Find out more and subscribe today!

In the realm of big data

Article Featured Image

Companies that develop consumer intelligence programs can acquire immense amounts of data, and require powerful analytics solutions to make the best use of that data. Rewards programs, for example, have proliferated as retailers and other merchants work to develop loyalty from their customers. They have proven to be popular with both merchants and consumers, but effective matching of consumer groups to the appropriate offer is critical.

One problem with rewards programs from the consumer viewpoint is that they are often cumbersome, requiring consumers to download and print coupons or wait for delivery of gift cards. Cardlytics offers a program for banks in which consumers receive reward offers while they are logged into their bank accounts. The consumer can choose to accept the offer while online, then make a qualifying purchase online or in a store; and at the end of the month, the discount is credited to the account.

Cardlytics identifies merchants who may want to participate, and matches them up with major financial institutions. All bank customer data remains behind the bank's firewall; Cardlytics' role is to provide market intelligence. From a pilot group of 10,000 consumers from which Cardlytics derived marketing information, the company expanded dramatically and now taps into a base of 70 million households. As a result of its exponential growth, Cardlytics found that its existing analytics platform was not meeting its need for processing queries that identified the most relevant offers for consumers and tracking opportunities for retailers.

Faster queries

After exploring available options for managing a volume of data that placed it squarely in the big data realm, Cardlytics selected the HP Vertica analytics platform. "We had tried to deal with the data by partitioning, but there were limits in what we could achieve," says Doug Harmon, director of data innovation at Cardlytics. "HP Vertica allowed us to do a proof of concept using their software, and we found that it worked for us."

Cardlytics began by warehousing its operational data in HP Vertica, and will be building data marts in Vertica in the near future. Because the query function operates so much more rapidly, Cardlytics has been able to identify 10 times as many targeted consumers in the same amount of time. Previously, a query might take from 20 minutes to 20 hours to complete, whereas with HP Vertica, the response time is about five minutes.

The pricing model's predictability was also seen as beneficial. "The price depends on the amount of data we are analyzing," says Harmon. "If we want to increase the speed further, we can add more servers, but there is no additional cost for the software."

New insights

The analytical process has produced more insights than was previously possible. "We can test hypotheses and analyze them quickly," Harmon says, "then come up with new ideas and test them." Cardlytics also plans to install an enterprise level business intelligence (BI) solution on top of HP Vertica to make use of a more user-friendly interface and visualization capabilities.

HP Vertica is complementary to other big data technologies such as Hadoop and traditional BI tools. "Many legacy data warehouses are slowing down because they are stretched to capacity and their reporting is inflexible," says Chris Selland, VP of marketing at HP Vertica. "Now, organizations are moving into a ‘conversational relationship' with the data, in response to the need to sift through massive amounts of data."

At the core of HP Vertica's platform is a columnar database, which is a different structure from the row and column format of traditional databases. It offers more flexibility and more speed for analytics. "We were an early entrant in the market, and the platform was purposely built for analytics," says Selland. "We can often improve speed by 50 to 1,000 times, sometimes more."

Academic achievement

Economic and social stability hinges on having an educated, employable workforce, so increasing the graduate rate in high school is a top priority in the educational community. Unemployment rates are highest and incomes lowest for individuals who have not completed high school, in comparison to other educational groups. One school that had a graduation rate of 25 percent in 2005 raised that rate to 80 percent by 2012 as a result of a continuous process of targeted interventions designed to address the needs of the students most at risk. The program relies on analysis of a large and growing body of data that tracks many aspects of the students' educational history as well as teacher performance.

Standardized test results for students in Hamilton County, Tenn., which includes the city of Chattanooga and surrounding suburbs, were well below state benchmarks and the school was "on notice."  Dr. Kirk Kelly, director of accountability and testing in the county's Department of Education, began an intensive analytics program.

"We looked at dropouts and discovered that about 70 percent were overage for their grade as compared to their peer group," says Kelly. "We then looked back longitudinally and found that kindergarten, first grade and sixth grade were points at which students were typically held back. We realized that to address the issue effectively, we had to start looking at much earlier interventions."

The Department of Accountability and Testing built a model that included core factors such as attendance, behavior, academics and testing. "We can change and update this model," Kelly says. "Missing data can be filled in as we obtain it for students so that we can build predictors for those students. The database has a constant stream of information coming in."

More timely data, better results

IBM's SPSS is being used for retroactive and predictive analysis. The analysis starts each night at midnight, and results are ready by 5:30 a.m. the next day. The data is much more current than is often the case in a data warehouse. "We can stay up to date on an individual basis," Kelly explains. "We even know if there is a pattern to the days on which a student is absent, for example."

Much as understanding factors that affect performance and dropout rates may depend on technology, the interventions are very much human-based. Once at-risk students are identified, teachers and counselors attempt to build relationships with those students. Having noted that contact with the families of such students is often unreliable because they are transient, counselors may reach out using the social media that students are likely to use, mainly Facebook.

The combination of better identification of at-risk students and more proactive intervention had the desired effect. Now, the county is off the notice list, and the average graduation rate is an impressive 80 percent. In addition, a project to track staff development was launched, using another IBM analytic tool, Cognos. "In this initiative we were trying to determine which staff development interventions were most effective," Kelly explains. "We have measured 20,000 so far and are working toward 30,000." Those may be anything from a lesson demonstration from an experienced teacher to a day of side-by-side teaching.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues