TEXT ANALYTICS gains clout to capture insights from the data maze
Text analytics continues to be a growth area, driven by the desire of organizations to extract value from the rapidly increasing volume of unstructured information. According to Mordor Intelligence, the market will grow 17 percent per year from 2018 through 2023, from about $4 billion to $10 billion. The broad applicability of text analytics to research, customer engagement and numerous other situations will sustain that growth.
“Text analytics is a foundational technology for knowledge management,” says Tom Reamy, chief knowledge architect at the KAPS Group. “It is closely tied to applications such as search, e-discovery, business intelligence and taxonomy.”
The Center for Drug Evaluation and Research (CDER) in the U.S. Food and Drug Administration (FDA) is responsible for ensuring that drugs marketed in the United States are safe and effective. It regulates prescription drugs and over-the-counter drugs, a category that includes medications but also products such as toothpaste and antiperspirants. CDER evaluates the results of clinical trials to determine whether drugs should be allowed on the market, and it monitors them post-market, including tracking adverse events. Post-market adverse event reports come from numerous sources, including physicians, pharmacists and public health supervisors.
Pulling the full story together
The post-market data includes both structured and unstructured information. The unstructured data is typical of big data in its volume, velocity and variety. “To build the full story, we had to use both structured and unstructured information,” says Qais Hatim, computer scientist in the Office of Computational Science, part of the Office of Transitional Science in CDER, “but the amount of information we receive is beyond what a human team would be able to interpret.” Both statistical modeling and text analytics solutions are being used to find patterns and associations in the large volumes of data in order to interpret it quickly.
Hatim uses statistical analysis tools to carry out analyses that allow researchers to test models. “I built a variety of models to reflect major adverse events occurring. If one event is dominating or there is an association between events, text mining extracts key items among these adverse events,” Hatim explains. “It also presents a variety of visualizations to clarify the insights that result from the analyses.” He used domain experts in fields such as medical surveillance in building the models so that they reflect real-world problems and not just a statistical model.
“Discovering relationships between multiple adverse events required building ‘market basket’ models (association rules) to discover adverse events that are associated with other adverse events, cluster them and assign informative names for such associations,” Hatim explains. “These rules can be either trivial, inexplicable or actionable. By working with medical domain experts, we were able to focus on actionable rules.”
A major goal of conducting the analyses is to be able to predict adverse events before they happen. “A powerful analysis can be conducted by linking the clinical trials data with post-market data. The next step in my research will be to link the pre-market data with post-market data in our models so that we can have a full understanding of the adverse events based on complete data,” Hatim adds.