-->

KMWorld 2024, Washington, DC - November 18 - 21 

Understanding What Matters With Text Analytics and NLP

Article Featured Image

Context and semantics

Aasman identified four levels of textual understanding: phonetics, syntax, semantics (or context), and pragmatics. Some of the numerous methods for facilitating semantic understanding with NLP include the following:

Glossaries or databases: These tools involve resources defined by sources internal or external to the enterprise to “figure out the meanings of words,” Aasman said. “And, assuming that at some point you can find out the meaning, there are systems for that like Wordnet or dictionaries and theasauri.” According to Shankar, even the grammar checker in Word is “bordering on NLP because it is trying to understand the context of what you’re trying to do and is actually proposing options.”

Learning models: Machine learning models can supersede the conventional approach of gleaning context based on keywords. Pairing language models with transfer learning approaches results in a “massive, pre-trained understanding of context that enables us to take training data a customer gives us in a particular vernacular of their use case, and apply this pragmatic understanding of the language to make sure we’re getting the meaning correct,” Wilde said. Thus, organizations are tailoring an enormous deep learning model, which, Wilde observed, contains “billions of byte-pair encodings that are at the sub-word level, even down to the text level, that automatically generate billions of these inferences and vectors that define meaning in a much more complex, robust way than a human rule ever could.”

Knowledgebases: Both contextual and pragmatic understanding of text are supported by formal knowledgebases. Directly addressing AI’s symbolic reasoning side and underpinned by taxonomies, such repositories “consist of symbols of concepts,” Mishra remarked. “It’s about: what are the different personas for users, what are the financial terms, what are the metrics people generally talk about, what is Sunday, what is Monday, what is a weekday? We keep all of that as concepts. What is rank, what is marketshare? We keep that as concepts.” Applying these concepts to text analytics reinforces contextual understanding.

Classification and extraction

Classification and extraction are some of the most common text analytics actions occurring once NLP is employed to parse and understand unstructured data. Classification and extraction directly correlate to the business value engendered by text analytics. Often, entities defined during the preceding processes (whether implemented with taxonomies, knowledgebases, or machine learning models) are extracted, serving as the focus of analysis. With sentiment analysis, for example, organizations can scrutinize product reviews to assess market reaction to products and services. Entity extraction can assist this process by enabling them to focus on specific features, competitor comparisons, and other relevant concepts in text. Organizations can then “aggregate that with the information about the product they’re referencing and pass it back to the product teams that are trying to understand and make improvements,” Shankar said.

Many document workflows hinge on classification and extraction. Wilde detailed an email use case germane to healthcare, financial services, and insurance in which text analytics classifies attached documents according to type before extracting information required downstream. “Now we know what kind of document it is; it’s an application, let’s pretend,” Wilde said. “On the application there may be 20 or 30 pieces of information I need to lift off that application to put in my CRM system and my underwriting system. That’s extraction.”

Structuring the unstructured

There’s no limit to the business value that can be derived from parsing, appropriately contextualizing, classifying, and extracting text with NLP. This blueprint is applicable for speech recognition systems for real-time customer support and recommendations, as well as for summarizing mounds of reports for financial analysts via NLG. In these situations, text analytics’ chief value proposition is translating unstructured data to structured data that computers understand and enterprises act on. “You need to convert from this unstructured world to the structured SQL and pass it to the analytics,” Shankar said. “We need that interpreter in between to go from a text or voice analytics to the backend analytics.”

That interpreter, of course, is NLP.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues